mirror of
https://gerrit.wikimedia.org/r/mediawiki/extensions/DiscussionTools
synced 2024-11-24 08:23:52 +00:00
6f32369b6a
In JS, strings are internally encoded as UTF-16, and properties like .length return values in UTF-16 code units. In PHP, strings are internally encoded as UTF-8, and we have the option of using methods that return bytes like strlen() or UTF-8 code units like mb_strlen(). However, the values produced by preg_match( …, PREG_OFFSET_CAPTURE ) are in bytes, and there's nothing we can do about that. So let's use bytes throughout, mixing the two types results in meaningless numbers. Then in the test code, we have to calculate UTF-16 code units offsets based on the UTF-8 byte offsets. We also have to copy the entire workaround for mw:Entity nodes… Maybe the parser should be fixed to return the real nodes for ranges' ends in this case. Change-Id: I05804489d7de0d60be6e9f84e6a49a885e9fb870 |
||
---|---|---|
.. | ||
DiscussionToolsCommentParserTest.php | ||
DiscussionToolsTestCase.php |