mirror of
https://gerrit.wikimedia.org/r/mediawiki/extensions/DiscussionTools
synced 2024-11-15 03:44:02 +00:00
c1f4668806
The PHP DOM extension measures lengths and offsets in Unicode codepoints. Our PHP code used UTF-8 bytes, causing some offsets to be slightly off. Now it mostly uses Unicode codepoints as well (we're forced to use bytes in a few places, because preg_match returns offsets in bytes). In practice, this had no visible effect to the user. It caused the markers `<span data-mw-comment-end="..."></span>` to be placed at the end of their container instead of the correct position when the timestamp contained multibyte characters (e.g. "ź" in Polish); but the correct position is usually at the end of the container anyway. In the test cases, the only difference is placing these markers before a trailing line break inside `<p>...</p>` tags rather than before it. The patch also accidentally fixes another bug, where element nodes with no children (mostly <img>) were incorrectly excluded when calling cloneContents(), because they were treated as if they were text nodes. Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee |
||
---|---|---|
.. | ||
ckb-big-parsoid-getHTML.json | ||
ckb-big-parsoid-getText.json | ||
ckb-big-parsoid.html | ||
ckb-big-parsoid.json |