mediawiki-extensions-Discus.../tests/cases/ckb-big-oldparser
Bartosz Dziewoński c1f4668806 Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes
The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).

In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.

In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.

The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.

Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
2021-09-27 19:04:16 +00:00
..
ckb-big-oldparser-getHTML.json Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes 2021-09-27 19:04:16 +00:00
ckb-big-oldparser-getText.json Add some tests covering ThreadItem::getHTML() and related methods 2021-08-24 07:54:09 +02:00
ckb-big-oldparser.html Add integration tests using pages from ckb.wp 2020-09-01 01:50:33 +02:00
ckb-big-oldparser.json Introduce comment "names" to identify comments across revisions/pages 2021-03-23 16:08:42 +00:00