wikimedia/mediawiki-extensions-DiscussionTools

Fork 0

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/DiscussionTools synced 2024-11-14 19:35:38 +00:00

Commit graph

Author	SHA1	Message	Date
Bartosz Dziewoński	c1f4668806	Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes The PHP DOM extension measures lengths and offsets in Unicode codepoints. Our PHP code used UTF-8 bytes, causing some offsets to be slightly off. Now it mostly uses Unicode codepoints as well (we're forced to use bytes in a few places, because preg_match returns offsets in bytes). In practice, this had no visible effect to the user. It caused the markers `<span data-mw-comment-end="..."></span>` to be placed at the end of their container instead of the correct position when the timestamp contained multibyte characters (e.g. "ź" in Polish); but the correct position is usually at the end of the container anyway. In the test cases, the only difference is placing these markers before a trailing line break inside `<p>...</p>` tags rather than before it. The patch also accidentally fixes another bug, where element nodes with no children (mostly <img>) were incorrectly excluded when calling cloneContents(), because they were treated as if they were text nodes. Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee	2021-09-27 19:04:16 +00:00
Bartosz Dziewoński	a6a547f2b2	Add some tests covering ThreadItem::getHTML() and related methods * ThreadItem::getText * CommentItem::getBodyText (used when generating notifications) * ThreadItem::getHTML (may soon be used in API) * CommentItem::getBodyHTML (may soon be used in API) * ImmutableRange::cloneContents (the common implementation for all of the above) The outputs are only lightly reviewed. This is mostly meant to document the current behavior rather than the expected behavior, to avoid making unintentional changes while refactoring. Change-Id: I14471ee4969aa3d0b5577d9de2a6d4462fab4d09	2021-08-24 07:54:09 +02:00

Author

SHA1

Message

Date

Bartosz Dziewoński

c1f4668806

Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes

The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).

In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.

In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.

The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.

Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee

2021-09-27 19:04:16 +00:00

Bartosz Dziewoński

a6a547f2b2

Add some tests covering ThreadItem::getHTML() and related methods

* ThreadItem::getText
* CommentItem::getBodyText (used when generating notifications)
* ThreadItem::getHTML (may soon be used in API)
* CommentItem::getBodyHTML (may soon be used in API)
* ImmutableRange::cloneContents (the common implementation for all
  of the above)

The outputs are only lightly reviewed. This is mostly meant to
document the current behavior rather than the expected behavior,
to avoid making unintentional changes while refactoring.

Change-Id: I14471ee4969aa3d0b5577d9de2a6d4462fab4d09

2021-08-24 07:54:09 +02:00

2 commits