Commit graph

5 commits

Author SHA1 Message Date
Bartosz Dziewoński 3a9997d6ea Improve handling for comment separators
* Detect comment separators at the end of comments too
* Consider TemplateStyles associated with ignored templates

This unexpectedly improves a lot of cases other than T313097 too,
mostly where <br> or {{outdent}} was used within a paragraph:
splitting comments that were previously jumbled together, or restoring
content that was previously ignored for apps / notifications.

Bug: T313097
Change-Id: I9b2ef6b760f2ffd97141ad7000f70919aeab7803
2023-01-10 01:59:52 +00:00
Ed Sanders 664d5d041a Fix fetching of oldest comment in a thread
The implementation in Parser doesn't descend into sub-thread.
Re-use the getThreadSummary method in ThreadItem and traverse
the thread properly.

Bug: T298617
Change-Id: I318d9012eb83f37ccbe463923524ef2e9f995ced
2022-09-01 21:22:09 +00:00
Bartosz Dziewoński ddd391b6db Ignore "tracked" templates at the beginning of comments
This improves the behavior when replying to these comments
and the message snippets shown in notifications.

Bug: T313097
Change-Id: Ia10400472c9e999fa526c7437a03b72461c37b74
2022-07-31 03:56:36 +02:00
Bartosz Dziewoński c1f4668806 Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes
The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).

In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.

In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.

The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.

Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
2021-09-27 19:04:16 +00:00
Bartosz Dziewoński a6a547f2b2 Add some tests covering ThreadItem::getHTML() and related methods
* ThreadItem::getText
* CommentItem::getBodyText (used when generating notifications)
* ThreadItem::getHTML (may soon be used in API)
* CommentItem::getBodyHTML (may soon be used in API)
* ImmutableRange::cloneContents (the common implementation for all
  of the above)

The outputs are only lightly reviewed. This is mostly meant to
document the current behavior rather than the expected behavior,
to avoid making unintentional changes while refactoring.

Change-Id: I14471ee4969aa3d0b5577d9de2a6d4462fab4d09
2021-08-24 07:54:09 +02:00