The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).
In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.
In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.
The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.
Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
We added it because the initial designs for the subscribe action were
much easier to implement like this, and topic "containers" (T269950)
would have required it.
However, the latest design of the subscribe action will not need it
(T279149), and topic containers are still very far away, so let's
remove it for now.
Bug: T280433
Change-Id: I21a23e9bea43f24d265750926fbd62b99038d3f1
Longer, but follows the style guide and less likely to conflict.
We need to account for init classes in the cache being around for
a while.
Change-Id: I738bc93393850db320fdbda2b003ca8ac40556da
The code we're testing already produces a string of serialized HTML,
no need to parse and re-serialize it.
Also, we recently learned that the precise format matters here
(T274709), and now this test *actually* covers the fix for that bug.
Follow-up to 5b26e9664b.
As a downside, this test might now spuriously fail if the format of
the output of Parsoid's XMLSerializer changes. Hopefully that won't
happen too often.
Change-Id: I69b514f545e47dcb437fb39a83edb8e2f19ed99b
Now it detect signatures generated by en.wp's {{Undated}} template,
and signatures of people who do weird stuff to the timestamps.
Bug: T275938
Change-Id: I27b07f6786ca5433a3c02a5fe68e4716d41401bb
While working on T270009, I noticed that <style> and <link> nodes
are treated differently, which seemed weird. Rewrite this again,
hopefully this is the last time.
The changed test cases also involve <area> and <input> nodes,
and the new results make more sense to me.
Bug: T264116
Change-Id: I3af90c84768a4b3dc53446927f4dba6f72175a2f
We've recently decided that we want to "extend" comments until
the end of the paragraph (e36dc8e78a,
d0ae6c4e44).
However, we still had this special case that did the opposite: it
ensured that if a comment ended in the middle of a text node, the
comment would not be extended to the end of the node. Remove it.
Note the change in the test file signatures-funny-formattedreply.html,
which actually covered this case specifically.
Change-Id: Id1384bb0c6e1a5f0c70f55efcb4caa240f230f07
The end marker is skipped forward until an open or close
block tag is reached. In tree traversal terms this means
moving either to the next sibling, or the parent (to skip
over close tags).
Bug: T256033
Change-Id: Iaa2c588698790d576ac4f9ecc126f58a082ef6b3
As CommentFormatter no longer needs HTMLFormatter, remove
the inheritance and make addReplyLinks a static method.
Testing locally this is marginally slower, going from 2.55s
to 2.9s for the CommentFormatterTest case.
Bug: T266317
Bug: T267973
Change-Id: If69749cae678a1647a138d782a32032189f55cec
Use the same logic for marking ranges in the document, and ensure
that the heading range does not include section edit links or
section numberings.
Change-Id: I782caafc34fee2a822b0a17b24dd6b9528202eca
To avoid old threads re-appearing on popular pages when someone
uses a vague title (e.g. dozens of threads titled "question" on
[[Wikipedia:Help desk]]: https://w.wiki/fbN), include the oldest
timestamp in the thread (i.e. date the thread was started) in the
heading ID.
Bug: T264478
Change-Id: If918bfd5e025248923d1939bc86916697ead95a0
Sequential numbers aren't great because they change when an earlier
comment is archived. Parent comment/heading IDs should change less
often.
This also makes much more sense for disambiguating subsections,
e.g. a dozen identical ===Votes=== sections for a dozen proposals.
Bug: T264478
Change-Id: I466454984fd919ebef35f2b37ddb5d86dc842996
Our threads now also contain all replies to their sub-threads.
This is similar to how sections work in MediaWiki, where the parent
section also contains the content of all the lower-level sections.
We're going to need this for notifications about replies in a thread.
Bug: T264478
Change-Id: I241fc58e2088a7555942824b0f184ed21e3a8b6f
Previously, only comments could have IDs, because we only needed IDs
for replying. But we might also use them for notifications soon.
Bug: T264478
Change-Id: I1bcad02bf17ab54bc5028a959543c10f0430836b
I haven't really reviewed the outputs, but at least a) they don't
crash b) they will fail if the output suddenly changes (which could
cause problems due to caching).
Bug: T252555
Change-Id: I1bbcbc5dd17ce1e24b3622062f5e8df4baf5f389