I was curious why running some of the PHPUnit tests in this code base
takes so long. While I could not spot an obvious bottleneck I found
a lot of code that is extremely hot, e.g. called a hundred thousand
times. A few obvious optimizations are possible in this code, e.g.
not calling the surprisingly expensive DOMCompat::getClassList
multiple times.
Change-Id: If22bbc1aedd2c36db1ab2343de5737009050b7bb
MediaWiki's PHPCS plugin requires documentation comments on all
properties, unless those properties are typed.
This has potential to introduce bugs – in particular, because typed
properties without a default value will throw an exception if their
value is accessed before it's defined, while previously they defaulted
to null. I fixed this when I found it (making them nullable and null
by default), but I may have missed some cases.
Change-Id: If5b1f4d542ce3e1b69327ee4283f7c3e133a62a0
Since 92f5cfd8 we support "mw-notalk" to suppressing comment detection
in pages or sections.
Until now, it only worked when the comment timestamp was surrounded by
a marked element. However, when a marked element was directly adjacent
to a comment, it would sometimes become a part of the comment range.
This can no longer happen now.
Existing use cases for this were the {{outdent}} and {{tracked}}
templates, which we handle specially since 50ad5bb2 and ddd391b6.
It's a bit ugly to hardcode specific templates like that, and this
provides a better solution for the future. The added test case
displays some other potential uses.
Bug: T324132
Change-Id: I7ffd299ef5957b35da8d01f9a0ed5a7a9a78be83
Also merge setMwGlobals() calls because they are really expensive.
Also utilize the more readable str_contains() and related.
Change-Id: Iebde6aa17c2e366f0c0a98fe13a454f6a06c299b
This code was unnecessarily copied from VE. It's not needed for
anything in this extension, and it causes the headings to be treated
as modified by selser, which in turn causes dirty diffs.
Bug: T328268
Change-Id: Ibdbed430f2ff28d0ea2e67644075c1621d9fae53
* Detect comment separators at the end of comments too
* Consider TemplateStyles associated with ignored templates
This unexpectedly improves a lot of cases other than T313097 too,
mostly where <br> or {{outdent}} was used within a paragraph:
splitting comments that were previously jumbled together, or restoring
content that was previously ignored for apps / notifications.
Bug: T313097
Change-Id: I9b2ef6b760f2ffd97141ad7000f70919aeab7803
This improves the behavior when replying to these comments
and the message snippets shown in notifications.
Bug: T313097
Change-Id: Ia10400472c9e999fa526c7437a03b72461c37b74
Othercontent would often contain the opening tag of the next heading /
section. By looking for the closest node with a previousSibling we can
more-reliably escape the heading.
Also, only add the initial placeholder if there's content before the
first heading. We do this by testing for any siblings before the
startContainer of the first heading -- if there are any, assume this
means there's some sort of content. (This can still result in a
placeholder with `othercontent:""` if there's only whitespace before
the first heading.)
Bug: T313850
Change-Id: I080205b74413c46d3cf3442e79276145aaa9439c
Rename ThreadItem to ContentThreadItem, then create a new ThreadItem
interface containing only the methods that we'll be able to implement
using only the persistently stored data (no parsing), then create a
DatabaseThreadItem. Do the same for CommentItem and HeadingItem.
ThreadItemSet gets a similar treatment, but it's basically only for
Phan's type checking. (This is sad.)
Change-Id: I1633049befe8ec169753b82eb876459af1f63fe8
While in many cases the class will never be sub-classed, it's easier
just to always use static:: and not worry about predicting which
classes might have problems in the future.
Change-Id: I23072a1701b5acf62bb3379a877de97627d8fcf3
Change the order of checks to ensure that we have at least one comment
before we try comparing ranges, to avoid issues with empty headings
having collapsed ranges. It should be a tiny bit faster this way, too.
Bug: T304377
Change-Id: I59ad30cfc075dcec882e048d2d199744efec2114
Also fix a bug where headings would be ignored while checking for
comment frames. See task for detailed explanation.
Bug: T303396
Change-Id: I6495826b4b050ea80680e0798ac6ab4497a7c09e
Phan can analyze them now and reports some issues with types.
* Add some assertions on types where we're sure that we're using an
Element or non-null, but Phan can't prove it
* Fix incorrect type hints on getFullyCoveredSiblings() and
getCoveredSiblings(), luckily it was harmless
Change-Id: I8cc12450378efa7434c4d66882378b715edd4a70
`tagName` is only defined on Element, and it returns its tag name.
`nodeName` is defined on Node, and it returns the tag name for Elements,
and a string like '#text' or '#document-fragment' for other types.
We were using both, which made it harder to reason about what types
we're dealing with.
Change-Id: I8e621e5872bdf78c84ec553cfbfcdbf0192f0589
It is friendlier for static analysis tools like Phan, which can't
infer anything from the `->nodeType === …` checks, and we were already
using it in most places.
Fix newly revealed Phan failures (and one unneeded suppression).
Change-Id: Id789f05e16a210f7ba22ca7514587c392fac0741
Added in 76289cdf73,
should no longer be needed since we switch to Parsoid's
HTML parser in 3e6ab2c4d2.
Change-Id: Ic0b7ed8089b71f2338e604f68d547759e069f0b2
Also special-case thumbnail wrappers generated by
MediaTransformOutput::linkWrap, for compatibility with
TimedMediaHandler.
Bug: T301427
Bug: T302296
Change-Id: I7f48d8b2261507c5a33526c54109f5187d062ed3
Previously, we required a signature at the end of the comment.
This was a pretty rough heuristic that did not correctly handle
many comments that we would consider entirely properly signed
in CommentParser (e.g. comments wrapped in formatting like
<small>…</small>, comments with a post-scriptum or in parentheses,
or comments generated by various templates).
Now we process the user input using the same code that adds reply
links, and only add a signature when we detect that there really
isn't a signature (including template-generated), or if the signature
is in the wrong place and would result in the reply link showing up
in the wrong place as well (not at the end of the comment).
Bug: T278442
Bug: T268558
Bug: T278355
Bug: T291421
Bug: T282983
Change-Id: I46b6110af328ebdf93b7dfc2bd941e04391a1599
Also, in ThreadItem::getSinglePageTransclusionTitle(), we don't need
this terribly complicated method.
Change-Id: If02c09aaa2f4dd66b2bc253a1edec4ea107564ee
When we encounter a node that doesn't represent comment contents, e.g.:
* a [reply] link we inserted (T297034#7641334)
* an {{outdent}} template (see changed test case)
…we should ignore it together with its descendants (like in
Parser#nextInterestingLeafNode), instead of processing descendants
and possibly detecting comment contents in them.
Follow-up to 8de940b587,
72b9c2c6f5.
Bug: T297034
Change-Id: Ib2fa40c5fa389572b0e88ef558728fa06e3621b0
In PHP, use DOMCompat::getClassList(), provided by Parsoid.
In JS, use `.classList`, available in all supported browsers.
This may fix some bugs where we were incorrectly checking for exactly
one class. The change in isOurGeneratedNode() is needed for
Ib2fa40c5fa389572b0e88ef558728fa06e3621b0.
Change-Id: Ia28d31678fd3d617b69280c4b7857755300fa515
Reimplement getFullyCoveredSiblings() using compareRanges(), which
checks basically the same thing, but works better and I like it more.
Bug: T297034
Change-Id: I33dc1d088bdee984064315290e378bfbfa830b10
Previously: 569db3603c (2020-06).
Unfortunately we've found cases where the previous implementation
doesn't work correctly, resulting in comments being added to the wrong
pages or page corruption.
Bug: T289873
Bug: T298051
Change-Id: Id867b3005ebc46906d6df852a525fcaec9e6b19b
Usually this isn't a problem, because the comments are marked as
template-generated and we don't allow replying to them. But we had a
special case where we were trying to skip over some invisible
elements, which was causing us to skip into the middle of the
about-group in some cases. When Parsoid sees that, it serializes the
contents twice.
Bug: T290940
Change-Id: I9fe0b8d43ab874ccef371990799f77bfc46bc954
The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).
In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.
In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.
The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.
Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
Use DOMCompat::querySelectorAll() instead.
CommentModifier::isHtmlSigned()
* Copied the CSS selector from the JS equivalent function.
CommentUtils::unwrapParsoidSections()
* Copied the CSS selector from the JS equivalent function (in VisualEditor).
CommentItem::getMentions()
* Trivial.
This causes Phan to report some more issues, which are also fixed.
Follow-up to 25272e7a4a.
Change-Id: Iaf1222f7114916f2eca19942c3686168899486fd
These changes ensure that DiscussionTools is independent of DOM
library choice, and will not break if/when Parsoid switches to an
alternate (more standards-compliant) DOM library.
We run `phan` against the Dodo standards-compliant DOM library,
so this ends up flagging uses of non-standard PHP extensions to
the DOM. These will be suppressed for now with a "Nonstandard DOM"
comment that can be grepped for, since they will eventually
will need to be rewritten or worked around.
Most frequent issues:
* Node::nodeValue and Node::textContent and Element::getAttribute()
can return null in a spec-compliant implementation. Add `?? ''` to
make spec-compliant results consistent w/ what PHP returns.
* DOMXPath doesn't accept anything except DOMDocument. These uses
should be replaced with DOMCompat::querySelectorAll() or similar
(which end up using DOMXPath under the covers for DOMDocument any way,
but are implemented more efficiently in a spec-compliant
implementation).
* A couple of times we have code like:
`while ($node->firstChild!==null) { $node = $node->firstChild; }`
and phan's analysis isn't strong enough to determine that $node is still
non-null after the while. This same issue should appear with DOMDocument
but phan doesn't complain for some reason.
One apparently legit issue:
* Node::insertBefore() is once called in a funny way which leans on
the fact that the second option is optional in PHP. This seems to be
a workaround for an ancient PHP bug, and can probably be safely
removed.
Bug: T287611
Bug: T217867
Change-Id: I3c4f41c3819770f85d68157c9f690d650b7266a3