Commit graph

70 commits

Author SHA1 Message Date
Bartosz Dziewoński be012ced04 Only match article path until first '?' when parsing links
Bug: T324028
Change-Id: I7aca1a8f20695b9ecd3f63f2d0a3f5684616655e
2022-11-29 17:16:03 +00:00
Bartosz Dziewoński 433e57394c Use PHP 7.4 property types
Change-Id: I788db64f0c0c00894d77256b7f016d44eda4bbb1
2022-10-28 21:56:38 +02:00
jenkins-bot c9dadbfe7d Merge "Remove support for <span class="mw-headline-number"> in headings" 2022-10-26 16:29:41 +00:00
Bartosz Dziewoński c6cd20f682 Remove support for <span class="mw-headline-number"> in headings
This feature has been removed from MediaWiki in change
Ic9ed88f419419cf4cc5cc32010539eea8b76314b.

Change-Id: If11b33589f47eab614f5129b38e80d0f3cafa083
2022-10-25 18:59:05 +00:00
Bartosz Dziewoński 8664de52d1 Don't insert comment markers inside <figure>
…when wgParserEnableLegacyMediaDOM=false. See task for details.

Bug: T320285
Change-Id: I397cb70f915bb8d974fe2796198d252b1be9a368
2022-10-08 23:23:54 +02:00
Bartosz Dziewoński ddd391b6db Ignore "tracked" templates at the beginning of comments
This improves the behavior when replying to these comments
and the message snippets shown in notifications.

Bug: T313097
Change-Id: Ia10400472c9e999fa526c7437a03b72461c37b74
2022-07-31 03:56:36 +02:00
David Lynch ec0e2920ae API ThreadItemsHTML: improve generation of othercontent
Othercontent would often contain the opening tag of the next heading /
section. By looking for the closest node with a previousSibling we can
more-reliably escape the heading.

Also, only add the initial placeholder if there's content before the
first heading. We do this by testing for any siblings before the
startContainer of the first heading -- if there are any, assume this
means there's some sort of content. (This can still result in a
placeholder with `othercontent:""` if there's only whitespace before
the first heading.)

Bug: T313850
Change-Id: I080205b74413c46d3cf3442e79276145aaa9439c
2022-07-28 02:51:18 -05:00
Bartosz Dziewoński 880f9755e0 Separate ContentThreadItem and DatabaseThreadItem etc.
Rename ThreadItem to ContentThreadItem, then create a new ThreadItem
interface containing only the methods that we'll be able to implement
using only the persistently stored data (no parsing), then create a
DatabaseThreadItem. Do the same for CommentItem and HeadingItem.

ThreadItemSet gets a similar treatment, but it's basically only for
Phan's type checking. (This is sad.)

Change-Id: I1633049befe8ec169753b82eb876459af1f63fe8
2022-07-04 23:35:50 +02:00
Ed Sanders 4accd2fc7e Add some missing typehints
Change-Id: Idb111dd907972d9e02dab4b26c3fc106b12b1035
2022-06-29 15:15:52 +00:00
Ed Sanders af54bae2ec Prefer late static binding over self::
While in many cases the class will never be sub-classed, it's easier
just to always use static:: and not worry about predicting which
classes might have problems in the future.

Change-Id: I23072a1701b5acf62bb3379a877de97627d8fcf3
2022-06-09 15:12:48 +01:00
jenkins-bot 35b3fd2fc0 Merge "CommentParser: Replace uses of Title with TitleValue" 2022-03-23 01:14:16 +00:00
Bartosz Dziewoński c5375e05b9 CommentUtils: Fix isSingleCommentSignedBy() with empty heading
Change the order of checks to ensure that we have at least one comment
before we try comparing ranges, to avoid issues with empty headings
having collapsed ranges. It should be a tiny bit faster this way, too.

Bug: T304377
Change-Id: I59ad30cfc075dcec882e048d2d199744efec2114
2022-03-22 00:12:42 +01:00
Bartosz Dziewoński c7723baf72 CommentParser: Replace uses of Title with TitleValue
Another small step towards removing the reliance on global state.

Change-Id: Ifb4a5bcbef6606d02f1c7aa7385d72822cb0bad0
2022-03-18 18:24:34 +00:00
Bartosz Dziewoński 01b253c5b6 Don't allow the root node to be treated like a comment frame
Also fix a bug where headings would be ignored while checking for
comment frames. See task for detailed explanation.

Bug: T303396
Change-Id: I6495826b4b050ea80680e0798ac6ab4497a7c09e
2022-03-10 17:45:08 +00:00
jenkins-bot dd24b0edcd Merge "Improve handling for comments after fake headings using wikitext ;" 2022-03-10 16:21:18 +00:00
Bartosz Dziewoński 08c79142fb ImmutableRange: Add @property annotations for magic props
Phan can analyze them now and reports some issues with types.

* Add some assertions on types where we're sure that we're using an
  Element or non-null, but Phan can't prove it
* Fix incorrect type hints on getFullyCoveredSiblings() and
  getCoveredSiblings(), luckily it was harmless

Change-Id: I8cc12450378efa7434c4d66882378b715edd4a70
2022-03-08 23:29:40 +00:00
Bartosz Dziewoński 0e576216b2 CommentUtils: Fix confusing types in getIndentLevel()
Change-Id: I548cf4ad54e92c22da64caf53ee028a906cd3b62
2022-03-08 23:29:15 +00:00
Bartosz Dziewoński 584f6a020c Use tagName rather than nodeName when we know the node is an element
`tagName` is only defined on Element, and it returns its tag name.

`nodeName` is defined on Node, and it returns the tag name for Elements,
and a string like '#text' or '#document-fragment' for other types.

We were using both, which made it harder to reason about what types
we're dealing with.

Change-Id: I8e621e5872bdf78c84ec553cfbfcdbf0192f0589
2022-03-08 23:29:05 +00:00
Bartosz Dziewoński 063174e71c Use instanceof for checking for text/element nodes in PHP
It is friendlier for static analysis tools like Phan, which can't
infer anything from the `->nodeType === …` checks, and we were already
using it in most places.

Fix newly revealed Phan failures (and one unneeded suppression).

Change-Id: Id789f05e16a210f7ba22ca7514587c392fac0741
2022-03-08 23:28:39 +00:00
Bartosz Dziewoński b2ee19b441 Remove check for CDATA nodes
Added in 76289cdf73,
should no longer be needed since we switch to Parsoid's
HTML parser in 3e6ab2c4d2.

Change-Id: Ic0b7ed8089b71f2338e604f68d547759e069f0b2
2022-03-04 22:14:41 +01:00
jenkins-bot 3c91a800ed Merge "Improve detecting already signed comments" 2022-03-02 14:14:13 +00:00
jenkins-bot e4fa34f025 Merge "Don't insert comment markers inside replaced elements (like <video>)" 2022-02-28 17:16:11 +00:00
Bartosz Dziewoński 1e3ce9c88a Don't insert comment markers inside replaced elements (like <video>)
Also special-case thumbnail wrappers generated by
MediaTransformOutput::linkWrap, for compatibility with
TimedMediaHandler.

Bug: T301427
Bug: T302296
Change-Id: I7f48d8b2261507c5a33526c54109f5187d062ed3
2022-02-22 15:11:34 +00:00
Bartosz Dziewoński 0ecc8a4c05 Improve detecting already signed comments
Previously, we required a signature at the end of the comment.
This was a pretty rough heuristic that did not correctly handle
many comments that we would consider entirely properly signed
in CommentParser (e.g. comments wrapped in formatting like
<small>…</small>, comments with a post-scriptum or in parentheses,
or comments generated by various templates).

Now we process the user input using the same code that adds reply
links, and only add a signature when we detect that there really
isn't a signature (including template-generated), or if the signature
is in the wrong place and would result in the reply link showing up
in the wrong place as well (not at the end of the comment).

Bug: T278442
Bug: T268558
Bug: T278355
Bug: T291421
Bug: T282983
Change-Id: I46b6110af328ebdf93b7dfc2bd941e04391a1599
2022-02-21 21:21:26 +00:00
Bartosz Dziewoński aea36bab3a CommentParser: Fix a small use of global state
Also, in ThreadItem::getSinglePageTransclusionTitle(), we don't need
this terribly complicated method.

Change-Id: If02c09aaa2f4dd66b2bc253a1edec4ea107564ee
2022-02-21 18:15:31 +00:00
Bartosz Dziewoński e414d1acaf Improve handling for comments after fake headings using wikitext ;
Bug: T265964
Change-Id: I77db68928c5426fd885a277eec52c6e164d559bb
2022-02-11 23:35:32 +00:00
Bartosz Dziewoński 5945a4a0eb Fix some typos in comments
Change-Id: I699d9d105b8706cef0800ccc086cde687de54078
2022-02-04 20:36:28 +01:00
Bartosz Dziewoński 110a59200f One more tweak for comparing comment ranges to transclusion/DOM ranges
When we encounter a node that doesn't represent comment contents, e.g.:
* a [reply] link we inserted (T297034#7641334)
* an {{outdent}} template (see changed test case)

…we should ignore it together with its descendants (like in
Parser#nextInterestingLeafNode), instead of processing descendants
and possibly detecting comment contents in them.

Follow-up to 8de940b587,
72b9c2c6f5.

Bug: T297034
Change-Id: Ib2fa40c5fa389572b0e88ef558728fa06e3621b0
2022-01-24 17:42:18 +00:00
Bartosz Dziewoński f15693eefa Use class list everywhere for adding/checking CSS classes
In PHP, use DOMCompat::getClassList(), provided by Parsoid.
In JS, use `.classList`, available in all supported browsers.

This may fix some bugs where we were incorrectly checking for exactly
one class. The change in isOurGeneratedNode() is needed for
Ib2fa40c5fa389572b0e88ef558728fa06e3621b0.

Change-Id: Ia28d31678fd3d617b69280c4b7857755300fa515
2022-01-24 18:40:00 +01:00
Bartosz Dziewoński 8b426c7e5c Fix placeholder headings causing exceptions in getTranscludedFrom()
Follow-up to 8de940b587.

Change-Id: Iddf045105fac6ab8cdaa933fd2abcf6dbbd37d42
2022-01-11 23:24:45 +00:00
Ed Sanders 272b6595f5 Docs: Illustrate range overlaps in comment
Change-Id: If6f5d83719b8d078cd13327c0b9cbaef03f87508
2022-01-11 17:28:25 +00:00
Bartosz Dziewoński 72b9c2c6f5 Ignore some invisible nodes when looking for comment frames
Reimplement getFullyCoveredSiblings() using compareRanges(), which
checks basically the same thing, but works better and I like it more.

Bug: T297034
Change-Id: I33dc1d088bdee984064315290e378bfbfa830b10
2022-01-11 17:01:53 +00:00
Bartosz Dziewoński 8de940b587 Improve detecting transcluded comments again
Previously: 569db3603c (2020-06).

Unfortunately we've found cases where the previous implementation
doesn't work correctly, resulting in comments being added to the wrong
pages or page corruption.

Bug: T289873
Bug: T298051
Change-Id: Id867b3005ebc46906d6df852a525fcaec9e6b19b
2022-01-11 16:07:44 +00:00
Bartosz Dziewoński ef7274d69e Move some helpers from CommentParser to CommentUtils
Change-Id: I0e323d3b75f47459a5548a13e9684f4c6ff4ba0c
2021-12-13 17:13:41 +01:00
Bartosz Dziewoński 83ba496919 Avoid splitting about-groups starting with an empty <span>
Usually this isn't a problem, because the comments are marked as
template-generated and we don't allow replying to them. But we had a
special case where we were trying to skip over some invisible
elements, which was causing us to skip into the middle of the
about-group in some cases. When Parsoid sees that, it serializes the
contents twice.

Bug: T290940
Change-Id: I9fe0b8d43ab874ccef371990799f77bfc46bc954
2021-11-15 16:03:38 +00:00
Bartosz Dziewoński c1f4668806 Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes
The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).

In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.

In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.

The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.

Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
2021-09-27 19:04:16 +00:00
libraryupgrader 26b69d2c70 build: Updating composer dependencies
* mediawiki/mediawiki-phan-config: 0.10.6 → 0.11.0
* php-parallel-lint/php-parallel-lint: 1.3.0 → 1.3.1

Change-Id: I76996ed939d706739d2094077c64eeca6f51126a
2021-09-08 23:14:53 +00:00
Bartosz Dziewoński 7a9fd40eb9 Remove use of DOMXPath to remove Phan suppressions
Use DOMCompat::querySelectorAll() instead.

CommentModifier::isHtmlSigned()
* Copied the CSS selector from the JS equivalent function.
CommentUtils::unwrapParsoidSections()
* Copied the CSS selector from the JS equivalent function (in VisualEditor).
CommentItem::getMentions()
* Trivial.

This causes Phan to report some more issues, which are also fixed.

Follow-up to 25272e7a4a.

Change-Id: Iaf1222f7114916f2eca19942c3686168899486fd
2021-08-02 18:23:16 +02:00
C. Scott Ananian 25272e7a4a Don't refer directly to PHP dom extension classes; avoid nonstandard behavior
These changes ensure that DiscussionTools is independent of DOM
library choice, and will not break if/when Parsoid switches to an
alternate (more standards-compliant) DOM library.

We run `phan` against the Dodo standards-compliant DOM library,
so this ends up flagging uses of non-standard PHP extensions to
the DOM.  These will be suppressed for now with a "Nonstandard DOM"
comment that can be grepped for, since they will eventually
will need to be rewritten or worked around.

Most frequent issues:

* Node::nodeValue and Node::textContent and Element::getAttribute()
can return null in a spec-compliant implementation.  Add `?? ''` to
make spec-compliant results consistent w/ what PHP returns.

* DOMXPath doesn't accept anything except DOMDocument.  These uses
should be replaced with DOMCompat::querySelectorAll() or similar
(which end up using DOMXPath under the covers for DOMDocument any way,
but are implemented more efficiently in a spec-compliant
implementation).

* A couple of times we have code like:
  `while ($node->firstChild!==null) { $node = $node->firstChild; }`
and phan's analysis isn't strong enough to determine that $node is still
non-null after the while.  This same issue should appear with DOMDocument
but phan doesn't complain for some reason.

One apparently legit issue:

* Node::insertBefore() is once called in a funny way which leans on
the fact that the second option is optional in PHP.  This seems to be
a workaround for an ancient PHP bug, and can probably be safely
removed.

Bug: T287611
Bug: T217867
Change-Id: I3c4f41c3819770f85d68157c9f690d650b7266a3
2021-07-30 18:15:40 -04:00
C. Scott Ananian 5203d30ea6 Use DOMCompat::newDocument() to create a new Document
For compatibility with Parsoid's document abstraction (Parsoid may
switch to an alternate DOM library in the future), don't explicitly
create a new document object using `new DOMDocument`; instead use
the Parsoid wrapper `DOMCompat::newDocument()`.  This ensures that
the Document object created will be compatible with Parsoid.

There are a number of other subtle dependencies on the PHP `dom`
extension in DiscussionTools, like explicit `instanceof` tests; those
will be tweaked in a follow-up patch
(I3c4f41c3819770f85d68157c9f690d650b7266a3) since they do not affect
correctness so long as Parsoid is aliasing Document to a subclass of
the built-in DOMDocument.  Similarly, the Phan warnings we suppress
do not cause runtime errors (because of the fixes included in
c5265341afd9efde6b54ba56dc009aab88eff83c) but phan will be happier
once the follow-up patch lands and aligns all the DOM types.

Bug: T287611
Depends-On: If0671255779571a91d3472a9d90d0f2d69dd1f7d
Change-Id: Ib98bd5b76de7a0d32a29840d1ce04379c72ef486
2021-07-30 18:15:11 -04:00
libraryupgrader b0884b177c build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)
* glob-parent: 5.1.1 → 5.1.2
  * https://npmjs.com/advisories/1751 (CVE-2020-28469)
* trim-newlines: 3.0.0 → 3.0.1
  * https://npmjs.com/advisories/1753 (CVE-2021-33623)

Change-Id: I7a71e23da561599da417db3b3077b78d91173bbc
2021-07-22 16:29:04 +00:00
libraryupgrader 12fb65b9f1 build: Updating composer dependencies
* mediawiki/mediawiki-codesniffer: 35.0.0 → 36.0.0
* php-parallel-lint/php-parallel-lint: 1.2.0 → 1.3.0

Change-Id: I5c152292e83e7f3441e2c08b7d0ad23ac90f194b
2021-05-05 11:14:52 +00:00
Ed Sanders c4de603ef9 Give comments IDs so they can be scrolled to with hash links
Bug: T265268
Change-Id: Idb985ed38bdb74e23cb7840899a61dc919f05f6f
2021-03-20 15:43:23 +00:00
Ed Sanders 45cda20cf3 Don't attempt to put comment markers in <noscript> tags
Bug: T276455
Change-Id: Ia427d97528b137111145ac79680972a660f28e37
2021-03-04 22:24:04 +01:00
Bartosz Dziewoński efe95494a8 Improve signature detection to handle formatting on the timestamp
Now it detect signatures generated by en.wp's {{Undated}} template,
and signatures of people who do weird stuff to the timestamps.

Bug: T275938
Change-Id: I27b07f6786ca5433a3c02a5fe68e4716d41401bb
2021-02-27 02:33:30 +01:00
Bartosz Dziewoński e767ee1741 CommentUtils: Fix edge case bug in getCoveredSiblings()
In some cases it would return the parent node, instead of the siblings
it should return.

It's a private method only called by getFullyCoveredSiblings(), and
that method had a bug that cancelled out this one, so everything
worked correctly. But I want to use it elsewhere now and ran into it.

Change-Id: Ic12f007d57a8502a1bea5f0af17b29e9d59093d6
2021-02-27 02:26:42 +01:00
Thiemo Kreuz 1e0d2d93b3 Add missing out-of-index guard to CommentUtils
I found this error in our logstash. I was not able to find an
existing Phabricator ticket.

Note how line #348 extracts the last element from the
$siblings array. It uses the function end() there, which
returns false in case the array is empty. $siblings[0] can't
do this but yields an error.

An alternative is to use reset(), which can return false as
well. But that's not really better. Especially not better
readable, I would argue.

Change-Id: Ic90cd2392ede15078ba0d5b4d67b8dc5d05f9bf7
2021-02-09 12:27:41 +01:00
Bartosz Dziewoński c781b127c9 Handle category links at ends of comments affecting indentation
* Ignore rendering-transparent nodes between discussion comments.
* Improve isRenderingTransparentNode() so that <link> nodes
  representing TemplateStyles are not considered transparent,
  otherwise this would undo ae920b831f.
  Using a regexp from Parsoid.

Bug: T272746
Change-Id: I0b3c3251156ba6c4826abf5ba44ea93f80ebc01d
2021-01-26 04:55:03 +01:00
jenkins-bot e17467b09c Merge "Fix exception when trying to use non-existent 'typeof' attribute" 2021-01-18 18:58:35 +00:00
Bartosz Dziewoński 8f42c74985 Fix skipping to the end of paragraph, now it considers nested tags
Add yet another tree walking utility: CommentUtils::linearWalk().
Unlike TreeWalker, it allows handling the beginnings and ends of nodes
separately – kind of like parsing a XML token stream, or kind of like
VisualEditor's linear model.

(Add unit tests for this utility. The simple.html test case is copied
from [VisualEditor/VisualEditor]/demos/ve/pages/simple.html.)

Use this utility to stop skipping when we reach either a closing or
opening block node tag. Previously we'd skip over such tags inside
nested "transparent" nodes (like <a>, <del>, or apparently <font>).

Bug: T271385
Change-Id: I201a942eb3a56335e84d94e150ec2c33f8b4f4e0
2021-01-18 18:20:20 +00:00