Commit graph

35 commits

Author SHA1 Message Date
Bartosz Dziewoński c1f4668806 Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes
The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).

In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.

In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.

The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.

Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
2021-09-27 19:04:16 +00:00
libraryupgrader 26b69d2c70 build: Updating composer dependencies
* mediawiki/mediawiki-phan-config: 0.10.6 → 0.11.0
* php-parallel-lint/php-parallel-lint: 1.3.0 → 1.3.1

Change-Id: I76996ed939d706739d2094077c64eeca6f51126a
2021-09-08 23:14:53 +00:00
Bartosz Dziewoński 7a9fd40eb9 Remove use of DOMXPath to remove Phan suppressions
Use DOMCompat::querySelectorAll() instead.

CommentModifier::isHtmlSigned()
* Copied the CSS selector from the JS equivalent function.
CommentUtils::unwrapParsoidSections()
* Copied the CSS selector from the JS equivalent function (in VisualEditor).
CommentItem::getMentions()
* Trivial.

This causes Phan to report some more issues, which are also fixed.

Follow-up to 25272e7a4a.

Change-Id: Iaf1222f7114916f2eca19942c3686168899486fd
2021-08-02 18:23:16 +02:00
C. Scott Ananian 25272e7a4a Don't refer directly to PHP dom extension classes; avoid nonstandard behavior
These changes ensure that DiscussionTools is independent of DOM
library choice, and will not break if/when Parsoid switches to an
alternate (more standards-compliant) DOM library.

We run `phan` against the Dodo standards-compliant DOM library,
so this ends up flagging uses of non-standard PHP extensions to
the DOM.  These will be suppressed for now with a "Nonstandard DOM"
comment that can be grepped for, since they will eventually
will need to be rewritten or worked around.

Most frequent issues:

* Node::nodeValue and Node::textContent and Element::getAttribute()
can return null in a spec-compliant implementation.  Add `?? ''` to
make spec-compliant results consistent w/ what PHP returns.

* DOMXPath doesn't accept anything except DOMDocument.  These uses
should be replaced with DOMCompat::querySelectorAll() or similar
(which end up using DOMXPath under the covers for DOMDocument any way,
but are implemented more efficiently in a spec-compliant
implementation).

* A couple of times we have code like:
  `while ($node->firstChild!==null) { $node = $node->firstChild; }`
and phan's analysis isn't strong enough to determine that $node is still
non-null after the while.  This same issue should appear with DOMDocument
but phan doesn't complain for some reason.

One apparently legit issue:

* Node::insertBefore() is once called in a funny way which leans on
the fact that the second option is optional in PHP.  This seems to be
a workaround for an ancient PHP bug, and can probably be safely
removed.

Bug: T287611
Bug: T217867
Change-Id: I3c4f41c3819770f85d68157c9f690d650b7266a3
2021-07-30 18:15:40 -04:00
C. Scott Ananian 5203d30ea6 Use DOMCompat::newDocument() to create a new Document
For compatibility with Parsoid's document abstraction (Parsoid may
switch to an alternate DOM library in the future), don't explicitly
create a new document object using `new DOMDocument`; instead use
the Parsoid wrapper `DOMCompat::newDocument()`.  This ensures that
the Document object created will be compatible with Parsoid.

There are a number of other subtle dependencies on the PHP `dom`
extension in DiscussionTools, like explicit `instanceof` tests; those
will be tweaked in a follow-up patch
(I3c4f41c3819770f85d68157c9f690d650b7266a3) since they do not affect
correctness so long as Parsoid is aliasing Document to a subclass of
the built-in DOMDocument.  Similarly, the Phan warnings we suppress
do not cause runtime errors (because of the fixes included in
c5265341afd9efde6b54ba56dc009aab88eff83c) but phan will be happier
once the follow-up patch lands and aligns all the DOM types.

Bug: T287611
Depends-On: If0671255779571a91d3472a9d90d0f2d69dd1f7d
Change-Id: Ib98bd5b76de7a0d32a29840d1ce04379c72ef486
2021-07-30 18:15:11 -04:00
libraryupgrader b0884b177c build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)
* glob-parent: 5.1.1 → 5.1.2
  * https://npmjs.com/advisories/1751 (CVE-2020-28469)
* trim-newlines: 3.0.0 → 3.0.1
  * https://npmjs.com/advisories/1753 (CVE-2021-33623)

Change-Id: I7a71e23da561599da417db3b3077b78d91173bbc
2021-07-22 16:29:04 +00:00
libraryupgrader 12fb65b9f1 build: Updating composer dependencies
* mediawiki/mediawiki-codesniffer: 35.0.0 → 36.0.0
* php-parallel-lint/php-parallel-lint: 1.2.0 → 1.3.0

Change-Id: I5c152292e83e7f3441e2c08b7d0ad23ac90f194b
2021-05-05 11:14:52 +00:00
Ed Sanders c4de603ef9 Give comments IDs so they can be scrolled to with hash links
Bug: T265268
Change-Id: Idb985ed38bdb74e23cb7840899a61dc919f05f6f
2021-03-20 15:43:23 +00:00
Ed Sanders 45cda20cf3 Don't attempt to put comment markers in <noscript> tags
Bug: T276455
Change-Id: Ia427d97528b137111145ac79680972a660f28e37
2021-03-04 22:24:04 +01:00
Bartosz Dziewoński efe95494a8 Improve signature detection to handle formatting on the timestamp
Now it detect signatures generated by en.wp's {{Undated}} template,
and signatures of people who do weird stuff to the timestamps.

Bug: T275938
Change-Id: I27b07f6786ca5433a3c02a5fe68e4716d41401bb
2021-02-27 02:33:30 +01:00
Bartosz Dziewoński e767ee1741 CommentUtils: Fix edge case bug in getCoveredSiblings()
In some cases it would return the parent node, instead of the siblings
it should return.

It's a private method only called by getFullyCoveredSiblings(), and
that method had a bug that cancelled out this one, so everything
worked correctly. But I want to use it elsewhere now and ran into it.

Change-Id: Ic12f007d57a8502a1bea5f0af17b29e9d59093d6
2021-02-27 02:26:42 +01:00
Thiemo Kreuz 1e0d2d93b3 Add missing out-of-index guard to CommentUtils
I found this error in our logstash. I was not able to find an
existing Phabricator ticket.

Note how line #348 extracts the last element from the
$siblings array. It uses the function end() there, which
returns false in case the array is empty. $siblings[0] can't
do this but yields an error.

An alternative is to use reset(), which can return false as
well. But that's not really better. Especially not better
readable, I would argue.

Change-Id: Ic90cd2392ede15078ba0d5b4d67b8dc5d05f9bf7
2021-02-09 12:27:41 +01:00
Bartosz Dziewoński c781b127c9 Handle category links at ends of comments affecting indentation
* Ignore rendering-transparent nodes between discussion comments.
* Improve isRenderingTransparentNode() so that <link> nodes
  representing TemplateStyles are not considered transparent,
  otherwise this would undo ae920b831f.
  Using a regexp from Parsoid.

Bug: T272746
Change-Id: I0b3c3251156ba6c4826abf5ba44ea93f80ebc01d
2021-01-26 04:55:03 +01:00
jenkins-bot e17467b09c Merge "Fix exception when trying to use non-existent 'typeof' attribute" 2021-01-18 18:58:35 +00:00
Bartosz Dziewoński 8f42c74985 Fix skipping to the end of paragraph, now it considers nested tags
Add yet another tree walking utility: CommentUtils::linearWalk().
Unlike TreeWalker, it allows handling the beginnings and ends of nodes
separately – kind of like parsing a XML token stream, or kind of like
VisualEditor's linear model.

(Add unit tests for this utility. The simple.html test case is copied
from [VisualEditor/VisualEditor]/demos/ve/pages/simple.html.)

Use this utility to stop skipping when we reach either a closing or
opening block node tag. Previously we'd skip over such tags inside
nested "transparent" nodes (like <a>, <del>, or apparently <font>).

Bug: T271385
Change-Id: I201a942eb3a56335e84d94e150ec2c33f8b4f4e0
2021-01-18 18:20:20 +00:00
Bartosz Dziewoński c20e7765ea Fix exception when trying to use non-existent 'typeof' attribute
Bug: T272090
Change-Id: I4d1e7457441f28d789dec8b7fd2dc3ba10fd995e
2021-01-14 22:11:32 +01:00
Bartosz Dziewoński 6e37a172ae Fix detecting decorative comment frames with whitespace
As a result of 0fc71f60cd, "empty" text
nodes (containing only whitespace) at the end of the comment may be
inside the comment's range, and trying to ignore them caused the
ranges not to match and the frame not to be detected.

Now the code works whether they're inside the comment's range or not.

Add a test case for wrapped discussion comments with HTML comments and
with whitespace.

Bug: T250126
Bug: T268407
Change-Id: I2217ff5a635fd1c9c9e803f46795b1bfb3d17535
2021-01-04 20:31:33 +01:00
Bartosz Dziewoński 6c7a0ca9a2 Fix trying to insert start/end markers in impossible locations
Bug: T270009
Bug: T266288
Change-Id: I962128e7d9290e7b5eb49bfdb5847fd17714bae1
2020-12-14 21:09:56 +01:00
Ed Sanders fb0cc01ff8 Skip over empty inline templates (e.g. tracking templates)
Bug: T269036
Change-Id: I15e56041c1f1ecb85e9e368a9fbb07882438bf8d
2020-12-09 18:51:41 +00:00
Bartosz Dziewoński 8c9230fa10 Handle category links like comments (rendering-transparent nodes)
Bug: T269036
Change-Id: Id4321ad09907b5030881456c93da90a39bdfdd75
2020-12-08 21:39:16 +00:00
Thiemo Kreuz 8ffe0d55da Remove comments that literally repeat what the code says
Change-Id: Ib928cf61dc512fbbf39a3279789376d635a82c52
2020-11-11 09:31:59 +01:00
Bartosz Dziewoński bed717d329 Move getHeadlineNodeAndOffset() to utils
Needed by I7d35098d672d0edb50d49e22de1686d5cc83b60e.

Change-Id: I44bf927213de570fe9de43e485e09cfae6778eef
2020-11-05 16:11:30 +01:00
Bartosz Dziewoński 10899af666 Fix parsing links in Parsoid documents without short URLs
Move the code so that we check for "?title=" query parameter first,
because we don't handle this right in the other code path.

Use parse_url() instead of wfParseUrl() because the latter doesn't
accept relative URLs, and we don't care about the other differences.

Bug: T261711
Depends-On: I4da952876e1c3d1a41d06b51f7e26015ff5e34d7
Change-Id: I70fac2b41befd782b0a47a4f726ae748dc0f775d
2020-09-02 23:42:37 +02:00
Bartosz Dziewoński e36dc8e78a Skip to the end of the paragraph in the parser, not modifier
When a comment ended before the end of a paragraph, the next
comment would begin right there in the middle of the paragraph.
This could result in the detected indentation level of that
comment being incorrect, and replies being inserted in wrong
places, as seen in the 'signatures-funny' test case.

The code moved to the parser was previously repeated twice in
addListItem() and addReplyLink(), which should have been a hint
that something isn't quite right.

Also, fix the code guarding against overlapping signatures,
now that signatures may not be at the end of a comment.

Bug: T260855
Change-Id: Ic26a87642f8a15d5de2f7073d4d8176b299c7f94
2020-08-20 19:35:55 +00:00
Bartosz Dziewoński 31b26a5bec Fix indentation level when replying to comments with mixed indentation
When adding a reply, we take a node at the end of the previous comment,
compare that comment's indentation level to the expected indentation level
of the reply, and add (or remove) that number of wrapper lists.

The existing code did not consider that comments may have lists within
them, and so the indentation of that node may not match the indentation
of the comment.

Bug: T252702
Change-Id: Icc5ff19783d2b213bff99f283cb0599a8b5c1ab4
2020-08-06 01:25:33 +02:00
Bartosz Dziewoński 569db3603c Improve detecting template-generated multi-line comments
Bug: T252058
Change-Id: Ic010b8aeff9b177031184f02f92fcdea5280dc36
2020-07-21 22:26:45 +01:00
Ed Sanders a4636d39fc Move #getTranscludedFrom from parser to ThreadItem
Also requires moving getTitleFromUrl to CommentUtils

Change-Id: I9cb83a3fdd456eba66899433b866ce7a7f00eeb5
2020-07-20 15:56:48 +01:00
Ed Sanders b32f991913 Documentation fixes
Change-Id: I2c7ccecbf8a50bd4d658b0f17f4a21fe90a3c399
2020-07-20 13:34:08 +01:00
James D. Forrester d6c3df31f5 Remove various phan suppressions and fix issues
Change-Id: I73b535f284566a0a8876a3198b9784b47567fac6
2020-06-12 20:35:59 +01:00
Ed Sanders 7be0cc3209 Create ThreadItem classes
Change-Id: Id2c5324d74eccb1209ccb76768c557722c6d9400
2020-06-12 20:35:59 +01:00
Umherirrender 48e860916a build: Add mediawiki/mediawiki-phan-config
Replace phan-taint-check-plugin by phan, it is now included

Change-Id: I0e682a83afd30faa8967e3c586431be4ae9a29b3
2020-06-10 22:21:07 +02:00
Ed Sanders da433037a3 Move getTranscludedFromElement to Utils
Change-Id: I8bdd757f949c013ba426150a192d71243fadf45d
2020-06-01 22:32:23 +01:00
Bartosz Dziewoński 01b4a8f4f4 Support replying when timestamp is template-generated
* Move modifier#getFullyCoveredWrapper to utils
* Use that method to find the node where we start searching for
  template wrappers, rather than using endContainer

Bug: T252058
Change-Id: I55de58102f3468fce01290bd413a7fdc96d322d6
2020-05-27 21:16:03 +02:00
Ed Sanders b1427163af Parser.php: Add tests for getTranscludedFrom
Requires an implementation of unwrapParsoidSections

Change-Id: I96c929b1117ba652dbd5af6a1ee37a5f9e87ed1e
2020-05-18 19:53:01 +00:00
Ed Sanders b78fb3f4c1 Move all PHP to the MediaWiki\Extension\DiscussionTools namespace
Change-Id: I654ebb3e646a6d8d62f7bd14d48805e39f836d7e
2020-05-15 21:57:13 +02:00
Renamed from includes/DiscussionToolsCommentUtils.php (Browse further)