Commit graph

117 commits

Author SHA1 Message Date
Bartosz Dziewoński 6a59149132 Ignore LRM and RLM in more places in the timestamp
We previously ignored them before timezone indicator (e9c401e3aa),
but they can end up in other places too, e.g. after the time.

Now we ignore them after every token. This is way overkill, but it
shouldn't hurt.

Bug: T308448
Change-Id: I20f7aaa34dba23f2a2faf1be258c1aea32ab770f
2022-05-17 02:00:22 +02:00
Bartosz Dziewoński c7723baf72 CommentParser: Replace uses of Title with TitleValue
Another small step towards removing the reliance on global state.

Change-Id: Ifb4a5bcbef6606d02f1c7aa7385d72822cb0bad0
2022-03-18 18:24:34 +00:00
jenkins-bot 32d9ef573a Merge "CommentParser: Avoid using a dynamic undeclared property" 2022-03-10 00:22:16 +00:00
jenkins-bot 76478dda26 Merge "Move signatureScanLimit to a constant in JS" 2022-03-10 00:22:14 +00:00
Bartosz Dziewoński 4c29304484 CommentParser: Avoid using a dynamic undeclared property
Change-Id: Iefa8dea83bc0d31b9c6b3509189eeaa652dd9ea0
2022-03-08 23:30:11 +00:00
Bartosz Dziewoński 08c79142fb ImmutableRange: Add @property annotations for magic props
Phan can analyze them now and reports some issues with types.

* Add some assertions on types where we're sure that we're using an
  Element or non-null, but Phan can't prove it
* Fix incorrect type hints on getFullyCoveredSiblings() and
  getCoveredSiblings(), luckily it was harmless

Change-Id: I8cc12450378efa7434c4d66882378b715edd4a70
2022-03-08 23:29:40 +00:00
Bartosz Dziewoński eb1fe7a8fb CommentParser: Fix redundant uses of getHeadlineNodeAndOffset()
We call CommentUtils::getHeadlineNodeAndOffset() before constructing
the HeadingItem in CommentParser, so the range's startContainer
is always the headline node.

Change-Id: I2afb6ba9100e785cd91f31d82f4cea59fa8b5443
2022-03-08 23:29:34 +00:00
Bartosz Dziewoński 584f6a020c Use tagName rather than nodeName when we know the node is an element
`tagName` is only defined on Element, and it returns its tag name.

`nodeName` is defined on Node, and it returns the tag name for Elements,
and a string like '#text' or '#document-fragment' for other types.

We were using both, which made it harder to reason about what types
we're dealing with.

Change-Id: I8e621e5872bdf78c84ec553cfbfcdbf0192f0589
2022-03-08 23:29:05 +00:00
Bartosz Dziewoński 063174e71c Use instanceof for checking for text/element nodes in PHP
It is friendlier for static analysis tools like Phan, which can't
infer anything from the `->nodeType === …` checks, and we were already
using it in most places.

Fix newly revealed Phan failures (and one unneeded suppression).

Change-Id: Id789f05e16a210f7ba22ca7514587c392fac0741
2022-03-08 23:28:39 +00:00
jenkins-bot 542da89530 Merge "Don't detect comments within references" 2022-02-28 16:47:21 +00:00
Bartosz Dziewoński 8a2715bdd5 Move signatureScanLimit to a constant in JS
Change-Id: Ieb60c148fd060ab62e4a493e2d0dff6c051f945c
2022-02-21 22:42:14 +01:00
Bartosz Dziewoński 4244418e56 Don't detect comments within references
Bug: T301213
Change-Id: Ifd5198651c8ed0ce53379fb5e35938089cd54a09
2022-02-21 19:57:44 +00:00
jenkins-bot 1a48f8cd7e Merge "CommentParser: Inject a forgotten service" 2022-02-21 19:30:55 +00:00
Bartosz Dziewoński 85165543f4 CommentParser: Inject a forgotten service
Also sort alphabetically.

Change-Id: I9e77c4aa1fba930f382e3c4f17ac0504c2f06668
2022-02-21 20:15:54 +01:00
Bartosz Dziewoński aea36bab3a CommentParser: Fix a small use of global state
Also, in ThreadItem::getSinglePageTransclusionTitle(), we don't need
this terribly complicated method.

Change-Id: If02c09aaa2f4dd66b2bc253a1edec4ea107564ee
2022-02-21 18:15:31 +00:00
Bartosz Dziewoński 8e44b43df0 Split off ThreadItemSet from CommentParser
Goal:
-----
Finishing the work from Iadb7757debe000025e52770ca51ebcf24ca8ee66
by changing CommentParser::parse() to return a data object, instead of
the whole parser.

Changes:
--------
ThreadItemSet.php:
ThreadItemSet.js:
* New data class to access the results of parsing a discussion. Most
  methods and properties are moved from CommentParser with no changes.

CommentParser.php:
Parser.js:
* parse() returns a new ThreadItemSet.
* Remove methods moved to ThreadItemSet.
* Placeholder headings are generated slightly differently, as we process
  things in a different order.
* Grouping threads and computing IDs/names is no longer lazy. We always
  needed IDs/names anyway.
* computeId() explicitly uses a ThreadItemSet to check the existing IDs
  when de-duplicating.

controller.js:
* Move the code for turning some nodes annotated by CommentFormatter
  into a ThreadItemSet (previously a Parser) from controller#init to
  ThreadItemSet.static.newFromAnnotatedNodes, and rewrite it to handle
  assigning parents/replies and recalculating legacy IDs more nicely.
* mw.dt.pageThreads is now a ThreadItemSet.

Change-Id: I49bfe019aa460651447fd383f73eafa9d7180a92
2022-02-21 16:22:32 +00:00
Bartosz Dziewoński 4613ae78e7 Change CommentParser into a service
Goal:
-----
To have a method like CommentParser::parse(), which just takes a node
to parse and a title and returns plain data, so that we don't need to
keep track of the config to construct a CommentParser object (the
required config like content language is provided by services) and
we don't need to keep that object around after parsing.

Changes:
--------
CommentParser.php:
* …is now a service. Constructor only takes services as arguments.
  The node and title are passed to a new parse() method.
* parse() should return plain data, but I split this part to a separate
  patch for ease of review: I49bfe019aa460651447fd383f73eafa9d7180a92.
* CommentParser still cheats and accesses global state in a few places,
  e.g. calling Title::makeTitleSafe or CommentUtils::getTitleFromUrl,
  so we can't turn its tests into true unit tests. This work is left
  for future commits.

LanguageData.php:
* …is now a service, instead of a static class.

Parser.js:
* …is not a real service, but it's changed to behave in a similar way.
  Constructor takes only the required config as argument,
  and node and title are instead passed to a new parse() method.

CommentParserTest.php:
parser.test.js:
* Can be simplified, now that we don't need a useless node and title
  to test internal methods that don't use them.

testUtils.js:
* Can be simplified, now that we don't need to override internal
  ResourceLoader stuff just to change the parser config.

Change-Id: Iadb7757debe000025e52770ca51ebcf24ca8ee66
2022-02-19 19:51:57 +01:00
Bartosz Dziewoński f51f3a1051 CommentParser: Remove unused method getThreadItemsByName()
Follow-up to a5099739a6.

Change-Id: I53cbf6a7a2c9b95674998734689b3930dfe74149
2022-02-19 19:51:57 +01:00
Bartosz Dziewoński 99b5de8038 Split Data class into ResourceLoaderData and LanguageData
The Data class contained utilities for two unrelated purposes.
Split each half to a separate class.

Notably, this improves the signature of the getLocalData() function.

Change-Id: Icde615fb9d483fee1f352c34909b37f8ffde8081
2022-02-19 19:37:34 +01:00
Bartosz Dziewoński ae9f26a9e5 Various code quality tweaks
(suggested by PhpStorm)

composer.json:
* Document required PHP extensions

Parser.js:
* Remove incorrect param documentation
* Fix some typos in comments (missing parentheses)

CommentParser.php:
* Fix some typos in comments (missing parentheses)

ImmutableRange.php:
* Remove unused property
* Add a `throw` to indicate that code path is unreachable

SubscribedNewCommentPresentationModel.php:
* Add missing `return false`

CommentParserTest.php:
* Remove unnecessary pass-by-reference

CommentModifierTest.php:
* Remove unused variable

CommentParserTest.php:
* Don't construct Element objects directly. PHP's DOMElement allows
  it, but Parsoid/Dodo's doesn't, and we use the latter for static
  analysis. This generates all kinds of confusing warnings.

Change-Id: Ia9598ebea0e99830dd485296e94a9d96acc4b258
2022-02-19 19:36:52 +01:00
Bartosz Dziewoński 13ab1db6da Don't count leading/trailing whitespace against signature scan limit
It's an arbitrary limit, it seems harmless to relax it to support the
use case in the task, even if it's weird.

Bug: T300949
Change-Id: I7c895c7019726758bbae3183b9c3ecbd9eabcf38
2022-02-04 19:35:29 +00:00
Ed Sanders 0b42aea276 CommentParser: Cache variables in getUsernameFromLink
Change-Id: I625e6ded3badd75a7a658c8d000576d0d165a18b
2022-02-04 19:35:18 +00:00
Ed Sanders 8ad1df7dc8 CommentParser: Name parts of return value from findSignature
Change-Id: I3a5ad36df0afdedc0aa9a15e5d83c5426b03b790
2022-02-04 19:34:18 +00:00
Bartosz Dziewoński f15693eefa Use class list everywhere for adding/checking CSS classes
In PHP, use DOMCompat::getClassList(), provided by Parsoid.
In JS, use `.classList`, available in all supported browsers.

This may fix some bugs where we were incorrectly checking for exactly
one class. The change in isOurGeneratedNode() is needed for
Ib2fa40c5fa389572b0e88ef558728fa06e3621b0.

Change-Id: Ia28d31678fd3d617b69280c4b7857755300fa515
2022-01-24 18:40:00 +01:00
Ed Sanders f80ff74fc6 Handle selflinks by returning the current page's title
Bug: T287818
Change-Id: I67f10ac9976581279d1e6a477e90d55875ebab20
2022-01-12 21:18:04 +00:00
Ed Sanders 34011b7a07 Parser: Pass in title of page being parsed
Will be used to parse selflinks in the future.

Change-Id: I2bc29d1c5c69cb6309f582f162f9af7d96ce8913
2022-01-12 21:17:59 +00:00
Bartosz Dziewoński ef7274d69e Move some helpers from CommentParser to CommentUtils
Change-Id: I0e323d3b75f47459a5548a13e9684f4c6ff4ba0c
2021-12-13 17:13:41 +01:00
Ed Sanders 8e4f08182e Add missing typehints
Change-Id: Ia25c5bea1834a3fdd26f32a9d5ed097789329824
2021-12-01 14:57:09 +00:00
Ed Sanders a86d308d66 CommentItem.php: Store timestamp object instead of string
We do something similar in CommentItem.js with a moment object.
The object can be converted to a string when required.

Change-Id: Id7221e9201db0d89c3b771574634c878c9515ca0
2021-11-09 16:37:45 +00:00
Bartosz Dziewoński c1f4668806 Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes
The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).

In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.

In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.

The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.

Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
2021-09-27 19:04:16 +00:00
jenkins-bot 15747aba3a Merge "CommentParser: Remove outdated legacy ID algorithm" 2021-09-21 16:50:37 +00:00
Alexander Vorwerk 97a702cbc7 CommentParser: use IPUtils instead of the deprecated IP class
Bug: T291008
Change-Id: I0207940a642a32f2ca997b78387c9ff0af101599
2021-09-14 22:19:05 +02:00
Bartosz Dziewoński 9819df3288 CommentParser: Remove outdated legacy ID algorithm
Last changed in March (4a0802065c),
was only needed for about 2 weeks for compatibility with cached data.

Change-Id: I510238cb86a7b4d7ae5e8636716d1e9ca2d0e402
2021-09-07 17:41:30 +02:00
Bartosz Dziewoński a5099739a6 Improve notifications for comments posted in close succession
In case 4 and case 6, no notifications are expected. In all other
cases we now get the expected notifications.

Bug: T285528
Change-Id: I9e813bb3a053bc1232783f9eae1ad75672b4fa7e
2021-08-01 12:27:33 +02:00
C. Scott Ananian 25272e7a4a Don't refer directly to PHP dom extension classes; avoid nonstandard behavior
These changes ensure that DiscussionTools is independent of DOM
library choice, and will not break if/when Parsoid switches to an
alternate (more standards-compliant) DOM library.

We run `phan` against the Dodo standards-compliant DOM library,
so this ends up flagging uses of non-standard PHP extensions to
the DOM.  These will be suppressed for now with a "Nonstandard DOM"
comment that can be grepped for, since they will eventually
will need to be rewritten or worked around.

Most frequent issues:

* Node::nodeValue and Node::textContent and Element::getAttribute()
can return null in a spec-compliant implementation.  Add `?? ''` to
make spec-compliant results consistent w/ what PHP returns.

* DOMXPath doesn't accept anything except DOMDocument.  These uses
should be replaced with DOMCompat::querySelectorAll() or similar
(which end up using DOMXPath under the covers for DOMDocument any way,
but are implemented more efficiently in a spec-compliant
implementation).

* A couple of times we have code like:
  `while ($node->firstChild!==null) { $node = $node->firstChild; }`
and phan's analysis isn't strong enough to determine that $node is still
non-null after the while.  This same issue should appear with DOMDocument
but phan doesn't complain for some reason.

One apparently legit issue:

* Node::insertBefore() is once called in a funny way which leans on
the fact that the second option is optional in PHP.  This seems to be
a workaround for an ancient PHP bug, and can probably be safely
removed.

Bug: T287611
Bug: T217867
Change-Id: I3c4f41c3819770f85d68157c9f690d650b7266a3
2021-07-30 18:15:40 -04:00
libraryupgrader b0884b177c build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)
* glob-parent: 5.1.1 → 5.1.2
  * https://npmjs.com/advisories/1751 (CVE-2020-28469)
* trim-newlines: 3.0.0 → 3.0.1
  * https://npmjs.com/advisories/1753 (CVE-2021-33623)

Change-Id: I7a71e23da561599da417db3b3077b78d91173bbc
2021-07-22 16:29:04 +00:00
libraryupgrader 12fb65b9f1 build: Updating composer dependencies
* mediawiki/mediawiki-codesniffer: 35.0.0 → 36.0.0
* php-parallel-lint/php-parallel-lint: 1.2.0 → 1.3.0

Change-Id: I5c152292e83e7f3441e2c08b7d0ad23ac90f194b
2021-05-05 11:14:52 +00:00
jenkins-bot ef7073b8fd Merge "Simplify how warnings for IDs equal to legacy IDs are avoided" 2021-04-15 23:31:45 +00:00
Bartosz Dziewoński 42ce942c86 Introduce comment "names" to identify comments across revisions/pages
The existing comment IDs can't be used to find the same comment on
a different revision or page (when it's transcluded), because they
depend on the comment's parent and its position on the page.

Comment names depend only on the author and timestamp. The trade-off
is that they can't distinguish comments posted within the same minute,
or in the same edit, so we will still need the IDs sometimes.

Prefer using comment names when replying, if they're not ambiguous.
This fixes T273413 and T275821.

Heading names depend on the author and timestamp of the oldest comment.
This way we don't have to detect changes to the heading text, but we
can't distinguish headings without any comments.

Bug: T274685
Bug: T273413
Bug: T275821
Change-Id: Id85c50ba38d1e532cec106708c077b908a3fcd49
2021-03-23 16:08:42 +00:00
Bartosz Dziewoński b28290fa62 Simplify how warnings for IDs equal to legacy IDs are avoided
I don't like the extra parameter.

Follow-up to d05109b24d.

Change-Id: Ic0f403a816fd3182982002da326bb32d591ebcf7
2021-03-22 20:15:07 +00:00
Ed Sanders 4a0802065c Make IDs (to be used as URL hashes) wikitext safe
* Use hyphens instead of pipes a separators
* Use underscores for spaces in usernames

Change-Id: I6efd9739fc73e45002e50e64c43ce0de1c2f1239
2021-03-18 20:45:21 +01:00
Bartosz Dziewoński a103abb8ae Ignore warnings about legacy IDs in tests
Change-Id: I3c74b4e65aac9b84494917547cce7eb6a75995b4
2021-03-18 20:42:03 +01:00
Bartosz Dziewoński f5059e6ea6 Don't detect comments within 'cite' elements too
Follow-up to 024a978ffd.

Bug: T275881
Change-Id: I53448ad22cd0531e7fd4aa0ea5d15782879cce14
2021-03-01 21:40:43 +01:00
jenkins-bot 0eb37a87df Merge "Don't detect comments within quotes" 2021-02-28 22:56:20 +00:00
Bartosz Dziewoński 024a978ffd Don't detect comments within quotes
Bug: T275881
Change-Id: I8f7a4279837bd95ebf5b604ff350c0a3f29c2c05
2021-02-28 22:49:48 +00:00
jenkins-bot 8bb5eea999 Merge "Improve signature detection to handle formatting on the timestamp" 2021-02-27 22:54:50 +00:00
jenkins-bot 49938a88dc Merge "Improve merging multiple comments on one paragraph" 2021-02-27 22:54:43 +00:00
Daimona Eaytoy 67096cb431 Stop using deprecated Language methods
Change-Id: I7cf21365df355a4a62f9e353be61aaa03ed58b9d
2021-02-27 14:48:49 +00:00
Bartosz Dziewoński efe95494a8 Improve signature detection to handle formatting on the timestamp
Now it detect signatures generated by en.wp's {{Undated}} template,
and signatures of people who do weird stuff to the timestamps.

Bug: T275938
Change-Id: I27b07f6786ca5433a3c02a5fe68e4716d41401bb
2021-02-27 02:33:30 +01:00
Bartosz Dziewoński af082908a5 Improve merging multiple comments on one paragraph
The horrendous 11-line if() condition did not correctly handle
signatures wrapped in inline formatting markup, like <small>.

Instead, implement this logic in the code for skipping to the end
of a paragraph, which didn't exist yet when that condition was
added, but seems like a much better place to check this now.

Bug: T275934
Change-Id: I5cccff889b5e15b5f8fde0538bf4bccb22e762cf
2021-02-27 02:21:36 +01:00