Commit graph

98 commits

Author SHA1 Message Date
jenkins-bot fb4d4d77a5 Merge "Trim whitespace from truncated heading titles in IDs" 2024-06-20 02:20:23 +00:00
Ed Sanders bf09928773 build: Update eslint-config-wikimedia to 0.28.0
Change-Id: Ie6bcea16debc74d3dd1283f6a4975fb3bd5056b0
2024-06-03 18:37:15 +01:00
Ed Sanders ea5639b20b ESLint: Enforce prefer-const
Change-Id: I5985d1b532988bb3b71ff1aa24eae57ac2e1b9c5
2024-05-24 16:50:13 +01:00
Ed Sanders 44303de9cf ESLint: Replace some lets with consts by moving declarations
Change-Id: I710bff5fe04268172a40af75eb6ff5910eaf0255
2024-05-24 16:50:13 +01:00
Ed Sanders dda9227947 ESLint: Manually fix remaining no-var violations
Change-Id: I4474bd0205e7a1ed8e60147e52675e3e0b93ccd9
2024-05-24 16:50:11 +01:00
Ed Sanders ca5157156a ESLint: Autofix no-var rule
Leave rule off for now as manual fixes are required.

Also temporarily disable prefer-const rule as that
will also require some manual fixes.

Change-Id: I8c3478f26f51287acb943bd38c9c1020c06b9f39
2024-05-24 16:49:36 +01:00
Ed Sanders 501046f38c Trim whitespace from truncated heading titles in IDs
Bug: T356196
Change-Id: Iddcda0cee624fda7f78a05e0a3d70eaee2635da9
2024-05-01 14:20:12 +01:00
jenkins-bot 3c4b5364f2 Merge "Prefer short arrow functions" 2024-04-30 16:57:37 +00:00
Bartosz Dziewoński 3cffe1190e Clean up handling of <span class="mw-headline">
The PHP code should never see a `<span class="mw-headline">` node
since MediaWiki change If04d72f427ec3c3730e757cbb3ade8840c09f7d3,
but we have to support it for our old integration tests (T363031).

Improve code comments to explain this and move the handling to
one place, so that it can be deleted more easily in the future.

Follow-up to 08f61b2609.

Change-Id: I5ab9d3373a6911c1456c30d844b66576b278a1b5
2024-04-20 00:16:45 +00:00
Bartosz Dziewoński 7445294b3b Remove the "offset" from getHeadlineNodeAndOffset()
Since c6cd20f682 the offset is always 0.

Change-Id: I9c1c8230f897d8bb287ca47056f5fa9fb187d060
2024-04-20 00:34:32 +02:00
Bartosz Dziewoński d4c5aebd8f Prefer short arrow functions
When an arrow function body contains just a single `return` statement,
the braces can be omitted.

(Changes are mostly made by `grunt eslint --fix`, with only some line
breaks added by hand.)

Change-Id: I37f259f87085c8d20ed09cfa58a8456dd36cdc38
2024-04-20 00:08:51 +02:00
Ed Sanders a74c00ba8c Prefer arrow functions for callbacks
This allows us to remove our `this` bindings.

Change-Id: Ie8c8c38d36af8a033b5181870c39f8981a57b939
2024-04-19 12:34:23 +01:00
Bartosz Dziewoński 6a1f2accee Parser: Minor code cleanup
We had some unused variables and roundabout checks.

Change-Id: I454b60ffa05c1cc12288c5de88c849a25aa35447
2024-02-08 16:59:04 +01:00
Bartosz Dziewoński 9db35873a4 Parser: Fix the main loop getting stuck on some signatures
In certain cases the parser could go back rather than forward after
finding a signature, causing it to find the same signature forever
until it ran out of memory.

Test cases coming later in a separate patch.

Bug: T356884
Change-Id: I8ac72b05e5e4ed45e6119c012a69708c9d8eda29
2024-02-07 22:33:57 +01:00
Ed Sanders 8069585489 CommentParser: Ignore generated timestamp links
This will be present in parser cache output and can
sometimes be mistaken for user page links.

Bug: T356142
Change-Id: I800b23d8466f72affcadfa336aab07abf7f8d79e
2024-01-30 10:29:36 +00:00
Ed Sanders 329a268564 ThreadItem.js: Rename getNativeRange to getRange
Makes the class more similar to PHP. The non-native
ranges exist for efficiency, but users will usually
want native ranges.

Change-Id: Ifd7dd034d2e0f3b9af050ecdab3e063df73dde5e
2023-12-15 16:22:52 +00:00
Ed Sanders 4051c7faf4 Ignore signatures with invalid timestamps
Bug: T352455
Change-Id: Ie499db4594bfa23b618907383d0ac583849ff582
2023-12-05 13:23:15 +00:00
Ed Sanders f2f0ec2f65 build: Update linters and fix
Change-Id: Iec16f3330f94d38bb50492b7dcc9207786b964a4
2023-11-28 16:10:47 +00:00
jenkins-bot 048d5364e2 Merge "Replace preg_replace_callback with strtr in CommentParser" 2023-10-31 13:35:19 +00:00
thiemowmde 10dcd1f847 Replace preg_replace_callback with strtr in CommentParser
It does the same as before.

I think performance is not a concern here, and wasn't my motivation
either. But I hope this makes the code easier to read and to reason
with.

I added a pure unit test case (without involving an actual Language
object) to cover the previously uncovered digits feature.

Change-Id: I6a0fc86035817eabb42b55e58183ae094c052aa6
2023-10-31 08:55:40 +01:00
thiemowmde 1491b47b12 Improve performance of CommentParser::getUsernameFromLink
I was curious why running the CommentParserTest takes so long. I
found this is one of the bottlenecks because it's called so often,
but many link titles that are parsed as user names turn out to be
something else. This little hack speeds up the test by 15% and has
probably a similar impact in production scenarios.

Change-Id: I5a0b3a49ba5793c8a345baaa7118fed500c082b6
2023-10-30 17:59:46 +01:00
Theodore Dubois 4ca17b8c33 Support ISO 8601 timestamps in the parser
https://wikipesija.org is currently using ISO 8601 as the default date
format. The format is xnY-xnm-xnd"T"xnH:xni:xns and 'xn', 'm', and 's'
need support added.

Change-Id: I235098a578eb92ddd23ea47fa23d60df4b28f590
2023-06-17 11:36:43 -07:00
Ed Sanders 92f5cfd821 Support suppressing comment detection in pages or sections
This can be done within sections using CSS:
* mw-notalk

Or at a page level using a magic word:
* __NOTALK__

"notalk" suppresses all comment detection, treating the content as
not containing any comments even if there are signatures present.

Bug: T295553
Bug: T249293
Change-Id: Ic1d7294bafcf7071e16838e70684ecadd7bc6fd3
2023-04-03 18:36:34 +02:00
Ed Sanders 2fcc505d50 Parser: Store timestamp ranges
Change-Id: Ifcbe22011f11f4374f38b7aa346da5a96cac968c
2023-03-28 23:51:17 +00:00
Ed Sanders b82af45735 CommentParser: Output display name if different to username
The only normalisation we apply for comparison is lowercasing.

Change-Id: Id3d57c2066429fcedc7dcc091e74ed46e17060f1
2023-02-23 23:03:32 +00:00
Bartosz Dziewoński 3a9997d6ea Improve handling for comment separators
* Detect comment separators at the end of comments too
* Consider TemplateStyles associated with ignored templates

This unexpectedly improves a lot of cases other than T313097 too,
mostly where <br> or {{outdent}} was used within a paragraph:
splitting comments that were previously jumbled together, or restoring
content that was previously ignored for apps / notifications.

Bug: T313097
Change-Id: I9b2ef6b760f2ffd97141ad7000f70919aeab7803
2023-01-10 01:59:52 +00:00
Ed Sanders e24550fae9 Refactor thread summary getters
Replace getThreadSummary with individual getters that call
calculateThreadSummary once.

Change-Id: Ie8a8b4d7cb5121847b78dbc20bca2c8d48c7d857
2022-09-06 23:19:13 +02:00
Ed Sanders 664d5d041a Fix fetching of oldest comment in a thread
The implementation in Parser doesn't descend into sub-thread.
Re-use the getThreadSummary method in ThreadItem and traverse
the thread properly.

Bug: T298617
Change-Id: I318d9012eb83f37ccbe463923524ef2e9f995ced
2022-09-01 21:22:09 +00:00
Ed Sanders 0ad9b4c6b2 Move placeholder heading level (99) to a constant
Change the HeadingItem constructor to take a 'null' headingLevel
and store this internally with the constant. Change the JSON
serializer to convert this back to null.

Change-Id: I27508eed75d94b99c5189548919309f8da7deb75
2022-06-14 22:51:49 +01:00
Bartosz Dziewoński 6a59149132 Ignore LRM and RLM in more places in the timestamp
We previously ignored them before timezone indicator (e9c401e3aa),
but they can end up in other places too, e.g. after the time.

Now we ignore them after every token. This is way overkill, but it
shouldn't hurt.

Bug: T308448
Change-Id: I20f7aaa34dba23f2a2faf1be258c1aea32ab770f
2022-05-17 02:00:22 +02:00
Ed Sanders 579b8bb1d4 Implement getTimestampString on CommentItem
Change-Id: I1768e9993debe904d6a228942ad0188486d65c0b
2022-03-24 16:49:35 +00:00
Bartosz Dziewoński c7723baf72 CommentParser: Replace uses of Title with TitleValue
Another small step towards removing the reliance on global state.

Change-Id: Ifb4a5bcbef6606d02f1c7aa7385d72822cb0bad0
2022-03-18 18:24:34 +00:00
jenkins-bot 32d9ef573a Merge "CommentParser: Avoid using a dynamic undeclared property" 2022-03-10 00:22:16 +00:00
jenkins-bot 76478dda26 Merge "Move signatureScanLimit to a constant in JS" 2022-03-10 00:22:14 +00:00
Bartosz Dziewoński 4c29304484 CommentParser: Avoid using a dynamic undeclared property
Change-Id: Iefa8dea83bc0d31b9c6b3509189eeaa652dd9ea0
2022-03-08 23:30:11 +00:00
Bartosz Dziewoński eb1fe7a8fb CommentParser: Fix redundant uses of getHeadlineNodeAndOffset()
We call CommentUtils::getHeadlineNodeAndOffset() before constructing
the HeadingItem in CommentParser, so the range's startContainer
is always the headline node.

Change-Id: I2afb6ba9100e785cd91f31d82f4cea59fa8b5443
2022-03-08 23:29:34 +00:00
Bartosz Dziewoński 8a2715bdd5 Move signatureScanLimit to a constant in JS
Change-Id: Ieb60c148fd060ab62e4a493e2d0dff6c051f945c
2022-02-21 22:42:14 +01:00
Bartosz Dziewoński 4244418e56 Don't detect comments within references
Bug: T301213
Change-Id: Ifd5198651c8ed0ce53379fb5e35938089cd54a09
2022-02-21 19:57:44 +00:00
Bartosz Dziewoński 8e44b43df0 Split off ThreadItemSet from CommentParser
Goal:
-----
Finishing the work from Iadb7757debe000025e52770ca51ebcf24ca8ee66
by changing CommentParser::parse() to return a data object, instead of
the whole parser.

Changes:
--------
ThreadItemSet.php:
ThreadItemSet.js:
* New data class to access the results of parsing a discussion. Most
  methods and properties are moved from CommentParser with no changes.

CommentParser.php:
Parser.js:
* parse() returns a new ThreadItemSet.
* Remove methods moved to ThreadItemSet.
* Placeholder headings are generated slightly differently, as we process
  things in a different order.
* Grouping threads and computing IDs/names is no longer lazy. We always
  needed IDs/names anyway.
* computeId() explicitly uses a ThreadItemSet to check the existing IDs
  when de-duplicating.

controller.js:
* Move the code for turning some nodes annotated by CommentFormatter
  into a ThreadItemSet (previously a Parser) from controller#init to
  ThreadItemSet.static.newFromAnnotatedNodes, and rewrite it to handle
  assigning parents/replies and recalculating legacy IDs more nicely.
* mw.dt.pageThreads is now a ThreadItemSet.

Change-Id: I49bfe019aa460651447fd383f73eafa9d7180a92
2022-02-21 16:22:32 +00:00
Bartosz Dziewoński 4613ae78e7 Change CommentParser into a service
Goal:
-----
To have a method like CommentParser::parse(), which just takes a node
to parse and a title and returns plain data, so that we don't need to
keep track of the config to construct a CommentParser object (the
required config like content language is provided by services) and
we don't need to keep that object around after parsing.

Changes:
--------
CommentParser.php:
* …is now a service. Constructor only takes services as arguments.
  The node and title are passed to a new parse() method.
* parse() should return plain data, but I split this part to a separate
  patch for ease of review: I49bfe019aa460651447fd383f73eafa9d7180a92.
* CommentParser still cheats and accesses global state in a few places,
  e.g. calling Title::makeTitleSafe or CommentUtils::getTitleFromUrl,
  so we can't turn its tests into true unit tests. This work is left
  for future commits.

LanguageData.php:
* …is now a service, instead of a static class.

Parser.js:
* …is not a real service, but it's changed to behave in a similar way.
  Constructor takes only the required config as argument,
  and node and title are instead passed to a new parse() method.

CommentParserTest.php:
parser.test.js:
* Can be simplified, now that we don't need a useless node and title
  to test internal methods that don't use them.

testUtils.js:
* Can be simplified, now that we don't need to override internal
  ResourceLoader stuff just to change the parser config.

Change-Id: Iadb7757debe000025e52770ca51ebcf24ca8ee66
2022-02-19 19:51:57 +01:00
Bartosz Dziewoński 99b5de8038 Split Data class into ResourceLoaderData and LanguageData
The Data class contained utilities for two unrelated purposes.
Split each half to a separate class.

Notably, this improves the signature of the getLocalData() function.

Change-Id: Icde615fb9d483fee1f352c34909b37f8ffde8081
2022-02-19 19:37:34 +01:00
Bartosz Dziewoński ae9f26a9e5 Various code quality tweaks
(suggested by PhpStorm)

composer.json:
* Document required PHP extensions

Parser.js:
* Remove incorrect param documentation
* Fix some typos in comments (missing parentheses)

CommentParser.php:
* Fix some typos in comments (missing parentheses)

ImmutableRange.php:
* Remove unused property
* Add a `throw` to indicate that code path is unreachable

SubscribedNewCommentPresentationModel.php:
* Add missing `return false`

CommentParserTest.php:
* Remove unnecessary pass-by-reference

CommentModifierTest.php:
* Remove unused variable

CommentParserTest.php:
* Don't construct Element objects directly. PHP's DOMElement allows
  it, but Parsoid/Dodo's doesn't, and we use the latter for static
  analysis. This generates all kinds of confusing warnings.

Change-Id: Ia9598ebea0e99830dd485296e94a9d96acc4b258
2022-02-19 19:36:52 +01:00
Bartosz Dziewoński 13ab1db6da Don't count leading/trailing whitespace against signature scan limit
It's an arbitrary limit, it seems harmless to relax it to support the
use case in the task, even if it's weird.

Bug: T300949
Change-Id: I7c895c7019726758bbae3183b9c3ecbd9eabcf38
2022-02-04 19:35:29 +00:00
Ed Sanders 0b42aea276 CommentParser: Cache variables in getUsernameFromLink
Change-Id: I625e6ded3badd75a7a658c8d000576d0d165a18b
2022-02-04 19:35:18 +00:00
Ed Sanders 8ad1df7dc8 CommentParser: Name parts of return value from findSignature
Change-Id: I3a5ad36df0afdedc0aa9a15e5d83c5426b03b790
2022-02-04 19:34:18 +00:00
Ed Sanders f80ff74fc6 Handle selflinks by returning the current page's title
Bug: T287818
Change-Id: I67f10ac9976581279d1e6a477e90d55875ebab20
2022-01-12 21:18:04 +00:00
Ed Sanders 34011b7a07 Parser: Pass in title of page being parsed
Will be used to parse selflinks in the future.

Change-Id: I2bc29d1c5c69cb6309f582f162f9af7d96ce8913
2022-01-12 21:17:59 +00:00
Ed Sanders 2e1241289c Better document {Object} types
Change-Id: Ibfaf2ded443301c68552dbf98a1897a50bda9ef5
2021-12-20 17:25:54 +00:00
Bartosz Dziewoński ef7274d69e Move some helpers from CommentParser to CommentUtils
Change-Id: I0e323d3b75f47459a5548a13e9684f4c6ff4ba0c
2021-12-13 17:13:41 +01:00
Ed Sanders 7c3e583bec build: Update eslint-config-wikimedia to 0.21.0
Change-Id: I72de463d5a878e555eeed0e7ce2772e1d3a46f06
2021-11-08 19:03:40 +00:00