Commit graph

157 commits

Author SHA1 Message Date
thiemowmde f59335d85a Make use of the modern ??= operator in one more place
Change-Id: I57c56547893536cd1ca4de567158eccdfff508f9
2024-11-20 08:13:29 +01:00
Ammarpad 84d0dfc51c commentparser: Allow signature of user '0' to be detected
Bug: T379918
Change-Id: I11a447c11fe64dae88a5035b007fe762a335e6c5
2024-11-15 19:02:43 +01:00
Umherirrender eef833a112 Use namespaced classes
Changes to the use statements done automatically via script

Change-Id: Ife9826099657bcaeca1beec94863a1600fdf55f8
2024-10-19 23:39:18 +02:00
jenkins-bot c11460a527 Merge "Fix parsing usernames with +" 2024-06-21 15:19:33 +00:00
Bartosz Dziewoński 20ff1a519f Fix parsing usernames with +
urldecode() should be used for decoding URL query parameters,
rawurldecode() should be used for decoding URL paths.

Bug: T367977
Change-Id: I7a7b14da85fb89f612c701d2746803d830017842
2024-06-19 17:14:10 +02:00
Ed Sanders 501046f38c Trim whitespace from truncated heading titles in IDs
Bug: T356196
Change-Id: Iddcda0cee624fda7f78a05e0a3d70eaee2635da9
2024-05-01 14:20:12 +01:00
Bartosz Dziewoński 3cffe1190e Clean up handling of <span class="mw-headline">
The PHP code should never see a `<span class="mw-headline">` node
since MediaWiki change If04d72f427ec3c3730e757cbb3ade8840c09f7d3,
but we have to support it for our old integration tests (T363031).

Improve code comments to explain this and move the handling to
one place, so that it can be deleted more easily in the future.

Follow-up to 08f61b2609.

Change-Id: I5ab9d3373a6911c1456c30d844b66576b278a1b5
2024-04-20 00:16:45 +00:00
Bartosz Dziewoński 7445294b3b Remove the "offset" from getHeadlineNodeAndOffset()
Since c6cd20f682 the offset is always 0.

Change-Id: I9c1c8230f897d8bb287ca47056f5fa9fb187d060
2024-04-20 00:34:32 +02:00
Bartosz Dziewoński 69e8e948b2 Remove now redundant PHPDoc blocks
MediaWiki's PHPCS plugin requires documentation comments on all
methods, unless those methods are fully typed (all parameters and
return value).

It turns out that almost all of our methods are fully typed already.

Procedure:

1. Find: \*(\s*\*\s*(@param \??[\w\\]+(\|null)? &?\$\w+|@return \??[\w\\]+(\|null)?)\n)+\s*\*/
   Replace with: */
   This deletes type annotations, except those not representable
   as PHP type hints such as union types `a|b` or typed arrays `a[]`,
   or those with documentation beyond type hints, or those on
   functions with any other annotations.

2. Find: /\*\*/\n\s*
   Replace with nothing
   This deletes the remaining comments on methods that had no prose
   documentation.

3. Undo all changes that PHPCS complains about (those comments
   were not redundant)

4. Review the diff carefully, these regexps are imprecise :)

Change-Id: Ic82e8b23f2996f44951208dbd9cfb4c8e0738dac
2024-03-10 23:53:04 +00:00
Bartosz Dziewoński b16dd9dd96 Update PHPCS overrides
There's now a different rule for the same thing in
mediawiki/mediawiki-codesniffer v43.0.0. Also document the reason for
the override. Follow-up to 8b00546749.

Change-Id: I392ee10639ffda6de55b091555e8c3cadd2af485
2024-03-10 22:59:05 +01:00
Umherirrender 8b00546749 build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0
Change-Id: I889efe00ac06fa857fc3ae063193368927bcff7a
2024-03-10 17:36:18 +01:00
Bartosz Dziewoński 08f61b2609 Support for core section heading formatting in post-cache transform
We already supported plain headings without the 'mw-headline'
wrappers, but now we need to parse the 'id' from a different
attribute.

Needed-By: If04d72f427ec3c3730e757cbb3ade8840c09f7d3
Bug: T357723
Change-Id: If85f89c40834618f23dc0ace2e599efb3b6d5ed4
2024-02-16 20:28:53 +00:00
Bartosz Dziewoński 6a1f2accee Parser: Minor code cleanup
We had some unused variables and roundabout checks.

Change-Id: I454b60ffa05c1cc12288c5de88c849a25aa35447
2024-02-08 16:59:04 +01:00
Bartosz Dziewoński 9db35873a4 Parser: Fix the main loop getting stuck on some signatures
In certain cases the parser could go back rather than forward after
finding a signature, causing it to find the same signature forever
until it ran out of memory.

Test cases coming later in a separate patch.

Bug: T356884
Change-Id: I8ac72b05e5e4ed45e6119c012a69708c9d8eda29
2024-02-07 22:33:57 +01:00
Ed Sanders 8069585489 CommentParser: Ignore generated timestamp links
This will be present in parser cache output and can
sometimes be mistaken for user page links.

Bug: T356142
Change-Id: I800b23d8466f72affcadfa336aab07abf7f8d79e
2024-01-30 10:29:36 +00:00
Bartosz Dziewoński cc9cccbd35 CommentParser: Replace new uses of Title with TitleValue
Follow-up to Ic718a964e309ae3a8e15e299081f46d4db860731.

Change-Id: Ida70ee080de44ec36f11c3c40816f6a198b63798
2023-12-11 22:19:13 +01:00
Bartosz Dziewoński a27e27fc68 Move finding transclusion source from ContentThreadItem to CommentParser
Reasons:
* Various other methods dealing with ranges already live there
* It would be neat if ContentThreadItem was just a value class
  without a lot of logic, similar to DatabaseThreadItem,
  particularly for writing unit tests
* The methods access global state through Title, which can't
  be fixed while they're in ContentThreadItem (see I9dfccc83)

The computation is now always done, instead of only when needed,
but that's a small drawback, since it's fast (fast enough that
I don't see the difference in the time taken when running tests),
and we were already computing it for all comments in many places.

Change-Id: Ic718a964e309ae3a8e15e299081f46d4db860731
2023-12-11 22:18:30 +01:00
Umherirrender 64bcb583e9 Use namespaced classes
Done automatically via script
Change to extension.json done manually

Change-Id: Ied7bbddd357290ac6be6bf480be0ee9116e77365
2023-12-11 16:38:02 +01:00
jenkins-bot d69dc24161 Merge "Ignore signatures with invalid timestamps" 2023-12-07 18:00:09 +00:00
Ed Sanders 4051c7faf4 Ignore signatures with invalid timestamps
Bug: T352455
Change-Id: Ie499db4594bfa23b618907383d0ac583849ff582
2023-12-05 13:23:15 +00:00
thiemowmde 00ad50c673 Use upstream Title::inNamespace() instead of manual comparisons
This might be a matter of personal preference. Not sure if it's
worth it. Both is well readable. On the other hand, the method
exists. Why not use it?

Change-Id: Id66fc6c888db6ae1cf28e60a51f90d9ae2cdb6ee
2023-11-28 09:49:00 +01:00
jenkins-bot 048d5364e2 Merge "Replace preg_replace_callback with strtr in CommentParser" 2023-10-31 13:35:19 +00:00
thiemowmde 10dcd1f847 Replace preg_replace_callback with strtr in CommentParser
It does the same as before.

I think performance is not a concern here, and wasn't my motivation
either. But I hope this makes the code easier to read and to reason
with.

I added a pure unit test case (without involving an actual Language
object) to cover the previously uncovered digits feature.

Change-Id: I6a0fc86035817eabb42b55e58183ae094c052aa6
2023-10-31 08:55:40 +01:00
thiemowmde 1491b47b12 Improve performance of CommentParser::getUsernameFromLink
I was curious why running the CommentParserTest takes so long. I
found this is one of the bottlenecks because it's called so often,
but many link titles that are parsed as user names turn out to be
something else. This little hack speeds up the test by 15% and has
probably a similar impact in production scenarios.

Change-Id: I5a0b3a49ba5793c8a345baaa7118fed500c082b6
2023-10-30 17:59:46 +01:00
Bartosz Dziewoński 781a33357b Use type hints for properties, remove PHPCS overrides
MediaWiki's PHPCS plugin requires documentation comments on all
properties, unless those properties are typed.

This has potential to introduce bugs – in particular, because typed
properties without a default value will throw an exception if their
value is accessed before it's defined, while previously they defaulted
to null. I fixed this when I found it (making them nullable and null
by default), but I may have missed some cases.

Change-Id: If5b1f4d542ce3e1b69327ee4283f7c3e133a62a0
2023-10-19 19:31:02 +00:00
Ed Sanders 1bb29faa58 Use strict comparison with array_search
Change-Id: Id920d49701bd9436a6247763ed40df052877ad2f
2023-07-24 18:50:00 +01:00
Theodore Dubois 4ca17b8c33 Support ISO 8601 timestamps in the parser
https://wikipesija.org is currently using ISO 8601 as the default date
format. The format is xnY-xnm-xnd"T"xnH:xni:xns and 'xn', 'm', and 's'
need support added.

Change-Id: I235098a578eb92ddd23ea47fa23d60df4b28f590
2023-06-17 11:36:43 -07:00
thiemowmde 8bbbf39bbd Make use of named MainConfigNames::… constants
Also merge setMwGlobals() calls because they are really expensive.

Also utilize the more readable str_contains() and related.

Change-Id: Iebde6aa17c2e366f0c0a98fe13a454f6a06c299b
2023-05-19 12:12:32 +02:00
Ed Sanders 92f5cfd821 Support suppressing comment detection in pages or sections
This can be done within sections using CSS:
* mw-notalk

Or at a page level using a magic word:
* __NOTALK__

"notalk" suppresses all comment detection, treating the content as
not containing any comments even if there are signatures present.

Bug: T295553
Bug: T249293
Change-Id: Ic1d7294bafcf7071e16838e70684ecadd7bc6fd3
2023-04-03 18:36:34 +02:00
Ed Sanders 2fcc505d50 Parser: Store timestamp ranges
Change-Id: Ifcbe22011f11f4374f38b7aa346da5a96cac968c
2023-03-28 23:51:17 +00:00
Ed Sanders b82af45735 CommentParser: Output display name if different to username
The only normalisation we apply for comparison is lowercasing.

Change-Id: Id3d57c2066429fcedc7dcc091e74ed46e17060f1
2023-02-23 23:03:32 +00:00
Bartosz Dziewoński af68c835bb Update exception handling for new code conventions
Change code to match the documented consensus formed on T321683:
https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP#Exception_handling

* Do not directly throw Exception, Error or MWException
* Document checked exceptions with @throws
* Do not document unchecked exceptions

For this extension, I think it makes sense to consider DOMException an
unchecked exception too (in addition to the usual LogicException and
RuntimeException).

Depends-On: Id07e301c3f20afa135e5469ee234a27354485652
Depends-On: I869af06896b9757af18488b916211c5a41a8c563
Depends-On: I42d9b7465d1406a22ef1b3f6d8de426c60c90e2c
Change-Id: Ic9d9efd031a87fa5a93143f714f0adb20f0dd956
2023-01-22 18:17:11 +00:00
Bartosz Dziewoński 3a9997d6ea Improve handling for comment separators
* Detect comment separators at the end of comments too
* Consider TemplateStyles associated with ignored templates

This unexpectedly improves a lot of cases other than T313097 too,
mostly where <br> or {{outdent}} was used within a paragraph:
splitting comments that were previously jumbled together, or restoring
content that was previously ignored for apps / notifications.

Bug: T313097
Change-Id: I9b2ef6b760f2ffd97141ad7000f70919aeab7803
2023-01-10 01:59:52 +00:00
Bartosz Dziewoński 433e57394c Use PHP 7.4 property types
Change-Id: I788db64f0c0c00894d77256b7f016d44eda4bbb1
2022-10-28 21:56:38 +02:00
Ed Sanders e24550fae9 Refactor thread summary getters
Replace getThreadSummary with individual getters that call
calculateThreadSummary once.

Change-Id: Ie8a8b4d7cb5121847b78dbc20bca2c8d48c7d857
2022-09-06 23:19:13 +02:00
Ed Sanders 664d5d041a Fix fetching of oldest comment in a thread
The implementation in Parser doesn't descend into sub-thread.
Re-use the getThreadSummary method in ThreadItem and traverse
the thread properly.

Bug: T298617
Change-Id: I318d9012eb83f37ccbe463923524ef2e9f995ced
2022-09-01 21:22:09 +00:00
Bartosz Dziewoński cfa45a5f4c Remove all stuff about legacy IDs
We can no longer change IDs so easily, because they're stored in the
permalink database, so remove this mechanism to make sure it's not
accidentally used in the future.

Change-Id: I392ee1f49c48fc2f23d05e9a37c643438b4f2b9a
2022-08-24 01:01:09 +02:00
Bartosz Dziewoński 880f9755e0 Separate ContentThreadItem and DatabaseThreadItem etc.
Rename ThreadItem to ContentThreadItem, then create a new ThreadItem
interface containing only the methods that we'll be able to implement
using only the persistently stored data (no parsing), then create a
DatabaseThreadItem. Do the same for CommentItem and HeadingItem.

ThreadItemSet gets a similar treatment, but it's basically only for
Phan's type checking. (This is sad.)

Change-Id: I1633049befe8ec169753b82eb876459af1f63fe8
2022-07-04 23:35:50 +02:00
Ed Sanders 0ad9b4c6b2 Move placeholder heading level (99) to a constant
Change the HeadingItem constructor to take a 'null' headingLevel
and store this internally with the constant. Change the JSON
serializer to convert this back to null.

Change-Id: I27508eed75d94b99c5189548919309f8da7deb75
2022-06-14 22:51:49 +01:00
Ed Sanders af54bae2ec Prefer late static binding over self::
While in many cases the class will never be sub-classed, it's easier
just to always use static:: and not worry about predicting which
classes might have problems in the future.

Change-Id: I23072a1701b5acf62bb3379a877de97627d8fcf3
2022-06-09 15:12:48 +01:00
Bartosz Dziewoński 6a59149132 Ignore LRM and RLM in more places in the timestamp
We previously ignored them before timezone indicator (e9c401e3aa),
but they can end up in other places too, e.g. after the time.

Now we ignore them after every token. This is way overkill, but it
shouldn't hurt.

Bug: T308448
Change-Id: I20f7aaa34dba23f2a2faf1be258c1aea32ab770f
2022-05-17 02:00:22 +02:00
Bartosz Dziewoński c7723baf72 CommentParser: Replace uses of Title with TitleValue
Another small step towards removing the reliance on global state.

Change-Id: Ifb4a5bcbef6606d02f1c7aa7385d72822cb0bad0
2022-03-18 18:24:34 +00:00
jenkins-bot 32d9ef573a Merge "CommentParser: Avoid using a dynamic undeclared property" 2022-03-10 00:22:16 +00:00
jenkins-bot 76478dda26 Merge "Move signatureScanLimit to a constant in JS" 2022-03-10 00:22:14 +00:00
Bartosz Dziewoński 4c29304484 CommentParser: Avoid using a dynamic undeclared property
Change-Id: Iefa8dea83bc0d31b9c6b3509189eeaa652dd9ea0
2022-03-08 23:30:11 +00:00
Bartosz Dziewoński 08c79142fb ImmutableRange: Add @property annotations for magic props
Phan can analyze them now and reports some issues with types.

* Add some assertions on types where we're sure that we're using an
  Element or non-null, but Phan can't prove it
* Fix incorrect type hints on getFullyCoveredSiblings() and
  getCoveredSiblings(), luckily it was harmless

Change-Id: I8cc12450378efa7434c4d66882378b715edd4a70
2022-03-08 23:29:40 +00:00
Bartosz Dziewoński eb1fe7a8fb CommentParser: Fix redundant uses of getHeadlineNodeAndOffset()
We call CommentUtils::getHeadlineNodeAndOffset() before constructing
the HeadingItem in CommentParser, so the range's startContainer
is always the headline node.

Change-Id: I2afb6ba9100e785cd91f31d82f4cea59fa8b5443
2022-03-08 23:29:34 +00:00
Bartosz Dziewoński 584f6a020c Use tagName rather than nodeName when we know the node is an element
`tagName` is only defined on Element, and it returns its tag name.

`nodeName` is defined on Node, and it returns the tag name for Elements,
and a string like '#text' or '#document-fragment' for other types.

We were using both, which made it harder to reason about what types
we're dealing with.

Change-Id: I8e621e5872bdf78c84ec553cfbfcdbf0192f0589
2022-03-08 23:29:05 +00:00
Bartosz Dziewoński 063174e71c Use instanceof for checking for text/element nodes in PHP
It is friendlier for static analysis tools like Phan, which can't
infer anything from the `->nodeType === …` checks, and we were already
using it in most places.

Fix newly revealed Phan failures (and one unneeded suppression).

Change-Id: Id789f05e16a210f7ba22ca7514587c392fac0741
2022-03-08 23:28:39 +00:00
jenkins-bot 542da89530 Merge "Don't detect comments within references" 2022-02-28 16:47:21 +00:00