Commit graph

94 commits

Author SHA1 Message Date
thiemowmde 3cc4e00647 Use faster DOMUtils::hasClass()
I tried to benchmark this. It's not really a big bottleneck in this
codebase, but can still be seen. For example, the relative runtime
of the method CommentUtils::isCommentSeparator (the heaviest user of
this feature) goes down from 1.5% to 0.3%.

Depends-On: If9252a97562542e7a4bec29edcc6e8dee0fb8221
Change-Id: I6b94a2481b5c4f7983df78ee48d45c0d8a50e53b
2024-11-20 08:16:31 +01:00
Bartosz Dziewoński 20ff1a519f Fix parsing usernames with +
urldecode() should be used for decoding URL query parameters,
rawurldecode() should be used for decoding URL paths.

Bug: T367977
Change-Id: I7a7b14da85fb89f612c701d2746803d830017842
2024-06-19 17:14:10 +02:00
jenkins-bot 8b9a813b11 Merge "Clean up handling of <span class="mw-headline">" 2024-04-30 16:48:56 +00:00
jenkins-bot 9f7de3e917 Merge "Remove the "offset" from getHeadlineNodeAndOffset()" 2024-04-30 16:48:54 +00:00
Bartosz Dziewoński 67bad433dc Fix signature check not to look at the content of previous section
It needs to only look at the end of the added comments, and ignore
whatever is going on at the beginning, since the only thing it really
cares about is a functioning "Reply" button at the end.

Bug: T363285
Change-Id: Ia337be754deda741617d1343972f3e0a21c41b05
2024-04-25 23:15:18 +02:00
Bartosz Dziewoński 3cffe1190e Clean up handling of <span class="mw-headline">
The PHP code should never see a `<span class="mw-headline">` node
since MediaWiki change If04d72f427ec3c3730e757cbb3ade8840c09f7d3,
but we have to support it for our old integration tests (T363031).

Improve code comments to explain this and move the handling to
one place, so that it can be deleted more easily in the future.

Follow-up to 08f61b2609.

Change-Id: I5ab9d3373a6911c1456c30d844b66576b278a1b5
2024-04-20 00:16:45 +00:00
Bartosz Dziewoński 7445294b3b Remove the "offset" from getHeadlineNodeAndOffset()
Since c6cd20f682 the offset is always 0.

Change-Id: I9c1c8230f897d8bb287ca47056f5fa9fb187d060
2024-04-20 00:34:32 +02:00
Umherirrender 6c005d293f Add explicit parentheses around mixed boolean operator
Mixing different binary boolean operators within an expression
without using parentheses to clarify precedence is not allowed (T358966)

Change-Id: I9e9c9531ab0fa373606b19a5865cec748a3f36ff
2024-03-23 00:52:30 +01:00
Bartosz Dziewoński 69e8e948b2 Remove now redundant PHPDoc blocks
MediaWiki's PHPCS plugin requires documentation comments on all
methods, unless those methods are fully typed (all parameters and
return value).

It turns out that almost all of our methods are fully typed already.

Procedure:

1. Find: \*(\s*\*\s*(@param \??[\w\\]+(\|null)? &?\$\w+|@return \??[\w\\]+(\|null)?)\n)+\s*\*/
   Replace with: */
   This deletes type annotations, except those not representable
   as PHP type hints such as union types `a|b` or typed arrays `a[]`,
   or those with documentation beyond type hints, or those on
   functions with any other annotations.

2. Find: /\*\*/\n\s*
   Replace with nothing
   This deletes the remaining comments on methods that had no prose
   documentation.

3. Undo all changes that PHPCS complains about (those comments
   were not redundant)

4. Review the diff carefully, these regexps are imprecise :)

Change-Id: Ic82e8b23f2996f44951208dbd9cfb4c8e0738dac
2024-03-10 23:53:04 +00:00
Bartosz Dziewoński b16dd9dd96 Update PHPCS overrides
There's now a different rule for the same thing in
mediawiki/mediawiki-codesniffer v43.0.0. Also document the reason for
the override. Follow-up to 8b00546749.

Change-Id: I392ee10639ffda6de55b091555e8c3cadd2af485
2024-03-10 22:59:05 +01:00
Umherirrender 8b00546749 build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0
Change-Id: I889efe00ac06fa857fc3ae063193368927bcff7a
2024-03-10 17:36:18 +01:00
Bartosz Dziewoński 6419c19d1e Fix PHP getTitleFromUrl() when $wgArticlePath is '/$1'
In this case, the generated regexp would match the '/local' part in
the generated URL. Prefixing 'https://local' is no longer necessary
since 10899af666.

Add tests for this, and some tests to cover T261711 as well.

Bug: T358321
Change-Id: Idf54deba13f30b799b7b8d17de1897bc90f95701
2024-02-24 02:03:17 +01:00
Bartosz Dziewoński 08f61b2609 Support for core section heading formatting in post-cache transform
We already supported plain headings without the 'mw-headline'
wrappers, but now we need to parse the 'id' from a different
attribute.

Needed-By: If04d72f427ec3c3730e757cbb3ade8840c09f7d3
Bug: T357723
Change-Id: If85f89c40834618f23dc0ace2e599efb3b6d5ed4
2024-02-16 20:28:53 +00:00
Umherirrender 64bcb583e9 Use namespaced classes
Done automatically via script
Change to extension.json done manually

Change-Id: Ied7bbddd357290ac6be6bf480be0ee9116e77365
2023-12-11 16:38:02 +01:00
thiemowmde bbe5bed02d Optimize performance of very hot code paths in CommentUtils
I was curious why running some of the PHPUnit tests in this code base
takes so long. While I could not spot an obvious bottleneck I found
a lot of code that is extremely hot, e.g. called a hundred thousand
times. A few obvious optimizations are possible in this code, e.g.
not calling the surprisingly expensive DOMCompat::getClassList
multiple times.

Change-Id: If22bbc1aedd2c36db1ab2343de5737009050b7bb
2023-10-30 17:29:32 +01:00
Bartosz Dziewoński 781a33357b Use type hints for properties, remove PHPCS overrides
MediaWiki's PHPCS plugin requires documentation comments on all
properties, unless those properties are typed.

This has potential to introduce bugs – in particular, because typed
properties without a default value will throw an exception if their
value is accessed before it's defined, while previously they defaulted
to null. I fixed this when I found it (making them nullable and null
by default), but I may have missed some cases.

Change-Id: If5b1f4d542ce3e1b69327ee4283f7c3e133a62a0
2023-10-19 19:31:02 +00:00
Bartosz Dziewoński f1edc47050 Support ignoring "mw-notalk" before/after/between comments
Since 92f5cfd8 we support "mw-notalk" to suppressing comment detection
in pages or sections.

Until now, it only worked when the comment timestamp was surrounded by
a marked element. However, when a marked element was directly adjacent
to a comment, it would sometimes become a part of the comment range.
This can no longer happen now.

Existing use cases for this were the {{outdent}} and {{tracked}}
templates, which we handle specially since 50ad5bb2 and ddd391b6.
It's a bit ugly to hardcode specific templates like that, and this
provides a better solution for the future. The added test case
displays some other potential uses.

Bug: T324132
Change-Id: I7ffd299ef5957b35da8d01f9a0ed5a7a9a78be83
2023-10-07 00:32:27 +00:00
Umherirrender fd0de6b09a Use namespaced Title
Bug: T321681
Change-Id: I66a498679d0743b7740887c636eca001efc170cd
2023-08-19 20:16:15 +02:00
Ed Sanders e389fc48f1 Always use === in PHP
Change-Id: I30ca7cdf73921dcae48997841816099972cdbed0
2023-07-26 14:29:40 +01:00
Ed Sanders dda86f8ebf Always use the strict equality flag when using in_array
Change-Id: Ia09f5aadc3bbf64645ba174f047e53db49e07925
2023-06-06 13:08:00 +01:00
thiemowmde 8bbbf39bbd Make use of named MainConfigNames::… constants
Also merge setMwGlobals() calls because they are really expensive.

Also utilize the more readable str_contains() and related.

Change-Id: Iebde6aa17c2e366f0c0a98fe13a454f6a06c299b
2023-05-19 12:12:32 +02:00
Ed Sanders 4367595bfd EventDispatcher: Generate dt-added-topic events
Change-Id: I98b67e016995866558274809743fa21ed23ee063
2023-03-20 14:41:21 +00:00
Bartosz Dziewoński 3624d89c8b Don't add custom attributes in unwrapParsoidSections()
This code was unnecessarily copied from VE. It's not needed for
anything in this extension, and it causes the headings to be treated
as modified by selser, which in turn causes dirty diffs.

Bug: T328268
Change-Id: Ibdbed430f2ff28d0ea2e67644075c1621d9fae53
2023-01-31 01:32:02 +01:00
Bartosz Dziewoński 3a9997d6ea Improve handling for comment separators
* Detect comment separators at the end of comments too
* Consider TemplateStyles associated with ignored templates

This unexpectedly improves a lot of cases other than T313097 too,
mostly where <br> or {{outdent}} was used within a paragraph:
splitting comments that were previously jumbled together, or restoring
content that was previously ignored for apps / notifications.

Bug: T313097
Change-Id: I9b2ef6b760f2ffd97141ad7000f70919aeab7803
2023-01-10 01:59:52 +00:00
Bartosz Dziewoński be012ced04 Only match article path until first '?' when parsing links
Bug: T324028
Change-Id: I7aca1a8f20695b9ecd3f63f2d0a3f5684616655e
2022-11-29 17:16:03 +00:00
Bartosz Dziewoński 433e57394c Use PHP 7.4 property types
Change-Id: I788db64f0c0c00894d77256b7f016d44eda4bbb1
2022-10-28 21:56:38 +02:00
jenkins-bot c9dadbfe7d Merge "Remove support for <span class="mw-headline-number"> in headings" 2022-10-26 16:29:41 +00:00
Bartosz Dziewoński c6cd20f682 Remove support for <span class="mw-headline-number"> in headings
This feature has been removed from MediaWiki in change
Ic9ed88f419419cf4cc5cc32010539eea8b76314b.

Change-Id: If11b33589f47eab614f5129b38e80d0f3cafa083
2022-10-25 18:59:05 +00:00
Bartosz Dziewoński 8664de52d1 Don't insert comment markers inside <figure>
…when wgParserEnableLegacyMediaDOM=false. See task for details.

Bug: T320285
Change-Id: I397cb70f915bb8d974fe2796198d252b1be9a368
2022-10-08 23:23:54 +02:00
Bartosz Dziewoński ddd391b6db Ignore "tracked" templates at the beginning of comments
This improves the behavior when replying to these comments
and the message snippets shown in notifications.

Bug: T313097
Change-Id: Ia10400472c9e999fa526c7437a03b72461c37b74
2022-07-31 03:56:36 +02:00
David Lynch ec0e2920ae API ThreadItemsHTML: improve generation of othercontent
Othercontent would often contain the opening tag of the next heading /
section. By looking for the closest node with a previousSibling we can
more-reliably escape the heading.

Also, only add the initial placeholder if there's content before the
first heading. We do this by testing for any siblings before the
startContainer of the first heading -- if there are any, assume this
means there's some sort of content. (This can still result in a
placeholder with `othercontent:""` if there's only whitespace before
the first heading.)

Bug: T313850
Change-Id: I080205b74413c46d3cf3442e79276145aaa9439c
2022-07-28 02:51:18 -05:00
Bartosz Dziewoński 880f9755e0 Separate ContentThreadItem and DatabaseThreadItem etc.
Rename ThreadItem to ContentThreadItem, then create a new ThreadItem
interface containing only the methods that we'll be able to implement
using only the persistently stored data (no parsing), then create a
DatabaseThreadItem. Do the same for CommentItem and HeadingItem.

ThreadItemSet gets a similar treatment, but it's basically only for
Phan's type checking. (This is sad.)

Change-Id: I1633049befe8ec169753b82eb876459af1f63fe8
2022-07-04 23:35:50 +02:00
Ed Sanders 4accd2fc7e Add some missing typehints
Change-Id: Idb111dd907972d9e02dab4b26c3fc106b12b1035
2022-06-29 15:15:52 +00:00
Ed Sanders af54bae2ec Prefer late static binding over self::
While in many cases the class will never be sub-classed, it's easier
just to always use static:: and not worry about predicting which
classes might have problems in the future.

Change-Id: I23072a1701b5acf62bb3379a877de97627d8fcf3
2022-06-09 15:12:48 +01:00
jenkins-bot 35b3fd2fc0 Merge "CommentParser: Replace uses of Title with TitleValue" 2022-03-23 01:14:16 +00:00
Bartosz Dziewoński c5375e05b9 CommentUtils: Fix isSingleCommentSignedBy() with empty heading
Change the order of checks to ensure that we have at least one comment
before we try comparing ranges, to avoid issues with empty headings
having collapsed ranges. It should be a tiny bit faster this way, too.

Bug: T304377
Change-Id: I59ad30cfc075dcec882e048d2d199744efec2114
2022-03-22 00:12:42 +01:00
Bartosz Dziewoński c7723baf72 CommentParser: Replace uses of Title with TitleValue
Another small step towards removing the reliance on global state.

Change-Id: Ifb4a5bcbef6606d02f1c7aa7385d72822cb0bad0
2022-03-18 18:24:34 +00:00
Bartosz Dziewoński 01b253c5b6 Don't allow the root node to be treated like a comment frame
Also fix a bug where headings would be ignored while checking for
comment frames. See task for detailed explanation.

Bug: T303396
Change-Id: I6495826b4b050ea80680e0798ac6ab4497a7c09e
2022-03-10 17:45:08 +00:00
jenkins-bot dd24b0edcd Merge "Improve handling for comments after fake headings using wikitext ;" 2022-03-10 16:21:18 +00:00
Bartosz Dziewoński 08c79142fb ImmutableRange: Add @property annotations for magic props
Phan can analyze them now and reports some issues with types.

* Add some assertions on types where we're sure that we're using an
  Element or non-null, but Phan can't prove it
* Fix incorrect type hints on getFullyCoveredSiblings() and
  getCoveredSiblings(), luckily it was harmless

Change-Id: I8cc12450378efa7434c4d66882378b715edd4a70
2022-03-08 23:29:40 +00:00
Bartosz Dziewoński 0e576216b2 CommentUtils: Fix confusing types in getIndentLevel()
Change-Id: I548cf4ad54e92c22da64caf53ee028a906cd3b62
2022-03-08 23:29:15 +00:00
Bartosz Dziewoński 584f6a020c Use tagName rather than nodeName when we know the node is an element
`tagName` is only defined on Element, and it returns its tag name.

`nodeName` is defined on Node, and it returns the tag name for Elements,
and a string like '#text' or '#document-fragment' for other types.

We were using both, which made it harder to reason about what types
we're dealing with.

Change-Id: I8e621e5872bdf78c84ec553cfbfcdbf0192f0589
2022-03-08 23:29:05 +00:00
Bartosz Dziewoński 063174e71c Use instanceof for checking for text/element nodes in PHP
It is friendlier for static analysis tools like Phan, which can't
infer anything from the `->nodeType === …` checks, and we were already
using it in most places.

Fix newly revealed Phan failures (and one unneeded suppression).

Change-Id: Id789f05e16a210f7ba22ca7514587c392fac0741
2022-03-08 23:28:39 +00:00
Bartosz Dziewoński b2ee19b441 Remove check for CDATA nodes
Added in 76289cdf73,
should no longer be needed since we switch to Parsoid's
HTML parser in 3e6ab2c4d2.

Change-Id: Ic0b7ed8089b71f2338e604f68d547759e069f0b2
2022-03-04 22:14:41 +01:00
jenkins-bot 3c91a800ed Merge "Improve detecting already signed comments" 2022-03-02 14:14:13 +00:00
jenkins-bot e4fa34f025 Merge "Don't insert comment markers inside replaced elements (like <video>)" 2022-02-28 17:16:11 +00:00
Bartosz Dziewoński 1e3ce9c88a Don't insert comment markers inside replaced elements (like <video>)
Also special-case thumbnail wrappers generated by
MediaTransformOutput::linkWrap, for compatibility with
TimedMediaHandler.

Bug: T301427
Bug: T302296
Change-Id: I7f48d8b2261507c5a33526c54109f5187d062ed3
2022-02-22 15:11:34 +00:00
Bartosz Dziewoński 0ecc8a4c05 Improve detecting already signed comments
Previously, we required a signature at the end of the comment.
This was a pretty rough heuristic that did not correctly handle
many comments that we would consider entirely properly signed
in CommentParser (e.g. comments wrapped in formatting like
<small>…</small>, comments with a post-scriptum or in parentheses,
or comments generated by various templates).

Now we process the user input using the same code that adds reply
links, and only add a signature when we detect that there really
isn't a signature (including template-generated), or if the signature
is in the wrong place and would result in the reply link showing up
in the wrong place as well (not at the end of the comment).

Bug: T278442
Bug: T268558
Bug: T278355
Bug: T291421
Bug: T282983
Change-Id: I46b6110af328ebdf93b7dfc2bd941e04391a1599
2022-02-21 21:21:26 +00:00
Bartosz Dziewoński aea36bab3a CommentParser: Fix a small use of global state
Also, in ThreadItem::getSinglePageTransclusionTitle(), we don't need
this terribly complicated method.

Change-Id: If02c09aaa2f4dd66b2bc253a1edec4ea107564ee
2022-02-21 18:15:31 +00:00
Bartosz Dziewoński e414d1acaf Improve handling for comments after fake headings using wikitext ;
Bug: T265964
Change-Id: I77db68928c5426fd885a277eec52c6e164d559bb
2022-02-11 23:35:32 +00:00