Commit graph

29 commits

Author SHA1 Message Date
thiemowmde e739e6c4df Rewrite ImmutableRange::findCommonAncestorContainer for speed
While this method is not a huge bottleneck in this codebase it still
sticks out because it calls end() and array_pop() literally millions
of times. (Tested by running the unit test suite, which currently
takes about 30 seconds on my machine.)

Because of the way the method is used in this codebase (see especially
ImmutableRange::computePosition) $a is almost always a sub-element of
$b, or the other way around. It's almost never necessary to go all the
way up to the root element. We can use this additional knowledge and
stop much, much earlier. The extra code is worth it because we know it
will succeed very, very often.

When I measure the runtime of this method alone it goes down to less
than 1% of the previous runtime. The final loop at the end of the
method is almost never executed now (about 30 times in 15,000).

I also micro-benchmarked the final loop and optimized it to work with
passive array-indexes instead of actively manipulating the array with
array_pop().

Change-Id: Iffcaa8848780a85fde38e322649050c687567f29
2023-11-02 13:57:23 +01:00
Bartosz Dziewoński 781a33357b Use type hints for properties, remove PHPCS overrides
MediaWiki's PHPCS plugin requires documentation comments on all
properties, unless those properties are typed.

This has potential to introduce bugs – in particular, because typed
properties without a default value will throw an exception if their
value is accessed before it's defined, while previously they defaulted
to null. I fixed this when I found it (making them nullable and null
by default), but I may have missed some cases.

Change-Id: If5b1f4d542ce3e1b69327ee4283f7c3e133a62a0
2023-10-19 19:31:02 +00:00
Bartosz Dziewoński af68c835bb Update exception handling for new code conventions
Change code to match the documented consensus formed on T321683:
https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP#Exception_handling

* Do not directly throw Exception, Error or MWException
* Document checked exceptions with @throws
* Do not document unchecked exceptions

For this extension, I think it makes sense to consider DOMException an
unchecked exception too (in addition to the usual LogicException and
RuntimeException).

Depends-On: Id07e301c3f20afa135e5469ee234a27354485652
Depends-On: I869af06896b9757af18488b916211c5a41a8c563
Depends-On: I42d9b7465d1406a22ef1b3f6d8de426c60c90e2c
Change-Id: Ic9d9efd031a87fa5a93143f714f0adb20f0dd956
2023-01-22 18:17:11 +00:00
Ed Sanders e57cf6c60f ImmutableRange: Add surroundContents method
Also adds extractContents method.

Change-Id: Ie81dab12007b4038ebaaae904a1c032f80ef43cb
2022-10-04 16:51:08 +02:00
Ed Sanders c3efa2fd89 ImmutableRange: Fix setStart/setEnd to avoid backwards range
The new behaviour better matches the DOM spec.

Change-Id: Ib795e99cdaeedfb1d9e03eb888dd0a243028a61e
2022-07-11 21:40:09 +00:00
Ed Sanders af54bae2ec Prefer late static binding over self::
While in many cases the class will never be sub-classed, it's easier
just to always use static:: and not worry about predicting which
classes might have problems in the future.

Change-Id: I23072a1701b5acf62bb3379a877de97627d8fcf3
2022-06-09 15:12:48 +01:00
Bartosz Dziewoński 05996d71fe ImmutableRange: Fix Phan suppression
Change-Id: I9549ab3113e3255375e69e3337f3a4aadfa9b21e
2022-03-18 23:27:58 +00:00
Bartosz Dziewoński 08c79142fb ImmutableRange: Add @property annotations for magic props
Phan can analyze them now and reports some issues with types.

* Add some assertions on types where we're sure that we're using an
  Element or non-null, but Phan can't prove it
* Fix incorrect type hints on getFullyCoveredSiblings() and
  getCoveredSiblings(), luckily it was harmless

Change-Id: I8cc12450378efa7434c4d66882378b715edd4a70
2022-03-08 23:29:40 +00:00
Bartosz Dziewoński ae9f26a9e5 Various code quality tweaks
(suggested by PhpStorm)

composer.json:
* Document required PHP extensions

Parser.js:
* Remove incorrect param documentation
* Fix some typos in comments (missing parentheses)

CommentParser.php:
* Fix some typos in comments (missing parentheses)

ImmutableRange.php:
* Remove unused property
* Add a `throw` to indicate that code path is unreachable

SubscribedNewCommentPresentationModel.php:
* Add missing `return false`

CommentParserTest.php:
* Remove unnecessary pass-by-reference

CommentModifierTest.php:
* Remove unused variable

CommentParserTest.php:
* Don't construct Element objects directly. PHP's DOMElement allows
  it, but Parsoid/Dodo's doesn't, and we use the latter for static
  analysis. This generates all kinds of confusing warnings.

Change-Id: Ia9598ebea0e99830dd485296e94a9d96acc4b258
2022-02-19 19:36:52 +01:00
Bartosz Dziewoński 12aa136f54 Add Range::compareBoundaryPoints
Change-Id: I7b8b9f196bb1ed95304d8b7755d51246c4515bc9
2021-12-17 00:55:00 +00:00
Ed Sanders ff46f37b56 ImmutableRange: Port over changes from upstream
* Explode boundary point tuples passed to computePosition
* Note that we won't use previousSibling instead of array_reverse
* Simplify logic using xor

Change-Id: I927256e31b5e441aade91b4fd0d83d8f0d89afbe
2021-10-21 12:42:27 +01:00
Bartosz Dziewoński c1f4668806 Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes
The PHP DOM extension measures lengths and offsets in Unicode codepoints.
Our PHP code used UTF-8 bytes, causing some offsets to be slightly off.
Now it mostly uses Unicode codepoints as well (we're forced to use bytes
in a few places, because preg_match returns offsets in bytes).

In practice, this had no visible effect to the user. It caused the
markers `<span data-mw-comment-end="..."></span>` to be placed at
the end of their container instead of the correct position when the
timestamp contained multibyte characters (e.g. "ź" in Polish); but
the correct position is usually at the end of the container anyway.

In the test cases, the only difference is placing these markers before
a trailing line break inside `<p>...</p>` tags rather than before it.

The patch also accidentally fixes another bug, where element nodes
with no children (mostly <img>) were incorrectly excluded when calling
cloneContents(), because they were treated as if they were text nodes.

Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee
2021-09-27 19:04:16 +00:00
Bartosz Dziewoński 8b73612585 ImmutableRange: Remove Phan suppression
We can just use insertBefore() normally. There was never a PHP bug,
but rather a Phan bug, and it no longer affects us.
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/596813/70/includes/ImmutableRange.php#373

This reveals a bug in CommentParser where it sometimes produces
incorrect ranges (we were incorrectly treating `false` like `null`,
hiding the issue). I'll fix it in a separate commit.

Follow-up to 25272e7a4a.

Change-Id: I4afba38f1d82ddbf8732bfe3e4d4f6ebe2f8de5d
2021-08-02 17:51:26 +02:00
C. Scott Ananian 25272e7a4a Don't refer directly to PHP dom extension classes; avoid nonstandard behavior
These changes ensure that DiscussionTools is independent of DOM
library choice, and will not break if/when Parsoid switches to an
alternate (more standards-compliant) DOM library.

We run `phan` against the Dodo standards-compliant DOM library,
so this ends up flagging uses of non-standard PHP extensions to
the DOM.  These will be suppressed for now with a "Nonstandard DOM"
comment that can be grepped for, since they will eventually
will need to be rewritten or worked around.

Most frequent issues:

* Node::nodeValue and Node::textContent and Element::getAttribute()
can return null in a spec-compliant implementation.  Add `?? ''` to
make spec-compliant results consistent w/ what PHP returns.

* DOMXPath doesn't accept anything except DOMDocument.  These uses
should be replaced with DOMCompat::querySelectorAll() or similar
(which end up using DOMXPath under the covers for DOMDocument any way,
but are implemented more efficiently in a spec-compliant
implementation).

* A couple of times we have code like:
  `while ($node->firstChild!==null) { $node = $node->firstChild; }`
and phan's analysis isn't strong enough to determine that $node is still
non-null after the while.  This same issue should appear with DOMDocument
but phan doesn't complain for some reason.

One apparently legit issue:

* Node::insertBefore() is once called in a funny way which leans on
the fact that the second option is optional in PHP.  This seems to be
a workaround for an ancient PHP bug, and can probably be safely
removed.

Bug: T287611
Bug: T217867
Change-Id: I3c4f41c3819770f85d68157c9f690d650b7266a3
2021-07-30 18:15:40 -04:00
libraryupgrader b0884b177c build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)
* glob-parent: 5.1.1 → 5.1.2
  * https://npmjs.com/advisories/1751 (CVE-2020-28469)
* trim-newlines: 3.0.0 → 3.0.1
  * https://npmjs.com/advisories/1753 (CVE-2021-33623)

Change-Id: I7a71e23da561599da417db3b3077b78d91173bbc
2021-07-22 16:29:04 +00:00
Ed Sanders 7d349808c2 ImmutableRange: Guard against appending empty fragments
This triggers a PHP warning.

Change-Id: I5ccb204287d55b38fadcef8cc846400a277e8491
2020-11-16 19:22:26 +00:00
Ed Sanders b03165fcce Compare node positions using upward traversal
A TreeWalker ends up walking potentially every single subsequent
node in the document looking for a target node. Instead use upwards
traversal to find a common ancestor, then sibling traversal to
compare document order.

This makes calling cloneContents on every comment on a 300k talk page
significantly faster, going from >30s to 500ms locally.

Change-Id: I28a2b8c11d4098d9bc44d19b98e19ccc02273098
2020-11-16 19:22:10 +00:00
jenkins-bot 5a5b2e61f1 Merge "ImmutableRange: Avoid doing expensive TreeWalker computation twice" 2020-11-15 14:43:01 +00:00
jenkins-bot 5937226f5f Merge "ImmutableRange: Skip redundant calls to isFullyContainedNode()" 2020-11-15 14:42:50 +00:00
Thiemo Kreuz 8ffe0d55da Remove comments that literally repeat what the code says
Change-Id: Ib928cf61dc512fbbf39a3279789376d635a82c52
2020-11-11 09:31:59 +01:00
Bartosz Dziewoński 6f2ada2bb4 ImmutableRange: Avoid doing expensive TreeWalker computation twice
If A follows B, then we can assume that B does not follow A.
Calling the function recursively computes that twice,
we can instead make some simple changes to "invert" the result.

Change-Id: I709aca7cb997dd2fe3980468a8c6bde6f366fb5b
2020-10-22 23:39:14 +02:00
Bartosz Dziewoński a29cecdf73 ImmutableRange: Skip redundant calls to isFullyContainedNode()
It's an expensive method, and we previously called it for
every child of the common ancestor, completely unnecessarily.

These changes follow from two observations:
* If there is a $firstPartiallyContainedChild, then the
  first fully contained child must follow it; similarly,
  if there is a $lastPartiallyContainedChild, then the
  last fully contained child must precede it.
* All nodes between the first and last fully contained
  children are also fully contained.

Maybe it can be made cleverer still, but it's a lot better.

Change-Id: I4e596c62274c2c0be115f0ddec42629115b430a4
2020-10-22 23:31:21 +02:00
Bartosz Dziewoński 1ad6389292 ImmutableRange: Optimize parent check in computePosition()
We can check whether a node is a child of another node directly,
without iterating over all its children.

Change-Id: I3a26df89365bf765348d96b477c983ec9c4e43fe
2020-10-20 11:14:00 +02:00
Bartosz Dziewoński fe520ab175 ImmutableRange: Remove unused variable
The only use was removed in code review, but we missed this.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/596813/70..71/includes/ImmutableRange.php

Change-Id: Ia761a6a350711c32f0bd4c8af48c7c1a35a4afb4
2020-10-15 11:23:43 +00:00
Ed Sanders 64392af485 Add reply links on the server
Bug: T252555
Change-Id: I4e60fdbc098c1a74757d6e60fec6bcf8e5db37c1
2020-10-08 13:27:08 +01:00
Ed Sanders e329027bf3 Implement cloneContents on ImmutableRange
Change-Id: Ife6bdbe8d7af2da6662814459f0e8726482034f4
2020-09-12 13:19:35 +01:00
James D. Forrester d6c3df31f5 Remove various phan suppressions and fix issues
Change-Id: I73b535f284566a0a8876a3198b9784b47567fac6
2020-06-12 20:35:59 +01:00
Umherirrender 48e860916a build: Add mediawiki/mediawiki-phan-config
Replace phan-taint-check-plugin by phan, it is now included

Change-Id: I0e682a83afd30faa8967e3c586431be4ae9a29b3
2020-06-10 22:21:07 +02:00
Ed Sanders b3ca37c1c5 Create ImmutableRange class in PHP
TODO: Create one in JS as well

Change-Id: I6c9dc2455afcb8d0b68674a2985c5e43dd94b6fb
2020-05-22 15:01:09 +01:00