Commit graph

9 commits

Author SHA1 Message Date
Ed Sanders b78fb3f4c1 Move all PHP to the MediaWiki\Extension\DiscussionTools namespace
Change-Id: I654ebb3e646a6d8d62f7bd14d48805e39f836d7e
2020-05-15 21:57:13 +02:00
Ed Sanders 340572bc05 Create a Utils class in PHP
Also move htmlTrim to utils in JS.

Change-Id: Ia5356d713c1c5d521c396cc28bcd4ecc7ee5bbbb
2020-05-15 00:25:32 +01:00
Ed Sanders a3889fd400 Port modifier.js to PHP
Change-Id: I03b9e4377cb3ce6a5ca9d06e49dca9b2516f4979
2020-05-15 00:20:41 +01:00
Bartosz Dziewoński 6f32369b6a tests: Fix comparing PHP and JS ranges
In JS, strings are internally encoded as UTF-16, and properties like
.length return values in UTF-16 code units.

In PHP, strings are internally encoded as UTF-8, and we have the
option of using methods that return bytes like strlen() or UTF-8 code
units like mb_strlen().

However, the values produced by preg_match( …, PREG_OFFSET_CAPTURE )
are in bytes, and there's nothing we can do about that. So let's use
bytes throughout, mixing the two types results in meaningless numbers.

Then in the test code, we have to calculate UTF-16 code units offsets
based on the UTF-8 byte offsets.

We also have to copy the entire workaround for mw:Entity nodes… Maybe
the parser should be fixed to return the real nodes for ranges' ends
in this case.

Change-Id: I05804489d7de0d60be6e9f84e6a49a885e9fb870
2020-05-14 22:37:34 +00:00
Bartosz Dziewoński 76289cdf73 tests: Fix failures due to CDATA handling in PHP
It appears PHP's DOM library always uses CDATA nodes for the contents of
<style> tags, even if there is no such markup in the source HTML.

Change-Id: Id04b27086c5e7a0b016a3a440b2b4895d6b13c93
2020-05-14 22:37:23 +00:00
Bartosz Dziewoński 33d69e26c9 tests: Fix different whitespace trimming in PHP and JS
Notably, JS trims the no-break space, while PHP doesn't. There are
some other differences that don't come up in our tests. What we really
want is to trim the ASCII whitespace as defined in the HTML spec.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim
https://www.php.net/manual/en/function.trim.php
https://infra.spec.whatwg.org/#ascii-whitespace

Change-Id: I95b8fb38878716a2fa7ec84c9f2e8065ebe77c0d
2020-05-14 21:37:26 +00:00
Ed Sanders 076c6d828a Parser: Use Element instead of Node when appropriate
Change-Id: I67015989954e61810302bf8931d05419f007ffeb
2020-05-13 14:46:36 +01:00
Bartosz Dziewoński b8d7a75c34 Fix performance of DiscussionToolsCommentParser::childIndexOf()
Profiling reveals that >87% of the run time of our test suite is spent
in this tiny method. Apparently, DOMNodeList::item() is extremely slow
(possibly it's linear time instead of constant time?).

Profiled using XDebug and KCacheGrind:
https://phabricator.wikimedia.org/F31815264

We can calculate the child's index in its parent by counting its
precending siblings instead, which turns out to be much faster.

Before:
 1. 275444ms to run DiscussionToolsCommentParserTest:testGetComments with data set #2
 2. 12668ms to run DiscussionToolsCommentParserTest:testGetComments with data set #3
 ...
After:
 1. 9545ms to run DiscussionToolsCommentParserTest:testGetComments with data set #2
 2. 5549ms to run DiscussionToolsCommentParserTest:testGetComments with data set #3
 ...

That's still kind of slow but now it's bearable to run the test suite.

Change-Id: I49155f7aa2e231a9a20bf282cf6aaa28fc902e0b
2020-05-13 02:56:39 +02:00
Roan Kattouw 7b7a2cd69c The Great Parser JS to PHP port of 2020!*
* Not to be confused with the Parsing Team's
"Great Parser JS to PHP port of 2019"

Gasp as OR hacks are changed to null coalescing operators.
Applaud as variable declarations are dropped.
Cheer as parameters and return values are type-hinted.
Shudder as DomNodeLists have no indexOf method.

Moving discussion parsing to the server should allow
us to implement much cleaner APIs for commenting.

Bug: T252252
Co-authored-by: Ed Sanders <esanders@wikimedia.org>
Change-Id: Ic1438d516e223db462cb227f6668e856672f538c
2020-05-12 12:33:04 +01:00