Commit graph

50 commits

Author SHA1 Message Date
Ed Sanders d5376e28fc Improve ThreadItem documentation
Change-Id: Ia266fc22b02af0edbb32f356b4e0d92fe3a4da5f
2020-06-26 14:56:19 +02:00
James D. Forrester d6c3df31f5 Remove various phan suppressions and fix issues
Change-Id: I73b535f284566a0a8876a3198b9784b47567fac6
2020-06-12 20:35:59 +01:00
Ed Sanders 7be0cc3209 Create ThreadItem classes
Change-Id: Id2c5324d74eccb1209ccb76768c557722c6d9400
2020-06-12 20:35:59 +01:00
Ed Sanders da433037a3 Move getTranscludedFromElement to Utils
Change-Id: I8bdd757f949c013ba426150a192d71243fadf45d
2020-06-01 22:32:23 +01:00
Bartosz Dziewoński 79ae8a32c5 Support parsing when timestamp is wrapped in a link
Bug: T252059
Change-Id: Ib8952fb80503bad407e8d0fe725103a0fae12a6a
2020-05-27 22:47:17 +02:00
Bartosz Dziewoński 01b4a8f4f4 Support replying when timestamp is template-generated
* Move modifier#getFullyCoveredWrapper to utils
* Use that method to find the node where we start searching for
  template wrappers, rather than using endContainer

Bug: T252058
Change-Id: I55de58102f3468fce01290bd413a7fdc96d322d6
2020-05-27 21:16:03 +02:00
Bartosz Dziewoński 8f583a23be Use the faster childIndexOf() approach in JS too
I was wondering if the different approach to childIndexOf()
implemented in PHP in b8d7a75c34 would
be faster in JS as well, and yes, it is.

Our test suite now takes (on my machine):
* Chrome: 8337 ms → 7355 ms (average over 5 tries)
* Firefox: 5321 ms → 5044 ms (average over 5 tries)

Change-Id: I71963eeb92dcea9bfd59cbf01a7aa0b7de5d9cf1
2020-05-22 19:33:35 +02:00
Bartosz Dziewoński 515af82061 Reduce duplication between PHP parser and data gen for JS parser
Also, make the handling of TranslateNumerals and digitsRegexp the same
between PHP and JS.

Change-Id: I1d81343d0b59ab3ecd59ba1c2ad99a729d983ac4
2020-05-19 20:54:44 +02:00
jenkins-bot 366aca2ccd Merge "Stop printing console warnings" 2020-05-18 22:52:30 +00:00
Bartosz Dziewoński 219339551c Stop printing console warnings
It was useful when I was debugging those parts of the code, but now
it's usually annoying.

The warnings can still sometimes be useful for understanding how the
tool parses some discussion, though. To keep that functionality, add
displaying warnings for each comment in the debug mode.

Change-Id: I2d218a8a394f179bcc0990ff988a0567c275ccf2
2020-05-18 23:37:37 +02:00
Ed Sanders 607440498e Spell check pass
Change-Id: Ia20da358297126bd52a968bd77c960f81fe82b8f
2020-05-18 21:24:14 +00:00
Bartosz Dziewoński c848d8a90e Parser tweaks
Follow-up to Ic1438d516e223db462cb227f6668e856672f538c.
Minor corrections and comment improvements in PHP parser,
and "backporting" some changes to JS parser that I like.

Change-Id: I5e54121914ec6b323e556dd133bcb71b3aefbb61
2020-05-18 19:53:26 +00:00
Ed Sanders b78fb3f4c1 Move all PHP to the MediaWiki\Extension\DiscussionTools namespace
Change-Id: I654ebb3e646a6d8d62f7bd14d48805e39f836d7e
2020-05-15 21:57:13 +02:00
Ed Sanders 340572bc05 Create a Utils class in PHP
Also move htmlTrim to utils in JS.

Change-Id: Ia5356d713c1c5d521c396cc28bcd4ecc7ee5bbbb
2020-05-15 00:25:32 +01:00
Bartosz Dziewoński 76289cdf73 tests: Fix failures due to CDATA handling in PHP
It appears PHP's DOM library always uses CDATA nodes for the contents of
<style> tags, even if there is no such markup in the source HTML.

Change-Id: Id04b27086c5e7a0b016a3a440b2b4895d6b13c93
2020-05-14 22:37:23 +00:00
Bartosz Dziewoński 33d69e26c9 tests: Fix different whitespace trimming in PHP and JS
Notably, JS trims the no-break space, while PHP doesn't. There are
some other differences that don't come up in our tests. What we really
want is to trim the ASCII whitespace as defined in the HTML spec.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim
https://www.php.net/manual/en/function.trim.php
https://infra.spec.whatwg.org/#ascii-whitespace

Change-Id: I95b8fb38878716a2fa7ec84c9f2e8065ebe77c0d
2020-05-14 21:37:26 +00:00
Ed Sanders 076c6d828a Parser: Use Element instead of Node when appropriate
Change-Id: I67015989954e61810302bf8931d05419f007ffeb
2020-05-13 14:46:36 +01:00
Ed Sanders 77f1be3bd1 parser.js: Fix some typos and variable names
Change-Id: Ib80fd87893f8fad47eaeac8edabe03472b78b75c
2020-05-08 20:22:49 +01:00
Ed Sanders 4f22e32304 Fix typo in parser.js 'dl' => 'dd'
Doesn't actually affect the behaviour as we are
computing indentation level.

Change-Id: I8b6824460ffbee1c0acb361c26503d3b6c439025
2020-05-08 14:24:15 +01:00
jenkins-bot f0f457562d Merge "Handle &nbsp; and other entities in the timestamp/timezone" 2020-05-07 19:44:12 +00:00
Ed Sanders 44fd8ec0ef Parser: Return mw.Title from getTitleFromUrl
Change-Id: I2c3ce50c5ff7486d6c1f3a0c26cb3a9f1c252b60
2020-05-05 21:34:46 +00:00
Bartosz Dziewoński 0269626adc Handle &nbsp; and other entities in the timestamp/timezone
Bug: T251838
Change-Id: Iba8d7c71e332c63229eec4bc7c80b10627135784
2020-05-05 22:38:16 +02:00
Ed Sanders 5da26e2d0b parser: Have getAuthors return a list
Change-Id: Ieda881a26f802a9e8c73822fcbcc70dd6035bbb6
2020-05-05 12:13:12 +01:00
Bartosz Dziewoński 4ad9aa8449 parser: Don't crash on links to invalid titles
Bug: T251045
Change-Id: I1a7b8b9b34d38ec4e3edc12687bdf99bb0e61bd8
2020-04-27 19:59:26 +02:00
Bartosz Dziewoński dab37fd7b4 parser: Make #getTranscludedFrom return page title in text form
It's more convenient for display or comparing it with other things.

Depends-On: I03bc455d5484a6c51f3fa2397c64936b829fe7e3
Change-Id: I88d7aa68977210b16860075ed52983a5e99ee0f7
2020-03-24 22:29:35 +00:00
Bartosz Dziewoński e9b583d1c3 parser: Improve merging multiple comments on one line
Now also works if the "follow-up" comment is wrapped in e.g. `<small>`.

Change-Id: Ic37cb6afdb42021f109a1818f5c4299d907ed094
2020-03-14 13:34:42 +00:00
Bartosz Dziewoński 04365c0188 Merge RL modules which are only loaded by 'ext.discussionTools.init'
Bug: T240474
Change-Id: I1b83aa18666be8f1ea6a3602b299f92574d42cb7
2020-03-14 14:33:23 +01:00
jenkins-bot 2e0c299a1f Merge "Fix signatureRanges overlapping for some comments" 2020-03-09 20:31:54 +00:00
Bartosz Dziewoński e3e4ef9de4 parser: Detect comments transcluded from another page
When trying to reply to a comment that is inside a transclusion,
detect if it's transcluded from a subpage or simply wrapped in a
template, and show appropriate error messages.

References:
* VisualEditor ve.dm.Converter#getAboutGroup()
* VisualEditor ve.dm.ModelRegistry#matchElement()
* Parsoid Linter#findEnclosingTemplateName()

Bug: T245694
Change-Id: If3dd1ebbf1d02ee4379c200019bfc3a8ec02325b
2020-03-09 20:28:56 +01:00
Bartosz Dziewoński b4029c3c58 Fix signatureRanges overlapping for some comments
If two signatures for a single comment were near each other,
we would sometimes treat them as one huge signature.

Change-Id: Ied4b3aa535a9ca6bebef8a004ae48b7d5a8f2f9b
2020-03-09 13:28:22 +00:00
Bartosz Dziewoński 0ca851aa92 parser: Return signature and timestamp ranges
Currently not used for anything. May be used later for editing
comments (T245225) or reformatting timestamps (T240360).

Note that a comment may have multiple signatures+timestamps,
and we return them all so that you have to deal with that.

Fix some unrelated incorrect documentation comments.

Bug: T245220
Change-Id: I51b8bf4a3bb7968f35e32c7e44c95c2ab079d9ac
2020-03-05 14:28:17 +01:00
Bartosz Dziewoński ea26009896 Work around mw.Uri crash on fallback encoding in links
Bug: T245889
Change-Id: I182f9ffa84a3b3cf4afafd536360572eda9d2714
2020-02-29 19:08:01 +01:00
Bartosz Dziewoński e9c401e3aa Ignore LRM and RLM before timezone indicator
They are not generated by MediaWiki, but they often appear when users
sign others' unsigned comments by copy-pasting the timestamp from the
history page.

Add test config data for nlwiki, exported by running this in the
browser console:

  copy(
    JSON.stringify( { wgArticlePath, wgNamespaceIds, wgFormattedNamespaces }, null, 2 ) + '\n' +
    JSON.stringify( mw.loader.moduleRegistry['ext.discussionTools.parser'].packageExports['data.json'], null, 2 )
  );

Bug: T245784
Change-Id: Icbcdc5a028e9ce2cb09173f87769e525ec3082fc
2020-02-25 00:20:00 +00:00
Bartosz Dziewoński ff0386239f Only detect comments with real signatures
Consequences of this are visible in the test cases:

* (en) Tech News posts are not detected.
  Examples: "21:22, 1 July 2019 (UTC)", "21:42, 29 July 2019 (UTC)"

* (en) Comments by users who customize the timestamp are not detected.
  Examples: "10:49, 28 June 2019 (UTC)", "21:34, 14 July 2019 (UTC)"

* (en) Comments with signatures missing a username are not detected.
  This sometimes happens if a comment is accidentally signed with
  '~~~~~' (five tildes), which only inserts the timestamp.
  Examples: "17:17, 27 July 2019 (UTC)", "10:25, 29 July 2019 (UTC)"

* (pl) A lone timestamp at the beginning of a thread is not detected.
  It's not part of a post, it was added to aid automatic archiving.
  Example: "21:03, 18 paź 2018 (CET)"

Bug: T245692
Change-Id: I0767bb239a1800f2e538917b5995fc4f0fa4d043
2020-02-21 01:30:54 +01:00
Bartosz Dziewoński 7761f62b42 Fix edit summary for comments in 0th section (no heading)
Bug: T245765
Change-Id: I9eb4726ef096b8d7459cc1409814514ec1dc89ae
2020-02-21 00:44:42 +01:00
Bartosz Dziewoński e5e6fdd3af Stop using native Range objects, they're too annoying
Native Range objects are automatically updated when the DOM elements
they refer to are affected (e.g. detached from the DOM, or their offset
changes because of siblings being added/removed).

This seemed harmless or maybe even slightly useful, but it turns out
it conflicts with VisualEditor, which has to wrap the entire page in a
new DOM node when it opens (and unwrap it when it closes), effectively
temporarily detaching it from the DOM, which destroys all our ranges.

Just use a plain object that stores the same data as a Range. And when
we need to use Range's API, we can simply construct a temporary one.

Bug: T241861
Change-Id: Iee64aa3d667877265ef8a59293c202e6478d7fb6
2020-02-05 19:42:03 +01:00
Bartosz Dziewoński e29b8173bf Handle comments before first section heading
The loop in parser.js assumed that there was always a heading before
any comments (not counting the page title, only section headings).

Bug: T243869
Change-Id: I3a0bb06716e75d4a17e25c40748673a071ee5f30
2020-01-30 00:14:46 -08:00
Bartosz Dziewoński 30fcfec1fd parser: Merge multiple comments on one line
Even when you have multiple signatures by multiple users in one
paragraph (or list item), it's still basically a single comment.
We don't want to offer multiple buttons to reply to it.

The changed parser test cases are illustrative:
* All affected comments in the "pl" example are comments with a
  "post-scriptum", which is now more intuitively treated as part of
  the main comment.
* The first comment in the "en" example would probably have been
  better if it wasn't merged, but a weird use of the outdent template
  causes us to not be able to distinguish that the two parts of the
  comment display on separate lines.
* The last comment in the "en" example (isn't that neat?) was previously
  incorrectly treated as two comments, because there's a timestamp in
  the middle of it (the user is referring to another comment).
* Remaining affected comments in the "en" example are also comments
  with a "post-scriptum" and their treatment is clearly better now.

It also accidentally fixes some problems with modifier tests (but not
all), where previously <dl> nodes would be inserted in the middle of
<p> nodes, to reply to the comments which are now merged.

Bug: T240640
Change-Id: I0f2d9238aff75d78286250affd323cd145661a11
2020-01-22 02:21:43 +01:00
Bartosz Dziewoński da668b72d5 Identify comments by username+timestamp+seq
Possible use cases:
* Matching comments between PHP and Parsoid HTML [implemented here]
* Finding the same comment in a different revision of a page
  (e.g. while resolving an edit conflict, or to allow resuming
  composition of autosaved comments) [implemented for highlighting
  user's own posted comment only]
* Permanent links to comments [future]

The reasoning for this form of ID is:
* _Timestamp_ by itself is a nearly unique identifier, so it's a good
  thing to start with
* Users may post multiple comments in one edit (or in many edits in
  one minute), so we need the _sequential number_ to distinguish them
* _Username_ is probably not required, but it may reduce the need
  for sequential numbers, and will help with human-readability if we
  add permanent links

The ID remains stable when a new comment is added anywhere by anyone
(excepts comments within the same minute by the same user), or when a
section is renamed.

It's not always stable when a comment is moved or when an entire
section is moved or deleted (archived), but you can't have everything.

Change-Id: Idaae6427d659d12b82e37f1791bd03833632c7c0
2019-12-09 13:45:31 +00:00
Bartosz Dziewoński 4021ca1642 Add unit tests for parser#getTimestampParser
Change-Id: I03cba04489194539d6ff3a32acdb9a8fe3d499e5
2019-10-30 00:13:56 +01:00
Bartosz Dziewoński fc34556b04 Fix parsing links to subpages in user signatures
Change-Id: I381087c252eeb7530e63c4d99cecc1b2ee041b0a
2019-10-30 00:13:56 +01:00
Bartosz Dziewoński c83201b10c Fix parsing non-standard case in links to user contribs
Change-Id: I2da72e2731019ad5be0ba33aa229ad914a7aaf10
2019-10-30 00:13:56 +01:00
Bartosz Dziewoński e8012b7094 Fix incorrectly detecting a section heading inside the table of contents
Change-Id: I7209b523c3322b3b379e6aa82a4b2014cc39c404
2019-10-30 00:13:56 +01:00
Bartosz Dziewoński 9efe8b1dd4 Add unit tests for parser#getTimestampRegexp
Depends-On: I6c3d186de1877f73d4a4e3fec7d6d632a5d5fa83
Change-Id: Icdb44f793a8f5e56666ec635bb8b0125041b5aab
2019-10-24 23:21:29 +02:00
Bartosz Dziewoński 97ce480767 Document methods in parser.js
Change-Id: I9272a619770f805f36686d722eebba586d2650e4
2019-10-22 14:38:53 +00:00
Bartosz Dziewoński 96af61bbc4 Minor naming and comment cleanup after re-reading the code
Change-Id: I5d0309329e56034697070ebadf551ac704323d5c
2019-10-22 14:38:46 +00:00
Bartosz Dziewoński db80e48933 Handle timestamps in daylight saving time
Add the Moment Timezone library. Add a script for managing libraries,
like in MediaWiki core.

Depends-On: I9a59a6ad01850b30327e4215f2be61b8d1c41277
Change-Id: I64bc79e7d0ccdf42b006e5a225c8aa70ea5f4e15
2019-10-22 16:33:21 +02:00
Bartosz Dziewoński 3dc5d79b20 Fix regexp for HTML heading tags
Change-Id: I91b8c2626e76d340da83b9e36a655c3f5158ac3c
2019-10-20 17:18:01 +00:00
Bartosz Dziewoński 282cf3c386 Escape regexp special characters in date formats
For example, the default date format for Japanese (ja) is
"Y年n月j日 (D) H:i", which contains parentheses.

Change-Id: I4fce11f2913959dad06b3846d03df1da1e84e435
2019-10-20 17:17:55 +00:00
Bartosz Dziewoński b105bf7ded Detect and parse timestamps, signatures, comments and threads
Bug: T232780
Bug: T234404
Change-Id: Ie9c80121089742cfc7cd7c04d694c2e0fe8d6a98
2019-10-18 13:59:07 +02:00