wikimedia/mediawiki-extensions-DiscussionTools

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/DiscussionTools synced 2024-11-12 09:58:17 +00:00

Author	SHA1	Message	Date
C. Scott Ananian	25272e7a4a	Don't refer directly to PHP `dom` extension classes; avoid nonstandard behavior These changes ensure that DiscussionTools is independent of DOM library choice, and will not break if/when Parsoid switches to an alternate (more standards-compliant) DOM library. We run `phan` against the Dodo standards-compliant DOM library, so this ends up flagging uses of non-standard PHP extensions to the DOM. These will be suppressed for now with a "Nonstandard DOM" comment that can be grepped for, since they will eventually will need to be rewritten or worked around. Most frequent issues: * Node::nodeValue and Node::textContent and Element::getAttribute() can return null in a spec-compliant implementation. Add `?? ''` to make spec-compliant results consistent w/ what PHP returns. * DOMXPath doesn't accept anything except DOMDocument. These uses should be replaced with DOMCompat::querySelectorAll() or similar (which end up using DOMXPath under the covers for DOMDocument any way, but are implemented more efficiently in a spec-compliant implementation). * A couple of times we have code like: `while ($node->firstChild!==null) { $node = $node->firstChild; }` and phan's analysis isn't strong enough to determine that $node is still non-null after the while. This same issue should appear with DOMDocument but phan doesn't complain for some reason. One apparently legit issue: * Node::insertBefore() is once called in a funny way which leans on the fact that the second option is optional in PHP. This seems to be a workaround for an ancient PHP bug, and can probably be safely removed. Bug: T287611 Bug: T217867 Change-Id: I3c4f41c3819770f85d68157c9f690d650b7266a3	2021-07-30 18:15:40 -04:00
libraryupgrader	b0884b177c	build: Updating dependencies composer: * mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0 npm: * postcss: 7.0.35 → 7.0.36 * https://npmjs.com/advisories/1693 (CVE-2021-23368) * glob-parent: 5.1.1 → 5.1.2 * https://npmjs.com/advisories/1751 (CVE-2020-28469) * trim-newlines: 3.0.0 → 3.0.1 * https://npmjs.com/advisories/1753 (CVE-2021-33623) Change-Id: I7a71e23da561599da417db3b3077b78d91173bbc	2021-07-22 16:29:04 +00:00
libraryupgrader	12fb65b9f1	build: Updating composer dependencies * mediawiki/mediawiki-codesniffer: 35.0.0 → 36.0.0 * php-parallel-lint/php-parallel-lint: 1.2.0 → 1.3.0 Change-Id: I5c152292e83e7f3441e2c08b7d0ad23ac90f194b	2021-05-05 11:14:52 +00:00
jenkins-bot	ef7073b8fd	Merge "Simplify how warnings for IDs equal to legacy IDs are avoided"	2021-04-15 23:31:45 +00:00
Bartosz Dziewoński	42ce942c86	Introduce comment "names" to identify comments across revisions/pages The existing comment IDs can't be used to find the same comment on a different revision or page (when it's transcluded), because they depend on the comment's parent and its position on the page. Comment names depend only on the author and timestamp. The trade-off is that they can't distinguish comments posted within the same minute, or in the same edit, so we will still need the IDs sometimes. Prefer using comment names when replying, if they're not ambiguous. This fixes T273413 and T275821. Heading names depend on the author and timestamp of the oldest comment. This way we don't have to detect changes to the heading text, but we can't distinguish headings without any comments. Bug: T274685 Bug: T273413 Bug: T275821 Change-Id: Id85c50ba38d1e532cec106708c077b908a3fcd49	2021-03-23 16:08:42 +00:00
Bartosz Dziewoński	b28290fa62	Simplify how warnings for IDs equal to legacy IDs are avoided I don't like the extra parameter. Follow-up to `d05109b24d`. Change-Id: Ic0f403a816fd3182982002da326bb32d591ebcf7	2021-03-22 20:15:07 +00:00
Ed Sanders	4a0802065c	Make IDs (to be used as URL hashes) wikitext safe * Use hyphens instead of pipes a separators * Use underscores for spaces in usernames Change-Id: I6efd9739fc73e45002e50e64c43ce0de1c2f1239	2021-03-18 20:45:21 +01:00
Bartosz Dziewoński	a103abb8ae	Ignore warnings about legacy IDs in tests Change-Id: I3c74b4e65aac9b84494917547cce7eb6a75995b4	2021-03-18 20:42:03 +01:00
Bartosz Dziewoński	f5059e6ea6	Don't detect comments within 'cite' elements too Follow-up to `024a978ffd`. Bug: T275881 Change-Id: I53448ad22cd0531e7fd4aa0ea5d15782879cce14	2021-03-01 21:40:43 +01:00
jenkins-bot	0eb37a87df	Merge "Don't detect comments within quotes"	2021-02-28 22:56:20 +00:00
Bartosz Dziewoński	024a978ffd	Don't detect comments within quotes Bug: T275881 Change-Id: I8f7a4279837bd95ebf5b604ff350c0a3f29c2c05	2021-02-28 22:49:48 +00:00
jenkins-bot	8bb5eea999	Merge "Improve signature detection to handle formatting on the timestamp"	2021-02-27 22:54:50 +00:00
jenkins-bot	49938a88dc	Merge "Improve merging multiple comments on one paragraph"	2021-02-27 22:54:43 +00:00
Daimona Eaytoy	67096cb431	Stop using deprecated Language methods Change-Id: I7cf21365df355a4a62f9e353be61aaa03ed58b9d	2021-02-27 14:48:49 +00:00
Bartosz Dziewoński	efe95494a8	Improve signature detection to handle formatting on the timestamp Now it detect signatures generated by en.wp's {{Undated}} template, and signatures of people who do weird stuff to the timestamps. Bug: T275938 Change-Id: I27b07f6786ca5433a3c02a5fe68e4716d41401bb	2021-02-27 02:33:30 +01:00
Bartosz Dziewoński	af082908a5	Improve merging multiple comments on one paragraph The horrendous 11-line if() condition did not correctly handle signatures wrapped in inline formatting markup, like <small>. Instead, implement this logic in the code for skipping to the end of a paragraph, which didn't exist yet when that condition was added, but seems like a much better place to check this now. Bug: T275934 Change-Id: I5cccff889b5e15b5f8fde0538bf4bccb22e762cf	2021-02-27 02:21:36 +01:00
Bartosz Dziewoński	35738b1f9b	CommentParser: Replace getThreadStartTimestamp with getThreadStartComment Change-Id: Ia8d878594306b5ce4039ca06d6dcec753e5dea28	2021-02-24 12:26:58 +00:00
Ed Sanders	fa484e0c4a	Don't allow CommentItem author to be null Change-Id: Idb12bfa62e42bff521e872ab358b5ba9a8d24089	2021-02-22 20:55:35 +00:00
Bartosz Dziewoński	1998c983f1	computeId() can't return null It used to return null for headings, but now it doesn't. Simplify some code checking for that. Change-Id: I28131c4aee89b901879b4c49953d6b15ed91b5e7	2021-02-13 00:08:15 +01:00
Ed Sanders	d05109b24d	Truncate user generated part of IDs to 80 characters This ensures that IDs fit in a 255 character database field. Bug: T273658 Change-Id: I3cfe4fce6a865b4343f0f01121cd696aa5f98b22	2021-02-03 15:04:58 +00:00
Bartosz Dziewoński	c781b127c9	Handle category links at ends of comments affecting indentation * Ignore rendering-transparent nodes between discussion comments. * Improve isRenderingTransparentNode() so that <link> nodes representing TemplateStyles are not considered transparent, otherwise this would undo `ae920b831f`. Using a regexp from Parsoid. Bug: T272746 Change-Id: I0b3c3251156ba6c4826abf5ba44ea93f80ebc01d	2021-01-26 04:55:03 +01:00
Bartosz Dziewoński	8f42c74985	Fix skipping to the end of paragraph, now it considers nested tags Add yet another tree walking utility: CommentUtils::linearWalk(). Unlike TreeWalker, it allows handling the beginnings and ends of nodes separately – kind of like parsing a XML token stream, or kind of like VisualEditor's linear model. (Add unit tests for this utility. The simple.html test case is copied from [VisualEditor/VisualEditor]/demos/ve/pages/simple.html.) Use this utility to stop skipping when we reach either a closing or opening block node tag. Previously we'd skip over such tags inside nested "transparent" nodes (like <a>, <del>, or apparently <font>). Bug: T271385 Change-Id: I201a942eb3a56335e84d94e150ec2c33f8b4f4e0	2021-01-18 18:20:20 +00:00
Bartosz Dziewoński	50ad5bb2b4	Ignore outdent templates at the beginning of comments Bug: T264116 Change-Id: Iae9dbb30a1aead897cc274f655d3ecff4b297dbd	2020-12-14 21:35:56 +01:00
Bartosz Dziewoński	ae920b831f	Change which nodes are ignored at the beginning of comments again While working on T270009, I noticed that <style> and <link> nodes are treated differently, which seemed weird. Rewrite this again, hopefully this is the last time. The changed test cases also involve <area> and <input> nodes, and the new results make more sense to me. Bug: T264116 Change-Id: I3af90c84768a4b3dc53446927f4dba6f72175a2f	2020-12-14 21:33:50 +01:00
Bartosz Dziewoński	0fc71f60cd	Skip to the end of the paragraph if it's just text, too We've recently decided that we want to "extend" comments until the end of the paragraph (`e36dc8e78a`, `d0ae6c4e44`). However, we still had this special case that did the opposite: it ensured that if a comment ended in the middle of a text node, the comment would not be extended to the end of the node. Remove it. Note the change in the test file signatures-funny-formattedreply.html, which actually covered this case specifically. Change-Id: Id1384bb0c6e1a5f0c70f55efcb4caa240f230f07	2020-11-25 00:48:53 +01:00
Ed Sanders	d0ae6c4e44	Skip end marker "forward" until a block tag is reached The end marker is skipped forward until an open or close block tag is reached. In tree traversal terms this means moving either to the next sibling, or the parent (to skip over close tags). Bug: T256033 Change-Id: Iaa2c588698790d576ac4f9ecc126f58a082ef6b3	2020-11-23 15:08:29 +00:00
Ed Sanders	44a1bbcc59	Fix start node for comments following headings The general rule is that comments start after their preceding thread item, but when that is a heading we should skip past the entire <h[1-6]> node to avoid making section edit links part of the first comment. Bug: T267988 Change-Id: Ia7f1b27e0a69a9aab7c7da743bf8549479304096	2020-11-19 23:48:30 +00:00
Thiemo Kreuz	8ffe0d55da	Remove comments that literally repeat what the code says Change-Id: Ib928cf61dc512fbbf39a3279789376d635a82c52	2020-11-11 09:31:59 +01:00
jenkins-bot	e378a9122b	Merge "Don't detect comments within headings"	2020-11-05 16:56:02 +00:00
Bartosz Dziewoński	bed717d329	Move getHeadlineNodeAndOffset() to utils Needed by I7d35098d672d0edb50d49e22de1686d5cc83b60e. Change-Id: I44bf927213de570fe9de43e485e09cfae6778eef	2020-11-05 16:11:30 +01:00
Bartosz Dziewoński	986e83ee61	Fix getHeadlineNodeAndOffset() returning text nodes The condition was wrong, it could return either an element child with .mw-headline, or a non-element child. Bug: T267284 Change-Id: I28cda22ee8c5fe4a3259621adddd647b31291703	2020-11-05 16:09:35 +01:00
Bartosz Dziewoński	1626242863	Don't detect comments within headings Bug: T267068 Change-Id: Id134f15e086fd070801c4b1d836dbfbf9bf444ad	2020-11-04 21:57:33 +01:00
libraryupgrader	fb6706a606	build: Updating mediawiki/mediawiki-codesniffer to 32.0.0 The following sniffs are failing and were disabled: * MediaWiki.Commenting.PropertyDocumentation.MissingDocumentationPrivate * MediaWiki.Commenting.PropertyDocumentation.MissingDocumentationProtected * MediaWiki.Commenting.PropertyDocumentation.MissingDocumentationPublic Additional changes: * Dropped .inc files from .phpcs.xml (T200956). Change-Id: I340d6b573e9ae2a99085fb19a705fcf567b03f92	2020-10-29 10:53:01 +00:00
Ed Sanders	3aca622894	Treat headings like comments now they have IDs Use the same logic for marking ranges in the document, and ensure that the heading range does not include section edit links or section numberings. Change-Id: I782caafc34fee2a822b0a17b24dd6b9528202eca	2020-10-28 12:38:18 +00:00
Bartosz Dziewoński	ba8434e2e0	Add legacy IDs as of wmf/1.36.0-wmf.14 Bug: T264478 Change-Id: I099e1068fedc25d671cd1245ac8b32941dca7232	2020-10-22 22:52:59 +02:00
Bartosz Dziewoński	0ddc171c8a	Add oldest timestamp in the thread to heading IDs To avoid old threads re-appearing on popular pages when someone uses a vague title (e.g. dozens of threads titled "question" on [[Wikipedia:Help desk]]: https://w.wiki/fbN), include the oldest timestamp in the thread (i.e. date the thread was started) in the heading ID. Bug: T264478 Change-Id: If918bfd5e025248923d1939bc86916697ead95a0	2020-10-22 02:19:21 +02:00
Bartosz Dziewoński	b09bbfe668	Disambiguate comments by parent ID, rather than sequential numbers Sequential numbers aren't great because they change when an earlier comment is archived. Parent comment/heading IDs should change less often. This also makes much more sense for disambiguating subsections, e.g. a dozen identical ===Votes=== sections for a dozen proposals. Bug: T264478 Change-Id: I466454984fd919ebef35f2b37ddb5d86dc842996	2020-10-22 02:19:21 +02:00
Bartosz Dziewoński	3137d76f40	Connect sub-threads to their parent threads Our threads now also contain all replies to their sub-threads. This is similar to how sections work in MediaWiki, where the parent section also contains the content of all the lower-level sections. We're going to need this for notifications about replies in a thread. Bug: T264478 Change-Id: I241fc58e2088a7555942824b0f184ed21e3a8b6f	2020-10-22 02:05:02 +02:00
Bartosz Dziewoński	9ee0fd69f5	Allow headings to have IDs Previously, only comments could have IDs, because we only needed IDs for replying. But we might also use them for notifications soon. Bug: T264478 Change-Id: I1bcad02bf17ab54bc5028a959543c10f0430836b	2020-10-22 02:04:28 +02:00
Bartosz Dziewoński	6719d17364	Handle cached "legacy" IDs (and other JSON-serialized data) The output of CommentFormatter::addReplyLinks() and consequently ThreadItem::jsonSerialize() can end up in the HTTP cache (Varnish) on Wikimedia wikis. We need to consider that when changing that code. Introduce a concept of legacy ID (generated by the older algorithm after it changes), add some placeholder code that will generate them in the future, and update some code to find comments by either normal or legacy IDs. Add dire comments in a bunch of places (as if that ever helps). Bug: T264478 Change-Id: I4368f366800ab21b8b184b09378037614fdecd33	2020-10-22 00:53:06 +02:00
Bartosz Dziewoński	3b8d63467e	CommentParser: Remove confused comments about references and objects "This modifies the original objects…" – I feel like this is obvious now, but maybe it wasn't so obvious when this code was structured differently before `a2431fe006`. Also, it refers to a variable that doesn't exist. "FIXME this will clone the reply…" – No, actually, it will not. It would if replies were associative arrays, but they are objects, and have always been, ever since the PHP parser was merged in `7b7a2cd69c`. Maybe they were arrays once in Roan's mind before he pushed that for review. Change-Id: I1348e111699fdbde99cd1f9ef45d8f465f7391b0	2020-10-21 21:01:27 +02:00
Bartosz Dziewoński	88b5be11fd	CommentParser: Avoid unnecessary reference in foreach() This is not necessary, and never has been. This variable contains an object and it's never assigned to. Instead, the reference creates hard-to-debug bugs (I've just spent an hour debugging one). When the variable name is reused later in the same function as the loop variable of another foreach() loop (such as in If918bfd5e0), the result is overwriting of the last entry in $this->threadItems with the last entry from the other array. I was questioning everything I know about variables until I noticed. Change-Id: Ibb57f915b39dd4d6d2e744903f9ecadd67b1f52d	2020-10-15 17:58:06 +02:00
Bartosz Dziewoński	ed17f640b6	Ignore other empty-ish things at the beginning of comments Follow-up to `432a959436`. Bug: T264116 Change-Id: I0685cafab70c7e9d22f504f1a1309c9a28d6f2e1	2020-09-30 23:42:47 +02:00
Bartosz Dziewoński	17b7a481a2	Fix detecting username from the wrong links sometimes When a timestamp directly followed a `<div>…</div>` tag (or perhaps some other wrapper containing lots of content), we would detect the username from the earliest links in the wrapper (furthest from the timestamp), rather than the latest links (closest to the timestamp). Bug: T262573 Change-Id: Id16449a86a731b13dc79846bb30ecf6554e26f1d	2020-09-29 22:31:24 +02:00
Bartosz Dziewoński	432a959436	Ignore empty paragraphs at the beginning of comments The wikitext parser outputs `<p><br></p>` for empty paragraphs, so we need to ignore `<br>` tags when searching for an "interesting" node that marks the beginning of a comment. Otherwise the empty paragraphs mess up the detection of indentation levels. Bug: T264116 Change-Id: I84a97ab577baa7336b78935ccdc48041ecfc231a	2020-09-29 22:22:35 +02:00
Bartosz Dziewoński	329df8c953	Parsing discussions converted to language variants * Export parser data (date format, digits, timezone names, and messages for weekday/month names) converted to language variants * Update the parsers to try matching using every variant, in case the page is displayed in non-default variant (and to avoid problems with incomplete variant conversion) Bug: T259818 Change-Id: I04d73992cd31ce06fa79f87df0c0a53d7efc3c58	2020-09-16 22:07:07 +00:00
Ed Sanders	92a8ca3469	Documentation fix Change-Id: Ic37bb713bed8af7390a3e8be7ea0203b4687ce0e	2020-09-15 01:38:54 +02:00
Bartosz Dziewoński	14fb013515	Match handling of "signature scan limit" between JS and PHP PHP was counting UTF-8 bytes, JS was counting UTF-16 bytes. Both should have been counting codepoints (although it doesn't really matter as long as they both count the same things). I noticed the issue after adding some tests using the Cyrillic script, when one case had different results in PHP and JS: Id25b537fecd789640c209ff7f30e777455a3aece. Change-Id: Ic31240678f71ba48e6ec202126bf490cea12bb66	2020-09-08 03:27:01 +02:00
Bartosz Dziewoński	2d3fe47ac1	Fix parsing localised digits in PHP discussion parser The PHP code incorrectly assumed that the digits are single-byte in UTF-8, which is never the case (except for 0-9). The JS code worked correctly because it uses UTF-16 strings, so the bug would only affect non-BMP digits there. This was noted in a TODO comment, but we overlooked it when reimplementing in PHP. Instead of a string of 10 characters, use an array of 10 single-character strings. Bug: T261706 Change-Id: Ic5421382474c88f003424799c53ff473d99cce92	2020-09-01 01:50:33 +02:00
Bartosz Dziewoński	e36dc8e78a	Skip to the end of the paragraph in the parser, not modifier When a comment ended before the end of a paragraph, the next comment would begin right there in the middle of the paragraph. This could result in the detected indentation level of that comment being incorrect, and replies being inserted in wrong places, as seen in the 'signatures-funny' test case. The code moved to the parser was previously repeated twice in addListItem() and addReplyLink(), which should have been a hint that something isn't quite right. Also, fix the code guarding against overlapping signatures, now that signatures may not be at the end of a comment. Bug: T260855 Change-Id: Ic26a87642f8a15d5de2f7073d4d8176b299c7f94	2020-08-20 19:35:55 +00:00

1 2

83 commits