wikimedia/mediawiki-extensions-DiscussionTools

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/DiscussionTools synced 2024-11-28 02:00:57 +00:00

Author	SHA1	Message	Date
Bartosz Dziewoński	eb1fe7a8fb	CommentParser: Fix redundant uses of getHeadlineNodeAndOffset() We call CommentUtils::getHeadlineNodeAndOffset() before constructing the HeadingItem in CommentParser, so the range's startContainer is always the headline node. Change-Id: I2afb6ba9100e785cd91f31d82f4cea59fa8b5443	2022-03-08 23:29:34 +00:00
Bartosz Dziewoński	584f6a020c	Use `tagName` rather than `nodeName` when we know the node is an element `tagName` is only defined on Element, and it returns its tag name. `nodeName` is defined on Node, and it returns the tag name for Elements, and a string like '#text' or '#document-fragment' for other types. We were using both, which made it harder to reason about what types we're dealing with. Change-Id: I8e621e5872bdf78c84ec553cfbfcdbf0192f0589	2022-03-08 23:29:05 +00:00
Bartosz Dziewoński	063174e71c	Use `instanceof` for checking for text/element nodes in PHP It is friendlier for static analysis tools like Phan, which can't infer anything from the `->nodeType === …` checks, and we were already using it in most places. Fix newly revealed Phan failures (and one unneeded suppression). Change-Id: Id789f05e16a210f7ba22ca7514587c392fac0741	2022-03-08 23:28:39 +00:00
jenkins-bot	542da89530	Merge "Don't detect comments within references"	2022-02-28 16:47:21 +00:00
Bartosz Dziewoński	4244418e56	Don't detect comments within references Bug: T301213 Change-Id: Ifd5198651c8ed0ce53379fb5e35938089cd54a09	2022-02-21 19:57:44 +00:00
jenkins-bot	1a48f8cd7e	Merge "CommentParser: Inject a forgotten service"	2022-02-21 19:30:55 +00:00
Bartosz Dziewoński	85165543f4	CommentParser: Inject a forgotten service Also sort alphabetically. Change-Id: I9e77c4aa1fba930f382e3c4f17ac0504c2f06668	2022-02-21 20:15:54 +01:00
Bartosz Dziewoński	aea36bab3a	CommentParser: Fix a small use of global state Also, in ThreadItem::getSinglePageTransclusionTitle(), we don't need this terribly complicated method. Change-Id: If02c09aaa2f4dd66b2bc253a1edec4ea107564ee	2022-02-21 18:15:31 +00:00
Bartosz Dziewoński	8e44b43df0	Split off ThreadItemSet from CommentParser Goal: ----- Finishing the work from Iadb7757debe000025e52770ca51ebcf24ca8ee66 by changing CommentParser::parse() to return a data object, instead of the whole parser. Changes: -------- ThreadItemSet.php: ThreadItemSet.js: * New data class to access the results of parsing a discussion. Most methods and properties are moved from CommentParser with no changes. CommentParser.php: Parser.js: * parse() returns a new ThreadItemSet. * Remove methods moved to ThreadItemSet. * Placeholder headings are generated slightly differently, as we process things in a different order. * Grouping threads and computing IDs/names is no longer lazy. We always needed IDs/names anyway. * computeId() explicitly uses a ThreadItemSet to check the existing IDs when de-duplicating. controller.js: * Move the code for turning some nodes annotated by CommentFormatter into a ThreadItemSet (previously a Parser) from controller#init to ThreadItemSet.static.newFromAnnotatedNodes, and rewrite it to handle assigning parents/replies and recalculating legacy IDs more nicely. * mw.dt.pageThreads is now a ThreadItemSet. Change-Id: I49bfe019aa460651447fd383f73eafa9d7180a92	2022-02-21 16:22:32 +00:00
Bartosz Dziewoński	4613ae78e7	Change CommentParser into a service Goal: ----- To have a method like CommentParser::parse(), which just takes a node to parse and a title and returns plain data, so that we don't need to keep track of the config to construct a CommentParser object (the required config like content language is provided by services) and we don't need to keep that object around after parsing. Changes: -------- CommentParser.php: * …is now a service. Constructor only takes services as arguments. The node and title are passed to a new parse() method. * parse() should return plain data, but I split this part to a separate patch for ease of review: I49bfe019aa460651447fd383f73eafa9d7180a92. * CommentParser still cheats and accesses global state in a few places, e.g. calling Title::makeTitleSafe or CommentUtils::getTitleFromUrl, so we can't turn its tests into true unit tests. This work is left for future commits. LanguageData.php: * …is now a service, instead of a static class. Parser.js: * …is not a real service, but it's changed to behave in a similar way. Constructor takes only the required config as argument, and node and title are instead passed to a new parse() method. CommentParserTest.php: parser.test.js: * Can be simplified, now that we don't need a useless node and title to test internal methods that don't use them. testUtils.js: * Can be simplified, now that we don't need to override internal ResourceLoader stuff just to change the parser config. Change-Id: Iadb7757debe000025e52770ca51ebcf24ca8ee66	2022-02-19 19:51:57 +01:00
Bartosz Dziewoński	f51f3a1051	CommentParser: Remove unused method getThreadItemsByName() Follow-up to `a5099739a6`. Change-Id: I53cbf6a7a2c9b95674998734689b3930dfe74149	2022-02-19 19:51:57 +01:00
Bartosz Dziewoński	99b5de8038	Split Data class into ResourceLoaderData and LanguageData The Data class contained utilities for two unrelated purposes. Split each half to a separate class. Notably, this improves the signature of the getLocalData() function. Change-Id: Icde615fb9d483fee1f352c34909b37f8ffde8081	2022-02-19 19:37:34 +01:00
Bartosz Dziewoński	ae9f26a9e5	Various code quality tweaks (suggested by PhpStorm) composer.json: * Document required PHP extensions Parser.js: * Remove incorrect param documentation * Fix some typos in comments (missing parentheses) CommentParser.php: * Fix some typos in comments (missing parentheses) ImmutableRange.php: * Remove unused property * Add a `throw` to indicate that code path is unreachable SubscribedNewCommentPresentationModel.php: * Add missing `return false` CommentParserTest.php: * Remove unnecessary pass-by-reference CommentModifierTest.php: * Remove unused variable CommentParserTest.php: * Don't construct Element objects directly. PHP's DOMElement allows it, but Parsoid/Dodo's doesn't, and we use the latter for static analysis. This generates all kinds of confusing warnings. Change-Id: Ia9598ebea0e99830dd485296e94a9d96acc4b258	2022-02-19 19:36:52 +01:00
Bartosz Dziewoński	13ab1db6da	Don't count leading/trailing whitespace against signature scan limit It's an arbitrary limit, it seems harmless to relax it to support the use case in the task, even if it's weird. Bug: T300949 Change-Id: I7c895c7019726758bbae3183b9c3ecbd9eabcf38	2022-02-04 19:35:29 +00:00
Ed Sanders	0b42aea276	CommentParser: Cache variables in getUsernameFromLink Change-Id: I625e6ded3badd75a7a658c8d000576d0d165a18b	2022-02-04 19:35:18 +00:00
Ed Sanders	8ad1df7dc8	CommentParser: Name parts of return value from findSignature Change-Id: I3a5ad36df0afdedc0aa9a15e5d83c5426b03b790	2022-02-04 19:34:18 +00:00
Bartosz Dziewoński	f15693eefa	Use class list everywhere for adding/checking CSS classes In PHP, use DOMCompat::getClassList(), provided by Parsoid. In JS, use `.classList`, available in all supported browsers. This may fix some bugs where we were incorrectly checking for exactly one class. The change in isOurGeneratedNode() is needed for Ib2fa40c5fa389572b0e88ef558728fa06e3621b0. Change-Id: Ia28d31678fd3d617b69280c4b7857755300fa515	2022-01-24 18:40:00 +01:00
Ed Sanders	f80ff74fc6	Handle selflinks by returning the current page's title Bug: T287818 Change-Id: I67f10ac9976581279d1e6a477e90d55875ebab20	2022-01-12 21:18:04 +00:00
Ed Sanders	34011b7a07	Parser: Pass in title of page being parsed Will be used to parse selflinks in the future. Change-Id: I2bc29d1c5c69cb6309f582f162f9af7d96ce8913	2022-01-12 21:17:59 +00:00
Bartosz Dziewoński	ef7274d69e	Move some helpers from CommentParser to CommentUtils Change-Id: I0e323d3b75f47459a5548a13e9684f4c6ff4ba0c	2021-12-13 17:13:41 +01:00
Ed Sanders	8e4f08182e	Add missing typehints Change-Id: Ia25c5bea1834a3fdd26f32a9d5ed097789329824	2021-12-01 14:57:09 +00:00
Ed Sanders	a86d308d66	CommentItem.php: Store timestamp object instead of string We do something similar in CommentItem.js with a moment object. The object can be converted to a string when required. Change-Id: Id7221e9201db0d89c3b771574634c878c9515ca0	2021-11-09 16:37:45 +00:00
Bartosz Dziewoński	c1f4668806	Change CommentParser and ImmutableRange to use offsets in codepoints instead of bytes The PHP DOM extension measures lengths and offsets in Unicode codepoints. Our PHP code used UTF-8 bytes, causing some offsets to be slightly off. Now it mostly uses Unicode codepoints as well (we're forced to use bytes in a few places, because preg_match returns offsets in bytes). In practice, this had no visible effect to the user. It caused the markers `<span data-mw-comment-end="..."></span>` to be placed at the end of their container instead of the correct position when the timestamp contained multibyte characters (e.g. "ź" in Polish); but the correct position is usually at the end of the container anyway. In the test cases, the only difference is placing these markers before a trailing line break inside `<p>...</p>` tags rather than before it. The patch also accidentally fixes another bug, where element nodes with no children (mostly <img>) were incorrectly excluded when calling cloneContents(), because they were treated as if they were text nodes. Change-Id: Iccdccf1078598f4b62cab96225e9c85a4c0e93ee	2021-09-27 19:04:16 +00:00
jenkins-bot	15747aba3a	Merge "CommentParser: Remove outdated legacy ID algorithm"	2021-09-21 16:50:37 +00:00
Alexander Vorwerk	97a702cbc7	CommentParser: use IPUtils instead of the deprecated IP class Bug: T291008 Change-Id: I0207940a642a32f2ca997b78387c9ff0af101599	2021-09-14 22:19:05 +02:00
Bartosz Dziewoński	9819df3288	CommentParser: Remove outdated legacy ID algorithm Last changed in March (`4a0802065c`), was only needed for about 2 weeks for compatibility with cached data. Change-Id: I510238cb86a7b4d7ae5e8636716d1e9ca2d0e402	2021-09-07 17:41:30 +02:00
Bartosz Dziewoński	a5099739a6	Improve notifications for comments posted in close succession In case 4 and case 6, no notifications are expected. In all other cases we now get the expected notifications. Bug: T285528 Change-Id: I9e813bb3a053bc1232783f9eae1ad75672b4fa7e	2021-08-01 12:27:33 +02:00
C. Scott Ananian	25272e7a4a	Don't refer directly to PHP `dom` extension classes; avoid nonstandard behavior These changes ensure that DiscussionTools is independent of DOM library choice, and will not break if/when Parsoid switches to an alternate (more standards-compliant) DOM library. We run `phan` against the Dodo standards-compliant DOM library, so this ends up flagging uses of non-standard PHP extensions to the DOM. These will be suppressed for now with a "Nonstandard DOM" comment that can be grepped for, since they will eventually will need to be rewritten or worked around. Most frequent issues: * Node::nodeValue and Node::textContent and Element::getAttribute() can return null in a spec-compliant implementation. Add `?? ''` to make spec-compliant results consistent w/ what PHP returns. * DOMXPath doesn't accept anything except DOMDocument. These uses should be replaced with DOMCompat::querySelectorAll() or similar (which end up using DOMXPath under the covers for DOMDocument any way, but are implemented more efficiently in a spec-compliant implementation). * A couple of times we have code like: `while ($node->firstChild!==null) { $node = $node->firstChild; }` and phan's analysis isn't strong enough to determine that $node is still non-null after the while. This same issue should appear with DOMDocument but phan doesn't complain for some reason. One apparently legit issue: * Node::insertBefore() is once called in a funny way which leans on the fact that the second option is optional in PHP. This seems to be a workaround for an ancient PHP bug, and can probably be safely removed. Bug: T287611 Bug: T217867 Change-Id: I3c4f41c3819770f85d68157c9f690d650b7266a3	2021-07-30 18:15:40 -04:00
libraryupgrader	b0884b177c	build: Updating dependencies composer: * mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0 npm: * postcss: 7.0.35 → 7.0.36 * https://npmjs.com/advisories/1693 (CVE-2021-23368) * glob-parent: 5.1.1 → 5.1.2 * https://npmjs.com/advisories/1751 (CVE-2020-28469) * trim-newlines: 3.0.0 → 3.0.1 * https://npmjs.com/advisories/1753 (CVE-2021-33623) Change-Id: I7a71e23da561599da417db3b3077b78d91173bbc	2021-07-22 16:29:04 +00:00
libraryupgrader	12fb65b9f1	build: Updating composer dependencies * mediawiki/mediawiki-codesniffer: 35.0.0 → 36.0.0 * php-parallel-lint/php-parallel-lint: 1.2.0 → 1.3.0 Change-Id: I5c152292e83e7f3441e2c08b7d0ad23ac90f194b	2021-05-05 11:14:52 +00:00
jenkins-bot	ef7073b8fd	Merge "Simplify how warnings for IDs equal to legacy IDs are avoided"	2021-04-15 23:31:45 +00:00
Bartosz Dziewoński	42ce942c86	Introduce comment "names" to identify comments across revisions/pages The existing comment IDs can't be used to find the same comment on a different revision or page (when it's transcluded), because they depend on the comment's parent and its position on the page. Comment names depend only on the author and timestamp. The trade-off is that they can't distinguish comments posted within the same minute, or in the same edit, so we will still need the IDs sometimes. Prefer using comment names when replying, if they're not ambiguous. This fixes T273413 and T275821. Heading names depend on the author and timestamp of the oldest comment. This way we don't have to detect changes to the heading text, but we can't distinguish headings without any comments. Bug: T274685 Bug: T273413 Bug: T275821 Change-Id: Id85c50ba38d1e532cec106708c077b908a3fcd49	2021-03-23 16:08:42 +00:00
Bartosz Dziewoński	b28290fa62	Simplify how warnings for IDs equal to legacy IDs are avoided I don't like the extra parameter. Follow-up to `d05109b24d`. Change-Id: Ic0f403a816fd3182982002da326bb32d591ebcf7	2021-03-22 20:15:07 +00:00
Ed Sanders	4a0802065c	Make IDs (to be used as URL hashes) wikitext safe * Use hyphens instead of pipes a separators * Use underscores for spaces in usernames Change-Id: I6efd9739fc73e45002e50e64c43ce0de1c2f1239	2021-03-18 20:45:21 +01:00
Bartosz Dziewoński	a103abb8ae	Ignore warnings about legacy IDs in tests Change-Id: I3c74b4e65aac9b84494917547cce7eb6a75995b4	2021-03-18 20:42:03 +01:00
Bartosz Dziewoński	f5059e6ea6	Don't detect comments within 'cite' elements too Follow-up to `024a978ffd`. Bug: T275881 Change-Id: I53448ad22cd0531e7fd4aa0ea5d15782879cce14	2021-03-01 21:40:43 +01:00
jenkins-bot	0eb37a87df	Merge "Don't detect comments within quotes"	2021-02-28 22:56:20 +00:00
Bartosz Dziewoński	024a978ffd	Don't detect comments within quotes Bug: T275881 Change-Id: I8f7a4279837bd95ebf5b604ff350c0a3f29c2c05	2021-02-28 22:49:48 +00:00
jenkins-bot	8bb5eea999	Merge "Improve signature detection to handle formatting on the timestamp"	2021-02-27 22:54:50 +00:00
jenkins-bot	49938a88dc	Merge "Improve merging multiple comments on one paragraph"	2021-02-27 22:54:43 +00:00
Daimona Eaytoy	67096cb431	Stop using deprecated Language methods Change-Id: I7cf21365df355a4a62f9e353be61aaa03ed58b9d	2021-02-27 14:48:49 +00:00
Bartosz Dziewoński	efe95494a8	Improve signature detection to handle formatting on the timestamp Now it detect signatures generated by en.wp's {{Undated}} template, and signatures of people who do weird stuff to the timestamps. Bug: T275938 Change-Id: I27b07f6786ca5433a3c02a5fe68e4716d41401bb	2021-02-27 02:33:30 +01:00
Bartosz Dziewoński	af082908a5	Improve merging multiple comments on one paragraph The horrendous 11-line if() condition did not correctly handle signatures wrapped in inline formatting markup, like <small>. Instead, implement this logic in the code for skipping to the end of a paragraph, which didn't exist yet when that condition was added, but seems like a much better place to check this now. Bug: T275934 Change-Id: I5cccff889b5e15b5f8fde0538bf4bccb22e762cf	2021-02-27 02:21:36 +01:00
Bartosz Dziewoński	35738b1f9b	CommentParser: Replace getThreadStartTimestamp with getThreadStartComment Change-Id: Ia8d878594306b5ce4039ca06d6dcec753e5dea28	2021-02-24 12:26:58 +00:00
Ed Sanders	fa484e0c4a	Don't allow CommentItem author to be null Change-Id: Idb12bfa62e42bff521e872ab358b5ba9a8d24089	2021-02-22 20:55:35 +00:00
Bartosz Dziewoński	1998c983f1	computeId() can't return null It used to return null for headings, but now it doesn't. Simplify some code checking for that. Change-Id: I28131c4aee89b901879b4c49953d6b15ed91b5e7	2021-02-13 00:08:15 +01:00
Ed Sanders	d05109b24d	Truncate user generated part of IDs to 80 characters This ensures that IDs fit in a 255 character database field. Bug: T273658 Change-Id: I3cfe4fce6a865b4343f0f01121cd696aa5f98b22	2021-02-03 15:04:58 +00:00
Bartosz Dziewoński	c781b127c9	Handle category links at ends of comments affecting indentation * Ignore rendering-transparent nodes between discussion comments. * Improve isRenderingTransparentNode() so that <link> nodes representing TemplateStyles are not considered transparent, otherwise this would undo `ae920b831f`. Using a regexp from Parsoid. Bug: T272746 Change-Id: I0b3c3251156ba6c4826abf5ba44ea93f80ebc01d	2021-01-26 04:55:03 +01:00
Bartosz Dziewoński	8f42c74985	Fix skipping to the end of paragraph, now it considers nested tags Add yet another tree walking utility: CommentUtils::linearWalk(). Unlike TreeWalker, it allows handling the beginnings and ends of nodes separately – kind of like parsing a XML token stream, or kind of like VisualEditor's linear model. (Add unit tests for this utility. The simple.html test case is copied from [VisualEditor/VisualEditor]/demos/ve/pages/simple.html.) Use this utility to stop skipping when we reach either a closing or opening block node tag. Previously we'd skip over such tags inside nested "transparent" nodes (like <a>, <del>, or apparently <font>). Bug: T271385 Change-Id: I201a942eb3a56335e84d94e150ec2c33f8b4f4e0	2021-01-18 18:20:20 +00:00
Bartosz Dziewoński	50ad5bb2b4	Ignore outdent templates at the beginning of comments Bug: T264116 Change-Id: Iae9dbb30a1aead897cc274f655d3ecff4b297dbd	2020-12-14 21:35:56 +01:00

1 2 3

110 commits