wikimedia/mediawiki-extensions-DiscussionTools

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/DiscussionTools synced 2024-12-04 21:08:32 +00:00

Author	SHA1	Message	Date
Bartosz Dziewoński	9db35873a4	Parser: Fix the main loop getting stuck on some signatures In certain cases the parser could go back rather than forward after finding a signature, causing it to find the same signature forever until it ran out of memory. Test cases coming later in a separate patch. Bug: T356884 Change-Id: I8ac72b05e5e4ed45e6119c012a69708c9d8eda29	2024-02-07 22:33:57 +01:00
Ed Sanders	8069585489	CommentParser: Ignore generated timestamp links This will be present in parser cache output and can sometimes be mistaken for user page links. Bug: T356142 Change-Id: I800b23d8466f72affcadfa336aab07abf7f8d79e	2024-01-30 10:29:36 +00:00
Bartosz Dziewoński	cc9cccbd35	CommentParser: Replace new uses of Title with TitleValue Follow-up to Ic718a964e309ae3a8e15e299081f46d4db860731. Change-Id: Ida70ee080de44ec36f11c3c40816f6a198b63798	2023-12-11 22:19:13 +01:00
Bartosz Dziewoński	a27e27fc68	Move finding transclusion source from ContentThreadItem to CommentParser Reasons: * Various other methods dealing with ranges already live there * It would be neat if ContentThreadItem was just a value class without a lot of logic, similar to DatabaseThreadItem, particularly for writing unit tests * The methods access global state through Title, which can't be fixed while they're in ContentThreadItem (see I9dfccc83) The computation is now always done, instead of only when needed, but that's a small drawback, since it's fast (fast enough that I don't see the difference in the time taken when running tests), and we were already computing it for all comments in many places. Change-Id: Ic718a964e309ae3a8e15e299081f46d4db860731	2023-12-11 22:18:30 +01:00
Umherirrender	64bcb583e9	Use namespaced classes Done automatically via script Change to extension.json done manually Change-Id: Ied7bbddd357290ac6be6bf480be0ee9116e77365	2023-12-11 16:38:02 +01:00
jenkins-bot	d69dc24161	Merge "Ignore signatures with invalid timestamps"	2023-12-07 18:00:09 +00:00
Ed Sanders	4051c7faf4	Ignore signatures with invalid timestamps Bug: T352455 Change-Id: Ie499db4594bfa23b618907383d0ac583849ff582	2023-12-05 13:23:15 +00:00
thiemowmde	00ad50c673	Use upstream Title::inNamespace() instead of manual comparisons This might be a matter of personal preference. Not sure if it's worth it. Both is well readable. On the other hand, the method exists. Why not use it? Change-Id: Id66fc6c888db6ae1cf28e60a51f90d9ae2cdb6ee	2023-11-28 09:49:00 +01:00
jenkins-bot	048d5364e2	Merge "Replace preg_replace_callback with strtr in CommentParser"	2023-10-31 13:35:19 +00:00
thiemowmde	10dcd1f847	Replace preg_replace_callback with strtr in CommentParser It does the same as before. I think performance is not a concern here, and wasn't my motivation either. But I hope this makes the code easier to read and to reason with. I added a pure unit test case (without involving an actual Language object) to cover the previously uncovered digits feature. Change-Id: I6a0fc86035817eabb42b55e58183ae094c052aa6	2023-10-31 08:55:40 +01:00
thiemowmde	1491b47b12	Improve performance of CommentParser::getUsernameFromLink I was curious why running the CommentParserTest takes so long. I found this is one of the bottlenecks because it's called so often, but many link titles that are parsed as user names turn out to be something else. This little hack speeds up the test by 15% and has probably a similar impact in production scenarios. Change-Id: I5a0b3a49ba5793c8a345baaa7118fed500c082b6	2023-10-30 17:59:46 +01:00
Bartosz Dziewoński	781a33357b	Use type hints for properties, remove PHPCS overrides MediaWiki's PHPCS plugin requires documentation comments on all properties, unless those properties are typed. This has potential to introduce bugs – in particular, because typed properties without a default value will throw an exception if their value is accessed before it's defined, while previously they defaulted to null. I fixed this when I found it (making them nullable and null by default), but I may have missed some cases. Change-Id: If5b1f4d542ce3e1b69327ee4283f7c3e133a62a0	2023-10-19 19:31:02 +00:00
Ed Sanders	1bb29faa58	Use strict comparison with array_search Change-Id: Id920d49701bd9436a6247763ed40df052877ad2f	2023-07-24 18:50:00 +01:00
Theodore Dubois	4ca17b8c33	Support ISO 8601 timestamps in the parser https://wikipesija.org is currently using ISO 8601 as the default date format. The format is xnY-xnm-xnd"T"xnH:xni:xns and 'xn', 'm', and 's' need support added. Change-Id: I235098a578eb92ddd23ea47fa23d60df4b28f590	2023-06-17 11:36:43 -07:00
thiemowmde	8bbbf39bbd	Make use of named MainConfigNames::… constants Also merge setMwGlobals() calls because they are really expensive. Also utilize the more readable str_contains() and related. Change-Id: Iebde6aa17c2e366f0c0a98fe13a454f6a06c299b	2023-05-19 12:12:32 +02:00
Ed Sanders	92f5cfd821	Support suppressing comment detection in pages or sections This can be done within sections using CSS: * mw-notalk Or at a page level using a magic word: * __NOTALK__ "notalk" suppresses all comment detection, treating the content as not containing any comments even if there are signatures present. Bug: T295553 Bug: T249293 Change-Id: Ic1d7294bafcf7071e16838e70684ecadd7bc6fd3	2023-04-03 18:36:34 +02:00
Ed Sanders	2fcc505d50	Parser: Store timestamp ranges Change-Id: Ifcbe22011f11f4374f38b7aa346da5a96cac968c	2023-03-28 23:51:17 +00:00
Ed Sanders	b82af45735	CommentParser: Output display name if different to username The only normalisation we apply for comparison is lowercasing. Change-Id: Id3d57c2066429fcedc7dcc091e74ed46e17060f1	2023-02-23 23:03:32 +00:00
Bartosz Dziewoński	af68c835bb	Update exception handling for new code conventions Change code to match the documented consensus formed on T321683: https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP#Exception_handling * Do not directly throw Exception, Error or MWException * Document checked exceptions with @throws * Do not document unchecked exceptions For this extension, I think it makes sense to consider DOMException an unchecked exception too (in addition to the usual LogicException and RuntimeException). Depends-On: Id07e301c3f20afa135e5469ee234a27354485652 Depends-On: I869af06896b9757af18488b916211c5a41a8c563 Depends-On: I42d9b7465d1406a22ef1b3f6d8de426c60c90e2c Change-Id: Ic9d9efd031a87fa5a93143f714f0adb20f0dd956	2023-01-22 18:17:11 +00:00
Bartosz Dziewoński	3a9997d6ea	Improve handling for comment separators * Detect comment separators at the end of comments too * Consider TemplateStyles associated with ignored templates This unexpectedly improves a lot of cases other than T313097 too, mostly where <br> or {{outdent}} was used within a paragraph: splitting comments that were previously jumbled together, or restoring content that was previously ignored for apps / notifications. Bug: T313097 Change-Id: I9b2ef6b760f2ffd97141ad7000f70919aeab7803	2023-01-10 01:59:52 +00:00
Bartosz Dziewoński	433e57394c	Use PHP 7.4 property types Change-Id: I788db64f0c0c00894d77256b7f016d44eda4bbb1	2022-10-28 21:56:38 +02:00
Ed Sanders	e24550fae9	Refactor thread summary getters Replace getThreadSummary with individual getters that call calculateThreadSummary once. Change-Id: Ie8a8b4d7cb5121847b78dbc20bca2c8d48c7d857	2022-09-06 23:19:13 +02:00
Ed Sanders	664d5d041a	Fix fetching of oldest comment in a thread The implementation in Parser doesn't descend into sub-thread. Re-use the getThreadSummary method in ThreadItem and traverse the thread properly. Bug: T298617 Change-Id: I318d9012eb83f37ccbe463923524ef2e9f995ced	2022-09-01 21:22:09 +00:00
Bartosz Dziewoński	cfa45a5f4c	Remove all stuff about legacy IDs We can no longer change IDs so easily, because they're stored in the permalink database, so remove this mechanism to make sure it's not accidentally used in the future. Change-Id: I392ee1f49c48fc2f23d05e9a37c643438b4f2b9a	2022-08-24 01:01:09 +02:00
Bartosz Dziewoński	880f9755e0	Separate ContentThreadItem and DatabaseThreadItem etc. Rename ThreadItem to ContentThreadItem, then create a new ThreadItem interface containing only the methods that we'll be able to implement using only the persistently stored data (no parsing), then create a DatabaseThreadItem. Do the same for CommentItem and HeadingItem. ThreadItemSet gets a similar treatment, but it's basically only for Phan's type checking. (This is sad.) Change-Id: I1633049befe8ec169753b82eb876459af1f63fe8	2022-07-04 23:35:50 +02:00
Ed Sanders	0ad9b4c6b2	Move placeholder heading level (99) to a constant Change the HeadingItem constructor to take a 'null' headingLevel and store this internally with the constant. Change the JSON serializer to convert this back to null. Change-Id: I27508eed75d94b99c5189548919309f8da7deb75	2022-06-14 22:51:49 +01:00
Ed Sanders	af54bae2ec	Prefer late static binding over self:: While in many cases the class will never be sub-classed, it's easier just to always use static:: and not worry about predicting which classes might have problems in the future. Change-Id: I23072a1701b5acf62bb3379a877de97627d8fcf3	2022-06-09 15:12:48 +01:00
Bartosz Dziewoński	6a59149132	Ignore LRM and RLM in more places in the timestamp We previously ignored them before timezone indicator (`e9c401e3aa`), but they can end up in other places too, e.g. after the time. Now we ignore them after every token. This is way overkill, but it shouldn't hurt. Bug: T308448 Change-Id: I20f7aaa34dba23f2a2faf1be258c1aea32ab770f	2022-05-17 02:00:22 +02:00
Bartosz Dziewoński	c7723baf72	CommentParser: Replace uses of Title with TitleValue Another small step towards removing the reliance on global state. Change-Id: Ifb4a5bcbef6606d02f1c7aa7385d72822cb0bad0	2022-03-18 18:24:34 +00:00
jenkins-bot	32d9ef573a	Merge "CommentParser: Avoid using a dynamic undeclared property"	2022-03-10 00:22:16 +00:00
jenkins-bot	76478dda26	Merge "Move signatureScanLimit to a constant in JS"	2022-03-10 00:22:14 +00:00
Bartosz Dziewoński	4c29304484	CommentParser: Avoid using a dynamic undeclared property Change-Id: Iefa8dea83bc0d31b9c6b3509189eeaa652dd9ea0	2022-03-08 23:30:11 +00:00
Bartosz Dziewoński	08c79142fb	ImmutableRange: Add @property annotations for magic props Phan can analyze them now and reports some issues with types. * Add some assertions on types where we're sure that we're using an Element or non-null, but Phan can't prove it * Fix incorrect type hints on getFullyCoveredSiblings() and getCoveredSiblings(), luckily it was harmless Change-Id: I8cc12450378efa7434c4d66882378b715edd4a70	2022-03-08 23:29:40 +00:00
Bartosz Dziewoński	eb1fe7a8fb	CommentParser: Fix redundant uses of getHeadlineNodeAndOffset() We call CommentUtils::getHeadlineNodeAndOffset() before constructing the HeadingItem in CommentParser, so the range's startContainer is always the headline node. Change-Id: I2afb6ba9100e785cd91f31d82f4cea59fa8b5443	2022-03-08 23:29:34 +00:00
Bartosz Dziewoński	584f6a020c	Use `tagName` rather than `nodeName` when we know the node is an element `tagName` is only defined on Element, and it returns its tag name. `nodeName` is defined on Node, and it returns the tag name for Elements, and a string like '#text' or '#document-fragment' for other types. We were using both, which made it harder to reason about what types we're dealing with. Change-Id: I8e621e5872bdf78c84ec553cfbfcdbf0192f0589	2022-03-08 23:29:05 +00:00
Bartosz Dziewoński	063174e71c	Use `instanceof` for checking for text/element nodes in PHP It is friendlier for static analysis tools like Phan, which can't infer anything from the `->nodeType === …` checks, and we were already using it in most places. Fix newly revealed Phan failures (and one unneeded suppression). Change-Id: Id789f05e16a210f7ba22ca7514587c392fac0741	2022-03-08 23:28:39 +00:00
jenkins-bot	542da89530	Merge "Don't detect comments within references"	2022-02-28 16:47:21 +00:00
Bartosz Dziewoński	8a2715bdd5	Move signatureScanLimit to a constant in JS Change-Id: Ieb60c148fd060ab62e4a493e2d0dff6c051f945c	2022-02-21 22:42:14 +01:00
Bartosz Dziewoński	4244418e56	Don't detect comments within references Bug: T301213 Change-Id: Ifd5198651c8ed0ce53379fb5e35938089cd54a09	2022-02-21 19:57:44 +00:00
jenkins-bot	1a48f8cd7e	Merge "CommentParser: Inject a forgotten service"	2022-02-21 19:30:55 +00:00
Bartosz Dziewoński	85165543f4	CommentParser: Inject a forgotten service Also sort alphabetically. Change-Id: I9e77c4aa1fba930f382e3c4f17ac0504c2f06668	2022-02-21 20:15:54 +01:00
Bartosz Dziewoński	aea36bab3a	CommentParser: Fix a small use of global state Also, in ThreadItem::getSinglePageTransclusionTitle(), we don't need this terribly complicated method. Change-Id: If02c09aaa2f4dd66b2bc253a1edec4ea107564ee	2022-02-21 18:15:31 +00:00
Bartosz Dziewoński	8e44b43df0	Split off ThreadItemSet from CommentParser Goal: ----- Finishing the work from Iadb7757debe000025e52770ca51ebcf24ca8ee66 by changing CommentParser::parse() to return a data object, instead of the whole parser. Changes: -------- ThreadItemSet.php: ThreadItemSet.js: * New data class to access the results of parsing a discussion. Most methods and properties are moved from CommentParser with no changes. CommentParser.php: Parser.js: * parse() returns a new ThreadItemSet. * Remove methods moved to ThreadItemSet. * Placeholder headings are generated slightly differently, as we process things in a different order. * Grouping threads and computing IDs/names is no longer lazy. We always needed IDs/names anyway. * computeId() explicitly uses a ThreadItemSet to check the existing IDs when de-duplicating. controller.js: * Move the code for turning some nodes annotated by CommentFormatter into a ThreadItemSet (previously a Parser) from controller#init to ThreadItemSet.static.newFromAnnotatedNodes, and rewrite it to handle assigning parents/replies and recalculating legacy IDs more nicely. * mw.dt.pageThreads is now a ThreadItemSet. Change-Id: I49bfe019aa460651447fd383f73eafa9d7180a92	2022-02-21 16:22:32 +00:00
Bartosz Dziewoński	4613ae78e7	Change CommentParser into a service Goal: ----- To have a method like CommentParser::parse(), which just takes a node to parse and a title and returns plain data, so that we don't need to keep track of the config to construct a CommentParser object (the required config like content language is provided by services) and we don't need to keep that object around after parsing. Changes: -------- CommentParser.php: * …is now a service. Constructor only takes services as arguments. The node and title are passed to a new parse() method. * parse() should return plain data, but I split this part to a separate patch for ease of review: I49bfe019aa460651447fd383f73eafa9d7180a92. * CommentParser still cheats and accesses global state in a few places, e.g. calling Title::makeTitleSafe or CommentUtils::getTitleFromUrl, so we can't turn its tests into true unit tests. This work is left for future commits. LanguageData.php: * …is now a service, instead of a static class. Parser.js: * …is not a real service, but it's changed to behave in a similar way. Constructor takes only the required config as argument, and node and title are instead passed to a new parse() method. CommentParserTest.php: parser.test.js: * Can be simplified, now that we don't need a useless node and title to test internal methods that don't use them. testUtils.js: * Can be simplified, now that we don't need to override internal ResourceLoader stuff just to change the parser config. Change-Id: Iadb7757debe000025e52770ca51ebcf24ca8ee66	2022-02-19 19:51:57 +01:00
Bartosz Dziewoński	f51f3a1051	CommentParser: Remove unused method getThreadItemsByName() Follow-up to `a5099739a6`. Change-Id: I53cbf6a7a2c9b95674998734689b3930dfe74149	2022-02-19 19:51:57 +01:00
Bartosz Dziewoński	99b5de8038	Split Data class into ResourceLoaderData and LanguageData The Data class contained utilities for two unrelated purposes. Split each half to a separate class. Notably, this improves the signature of the getLocalData() function. Change-Id: Icde615fb9d483fee1f352c34909b37f8ffde8081	2022-02-19 19:37:34 +01:00
Bartosz Dziewoński	ae9f26a9e5	Various code quality tweaks (suggested by PhpStorm) composer.json: * Document required PHP extensions Parser.js: * Remove incorrect param documentation * Fix some typos in comments (missing parentheses) CommentParser.php: * Fix some typos in comments (missing parentheses) ImmutableRange.php: * Remove unused property * Add a `throw` to indicate that code path is unreachable SubscribedNewCommentPresentationModel.php: * Add missing `return false` CommentParserTest.php: * Remove unnecessary pass-by-reference CommentModifierTest.php: * Remove unused variable CommentParserTest.php: * Don't construct Element objects directly. PHP's DOMElement allows it, but Parsoid/Dodo's doesn't, and we use the latter for static analysis. This generates all kinds of confusing warnings. Change-Id: Ia9598ebea0e99830dd485296e94a9d96acc4b258	2022-02-19 19:36:52 +01:00
Bartosz Dziewoński	13ab1db6da	Don't count leading/trailing whitespace against signature scan limit It's an arbitrary limit, it seems harmless to relax it to support the use case in the task, even if it's weird. Bug: T300949 Change-Id: I7c895c7019726758bbae3183b9c3ecbd9eabcf38	2022-02-04 19:35:29 +00:00
Ed Sanders	0b42aea276	CommentParser: Cache variables in getUsernameFromLink Change-Id: I625e6ded3badd75a7a658c8d000576d0d165a18b	2022-02-04 19:35:18 +00:00
Ed Sanders	8ad1df7dc8	CommentParser: Name parts of return value from findSignature Change-Id: I3a5ad36df0afdedc0aa9a15e5d83c5426b03b790	2022-02-04 19:34:18 +00:00

1 2 3

144 commits