wikimedia/mediawiki-extensions-DiscussionTools

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/DiscussionTools synced 2024-12-12 08:35:37 +00:00

Author	SHA1	Message	Date
Bartosz Dziewoński	ba8434e2e0	Add legacy IDs as of wmf/1.36.0-wmf.14 Bug: T264478 Change-Id: I099e1068fedc25d671cd1245ac8b32941dca7232	2020-10-22 22:52:59 +02:00
Bartosz Dziewoński	0ddc171c8a	Add oldest timestamp in the thread to heading IDs To avoid old threads re-appearing on popular pages when someone uses a vague title (e.g. dozens of threads titled "question" on [[Wikipedia:Help desk]]: https://w.wiki/fbN), include the oldest timestamp in the thread (i.e. date the thread was started) in the heading ID. Bug: T264478 Change-Id: If918bfd5e025248923d1939bc86916697ead95a0	2020-10-22 02:19:21 +02:00
Bartosz Dziewoński	b09bbfe668	Disambiguate comments by parent ID, rather than sequential numbers Sequential numbers aren't great because they change when an earlier comment is archived. Parent comment/heading IDs should change less often. This also makes much more sense for disambiguating subsections, e.g. a dozen identical ===Votes=== sections for a dozen proposals. Bug: T264478 Change-Id: I466454984fd919ebef35f2b37ddb5d86dc842996	2020-10-22 02:19:21 +02:00
Bartosz Dziewoński	3137d76f40	Connect sub-threads to their parent threads Our threads now also contain all replies to their sub-threads. This is similar to how sections work in MediaWiki, where the parent section also contains the content of all the lower-level sections. We're going to need this for notifications about replies in a thread. Bug: T264478 Change-Id: I241fc58e2088a7555942824b0f184ed21e3a8b6f	2020-10-22 02:05:02 +02:00
Bartosz Dziewoński	9ee0fd69f5	Allow headings to have IDs Previously, only comments could have IDs, because we only needed IDs for replying. But we might also use them for notifications soon. Bug: T264478 Change-Id: I1bcad02bf17ab54bc5028a959543c10f0430836b	2020-10-22 02:04:28 +02:00
Bartosz Dziewoński	6719d17364	Handle cached "legacy" IDs (and other JSON-serialized data) The output of CommentFormatter::addReplyLinks() and consequently ThreadItem::jsonSerialize() can end up in the HTTP cache (Varnish) on Wikimedia wikis. We need to consider that when changing that code. Introduce a concept of legacy ID (generated by the older algorithm after it changes), add some placeholder code that will generate them in the future, and update some code to find comments by either normal or legacy IDs. Add dire comments in a bunch of places (as if that ever helps). Bug: T264478 Change-Id: I4368f366800ab21b8b184b09378037614fdecd33	2020-10-22 00:53:06 +02:00
Bartosz Dziewoński	3b8d63467e	CommentParser: Remove confused comments about references and objects "This modifies the original objects…" – I feel like this is obvious now, but maybe it wasn't so obvious when this code was structured differently before `a2431fe006`. Also, it refers to a variable that doesn't exist. "FIXME this will clone the reply…" – No, actually, it will not. It would if replies were associative arrays, but they are objects, and have always been, ever since the PHP parser was merged in `7b7a2cd69c`. Maybe they were arrays once in Roan's mind before he pushed that for review. Change-Id: I1348e111699fdbde99cd1f9ef45d8f465f7391b0	2020-10-21 21:01:27 +02:00
Bartosz Dziewoński	88b5be11fd	CommentParser: Avoid unnecessary reference in foreach() This is not necessary, and never has been. This variable contains an object and it's never assigned to. Instead, the reference creates hard-to-debug bugs (I've just spent an hour debugging one). When the variable name is reused later in the same function as the loop variable of another foreach() loop (such as in If918bfd5e0), the result is overwriting of the last entry in $this->threadItems with the last entry from the other array. I was questioning everything I know about variables until I noticed. Change-Id: Ibb57f915b39dd4d6d2e744903f9ecadd67b1f52d	2020-10-15 17:58:06 +02:00
Bartosz Dziewoński	ed17f640b6	Ignore other empty-ish things at the beginning of comments Follow-up to `432a959436`. Bug: T264116 Change-Id: I0685cafab70c7e9d22f504f1a1309c9a28d6f2e1	2020-09-30 23:42:47 +02:00
Bartosz Dziewoński	17b7a481a2	Fix detecting username from the wrong links sometimes When a timestamp directly followed a `<div>…</div>` tag (or perhaps some other wrapper containing lots of content), we would detect the username from the earliest links in the wrapper (furthest from the timestamp), rather than the latest links (closest to the timestamp). Bug: T262573 Change-Id: Id16449a86a731b13dc79846bb30ecf6554e26f1d	2020-09-29 22:31:24 +02:00
Bartosz Dziewoński	432a959436	Ignore empty paragraphs at the beginning of comments The wikitext parser outputs `<p><br></p>` for empty paragraphs, so we need to ignore `<br>` tags when searching for an "interesting" node that marks the beginning of a comment. Otherwise the empty paragraphs mess up the detection of indentation levels. Bug: T264116 Change-Id: I84a97ab577baa7336b78935ccdc48041ecfc231a	2020-09-29 22:22:35 +02:00
Bartosz Dziewoński	329df8c953	Parsing discussions converted to language variants * Export parser data (date format, digits, timezone names, and messages for weekday/month names) converted to language variants * Update the parsers to try matching using every variant, in case the page is displayed in non-default variant (and to avoid problems with incomplete variant conversion) Bug: T259818 Change-Id: I04d73992cd31ce06fa79f87df0c0a53d7efc3c58	2020-09-16 22:07:07 +00:00
Ed Sanders	92a8ca3469	Documentation fix Change-Id: Ic37bb713bed8af7390a3e8be7ea0203b4687ce0e	2020-09-15 01:38:54 +02:00
Bartosz Dziewoński	14fb013515	Match handling of "signature scan limit" between JS and PHP PHP was counting UTF-8 bytes, JS was counting UTF-16 bytes. Both should have been counting codepoints (although it doesn't really matter as long as they both count the same things). I noticed the issue after adding some tests using the Cyrillic script, when one case had different results in PHP and JS: Id25b537fecd789640c209ff7f30e777455a3aece. Change-Id: Ic31240678f71ba48e6ec202126bf490cea12bb66	2020-09-08 03:27:01 +02:00
Bartosz Dziewoński	2d3fe47ac1	Fix parsing localised digits in PHP discussion parser The PHP code incorrectly assumed that the digits are single-byte in UTF-8, which is never the case (except for 0-9). The JS code worked correctly because it uses UTF-16 strings, so the bug would only affect non-BMP digits there. This was noted in a TODO comment, but we overlooked it when reimplementing in PHP. Instead of a string of 10 characters, use an array of 10 single-character strings. Bug: T261706 Change-Id: Ic5421382474c88f003424799c53ff473d99cce92	2020-09-01 01:50:33 +02:00
Bartosz Dziewoński	e36dc8e78a	Skip to the end of the paragraph in the parser, not modifier When a comment ended before the end of a paragraph, the next comment would begin right there in the middle of the paragraph. This could result in the detected indentation level of that comment being incorrect, and replies being inserted in wrong places, as seen in the 'signatures-funny' test case. The code moved to the parser was previously repeated twice in addListItem() and addReplyLink(), which should have been a hint that something isn't quite right. Also, fix the code guarding against overlapping signatures, now that signatures may not be at the end of a comment. Bug: T260855 Change-Id: Ic26a87642f8a15d5de2f7073d4d8176b299c7f94	2020-08-20 19:35:55 +00:00
Bartosz Dziewoński	84cb9d1dca	parser: Code quality tweaks Do things in a more intuitive order, avoid some repetition, rename a vaguely named variable. Change-Id: Ic1a0bb54134682eaf126231e04eb67847d6a5da6	2020-08-20 20:52:42 +02:00
Bartosz Dziewoński	375bfe028e	parser: Fix comment ranges when timestamp has entities Previously, parser would output offsets that don't exist in their containers, because we were pretending that entities are parts of their neighboring text nodes. Turns out it's much easier to do it right when going backwards. Change-Id: I9bccca2d403f1a976ae517449989170cdd99721e	2020-08-11 20:41:06 +02:00
Bartosz Dziewoński	7cd370615f	Reindent CommentParser::findTimestamp() Something terrible has happened to this function… It seems that I have brutalized it when rebasing `092cfd6075`. Change-Id: I12d75c69d15645112563a7bc345209b23b54cb3e	2020-08-11 06:45:45 +02:00
Bartosz Dziewoński	31b26a5bec	Fix indentation level when replying to comments with mixed indentation When adding a reply, we take a node at the end of the previous comment, compare that comment's indentation level to the expected indentation level of the reply, and add (or remove) that number of wrapper lists. The existing code did not consider that comments may have lists within them, and so the indentation of that node may not match the indentation of the comment. Bug: T252702 Change-Id: Icc5ff19783d2b213bff99f283cb0599a8b5c1ab4	2020-08-06 01:25:33 +02:00
Ed Sanders	a2431fe006	Refactor CommentParser * Pass rootNode to the constructor * Rename getters to match CommentItem/HeadingItem/ThreadItem value classes. * Always build the thread tree so CommentItem's always have and ID and replies/parent. Change-Id: I508be9534de59016ff806e3d84edcbb1c76cb0c6	2020-07-20 23:38:10 +01:00
Ed Sanders	a4636d39fc	Move #getTranscludedFrom from parser to ThreadItem Also requires moving getTitleFromUrl to CommentUtils Change-Id: I9cb83a3fdd456eba66899433b866ce7a7f00eeb5	2020-07-20 15:56:48 +01:00
Ed Sanders	7ae5bbf384	Move #getAuthors from parser to ThreadItem Change-Id: I16e513000e5366b3044b17a99da07d8d0f47a61f	2020-07-20 15:13:59 +01:00
Ed Sanders	b32f991913	Documentation fixes Change-Id: I2c7ccecbf8a50bd4d658b0f17f4a21fe90a3c399	2020-07-20 13:34:08 +01:00
Ed Sanders	092cfd6075	Parser: Replace findTimestamps with findTimestamp Instead of doing a separate tree walk and finding all timestamps separately, make it part of the getComments tree walk, and find timestamps one at a time. Change-Id: I47f466eaf228504faa189fd99e07493bc7f022cd	2020-07-15 21:34:22 +02:00
Bartosz Dziewoński	308c2747b0	CommentParser.php: Use tree walking instead of XPath This is similar to what the JS version does. The TreeWalker and NodeFilter classes are adapted from https://github.com/Krinkle/dom-TreeWalker-polyfill (MIT license). This makes #getComments twice as fast on en-big-oldparser.html Change-Id: I2441f33e6e7bad753ac830d277e6a2e81ee8c93d	2020-07-15 16:40:50 +00:00
Ed Sanders	ed70d49285	CommentParser.php: Fix URL parsing Change-Id: I406fd98b308dd4d975ea974f2369737a7052b556	2020-07-01 17:06:02 +01:00
Ed Sanders	d5376e28fc	Improve ThreadItem documentation Change-Id: Ia266fc22b02af0edbb32f356b4e0d92fe3a4da5f	2020-06-26 14:56:19 +02:00
James D. Forrester	d6c3df31f5	Remove various phan suppressions and fix issues Change-Id: I73b535f284566a0a8876a3198b9784b47567fac6	2020-06-12 20:35:59 +01:00
Ed Sanders	7be0cc3209	Create ThreadItem classes Change-Id: Id2c5324d74eccb1209ccb76768c557722c6d9400	2020-06-12 20:35:59 +01:00
Umherirrender	48e860916a	build: Add mediawiki/mediawiki-phan-config Replace phan-taint-check-plugin by phan, it is now included Change-Id: I0e682a83afd30faa8967e3c586431be4ae9a29b3	2020-06-10 22:21:07 +02:00
Ed Sanders	da433037a3	Move getTranscludedFromElement to Utils Change-Id: I8bdd757f949c013ba426150a192d71243fadf45d	2020-06-01 22:32:23 +01:00
Bartosz Dziewoński	79ae8a32c5	Support parsing when timestamp is wrapped in a link Bug: T252059 Change-Id: Ib8952fb80503bad407e8d0fe725103a0fae12a6a	2020-05-27 22:47:17 +02:00
Bartosz Dziewoński	01b4a8f4f4	Support replying when timestamp is template-generated * Move modifier#getFullyCoveredWrapper to utils * Use that method to find the node where we start searching for template wrappers, rather than using endContainer Bug: T252058 Change-Id: I55de58102f3468fce01290bd413a7fdc96d322d6	2020-05-27 21:16:03 +02:00
Ed Sanders	b3ca37c1c5	Create ImmutableRange class in PHP TODO: Create one in JS as well Change-Id: I6c9dc2455afcb8d0b68674a2985c5e43dd94b6fb	2020-05-22 15:01:09 +01:00
Bartosz Dziewoński	515af82061	Reduce duplication between PHP parser and data gen for JS parser Also, make the handling of TranslateNumerals and digitsRegexp the same between PHP and JS. Change-Id: I1d81343d0b59ab3ecd59ba1c2ad99a729d983ac4	2020-05-19 20:54:44 +02:00
jenkins-bot	366aca2ccd	Merge "Stop printing console warnings"	2020-05-18 22:52:30 +00:00
Bartosz Dziewoński	219339551c	Stop printing console warnings It was useful when I was debugging those parts of the code, but now it's usually annoying. The warnings can still sometimes be useful for understanding how the tool parses some discussion, though. To keep that functionality, add displaying warnings for each comment in the debug mode. Change-Id: I2d218a8a394f179bcc0990ff988a0567c275ccf2	2020-05-18 23:37:37 +02:00
Ed Sanders	607440498e	Spell check pass Change-Id: Ia20da358297126bd52a968bd77c960f81fe82b8f	2020-05-18 21:24:14 +00:00
Bartosz Dziewoński	c848d8a90e	Parser tweaks Follow-up to Ic1438d516e223db462cb227f6668e856672f538c. Minor corrections and comment improvements in PHP parser, and "backporting" some changes to JS parser that I like. Change-Id: I5e54121914ec6b323e556dd133bcb71b3aefbb61	2020-05-18 19:53:26 +00:00
Ed Sanders	b1427163af	Parser.php: Add tests for getTranscludedFrom Requires an implementation of unwrapParsoidSections Change-Id: I96c929b1117ba652dbd5af6a1ee37a5f9e87ed1e	2020-05-18 19:53:01 +00:00
Ed Sanders	bc437fc43f	PHP: More missing typehints Change-Id: I483c9e70b65dcd685436b4099bcfc4925c65b002	2020-05-16 16:46:25 +01:00
jenkins-bot	9cc665a8bc	Merge "Add missing use MWException"	2020-05-15 21:46:21 +00:00
jenkins-bot	a6fcb965ea	Merge "Fix return type of callable"	2020-05-15 21:46:20 +00:00
Reedy	ac6cd26ca0	Don't call non static functions statically Change-Id: I2db66a8da3ab325f2bbabb37afd276d4a62077e9	2020-05-15 22:02:57 +01:00
Reedy	70c5a1e435	Fix return type of callable Change-Id: I7e594a9f9f6f9d4737fd880e449c43b9b2cf24fb	2020-05-15 22:01:21 +01:00
Reedy	c3a7ba1d13	Add missing use MWException Change-Id: I4d00106718c0f7e32060d23aaa2bc8c74a4d6d1f	2020-05-15 22:00:30 +01:00
Ed Sanders	e6e0b1ead9	PHP: Add missing typehints Change-Id: I5639f8cbdae9aaa9cfa06136e19cc94f9fad10ea	2020-05-15 22:04:47 +02:00
Ed Sanders	b78fb3f4c1	Move all PHP to the MediaWiki\Extension\DiscussionTools namespace Change-Id: I654ebb3e646a6d8d62f7bd14d48805e39f836d7e	2020-05-15 21:57:13 +02:00

1 2 3

149 commits