mediawiki-extensions-Cite

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/Cite synced 2024-12-12 15:15:12 +00:00

Author	SHA1	Message	Date
Subramanya Sastry	965203b301	Process <ref>s found in nodes with mw:ExpandedAttrs typeof * This is an instance of a bigger issue that we need to look closely when we are integrating Parsoid with core parser. Bug: T235656 Change-Id: I3d652727293461c7968e83be8994ba0572bae8e4	2019-10-29 21:52:09 +00:00
Subramanya Sastry	79183b596d	Cite wt->html: Match html->wt and make datamw->body an object not an array * Though this doesn't immediately affect anything, it just makes usage a bit more consistent. * A followup patch that fixes gaps in shiftDSR code will now be able to reference the html property as dmw->body->html to match html2wt usage. Change-Id: I9dfcd9d40205f6e64e139bf3f75a322915af3232	2019-10-15 19:43:11 -05:00
Subramanya Sastry	2be3ab72c6	References.php: Use strlen not mb_strlen to count page length * When a page is missing an explicit <references /> tag, we insert an implicit <references /> tag and assign it a zero-width DSR with a starting offset equal to the length of the page. However, now that we have byte offsets, that should have been strlen, not mb_strlen. This was causing incorrect DSR assignment on this implicit tag and causing trailing newline selser diffs on these pages. * Debugged on this reduced test case: "* a – b <ref>x</ref>\n\nc\n\n" and comparing selser trace and then DSR offsets on the DOMs. Change-Id: I8aebf307197935259df78251fb4a26c593f29603	2019-10-03 23:14:33 -05:00
Subramanya Sastry	167a28bbea	Use PHPUtils::lastItem() over end() in more places Tim Starling has indicated in couple different places that end(..) is not preferred and he had implemented a private version of lastItem in the PEG grammar code whereas PHPUtils::lastItem was recommending use of end(..). In this patch, I moved the implementation from the grammar to PHPUtils and replaced end(..) with PHPUtils::lastItem in a number of places in the codebase. We should discuss whether we want to use this helper everywhere. Resolved a couple of PORT-FIXMEs in the bargain. Change-Id: I837f2a98003df8ab7dbdf9af045e17bdd6e27799	2019-10-03 03:41:39 +00:00
Arlo Breault	6d0c6201dd	Resolve some PORT-FIXMEs around Selser construction And rename Selser -> SelserData Change-Id: Ia6a23f4194d4c05b7269498bfbbd31e236c86ce6	2019-09-19 17:53:57 -04:00
Subramanya Sastry	23b666ad14	Unconditionally add D modifier to regexps ending in $ check * We could potentially also exclude regexps for node name checks * A few additions looks like could potentially have caused subtle failures in edge cases. * Unrelated changes: Used # regexp terminator in a number of regexps to eliminate escaping of / character. Bug: T231980 Change-Id: Ie2451349684c248d93e064e3e7009d0d2d60acf3	2019-09-09 21:43:58 +00:00
Arlo Breault	e6204a1561	Test against ref name length instead of coercing to bool Since "0" is falsy in php. Couple tests now pass. Change-Id: I9b62b9f78680de6e1d5c31723af7212a58a535f3	2019-08-14 18:59:28 -04:00
Arlo Breault	dc7d19a1a8	Avoid normalizing fragment being passed to newFromText Matches what we're doing on the JS side. Change-Id: I93a0770b84e496ddf3290a36fa6b8073919ed183	2019-08-14 22:23:53 +00:00
Arlo Breault	b202964b5a	Fix dropping nested refs The extension handler expects a `null` for this, `false` is an indication not to use the toDOM handler. php bin/parserTests.php tests/citeParserTests.txt --filter "CircularRef" --wt2html Change-Id: I849a9aca1133f8a793c9d77e05f192a6af5d78f9	2019-08-09 16:48:03 -04:00
Tim Starling	b002802b4b	Workaround for PHP bug involving constant arrays cast to objects Introduce PHPUtils::arrayToObject() which duplicates the array before converting it to an object. Workaround for https://bugs.php.net/bug.php?id=78379 Bug: T228346 Change-Id: If9ef35e9e5183117025bc9cd705b695f270aa244	2019-08-07 02:33:44 +00:00
Subramanya Sastry	c7e6c2d0f8	Bug fixes in Cite.php + Parsoid Extension API Two classic PHP errors - string '0' is a falsy value in if conditions and so we need an explicit === '' check for empty strings - arrays need to be passed by reference to capture modifications in callers. Change-Id: I07d0e39c44a923ac1faeb2de01433e951c3de914	2019-07-29 13:24:17 -05:00
Subramanya Sastry	13cbce41c5	Assorted fixes for problems found from parserTest runs Change-Id: I75b158df00f54ff163455c630b51f2c00af24888	2019-07-18 09:01:46 -05:00
Subramanya Sastry	8757e89e21	Assorted fixes after running PHP parser tests in different modes * Fixes crashers, notices emitted during parser test runs. Change-Id: I0e337f22594f6cd36a7dff21afaa7a9dc9c862cf	2019-07-15 17:00:45 -05:00
Subramanya Sastry	d76ac30440	Followup to `0f0d6e0e`: Fix sig & return value to match ext-api changes * The new PS of `0f0d6e0e` missed this. Change-Id: I731bd85dc3ae4522d64c87085f49330691a10e36	2019-07-12 17:32:17 -05:00
Subramanya Sastry	0f0d6e0ed7	Followup to `7b6839ac`: Fix crashers during serialization * Don't unconditionally run fromHTML and before handlers without checking if we have a native extension handler. * Remove unintential implementation of `before` in Ref.php * Hybrid tests for Cite now passes again. * Discovered while running native parser tests and isolated those crashers to this. Change-Id: I45b48b595a5aee2b8b8d00b4ebcf73a5ea7bc8a3	2019-07-12 13:48:50 -04:00
C. Scott Ananian	7b6839ac59	Remove SerialHandler interface in favor of default methods on ExtensionTag Change-Id: I245c4b9393720982654d5f4e944329c9d764e04e	2019-07-11 18:51:55 +00:00
C. Scott Ananian	7e444d6364	All extensions implement Extension; tags all implement ExtensionTag Restructure ExtensionTag as an abstract class with default do-nothing implementations of all methods. So instead of Translate and LST not implementing ExtensionTag::toDOM, they inherit the default implementation which returns false, and that has the same effect. The intention is to move SerialHandler::fromHTML and SerialHandler::before into this framework as well. Every "optional" method should have a default implementation in the base class which returns false. Change-Id: I0ad5c714601c0cf0b5189d4d282c67c6b53fc760	2019-07-11 18:46:14 +00:00
C. Scott Ananian	b84b71af22	Gallery: shift TSRs in the DOM, rather than fibbing about srcOffset Passing srcOffsets which don't actually correspond to actual regions of the source wikitext cause problems in the token offset conversion code. Instead, parse the wikitext as itself, then adjust the TSRs in the DOM tree. Since Gallery isn't ported to PHP (yet), update the automatically-generated Gallery/index.php. The newly-added ContentUtils::shiftDSR() was ported, however. Change-Id: I28f3d3398930733ae2bcf9759e49c45f93bc7190	2019-06-28 14:10:16 +00:00
C. Scott Ananian	c790d125de	Convert `dsr` properties to DomSourceRange instances in PHP port Change-Id: I7795cedf14e6ff56a31eeaba0a32c3c5c3166f08	2019-06-27 15:35:05 -04:00
Subramanya Sastry	720f1db084	Fix exception handling: Don't catch exceptions and suppress them * Now that we are in sync land, we don't need to catch exceptions and log error messages at multiple places. Let them bubble up to the top. * I noticed this was actually getting in the way of debugging because with $env->log unimplemented, I was only getting very generic failures instead of the root cause that was being suppressed and unlogged. * There are still a couple of places where we have generic Exception catching in place where it does make sense currently. For example, we aren't interested in what caused a templatedata fetch to fail. We simply fall back to regular serialization - the rationale here is that it is better to emit a transclusion without the preferred formatting (but syntactically correct) instead of losing the edit altogether. * Minor unrelated fix in Cite/Ref.php: Use !isset() instead of empty() Change-Id: Iebff6f37dcd8278185c4a74b72a99b528efa20ff	2019-06-26 15:50:49 -05:00
Subramanya Sastry	bc72a99fb2	Minor fixes to Cite port Change-Id: Iccd6823c572059948e5ad1a7c91d567d39494934	2019-06-26 11:59:48 -05:00
Subramanya Sastry	3f0b81b085	Followup to 31d356a5, `005176a3` and assorted fixes * Source offset fixes: followup to 31d356a5 - there were instances of $tsr[0] and $tsr[1] that hadn't been converted over to $tsr->start, $tsr->end - removed dead code * Cite fixes: followup ot `005176a3` - Fixes array / object mixups - Bug fix * html2wt/WikiLinkHandler fixes - Protect access to missing properties in data-mw opt list * Other assorted fixes - Added missing typehints and improved doc types - Simplified some code patterns - Cast extension attributes to object since that ends up in data-mw which is a stdclass object. Change-Id: Idd04b0d3819be3660823047a90330fd1213388cf	2019-06-24 16:56:56 +00:00
Pavel Astakhov	005176a355	Port Cite extension * All wt2wt, html2wt, and all but one html2html tests pass in hybrid mode when entire html2wt code is run in PHP Set "Serializer: true" in the html2wt section of phpconfig.yaml * The single failing html2html test is a <gallery> test which is presumably related to the unported <gallery> extension code, but not sure. Not investigating it now. * Update Parsoid Extension API to provide access to extension source without exposing internals. Change-Id: I6d6e21ad2324acfc4306b32c9055d6c088708c48	2019-06-21 16:23:42 -05:00
C. Scott Ananian	4e334fa727	Fix an incorrectly capitalized typeOf in automatically-generated code Follow up to `04efa43c4c`. Change-Id: I12fa70f002ba65b5a5835ef65a557b6c39782f51	2019-06-20 18:54:13 -04:00
Subramanya Sastry	04efa43c4c	s/typeOf/typeof in various files - git grep is a wonderful thing to help catch identical errors. - Saw this "bug" first in the cite extension port. - Turns out this is only a bug on the PHP side since the PHP DOM treats attributes in a case-sensitive manner but Domino.js treats attributes in a case-insensitive manner. - Better to use the correct attribute name everywhere. Change-Id: I3735dc768a10a820b4816c211aa72291df9b1413	2019-06-20 11:46:07 -05:00
Subramanya Sastry	30aaf6574c	DOMFragments: Use sealFragment instead of unwrapFragment * unwrapFragment had a somewhat unusual behavior which could be a source of bugs while reasoning with it. If undefined, it's default value is true which is contrary to how we think of undefined. * Flipping the polarity of the flag to sealFragment makes the semantics easier to reason with and where !empty(..) applies more naturally to it. Change-Id: Ia50cba345f37e815e5f5f95abb452c8eefcf9011	2019-06-13 13:38:20 -05:00
C. Scott Ananian	320d045ee8	Update automatically-generated PHP files w/ latest js2php Mostly comment formatting improvements, some significant code changes to the JS side. Change-Id: I7a8f2105173df74dc09f2024d68268f5dc6fa632	2019-06-05 17:13:34 -04:00
Subramanya Sastry	f5c8aacc6e	Cite lint handler: Use nextSibling instead of nextNode * Not sure why nextNode was used in the first place, possibly some oversight? Change-Id: Ied76591947bb7505c59b9a11589a19b13dc58790	2019-05-28 17:30:10 +00:00
Arlo Breault	05cb13ddf9	Make extensions with post-processors return constructors This allows us to finish the cleanup started in 0b3bb10 and inline setupProcessors. Change-Id: Ia7840091607e9a75153031b5db7600d5a0018da6	2019-04-03 18:44:21 +00:00
Arlo Breault	20c627e3f4	Convert cite extension to es6 class structure Also, runs js2php on these files. Change-Id: Id8ee13ad536d75f63e0045a21fdfdb667a0df65d	2019-04-03 12:20:41 -04:00
C. Scott Ananian	eb70a83eb0	Audit uses of Node#getAttribute() + add missing file to PHP codebase In PHP, DOMNode#getAttribute() return '' if the attribute is not present, not null. Audit our uses and try to either explicitly use `\|\| ''` (which will ensure that PHP behaves the same was as JS) or use `hasAttribute` to explicitly test for the presence of the attribute. Changes have also been ported to PHP from JS. Also added src/Wt2Html/PP/Processors/AddMediaInfo.php which was missing. Change-Id: Ie1ae1df88e4fca70daf97b6f720f28014ebc99ed	2019-03-15 15:48:20 +00:00
C. Scott Ananian	25385a06e8	Apply recent JS changes to automatically-generated PHP port This applies the JS changes from the following recently-merged patches: 6679c3bf Protect data-object-id attribute d4e76d5b Fix new linter category to enable code work with templates e567db8d Tweak storeDataAttribs to suppress DOM nodes in data-parsoid.tmp 16603953 Fix setting dsr on body for genTest 3a84a9dd Fix stashing data attributes for mw:StartTag 22c4a19a Remove redundant dataParsoid call ed7b0ba0 Fix crasher in newly added linter category 505a357b Linter.js: Add new function to detect the use of links in links 8885b20e Move redlink updating into lib/parse.js ccfce23d templatedepth is either an int or false 6d1571bd Move language conversion work into lib/parse.js 5a89c7de Avoid serialize/parse of data attributes when treebuilding 021d9958 Rename `document.env` to `document.bag` c03ba494 Use XMLSerializer on both PHP & JS side in the DOM pass test script e0c3cca9 Use env.createDocument in lib/api/apiUtils.js `550d3d71` Use a bag-on-the-side implementation for node data f8de8b25 Add bin/inspectTokenizer.js db704eea Add ability to splice a PHP transformer into the pipeline `a8be3ad6` Fix crasher in cite extension from accessing data after it's stored 2874f200 Simplify and clean up stops usage 6368265d Add some strategic isElt guards 5ae9553f DRY out transform test runners + tweak genTest to enable that b0f2adc6 Assert that the .dataobject isn't touched after storing attrs on a node 1ce6a98d Skip separators when looking for the next th/td Change-Id: I6a66ecb061e7ee7ed53feba1895dd315d9324715	2019-03-05 17:33:32 -05:00
Arlo Breault	550d3d71eb	Use a bag-on-the-side implementation for node data Centralizing where all the docs are created is also useful for T179082 Bug: T204608 Change-Id: I5f0f1d4e696794dc8666edcbf290dea790c06673	2019-02-21 15:13:32 +00:00
Arlo Breault	a8be3ad6e4	Fix crasher in cite extension from accessing data after it's stored Follow up to b0f2adc Change-Id: I840924ba1ba1b7af963d541c2cb6619543fbe69d	2019-02-19 13:05:10 -05:00
C. Scott Ananian	f2948cd170	Skeleton PHP files generated by automatic conversion from JS Change-Id: I93dbbdb474d37f88e0bab1d810b3dd51304055fd	2019-02-13 12:34:44 -05:00
C. Scott Ananian	11147b88f4	Minor JS fixes to make conversion to PHP better Change-Id: Ia36dab7c9a0a59df80c582b18cbc5a0a9bf8a36a	2019-02-04 13:39:16 -05:00
Arlo Breault	1855c046ac	Add media info in a post-processing pass The basic idea here is to generate the media structure in the token stream using a stuffed span with a redlink, as in T169975, and augmenting the nodes on the DOM once the media info has been fetched. A redlink is justified as the canonical representation of the media elements before info is fetched because it's the fallback if fetching fails and the media type is unknown until the info is retrieved. Most options are stored in data-mw until the media type is fetched and it's determined that they're applicable. This is a bit of a reversion of how things were done before where inapplicable options were removed post-facto. For consistency and styling's sake, figcaptions are now always added to block figures. The pass has to be run before generating headings anchor, since that depends on the text content (ie. redlinks). This rearranges things in the post-processor and adds another pass. The post-processing pass to add media info is run on subpipelines as well as the top level so that the media info is present in cases where we embed HTML in data-mw (which is currently skipped by the top level only passes, except for the cite extension, which has special handling, see T214994) and to avoid an additional post-processing pass for the gallery extension, which scales media of packed galleries. This comes at the cost of making additional queries for each pipeline and requires the add media pass to be idempotent. Filed T214241 for figuring out what to do about data-mw info being clobbered by template annotations. The newly failing blacklisted tests are from roundtripping media options in galleries, which requires a general refactor for support. See the FIXMEs added there. Performance should be expected to regress by the amount of work we're able to overlap in the async phase of the pipeline while the media info is being fetched. Considering a lot of that work is caught up waiting for the batch to return (other async requests are found in the same batch), this doesn't turn out to be much in practice in the average case. Bug: T153080 Bug: T169975 Change-Id: I856ee962b70cef1f8d49652396ea5264e11a8ade	2019-02-01 13:30:07 -08:00
Subramanya Sastry	bf21cf0ce9	Init src/ with .js files copied over with .php extensions * This initialization lets us do a git log --follow and follow git history for that file across the js -> php port boundary. This works because git uses content hashes for objects and the copied code in the new .php file will have the same content hash as the .js file. * The following directories were skipped - ./lib/config/baseconfig - ./tests * The following JS files were skipped - ./lib/utils/promise.js - ./lib/config/wmf.sitematrix.json - ./tools/sync-baseconfig.js - ./tools/sync-parserTests.js - ./tools/fetch_ve_nowiki_edits.js - ./tools/fetch-parserTests.txt.js - ./tools/fetch-wmf-sitematrix.js - ./tools/compare.linter.results.js - ./tools/fetch-revision-data.js - ./tools/fetch-wt.js - ./tools/regression-testing.js - ./tools/build-langconv-fst.js - ./bin/server.js Change-Id: I0b22057c23b72795aebbd66e3abcb627c6858ef3	2019-01-09 11:59:29 -06:00
Subramanya Sastry	59b6621db3	extapi: Stop leaking manager (an impl. detail) to extensions * Extensions only need config info (env) and potentially abstract parsing context (frame). * Added some FIXMEs about potential future improvements. Change-Id: Ib4cec4a77ecb96c855798eeb06f7742c5efb0729	2019-01-08 00:21:56 +00:00
Arlo Breault	df161e78eb	Add helpers to ease binding context when load/storing data attribs Bug: T209772 Change-Id: Iebe68b179656538955c0c438806a1a724e7d185c	2018-12-18 12:39:04 -05:00
Subramanya Sastry	8a9908abbc	Remove .bind() usages in native extension code. Bug: T205336 Change-Id: I544205d7f32f755c0d0616af404a68ea40f4d2f5	2018-12-07 03:23:31 +00:00
Arlo Breault	c1b003553d	Move applying about ids from fragment wrappers to extension content It's only the extension content that needs the about ids. This matches b241ac6 where span wrapping was only applied to extension content, and goes hand-in-hand. The one necessitates the other. Fragment wrappers don't need abouts since they don't have siblings and are represented as a single node. When we get to unpacking, it's now clear that if the wrapper has an about id, it came from template encapsulation and would need to be applied to the top-level nodes in the fragment. This being true of all fragment, it lets use get rid of the `isForeignContent` option to `encapsulateExpansionHTML`, since that no longer serves to distinguish anything. Change-Id: Id9faae3d4e3c8771c2de1fb42ba62a7d92d76673	2018-12-06 23:35:07 +00:00
Subramanya Sastry	cee1086f35	Update HISTORY.md + bump version numbers to 0.10.0 for deb release Change-Id: I0be0c80a3f47ba3b1c2f94016bd9304de99388d4	2018-12-05 12:09:26 -05:00
Subramanya Sastry	7bee635f21	Split utils/DOMUtils.js into separate functional units * DOMUtils was perhaps the biggest kitchen sink utility class we had with different classes of utilities. * The split reflects a clear dependence hierarchy: DOMUtils -> DOMDataUtils -> WTUtils -> ContentUtils This also seems to reflect a bit in the use patterns. Content helpers are not used as much in html2wt. * DOMUtils now only has DOM utilities that are independent of Parsoid and could be useful in another project. * Moved a couple helpers into WTSUtils.js since they are very narrowly scoped to html2wt functionality and are unlikely to be every useful elsewhere. * Moved diffing related utils to html2wt/DiffUtils.js - There is still a dependency in development / debug mode when doms have to be dumped. I couldn't think of an easy way of removing this dependency. - But otherwise, DiffUtils is scoped to html2wt use cases only. * One more circular dependency eliminated. * All tests pass. Bug: T208360 Bug: T205333 Change-Id: I522e8b5c7d726706f994386282476102fe35e91e	2018-12-03 14:57:40 -06:00
Arlo Breault	878f6c7937	Unnecessary passing along of "responsive" attribute These are already set on the first encapsulation node by toDOM. Change-Id: I1eefdaf94e28222fc44d0be9d017eeab2f70c592	2018-12-03 03:54:26 +00:00
Arlo Breault	26984c0294	Allow extensions to set more options to encapsulateExpansionHTML The nowiki extension will want to override `isForeignContent` so that about attributes are omitted on the contents. Change-Id: Iea28d7a66a90d516226ba9dce4518be6b996d18e	2018-11-29 20:28:54 -05:00
Subramanya Sastry	b4d7a158cd	Cite: Import Sanitizer from extapi Change-Id: Ic4cb95d2fbe678641b4496d38295ec8e06269d28	2018-11-21 20:24:48 +00:00
Arlo Breault	e644753c6b	Introduce a parseTokenContentsToDOM helper It calls out to a slimmed down parseWikitextToDOM, since only a subset of that functionality is needed in some extensions, like gallery. Change-Id: I72e29c4b285fc8cae0207966cc2eddd0fe3ebcf3	2018-11-19 12:20:43 -05:00
Subramanya Sastry	678effaf0e	Extension API: Add a html2wtPreprocessor DOM processor stub * This patch doesn't do any actual work beyond adding the framework in place for HTML preprocessing before actual serialization. Change-Id: I2505f5937657813480abef8ec5da9d765a953e29	2018-11-07 17:22:31 +00:00
Subramanya Sastry	a944352878	Extracted token-related helpers from Util to TokenUtils * Things to investigate as followups: - Should this be part of the Token base class in parser.defines.js? - Can we remove nowiki & cite usage of these helpers? Why do they need to know about (parser internals) tokens? Bug: T208360 Change-Id: Ib962b4acf9534852240fcb083ce67d696a997a83	2018-10-30 18:05:24 -05:00

1 2 3 4 5

245 commits