wikimedia/mediawiki-extensions-VisualEditor

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/VisualEditor synced 2024-11-15 10:35:48 +00:00

Author	SHA1	Message	Date
Gabriel Wicke	e53bc93a8e	Check out old results before running tests Change-Id: Ia56aec22194a14c94620237041c30c269ac2e56a	2012-05-30 17:37:10 +02:00
Gabriel Wicke	36084c5d93	Preserve original newlines in HTML and serialization 254 round-trip tests (up from 184) are now passing. Also: * tweaked runtests.sh slightly (use less -R instead of -r). * made sure the EOFTk is preserved in phase 3 transforms Change-Id: I1de22186bdb78e52019370e43f096877005b8f5a	2012-05-29 23:29:03 +02:00
Subramanya Sastry	8174c9dafc	First attempt implementing rewriting rules on the DOM - This is implemented as a post-processing pass. - Might require additional checks to verify rewriteability. - Implemented as a pair-wise tag DOM minimization strategy, i.e. it takes tag pairs (B, I) for ex, and attempts to normalize the tree just for those tag pairs. Normalizing across multiple tags is implemented as pairwise rewriting across all pairs: Ex:(b,i), (b,u),(i,u) for (b,i,u) - Copied over attributes as part of rewriting, but some of the attributes lose their meaning on rewriting since tags are reordered (ex: sourcePosn, sourceTagPosn). How do we handle this? Output examples and possible issues to fix: <i><b><u>biu</u></b></i><b><u>bu</u></b><u>u</u> gets rewritten to: <u><b><i>biu</i>bu</b>u</u> But, the equivalent wikitext form: '''''<u>biu</u>''''''''<u>bu</u>'''<u>u</u> does not get rewritten because of parsing differences. This wikitext gets parsed into: <i><b><u>biu</u>'''</b></i><u>bu<b>u</b></u> The extra ''' token in the middle thwarts DOM rewriting. However, a slightly different version: "'''''<u>biu</u>''<u>bu</u>'''<u>u</u>" gets properly normalized to: <u>'''''biu''bu'''u</u> An alternative, but fun strategy to play with is to use the following two normalization primitives: S(wap) and M(erge). - S rewrites T1(T2(x)) into T2(T1(x)) (ex: <b><i>foo</i></b> ==> <i><b>foo</b></i>) - M rewrites (T(x),T(y)) into (T(x,y)). (ex: <b>foo</b><b>bar</b> ==> <b>foobar</b>) The current rewriting strategy could possibly be re-implemented as S-M rewriting. The problem to solve there would be to find an efficient rewriting strategy that is guaranteed to lead to a normal form. I may not play with it now, but just documenting it for later (to play with in my spare time). This commit is just as a record of fun/experimental code where I get to learn details of JS, wikitext, parsing, and DOM manipulation. Next version of this code will attempt to introduce minimal DOM restructuring across multiple tags at once which can be more efficient. gwicke: Removed now passing test from whitelist, and updated another whitelist entry which is now improved. Change-Id: Ie97bcb164eb62c34ba61aa76ba2f4c232aa713d8	2012-05-29 08:17:57 +02:00
Gabriel Wicke	c52c24b0cb	Slightly improve formatting of web service; test commit message tweak Change-Id: Ibac3ce3dd9aa2c4faf11eed351fea941ebf1e4b3	2012-05-25 16:36:14 +02:00
Gabriel Wicke	c692bc2307	Use '/bin/sh' instead of '/bin/bash' Bash omits the time output for some reason. We would like to keep a record of performance too. Change-Id: I7c435b1cf2e2f237f78a45b2819a195a240e3aa4	2012-05-25 16:19:46 +02:00
Gabriel Wicke	1ce2bc605d	Add a small test runner with result archiving in git repo Supports both -> HTML DOM and round-trip testing. Displays the diff to the last results using less -r. Change-Id: Ib3fbadeda3c8f4f7e3d2e6e5236a73ff7a773623	2012-05-25 14:28:53 +02:00
Gabriel Wicke	b89f5071e5	Basic parser / serializer web service * After installing Parsoid (sudo npm install -g in modules/parser), run 'node server.js' from the api directory and navigate to http://localhost:8000/ and follow the directions. You can start to navigate the English wikipedia at http://localhost:8000/Main_Page, or manually enter wikitext or HTML DOM to convert. * Uses the express framework, could also use just connect * Uses the cluster module to manage workers per-core and restart those on failure Change-Id: I443f2996ed3df00826b038b7476a2f966ab0c425	2012-05-23 12:35:00 +02:00
Gabriel Wicke	39c6f42879	Link round-tripping and other improvements * Changed RDFa for links according to http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary * Added basic support for internal/external link serialization * Moved numbering of external links from tokenizer to LinkHandler * Added round-tripping for generic HTML tags * Replaced nowiki tag with <meta typeOf="mw:tag" content="nowiki"> and <meta typeOf="mw:tag" content="/nowiki"> for now. * 154 round-trip tests passing (node parserTests.js --roundtrip). Change-Id: I16c4db21b1b543ee57c73e569c83025b64664542	2012-05-22 13:36:06 +02:00
Gabriel Wicke	fb7d5418a5	Round-trip nowiki Change-Id: I5f7e6a43f5fdc1708ee710b2a601b20db733452c	2012-05-21 18:06:09 +02:00
Subramanya Sastry	9b84e931db	First pass updating parserTests to verify dom->wikitext serialization. - Just a quick first pass updating the parserTests.js script so we can test DOM -> wikitext serialization (but which in effect really tests roundtripping). - There is no output normalization yet which is needed for now since we are not yet preserving white-space. Change-Id: Ie52058e0dc3330f852c24fa05641dced19f950e0	2012-05-18 09:51:22 +02:00
Gabriel Wicke	04fc74c76a	Strip RDFa attributes in parserTests We are adding some extra information in those, which should not make tests fail. Change-Id: I42cca596330252efeff5d51508f97ef1c566475b	2012-05-17 17:03:44 +02:00
Gabriel Wicke	542921b5a3	Removed html5 parser patch no longer needed with 0.3.8 Change-Id: Id8c23d34e8cca49a360f536e792144a85a8468a3	2012-05-16 12:06:42 +02:00
Adam Wight	0a7f0b7630	List markup is created during the sync23 phase. This makes it possible to transclude list items from a template. Note: "5 quotes" test is broken by this patch, it appears that ListHandler newline processing is changing some state which mysteriously affects the QuoteTransformer. This is ominous, hopefully there's a simple explanation... gwicke: fix a bug in tokenizer triggered by definition lists like this: **; foo : bar Change-Id: I4e3a86596fe9bffcbfc4bf22895362c3bf742bad	2012-05-08 11:39:36 +02:00
Gabriel Wicke	3be4992782	'Obama finally expands' ;) Misc fixes and documentation updates * [[:en:Barack Obama]] can now be expanded in 77 seconds using 330MB RAM, while it would prevously run out of RAM after ~30 minutes. Wohoooo! The token transform framework rework really paid off. * 303 parser tests are passing in the new record time of 5.5 seconds. Two more tests are passing since these tests expect the day of the week to be Thursday. Won't be the case tomorrow. Change-Id: I56e850838476b546df10c6a239c8c9e29a1a3136	2012-04-26 18:18:08 +02:00
Gabriel Wicke	8ff810659a	Rename text/wiki and tokens/wiki to text/x-mediawiki and similar Change-Id: I70113629f4633685cd6db3914303a15e4c79a50a	2012-04-25 20:19:43 +02:00
Gabriel Wicke	8368e17d6a	Biggish token transform system refactoring * All parser pipelines including tokenizer and DOM stuff are now constructed from a 'recipe' data structure in a ParserPipelineFactory. * All sub-pipelines of these can now be cached * Event registrations to a pipeline are directly forwarded to the last pipeline member to save relatively expensive event forwarding. * Some APIs for on-demand expansion / format conversion of parameters from parser functions are added: param.to('tokens/expanded', cb) param.to('text/wiki', cb) (this does not work yet) All parameters are additionally wrapped into a Param object that provides method for positional parameter naming (.named() or conversion to a dict (.dict()). * The async token transform manager is now separated from a frame object, with the frame holding arguments, an on-demand expansion method and loop checks. * Only keys of template parameters are now expanded. Parser functions or template arguments trigger an expansion on-demand. This (unsurprisingly) makes a big performance difference with typical switch-heavy template systems. * Return values from async transforms are no longer used in favor of plain callbacks. This saves the complication of having to maintain two code paths. A trick in transformTokens still avoids the construction of unneeded TokenAccumulators. * The results of template expansions are no longer buffered. * 301 parser tests are passing Known issues: * Cosmetic cleanup remains to do * Some parser functions do not support async expansions yet, and need to be modified. Change-Id: I1a7690baffbe8141cadf67270904a1b2e1df879a	2012-04-25 16:51:36 +02:00
Gabriel Wicke	aaca5eac7d	More tweaks: safesubst and image options * Ignore safesubst for now * Remove an unneeded whitelist entry * Make sure the caption is not lost for thumbs (fix to last commit) and remove debug print Change-Id: I243584ed0838cf7c3b4110fe9cdf869272477312	2012-04-17 11:02:52 +02:00
Gabriel Wicke	afa5b95bc1	Don't work around html5 library tokenizer attribute reordering The HTML5 parser we are using to normalize expected HTML output in parserTests reverses the order of attributes (see https://github.com/aredridel/html5/pull/53 for the fix). Remove whitelist entries concerned with this and use the proper order in external image attributes. Change-Id: If1868cae05396a150757c85a20473ab756cbcd97	2012-04-16 17:09:06 +02:00
Gabriel Wicke	9913108b40	Fix fetch-parserTests (it is in path instead of fs) Change-Id: I169502079ea2609a4f4af776b15767cf0c3ec8b5	2012-04-04 20:40:09 +02:00
Adam Wight	b234edba88	As much as I have loved writing Makefiles... I've replaced its functionality with package.json, mostly so we can avoid non-node dependencies. This is one of the recommended practices. We should consider moving tests/parser into modules/parser/tests, other node projects keep all module code in one directory. Explained in the README how to use npm to load the dependencies and run tests. Too bad about NODE_PATH... Don't try to find parserTests.txt in assorted places--if it isn't present, fetch from gerrit. You can symlink from core if you're developing on both parsers, and the fetch script will not overwrite. Use __dirname in parserTests.js to allow the script to run independent of current working directory. Change-Id: I4c8b884e91f4fdeae385c7697aff768bdd199dd5	2012-04-04 11:02:58 -07:00
Gabriel Wicke	f662690d02	Shorten data-mw-rt to data-mw and clean up whitelist Instead of a proliferation of data-mw-* attributes, it should be easier to stash all private / non-semantic round-trip information in a JSON object stored in data-mw. Change-Id: Id200a6a8789fa152f29ea530e5a24b6ee7b4b285	2012-04-02 18:12:49 +02:00
Gabriel Wicke	5ef3438ee5	Change path to parserTests from phase3 to core after switch to git. Change-Id: Ie13f678eaa81447e98db5c8c394ab103caad8454	2012-04-02 17:10:06 +02:00
Audrey Tang	d3602bb459	* Get parser tests from GitWeb, not Subversion. Change-Id: I39f933b9e0320dc62736da07ce097ec1badec9aa	2012-03-28 23:39:01 +08:00
Antoine Musso	f637756319	node modules required: request & jshashes	2012-03-13 15:14:18 +00:00
Gabriel Wicke	af03eb4f29	Improve generic attribute expansion before external link processing, and make wgUploadPath configurable. Also change the hard-coded fall-back image sizes to sensible defaults. This breaks three parser tests until image size retrieval from the wiki is implemented.	2012-03-06 18:02:35 +00:00
Gabriel Wicke	227103e12c	Accept empty table cell attribute sections, and consider percent-encoded %2525 valid. 270 tests passing.	2012-03-06 14:32:45 +00:00
Gabriel Wicke	2efcd3cd57	Reworked percent encoding handling for URIs to get closer to the 'url construction' part of the HTML5 spec: http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation Removed a few whitelisted test cases that are now passing directly. The encoding canonicalization could also be moved to the Sanitizer. Doing this early in token stream processing however has the advantage of providing further transformations uniform data to work with. We could even consider to move this even further into the tokenizer.	2012-03-06 13:49:37 +00:00
Gabriel Wicke	19fe9726a2	Fix invalid external link representation. 268 tests passing.	2012-03-05 18:06:29 +00:00
Gabriel Wicke	7b0c807710	Change wikilink tokenization strategy to split on pipes. This makes it possible to support template / template argument expansion in image options, and causes little trouble for wikilinks. Non-image wikilinks with multiple text pipes are quite rare in the dumps, and concatenating description tokens with a plain '\|' is quite easy. 261 parser tests passing.	2012-03-05 12:00:38 +00:00
Gabriel Wicke	009d7a4dea	Namespaces to the rescue.	2012-03-02 15:49:05 +00:00
Gabriel Wicke	fe681042c0	Collect some statistics while grepping.	2012-03-01 16:42:28 +00:00
Gabriel Wicke	e0838db315	Capturing the regexp is no longer necessary, and speeds up the grepper. Also tweaked the multi-line ISBN regexp slightly.	2012-02-29 13:02:46 +00:00
Gabriel Wicke	e3deb304db	Add a misc regexp file for dump grepping.	2012-02-29 11:07:17 +00:00
Gabriel Wicke	14f40aa7d5	Support capturing regexps in dumpGrepper.	2012-02-29 10:49:00 +00:00
Gabriel Wicke	ebcfc2c7a1	Improve grepper documentation.	2012-02-28 14:24:37 +00:00
Gabriel Wicke	b767e03449	Tweak martian regexp and grepper output format.	2012-02-28 14:11:44 +00:00
Gabriel Wicke	4806505ce4	Finish color highlighting for dump grepper / fix broken commit r112592.	2012-02-28 13:48:47 +00:00
Gabriel Wicke	7daeb34d4d	Implement onlyinclude transformer. 254 tests passing.	2012-02-28 13:21:01 +00:00
Gabriel Wicke	32012c00cd	Add martian-endtags regexp wrapper around dumpGrepper.	2012-02-27 16:51:20 +00:00
Gabriel Wicke	19c67c28a2	Add a simple dump grepper using DumpReader. Useful to inform parser design decisions, and as a way to exercise the dump reader in preparation for tests over full dumps.	2012-02-27 16:40:01 +00:00
Gabriel Wicke	21855c99cd	Tweak dumpReader to work with current libxmljs and stdin 'data' events.	2012-02-27 15:46:08 +00:00
Gabriel Wicke	2e41b19af8	Green two more parser tests by implementing some parser functions.	2012-02-22 16:39:50 +00:00
Gabriel Wicke	3568dfee14	Add some support for functionhooks in test parser and parserTests.js, and tweak a few parser functions.	2012-02-22 15:59:11 +00:00
au	f1fb937b4a	* Instead of sorting attributes, whitelist the one parserTest where it matters.	2012-02-20 22:26:24 +00:00
au	0ca9b00100	* Convert __patched-html-parser to .coffee. Note that the compiled .js file (generated by "make"/"make test") is still under version control so folks can work on the project even without a running "coffee" command in PATH. Also updated README to mention coffee-script and "make test".	2012-02-18 18:54:12 +00:00
au	4d1c6c7d6e	* Add a "make test" target that auto-fetches parserTests.txt.	2012-02-18 17:28:46 +00:00
au	0360e62da7	* Locally apply the HTML5.Marker.type patch. This is needed until https://github.com/aredridel/html5/issues/44 is merged into the upstream "html5" module.	2012-02-18 17:28:35 +00:00
Gabriel Wicke	025f9cddb3	Prefix all internal data- attributes with data-mw- and adjust the whitelist and test output normalization accordingly. 235 tests passing.	2012-02-13 13:54:07 +00:00
Gabriel Wicke	a122e51eec	Move data-* annotations into separate object on tokens, that is then serialized into a single data-mw-rt attribute if present. Update parserTests to ignore this attribute for comparisons with expected parser output. A few more tweaks and notes are thrown into this commit too. 233 tests are passing now.	2012-02-11 16:43:25 +00:00
Gabriel Wicke	1f6db903e9	Pluck a few low-hanging fruit in external link tokenization, and add a simple localurl parser function implementation. 230 parser tests now passing.	2012-02-07 10:28:23 +00:00

1 2 3

134 commits