wikimedia/mediawiki-extensions-VisualEditor

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/VisualEditor synced 2024-11-26 23:31:02 +00:00

Author	SHA1	Message	Date
Gabriel Wicke	bd98eb4c5a	Land big TokenTransformDispatcher and eventization refactoring. The TokenTransformDispatcher now actually implements an asynchronous, phased token transformation framework as described in https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations. Additionally, the parser pipeline is now mostly held together using events. The tokenizer still emits a lame single events with all tokens, as block-level emission failed with scoping issues specific to the PEGJS parser generator. All stages clean up when receiving the end tokens, so that the full pipeline can be used for repeated parsing. The QuoteTransformer is not yet 100% fixed to work with the new interface, and the Cite extension is disabled for now pending adaptation. Bold-italic related tests are failing currently.	2012-01-03 18:44:31 +00:00
Neil Kandalgaonkar	20374b5911	fix substr for IE, followup r107464	2011-12-30 21:51:03 +00:00
Gabriel Wicke	8e00a72d0a	Improvements to link trail handling, and two tweaks to the whitelist. 182 tests now passing. Link trails depend on language-dependent positive character classes in the PHP parser. These classes all seem to disallow punctuation implicitly and list differing plain text characters instead, so it might be possible to get away with identifying a common class of non-trail punctuation instead. This would help to keep the tokenizer independent of configurations, which is very desirable for caching and simplified external parsing.	2011-12-30 12:47:06 +00:00
Gabriel Wicke	11ece76b7b	Fix suffix handling for wiki links.	2011-12-30 09:35:57 +00:00
Gabriel Wicke	b3a0270d69	Remove env and load grammar in tokenizer constructor. Re-add property hack to keep parserTests running for now. Really need a different pipeline for html serialization or a reference to the HTML DOM.	2011-12-28 17:04:16 +00:00
Gabriel Wicke	3a63fb118e	Add a few comments inline, and remove unneeded html serialization as we are only interested in WikiDom output in this parser wrapper.	2011-12-28 13:46:52 +00:00
Neil Kandalgaonkar	8fbf36e63e	put add terminal token inside tokenize method (will pull it out again for streaming interface)	2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar	6103646ec8	remove need to add newline at end of input	2011-12-28 01:37:11 +00:00
Neil Kandalgaonkar	4158f82d7e	refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js	2011-12-28 01:37:06 +00:00
Neil Kandalgaonkar	d91a67ba99	nodeName not defined	2011-12-28 01:36:54 +00:00
Neil Kandalgaonkar	962d1262fc	create tokenizer without need to modify namespace with PEG source	2011-12-28 01:36:36 +00:00
Gabriel Wicke	33e60dd4d9	Update comments a bit.	2011-12-22 12:37:24 +00:00
Gabriel Wicke	9ee0e660ec	Fix regression introduced by r107060 for regular table cells. Good to have a test suite ;)	2011-12-22 12:09:25 +00:00
Gabriel Wicke	a94d0ec10c	Re-add support for row-only tables.	2011-12-22 11:58:32 +00:00
Gabriel Wicke	1c7fe0eb34	Refactor table productions to support table fragments in templates (table start / row / table end). The old productions are not deleted yet to make it easy to compare the output on more complex articles. 181 tests passing after adding two table tests with whitespace-only differences to the whitelist.	2011-12-22 11:43:55 +00:00
Gabriel Wicke	2845ba9552	Handle noinclude and includeonly at start of line, so that syntax after it still matches as if it actually was preceded by a newline.	2011-12-21 11:38:50 +00:00
Gabriel Wicke	3a631db6d9	Fix ranges for annotations in implicit paragraphs within branch nodes.	2011-12-16 19:36:04 +00:00
Gabriel Wicke	cc06551f2e	Rename table_header production to table_heading. Those non-natives strike again.	2011-12-16 19:24:59 +00:00
Gabriel Wicke	605ed23fd2	Fix attributes in table headings.	2011-12-16 19:22:13 +00:00
Gabriel Wicke	08255ff3e6	Small bug fix to heading level, spotted by Mike from localwiki- thanks!	2011-12-15 23:59:35 +00:00
Gabriel Wicke	a04744b2ec	Add some more attribute remapping capabilities to the DOMConverter, and clean up some grammar formatting.	2011-12-15 17:33:07 +00:00
Gabriel Wicke	e98dd9e722	Implement 1-char-minimum width for annotations, and some additonal minor cleanup.	2011-12-15 11:05:52 +00:00
Gabriel Wicke	22ba27295b	Clean up the DOMConverter a bit.	2011-12-15 10:55:30 +00:00
Gabriel Wicke	e72dee76e4	Follow-up to r106208 and r106207. Both good catches, thanks Yair! As this code is in its early stages and nowhere near deployment, please Be Bold and just commit things like this directly! IMHO it makes more sense to fully review this once it settles down a bit.	2011-12-15 10:13:50 +00:00
Gabriel Wicke	3585bd9c8e	Accept row-only tables. The parser now eats [[en:Barack Obama]] as-is. Hooray!	2011-12-15 00:39:28 +00:00
Gabriel Wicke	6df94a34a1	Less lust for urls	2011-12-15 00:26:22 +00:00
Gabriel Wicke	ce2ee067f7	Minor tweak to wiki link production	2011-12-15 00:12:58 +00:00
Gabriel Wicke	377226a120	Comment out a stray console.log	2011-12-14 23:44:58 +00:00
Gabriel Wicke	574abd9774	A collection of small bug fixes to the grammar, Cite, the Token format converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all parserTests.	2011-12-14 23:38:46 +00:00
Gabriel Wicke	dc77d73ad5	Add ability to pass through JSON data to WikiDom in data-json-* attributes, and fix parser to actually parse the Barack Obama article except for one table with nested templates at the start-of-line.	2011-12-14 17:25:09 +00:00
Gabriel Wicke	f6e4267fca	Handle a few more element types, and reset offset for each leaf node. Not sure if the latter is correct, as the documentation at https://www.mediawiki.org/wiki/Visual_editor/Software_design#Data_Structures and the actual sample WikiDom in the editor sandbox seem to disagree on this point.	2011-12-14 16:22:27 +00:00
Gabriel Wicke	6676a47008	Add implicit level attribute to WikiDom headings.	2011-12-14 15:55:58 +00:00
Gabriel Wicke	3018ca690b	Improve WikiDom conversion: Handle text and annotations in branch nodes as paragraphs and treat list items as branches.	2011-12-14 15:40:40 +00:00
Gabriel Wicke	a09aa4d599	Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of parser tests using 'node parserTests.js --wikidom'.	2011-12-14 15:15:41 +00:00
Gabriel Wicke	5f80d30428	Clean up access to document and body after building the tree.	2011-12-14 09:40:49 +00:00
Gabriel Wicke	30749b8d8d	Update comments a bit and add a note on things to improve in API.	2011-12-14 09:33:25 +00:00
Gabriel Wicke	55ff272847	Comment TokenTransformDispatcher.	2011-12-13 20:13:09 +00:00
Gabriel Wicke	44deefe303	Minor tweak to comment.	2011-12-13 18:55:44 +00:00
Gabriel Wicke	c61b32eaa7	Clean up and comment the Cite extension a bit.	2011-12-13 18:45:09 +00:00
Gabriel Wicke	feee9ded9f	Convert the Cite extension to a token stream transformer. This required a few further additions to the TokenTransformDispatcher. In particular, there is now an 'any' token match whose callbacks are executed before more specific callbacks. This is used by the Cite extension to eat all tokens between ref and /ref tags. This need is very common, so should be broken out to an intermediate layer in the future. In general, the requirements for the TokenTransformDispatcher API are now clearer, and the API should likely be cleaned up / simplified.	2011-12-13 14:48:47 +00:00
Gabriel Wicke	8e55e79b67	Rename TokenTransformer to TokenTransformDispatcher.	2011-12-13 11:45:12 +00:00
Gabriel Wicke	8231511217	Replace custom object copy with $.extend.	2011-12-13 11:18:15 +00:00
Gabriel Wicke	39aedd4378	Improve comments in QuoteTransformer.	2011-12-13 10:25:18 +00:00
Gabriel Wicke	0ad08b9ae3	Add a README file pointing to the wiki documentation.	2011-12-12 22:30:11 +00:00
Gabriel Wicke	a8fa9433c4	Convert quote handling (italic/bold) to a core extension operating on the token stream. This is the first token transformation exercising the TokenTransformer class as its dispatcher. Template expansions, wiki link formatting, tag sanitation and extensions should be able to use the same dispatcher by registering for specific token types. The parser performance is very slightly improved as the token stream is only traversed once.	2011-12-12 20:53:14 +00:00
Gabriel Wicke	752b0990b2	Refactor parserTests somewhat into a class-like structure, and wire up the TokenTransformer.	2011-12-12 14:03:54 +00:00
Gabriel Wicke	d616f07a79	Don't re-build the wiki tokenizer for each test. This speeds up the full parserTests.js run slightly from 7-8 minutes to about 14 seconds ;) A few very minor tweaks to the grammar are also thrown into this commit.	2011-12-12 10:47:42 +00:00
Gabriel Wicke	89c5e0cafb	Follow-up to r105859: Add missing new.	2011-12-12 10:09:13 +00:00
Gabriel Wicke	9ebce5839a	Further development of the TokenTransformer framework.	2011-12-12 10:01:47 +00:00
Gabriel Wicke	80d5067813	Add a TokenTransformer dispatcher class. This class provides subscriptions by token type, and supports asynchronous token expansion (for example for async template expansion). This code is not yet tested or used. The interface for token insertion from transformation functions will be expanded as needed.	2011-12-08 14:37:31 +00:00
Gabriel Wicke	c2b69e2486	Clean up newline handling. Emit a NEWLINE token for each non-{comment,pre,nowiki} newline.	2011-12-08 14:34:18 +00:00
Gabriel Wicke	abc2254110	A bit of comment clean-up and wrapping of tree building into try/catch block to actually count failures.	2011-12-08 11:40:59 +00:00
Gabriel Wicke	92fdf99384	Further renaming, this time from pegParser to pegTokenizer.	2011-12-08 10:59:44 +00:00
Gabriel Wicke	76bc477038	Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to TreeBuilder.	2011-12-08 10:37:18 +00:00
Gabriel Wicke	19a1f0850f	Tidy up the grammar a bit.	2011-12-08 10:33:23 +00:00
Gabriel Wicke	3742d70abd	Add some documentation to syntax flags	2011-12-07 15:54:55 +00:00
Gabriel Wicke	545ca1809f	Convert template argument production to generic inline with syntactic stop. Fix a bug in generic inline production. Nested multi-line templates are now parsed okayish.	2011-12-07 15:39:39 +00:00
Gabriel Wicke	902db40a1f	Process template arguments into an object.	2011-12-07 14:46:07 +00:00
Gabriel Wicke	51a40e4dbc	Follow-up to r105423: Fix off-by-one bug.	2011-12-07 11:56:12 +00:00
Gabriel Wicke	49c286a67b	Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve tag-only-line handling. 180 parser tests now passing.	2011-12-07 11:51:24 +00:00
Gabriel Wicke	418a5067c6	Parse attributes in tables using generic attribute production. Some table tests still do not pass as the MW table output reorders attributes ;)	2011-12-06 22:03:21 +00:00
Gabriel Wicke	3d06707152	Slightly speed up inline tag productions using guards and grouping; Fix list processing function.	2011-12-06 18:35:05 +00:00
Gabriel Wicke	ea8f226fd5	Remove ext and references special cases, now subsumed by generic XML tag productions. Document issue around special tokenizer mode for other extension tags.	2011-12-06 16:44:27 +00:00
Gabriel Wicke	e7de089d5b	Decode urls and html entities, 163 tests now passing.	2011-12-06 13:17:14 +00:00
Gabriel Wicke	a72a9e55a3	Don't match internal links with url as target. 161 passing.	2011-12-06 12:26:57 +00:00
Gabriel Wicke	2b5cc67bf5	Further tweaks to headings. 157 tests now passing.	2011-12-06 11:59:41 +00:00
Gabriel Wicke	f4d123886e	Convert heading rules to single rule that figures out the level. This saves a lot of backtracking and inline break complexity.	2011-12-06 11:06:05 +00:00
Gabriel Wicke	33e19f7275	Recognize block-level elements independent of case; Ignore toc and section edit links in tests. 148 parser tests passing.	2011-12-05 20:03:24 +00:00
Gabriel Wicke	9ed9cb31bd	Fix template argument handling somewhat.	2011-12-05 17:58:11 +00:00
Gabriel Wicke	1760210d13	Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused by improperly caching of production results, which interfered with the flag-dependent inline_break production.	2011-12-04 19:23:24 +00:00
Gabriel Wicke	63c728924b	Use pegjs from npm	2011-12-01 15:23:23 +00:00
Antoine Musso	5ab379f479	fix vim modeline	2011-12-01 15:19:37 +00:00
Gabriel Wicke	0ce1e9fcf3	Add a quick html entity decoding hack, and document need for general decoder.	2011-12-01 14:39:55 +00:00
Gabriel Wicke	d00743ad79	Improve external links and definition lists, now 133 tests passing ;) Also add printwhitelist option to test runner, provides js code copy/pastable to whitelist.	2011-12-01 14:25:59 +00:00
Gabriel Wicke	82e31ffd42	Do not allow newlines in various attributes	2011-11-30 15:12:53 +00:00
Gabriel Wicke	821162484e	Allow inlines in the term part of ; term : definition	2011-11-30 14:53:28 +00:00
Gabriel Wicke	f758894de7	Let another test pass by swapping the default order of italic/bold for '''''. Minor test output cosmetics.	2011-11-30 13:54:57 +00:00
Gabriel Wicke	e0fca805a6	Expand tabs in grammar.	2011-11-30 13:42:26 +00:00
Gabriel Wicke	2bb512a4de	A bit of tokenizer grammar clean-up and additional expected-html normalization. 99 parser tests now passing.	2011-11-30 13:40:17 +00:00
Gabriel Wicke	127d8c8621	Simplify DOM paragraph wrapping postprocessor	2011-11-30 12:28:45 +00:00
Gabriel Wicke	f0edc5cb9a	Fix a few more tests by allowing inline content inside links. 76 now passing.	2011-11-29 18:43:27 +00:00
Gabriel Wicke	ae0b5f9af4	* Split paragraph handling between tokenizer and DOM postprocessor for better html markup handling. * Remove global 'use strict' declarations from html5 parser. * Add trailing whitespace handling in dt Overall, 55 parser tests are now passing.	2011-11-29 15:11:51 +00:00
Gabriel Wicke	b16c295b98	Consider dl as a block-level element.	2011-11-28 16:54:58 +00:00
Gabriel Wicke	d3f0196df7	Add primitive HTML comparison to detect passing parser tests. The expected HTML is parsed using a HTML parser and re-serialized, and the output compared to the serialization of the new parser's dom. Newline normalization is a cheap hack for now, need to improve that later.	2011-11-28 11:10:39 +00:00
Gabriel Wicke	6b8c109cf0	Separate block-level tags in tokenizer to delimit inlines and avoid wrapping block-level in paragraphs.	2011-11-25 17:41:26 +00:00
Gabriel Wicke	859379a635	Improvements to nowiki/pre interaction. Will need to distinguish block-level tags from inline HTML tags next.	2011-11-25 15:02:44 +00:00
Gabriel Wicke	dd5cd59ac6	Better HTML, pre and blocklevel handling. Hackish source formatting for easier comparison with parserTest results.	2011-11-25 12:47:03 +00:00
Gabriel Wicke	5b3a4497aa	Add generic HTML tokenization and nowiki handling.	2011-11-25 10:59:43 +00:00
Gabriel Wicke	6c36ddcbce	Follow-up to r104164: Clean-up comments, remove old italic/bold productions.	2011-11-24 14:20:56 +00:00
Gabriel Wicke	dee262658f	Add MediaWiki-compatible quote handling including quirks and overlapped structures like ''[[Link\|Link text'']]. This is another transform on the token stream.	2011-11-24 13:56:30 +00:00
Gabriel Wicke	baf55875b9	Re-add modified wiki list handling to tokenizer.	2011-11-23 14:27:51 +00:00
Gabriel Wicke	694b998f24	Minor improvement to italic/bold, documentation on failed modularization of static parser functions.	2011-11-22 16:51:05 +00:00
Gabriel Wicke	d1b0293569	Fix comment token conversion and serialization	2011-11-21 09:22:30 +00:00
Gabriel Wicke	65afd9b610	Improve internal link handling	2011-11-18 14:48:32 +00:00
Gabriel Wicke	d744e65c48	Add missing token adapter.	2011-11-18 14:00:14 +00:00
Gabriel Wicke	b750ce38b8	Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer. Builds a DOM tree (jsdom) from the tokens and then serializes that using document.innerHTML. This is all very experimental, so don't be surprised by rough edges.	2011-11-18 13:57:07 +00:00
Gabriel Wicke	11e487d8c0	Flatten inline token lists before merging text into text tokens.	2011-11-17 15:43:31 +00:00
Gabriel Wicke	ea87e7aaee	Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of tokens, which for now is still completely built before parsing can proceed. For each top-level block, the source start/end positions are added as attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.	2011-11-17 15:26:02 +00:00
Gabriel Wicke	ef3c84bd2e	Extract text from inline elements for better testing. Slightly improved handling of comment-only lines. Change pre to leaf content model.	2011-11-08 16:08:05 +00:00
Gabriel Wicke	18ead89b37	Improved paragraph, br, comment parsing and switched headings to generic inlineline with syntactic flags.	2011-11-07 23:09:30 +00:00
Gabriel Wicke	944d010eb2	Indentation cleanup in PEG parser and Html serializer	2011-11-07 21:05:37 +00:00
Gabriel Wicke	c3a0c56e56	rename definition{term,description} to just {term,description}	2011-11-07 20:36:34 +00:00
Gabriel Wicke	71891131c3	Grammar improvements * replaced regexp stack with a set of break rules for inline content within specialized parse contexts, switched more rules to generic inlineline/inline/block rules. * don't consume end-of-line for proper start-of-line matching * added some pre support * still no conversion of inline elements to annotations	2011-11-07 14:39:12 +00:00
Gabriel Wicke	06ca9f12fe	Rename definitiondata to definitiondescription, minor fixes	2011-11-04 12:25:01 +00:00
Gabriel Wicke	7e5c196732	Some more progress for tables and definition lists	2011-11-04 12:06:49 +00:00
Gabriel Wicke	83a80bad49	Fixes for definition lists	2011-11-04 11:08:11 +00:00
Gabriel Wicke	85def70a8a	Add basic list serialization to HtmlSerializer * Added 'definitionterm' and 'definitiondata' styles to support definition lists, and special-case handling in the serializer to wrap both in dls.	2011-11-04 10:02:59 +00:00
Gabriel Wicke	63398b5749	Update parserTests to latest serializers	2011-11-04 07:45:05 +00:00
Gabriel Wicke	a8838dab18	Start by handling paragraphs, at least a bit.	2011-11-03 15:16:05 +00:00
Gabriel Wicke	0d30a5528e	First combination of WikiDom serializers with existing parser in tests/parser/parserTests.js. * Removed var from es in es.js to allow node.js to access it as global. Only alternative solution appears to be a node-specific 'exports' construct: http://nodejs.org/docs/v0.3.1/api/modules.html * Added es.Document.js and es.Document.Serializer.js in es/bases. Not sure if this is the desired location. * Changed es.extend to es.extendClass in the serializers * Modified the first parser test to include the WikiDom modules and call the new HTML serializer	2011-11-03 13:55:48 +00:00
Trevor Parscal	5bae153214	Moving parser stuff back into the modules folder (oops)	2011-11-02 21:45:57 +00:00
Trevor Parscal	2b499d5990	Reorganized modules by javascript namespace	2011-11-02 21:31:45 +00:00
Brion Vibber	213ee7d4a8	followup r101685: the peg definition	2011-11-02 21:09:19 +00:00
Brion Vibber	56a75ccca7	Copy several of the experimental JS parser bits from ParserPlayground to VisualEditor. They'll need retooling to hook up with the wikidom stuff.	2011-11-02 21:07:51 +00:00

... 2 3 4 5 6

264 commits