Commit graph

86 commits

Author SHA1 Message Date
Gabriel Wicke 574abd9774 A collection of small bug fixes to the grammar, Cite, the Token format
converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all
parserTests.
2011-12-14 23:38:46 +00:00
Gabriel Wicke dc77d73ad5 Add ability to pass through JSON data to WikiDom in data-json-* attributes,
and fix parser to actually parse the Barack Obama article except for one table
with nested templates at the start-of-line.
2011-12-14 17:25:09 +00:00
Gabriel Wicke f6e4267fca Handle a few more element types, and reset offset for each leaf node. Not sure
if the latter is correct, as the documentation at
https://www.mediawiki.org/wiki/Visual_editor/Software_design#Data_Structures
and the actual sample WikiDom in the editor sandbox seem to disagree on this
point.
2011-12-14 16:22:27 +00:00
Gabriel Wicke 6676a47008 Add implicit level attribute to WikiDom headings. 2011-12-14 15:55:58 +00:00
Gabriel Wicke 3018ca690b Improve WikiDom conversion: Handle text and annotations in branch nodes as
paragraphs and treat list items as branches.
2011-12-14 15:40:40 +00:00
Gabriel Wicke a09aa4d599 Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of
parser tests using 'node parserTests.js --wikidom'.
2011-12-14 15:15:41 +00:00
Gabriel Wicke 5f80d30428 Clean up access to document and body after building the tree. 2011-12-14 09:40:49 +00:00
Gabriel Wicke 30749b8d8d Update comments a bit and add a note on things to improve in API. 2011-12-14 09:33:25 +00:00
Gabriel Wicke 55ff272847 Comment TokenTransformDispatcher. 2011-12-13 20:13:09 +00:00
Gabriel Wicke 44deefe303 Minor tweak to comment. 2011-12-13 18:55:44 +00:00
Gabriel Wicke c61b32eaa7 Clean up and comment the Cite extension a bit. 2011-12-13 18:45:09 +00:00
Gabriel Wicke feee9ded9f Convert the Cite extension to a token stream transformer.
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.

In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke 8e55e79b67 Rename TokenTransformer to TokenTransformDispatcher. 2011-12-13 11:45:12 +00:00
Gabriel Wicke 8231511217 Replace custom object copy with $.extend. 2011-12-13 11:18:15 +00:00
Gabriel Wicke 39aedd4378 Improve comments in QuoteTransformer. 2011-12-13 10:25:18 +00:00
Gabriel Wicke 0ad08b9ae3 Add a README file pointing to the wiki documentation. 2011-12-12 22:30:11 +00:00
Gabriel Wicke a8fa9433c4 Convert quote handling (italic/bold) to a core extension operating on the
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.

The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke 752b0990b2 Refactor parserTests somewhat into a class-like structure, and wire up the
TokenTransformer.
2011-12-12 14:03:54 +00:00
Gabriel Wicke d616f07a79 Don't re-build the wiki tokenizer for each test. This speeds up the full
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)

A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke 89c5e0cafb Follow-up to r105859: Add missing new. 2011-12-12 10:09:13 +00:00
Gabriel Wicke 9ebce5839a Further development of the TokenTransformer framework. 2011-12-12 10:01:47 +00:00
Gabriel Wicke 80d5067813 Add a TokenTransformer dispatcher class. This class provides subscriptions by
token type, and supports asynchronous token expansion (for example for async
template expansion). This code is not yet tested or used. The interface for
token insertion from transformation functions will be expanded as needed.
2011-12-08 14:37:31 +00:00
Gabriel Wicke c2b69e2486 Clean up newline handling. Emit a NEWLINE token for each
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke abc2254110 A bit of comment clean-up and wrapping of tree building into try/catch block
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke 92fdf99384 Further renaming, this time from pegParser to pegTokenizer. 2011-12-08 10:59:44 +00:00
Gabriel Wicke 76bc477038 Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke 19a1f0850f Tidy up the grammar a bit. 2011-12-08 10:33:23 +00:00
Gabriel Wicke 3742d70abd Add some documentation to syntax flags 2011-12-07 15:54:55 +00:00
Gabriel Wicke 545ca1809f Convert template argument production to generic inline with syntactic stop.
Fix a bug in generic inline production. Nested multi-line templates are now
parsed okayish.
2011-12-07 15:39:39 +00:00
Gabriel Wicke 902db40a1f Process template arguments into an object. 2011-12-07 14:46:07 +00:00
Gabriel Wicke 51a40e4dbc Follow-up to r105423: Fix off-by-one bug. 2011-12-07 11:56:12 +00:00
Gabriel Wicke 49c286a67b Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve
tag-only-line handling. 180 parser tests now passing.
2011-12-07 11:51:24 +00:00
Gabriel Wicke 418a5067c6 Parse attributes in tables using generic attribute production. Some table
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke 3d06707152 Slightly speed up inline tag productions using guards and grouping; Fix list
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke ea8f226fd5 Remove ext and references special cases, now subsumed by generic XML tag
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke e7de089d5b Decode urls and html entities, 163 tests now passing. 2011-12-06 13:17:14 +00:00
Gabriel Wicke a72a9e55a3 Don't match internal links with url as target. 161 passing. 2011-12-06 12:26:57 +00:00
Gabriel Wicke 2b5cc67bf5 Further tweaks to headings. 157 tests now passing. 2011-12-06 11:59:41 +00:00
Gabriel Wicke f4d123886e Convert heading rules to single rule that figures out the level. This saves a
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke 33e19f7275 Recognize block-level elements independent of case; Ignore toc and section
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke 9ed9cb31bd Fix template argument handling somewhat. 2011-12-05 17:58:11 +00:00
Gabriel Wicke 1760210d13 Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Gabriel Wicke 63c728924b Use pegjs from npm 2011-12-01 15:23:23 +00:00
Antoine Musso 5ab379f479 fix vim modeline 2011-12-01 15:19:37 +00:00
Gabriel Wicke 0ce1e9fcf3 Add a quick html entity decoding hack, and document need for general decoder. 2011-12-01 14:39:55 +00:00
Gabriel Wicke d00743ad79 Improve external links and definition lists, now 133 tests passing ;)
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke 82e31ffd42 Do not allow newlines in various attributes 2011-11-30 15:12:53 +00:00
Gabriel Wicke 821162484e Allow inlines in the term part of ; term : definition 2011-11-30 14:53:28 +00:00
Gabriel Wicke f758894de7 Let another test pass by swapping the default order of italic/bold for '''''.
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke e0fca805a6 Expand tabs in grammar. 2011-11-30 13:42:26 +00:00