Commit graph

65 commits

Author SHA1 Message Date
Gabriel Wicke 80d5067813 Add a TokenTransformer dispatcher class. This class provides subscriptions by
token type, and supports asynchronous token expansion (for example for async
template expansion). This code is not yet tested or used. The interface for
token insertion from transformation functions will be expanded as needed.
2011-12-08 14:37:31 +00:00
Gabriel Wicke c2b69e2486 Clean up newline handling. Emit a NEWLINE token for each
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke abc2254110 A bit of comment clean-up and wrapping of tree building into try/catch block
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke 92fdf99384 Further renaming, this time from pegParser to pegTokenizer. 2011-12-08 10:59:44 +00:00
Gabriel Wicke 76bc477038 Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke 19a1f0850f Tidy up the grammar a bit. 2011-12-08 10:33:23 +00:00
Gabriel Wicke 3742d70abd Add some documentation to syntax flags 2011-12-07 15:54:55 +00:00
Gabriel Wicke 545ca1809f Convert template argument production to generic inline with syntactic stop.
Fix a bug in generic inline production. Nested multi-line templates are now
parsed okayish.
2011-12-07 15:39:39 +00:00
Gabriel Wicke 902db40a1f Process template arguments into an object. 2011-12-07 14:46:07 +00:00
Gabriel Wicke 51a40e4dbc Follow-up to r105423: Fix off-by-one bug. 2011-12-07 11:56:12 +00:00
Gabriel Wicke 49c286a67b Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve
tag-only-line handling. 180 parser tests now passing.
2011-12-07 11:51:24 +00:00
Gabriel Wicke 418a5067c6 Parse attributes in tables using generic attribute production. Some table
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke 3d06707152 Slightly speed up inline tag productions using guards and grouping; Fix list
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke ea8f226fd5 Remove ext and references special cases, now subsumed by generic XML tag
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke e7de089d5b Decode urls and html entities, 163 tests now passing. 2011-12-06 13:17:14 +00:00
Gabriel Wicke a72a9e55a3 Don't match internal links with url as target. 161 passing. 2011-12-06 12:26:57 +00:00
Gabriel Wicke 2b5cc67bf5 Further tweaks to headings. 157 tests now passing. 2011-12-06 11:59:41 +00:00
Gabriel Wicke f4d123886e Convert heading rules to single rule that figures out the level. This saves a
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke 33e19f7275 Recognize block-level elements independent of case; Ignore toc and section
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke 9ed9cb31bd Fix template argument handling somewhat. 2011-12-05 17:58:11 +00:00
Gabriel Wicke 1760210d13 Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Gabriel Wicke 63c728924b Use pegjs from npm 2011-12-01 15:23:23 +00:00
Antoine Musso 5ab379f479 fix vim modeline 2011-12-01 15:19:37 +00:00
Gabriel Wicke 0ce1e9fcf3 Add a quick html entity decoding hack, and document need for general decoder. 2011-12-01 14:39:55 +00:00
Gabriel Wicke d00743ad79 Improve external links and definition lists, now 133 tests passing ;)
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke 82e31ffd42 Do not allow newlines in various attributes 2011-11-30 15:12:53 +00:00
Gabriel Wicke 821162484e Allow inlines in the term part of ; term : definition 2011-11-30 14:53:28 +00:00
Gabriel Wicke f758894de7 Let another test pass by swapping the default order of italic/bold for '''''.
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke e0fca805a6 Expand tabs in grammar. 2011-11-30 13:42:26 +00:00
Gabriel Wicke 2bb512a4de A bit of tokenizer grammar clean-up and additional expected-html
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke 127d8c8621 Simplify DOM paragraph wrapping postprocessor 2011-11-30 12:28:45 +00:00
Gabriel Wicke f0edc5cb9a Fix a few more tests by allowing inline content inside links. 76 now passing. 2011-11-29 18:43:27 +00:00
Gabriel Wicke ae0b5f9af4 * Split paragraph handling between tokenizer and DOM postprocessor for better
html markup handling. 
* Remove global 'use strict' declarations from html5 parser. 
* Add trailing whitespace handling in dt

Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke b16c295b98 Consider dl as a block-level element. 2011-11-28 16:54:58 +00:00
Gabriel Wicke d3f0196df7 Add primitive HTML comparison to detect passing parser tests. The expected
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke 6b8c109cf0 Separate block-level tags in tokenizer to delimit inlines and avoid wrapping
block-level in paragraphs.
2011-11-25 17:41:26 +00:00
Gabriel Wicke 859379a635 Improvements to nowiki/pre interaction. Will need to distinguish block-level
tags from inline HTML tags next.
2011-11-25 15:02:44 +00:00
Gabriel Wicke dd5cd59ac6 Better HTML, pre and blocklevel handling. Hackish source formatting for easier
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Gabriel Wicke 5b3a4497aa Add generic HTML tokenization and nowiki handling. 2011-11-25 10:59:43 +00:00
Gabriel Wicke 6c36ddcbce Follow-up to r104164: Clean-up comments, remove old italic/bold productions. 2011-11-24 14:20:56 +00:00
Gabriel Wicke dee262658f Add MediaWiki-compatible quote handling including quirks and overlapped
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Gabriel Wicke baf55875b9 Re-add modified wiki list handling to tokenizer. 2011-11-23 14:27:51 +00:00
Gabriel Wicke 694b998f24 Minor improvement to italic/bold, documentation on failed modularization of
static parser functions.
2011-11-22 16:51:05 +00:00
Gabriel Wicke d1b0293569 Fix comment token conversion and serialization 2011-11-21 09:22:30 +00:00
Gabriel Wicke 65afd9b610 Improve internal link handling 2011-11-18 14:48:32 +00:00
Gabriel Wicke d744e65c48 Add missing token adapter. 2011-11-18 14:00:14 +00:00
Gabriel Wicke b750ce38b8 Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Gabriel Wicke 11e487d8c0 Flatten inline token lists before merging text into text tokens. 2011-11-17 15:43:31 +00:00
Gabriel Wicke ea87e7aaee Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Gabriel Wicke ef3c84bd2e Extract text from inline elements for better testing. Slightly improved
handling of comment-only lines. Change pre to leaf content model.
2011-11-08 16:08:05 +00:00