Commit graph

61 commits

Author SHA1 Message Date
Gabriel Wicke 76bc477038 Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke 19a1f0850f Tidy up the grammar a bit. 2011-12-08 10:33:23 +00:00
Gabriel Wicke 3742d70abd Add some documentation to syntax flags 2011-12-07 15:54:55 +00:00
Gabriel Wicke 545ca1809f Convert template argument production to generic inline with syntactic stop.
Fix a bug in generic inline production. Nested multi-line templates are now
parsed okayish.
2011-12-07 15:39:39 +00:00
Gabriel Wicke 902db40a1f Process template arguments into an object. 2011-12-07 14:46:07 +00:00
Gabriel Wicke 51a40e4dbc Follow-up to r105423: Fix off-by-one bug. 2011-12-07 11:56:12 +00:00
Gabriel Wicke 49c286a67b Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve
tag-only-line handling. 180 parser tests now passing.
2011-12-07 11:51:24 +00:00
Gabriel Wicke 418a5067c6 Parse attributes in tables using generic attribute production. Some table
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke 3d06707152 Slightly speed up inline tag productions using guards and grouping; Fix list
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke ea8f226fd5 Remove ext and references special cases, now subsumed by generic XML tag
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke e7de089d5b Decode urls and html entities, 163 tests now passing. 2011-12-06 13:17:14 +00:00
Gabriel Wicke a72a9e55a3 Don't match internal links with url as target. 161 passing. 2011-12-06 12:26:57 +00:00
Gabriel Wicke 2b5cc67bf5 Further tweaks to headings. 157 tests now passing. 2011-12-06 11:59:41 +00:00
Gabriel Wicke f4d123886e Convert heading rules to single rule that figures out the level. This saves a
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke 33e19f7275 Recognize block-level elements independent of case; Ignore toc and section
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke 9ed9cb31bd Fix template argument handling somewhat. 2011-12-05 17:58:11 +00:00
Gabriel Wicke 1760210d13 Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Gabriel Wicke 63c728924b Use pegjs from npm 2011-12-01 15:23:23 +00:00
Antoine Musso 5ab379f479 fix vim modeline 2011-12-01 15:19:37 +00:00
Gabriel Wicke 0ce1e9fcf3 Add a quick html entity decoding hack, and document need for general decoder. 2011-12-01 14:39:55 +00:00
Gabriel Wicke d00743ad79 Improve external links and definition lists, now 133 tests passing ;)
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke 82e31ffd42 Do not allow newlines in various attributes 2011-11-30 15:12:53 +00:00
Gabriel Wicke 821162484e Allow inlines in the term part of ; term : definition 2011-11-30 14:53:28 +00:00
Gabriel Wicke f758894de7 Let another test pass by swapping the default order of italic/bold for '''''.
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke e0fca805a6 Expand tabs in grammar. 2011-11-30 13:42:26 +00:00
Gabriel Wicke 2bb512a4de A bit of tokenizer grammar clean-up and additional expected-html
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke 127d8c8621 Simplify DOM paragraph wrapping postprocessor 2011-11-30 12:28:45 +00:00
Gabriel Wicke f0edc5cb9a Fix a few more tests by allowing inline content inside links. 76 now passing. 2011-11-29 18:43:27 +00:00
Gabriel Wicke ae0b5f9af4 * Split paragraph handling between tokenizer and DOM postprocessor for better
html markup handling. 
* Remove global 'use strict' declarations from html5 parser. 
* Add trailing whitespace handling in dt

Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke b16c295b98 Consider dl as a block-level element. 2011-11-28 16:54:58 +00:00
Gabriel Wicke d3f0196df7 Add primitive HTML comparison to detect passing parser tests. The expected
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke 6b8c109cf0 Separate block-level tags in tokenizer to delimit inlines and avoid wrapping
block-level in paragraphs.
2011-11-25 17:41:26 +00:00
Gabriel Wicke 859379a635 Improvements to nowiki/pre interaction. Will need to distinguish block-level
tags from inline HTML tags next.
2011-11-25 15:02:44 +00:00
Gabriel Wicke dd5cd59ac6 Better HTML, pre and blocklevel handling. Hackish source formatting for easier
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Gabriel Wicke 5b3a4497aa Add generic HTML tokenization and nowiki handling. 2011-11-25 10:59:43 +00:00
Gabriel Wicke 6c36ddcbce Follow-up to r104164: Clean-up comments, remove old italic/bold productions. 2011-11-24 14:20:56 +00:00
Gabriel Wicke dee262658f Add MediaWiki-compatible quote handling including quirks and overlapped
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Gabriel Wicke baf55875b9 Re-add modified wiki list handling to tokenizer. 2011-11-23 14:27:51 +00:00
Gabriel Wicke 694b998f24 Minor improvement to italic/bold, documentation on failed modularization of
static parser functions.
2011-11-22 16:51:05 +00:00
Gabriel Wicke d1b0293569 Fix comment token conversion and serialization 2011-11-21 09:22:30 +00:00
Gabriel Wicke 65afd9b610 Improve internal link handling 2011-11-18 14:48:32 +00:00
Gabriel Wicke d744e65c48 Add missing token adapter. 2011-11-18 14:00:14 +00:00
Gabriel Wicke b750ce38b8 Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Gabriel Wicke 11e487d8c0 Flatten inline token lists before merging text into text tokens. 2011-11-17 15:43:31 +00:00
Gabriel Wicke ea87e7aaee Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Gabriel Wicke ef3c84bd2e Extract text from inline elements for better testing. Slightly improved
handling of comment-only lines. Change pre to leaf content model.
2011-11-08 16:08:05 +00:00
Gabriel Wicke 18ead89b37 Improved paragraph, br, comment parsing and switched headings to
generic inlineline with syntactic flags.
2011-11-07 23:09:30 +00:00
Gabriel Wicke 944d010eb2 Indentation cleanup in PEG parser and Html serializer 2011-11-07 21:05:37 +00:00
Gabriel Wicke c3a0c56e56 rename definition{term,description} to just {term,description} 2011-11-07 20:36:34 +00:00
Gabriel Wicke 71891131c3 Grammar improvements
* replaced regexp stack with a set of break rules for inline content within
  specialized parse contexts, switched more rules to generic
  inlineline/inline/block rules.
* don't consume end-of-line for proper start-of-line matching
* added some pre support
* still no conversion of inline elements to annotations
2011-11-07 14:39:12 +00:00