Gabriel Wicke
80d5067813
Add a TokenTransformer dispatcher class. This class provides subscriptions by
...
token type, and supports asynchronous token expansion (for example for async
template expansion). This code is not yet tested or used. The interface for
token insertion from transformation functions will be expanded as needed.
2011-12-08 14:37:31 +00:00
Gabriel Wicke
c2b69e2486
Clean up newline handling. Emit a NEWLINE token for each
...
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke
abc2254110
A bit of comment clean-up and wrapping of tree building into try/catch block
...
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke
92fdf99384
Further renaming, this time from pegParser to pegTokenizer.
2011-12-08 10:59:44 +00:00
Gabriel Wicke
76bc477038
Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
...
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke
19a1f0850f
Tidy up the grammar a bit.
2011-12-08 10:33:23 +00:00
Gabriel Wicke
3742d70abd
Add some documentation to syntax flags
2011-12-07 15:54:55 +00:00
Gabriel Wicke
545ca1809f
Convert template argument production to generic inline with syntactic stop.
...
Fix a bug in generic inline production. Nested multi-line templates are now
parsed okayish.
2011-12-07 15:39:39 +00:00
Gabriel Wicke
902db40a1f
Process template arguments into an object.
2011-12-07 14:46:07 +00:00
Gabriel Wicke
51a40e4dbc
Follow-up to r105423: Fix off-by-one bug.
2011-12-07 11:56:12 +00:00
Gabriel Wicke
49c286a67b
Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve
...
tag-only-line handling. 180 parser tests now passing.
2011-12-07 11:51:24 +00:00
Gabriel Wicke
418a5067c6
Parse attributes in tables using generic attribute production. Some table
...
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke
3d06707152
Slightly speed up inline tag productions using guards and grouping; Fix list
...
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke
ea8f226fd5
Remove ext and references special cases, now subsumed by generic XML tag
...
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke
e7de089d5b
Decode urls and html entities, 163 tests now passing.
2011-12-06 13:17:14 +00:00
Gabriel Wicke
a72a9e55a3
Don't match internal links with url as target. 161 passing.
2011-12-06 12:26:57 +00:00
Gabriel Wicke
2b5cc67bf5
Further tweaks to headings. 157 tests now passing.
2011-12-06 11:59:41 +00:00
Gabriel Wicke
f4d123886e
Convert heading rules to single rule that figures out the level. This saves a
...
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke
33e19f7275
Recognize block-level elements independent of case; Ignore toc and section
...
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke
9ed9cb31bd
Fix template argument handling somewhat.
2011-12-05 17:58:11 +00:00
Gabriel Wicke
1760210d13
Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
...
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Gabriel Wicke
63c728924b
Use pegjs from npm
2011-12-01 15:23:23 +00:00
Antoine Musso
5ab379f479
fix vim modeline
2011-12-01 15:19:37 +00:00
Gabriel Wicke
0ce1e9fcf3
Add a quick html entity decoding hack, and document need for general decoder.
2011-12-01 14:39:55 +00:00
Gabriel Wicke
d00743ad79
Improve external links and definition lists, now 133 tests passing ;)
...
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke
82e31ffd42
Do not allow newlines in various attributes
2011-11-30 15:12:53 +00:00
Gabriel Wicke
821162484e
Allow inlines in the term part of ; term : definition
2011-11-30 14:53:28 +00:00
Gabriel Wicke
f758894de7
Let another test pass by swapping the default order of italic/bold for '''''.
...
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke
e0fca805a6
Expand tabs in grammar.
2011-11-30 13:42:26 +00:00
Gabriel Wicke
2bb512a4de
A bit of tokenizer grammar clean-up and additional expected-html
...
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke
127d8c8621
Simplify DOM paragraph wrapping postprocessor
2011-11-30 12:28:45 +00:00
Gabriel Wicke
f0edc5cb9a
Fix a few more tests by allowing inline content inside links. 76 now passing.
2011-11-29 18:43:27 +00:00
Gabriel Wicke
ae0b5f9af4
* Split paragraph handling between tokenizer and DOM postprocessor for better
...
html markup handling.
* Remove global 'use strict' declarations from html5 parser.
* Add trailing whitespace handling in dt
Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke
b16c295b98
Consider dl as a block-level element.
2011-11-28 16:54:58 +00:00
Gabriel Wicke
d3f0196df7
Add primitive HTML comparison to detect passing parser tests. The expected
...
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke
6b8c109cf0
Separate block-level tags in tokenizer to delimit inlines and avoid wrapping
...
block-level in paragraphs.
2011-11-25 17:41:26 +00:00
Gabriel Wicke
859379a635
Improvements to nowiki/pre interaction. Will need to distinguish block-level
...
tags from inline HTML tags next.
2011-11-25 15:02:44 +00:00
Gabriel Wicke
dd5cd59ac6
Better HTML, pre and blocklevel handling. Hackish source formatting for easier
...
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Gabriel Wicke
5b3a4497aa
Add generic HTML tokenization and nowiki handling.
2011-11-25 10:59:43 +00:00
Gabriel Wicke
6c36ddcbce
Follow-up to r104164: Clean-up comments, remove old italic/bold productions.
2011-11-24 14:20:56 +00:00
Gabriel Wicke
dee262658f
Add MediaWiki-compatible quote handling including quirks and overlapped
...
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Gabriel Wicke
baf55875b9
Re-add modified wiki list handling to tokenizer.
2011-11-23 14:27:51 +00:00
Gabriel Wicke
694b998f24
Minor improvement to italic/bold, documentation on failed modularization of
...
static parser functions.
2011-11-22 16:51:05 +00:00
Gabriel Wicke
d1b0293569
Fix comment token conversion and serialization
2011-11-21 09:22:30 +00:00
Gabriel Wicke
65afd9b610
Improve internal link handling
2011-11-18 14:48:32 +00:00
Gabriel Wicke
d744e65c48
Add missing token adapter.
2011-11-18 14:00:14 +00:00
Gabriel Wicke
b750ce38b8
Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
...
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Gabriel Wicke
11e487d8c0
Flatten inline token lists before merging text into text tokens.
2011-11-17 15:43:31 +00:00
Gabriel Wicke
ea87e7aaee
Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
...
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Gabriel Wicke
ef3c84bd2e
Extract text from inline elements for better testing. Slightly improved
...
handling of comment-only lines. Change pre to leaf content model.
2011-11-08 16:08:05 +00:00