Gabriel Wicke
c2b69e2486
Clean up newline handling. Emit a NEWLINE token for each
...
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke
abc2254110
A bit of comment clean-up and wrapping of tree building into try/catch block
...
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke
92fdf99384
Further renaming, this time from pegParser to pegTokenizer.
2011-12-08 10:59:44 +00:00
Gabriel Wicke
76bc477038
Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
...
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke
19a1f0850f
Tidy up the grammar a bit.
2011-12-08 10:33:23 +00:00
Gabriel Wicke
3742d70abd
Add some documentation to syntax flags
2011-12-07 15:54:55 +00:00
Gabriel Wicke
545ca1809f
Convert template argument production to generic inline with syntactic stop.
...
Fix a bug in generic inline production. Nested multi-line templates are now
parsed okayish.
2011-12-07 15:39:39 +00:00
Gabriel Wicke
902db40a1f
Process template arguments into an object.
2011-12-07 14:46:07 +00:00
Gabriel Wicke
51a40e4dbc
Follow-up to r105423: Fix off-by-one bug.
2011-12-07 11:56:12 +00:00
Gabriel Wicke
49c286a67b
Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve
...
tag-only-line handling. 180 parser tests now passing.
2011-12-07 11:51:24 +00:00
Gabriel Wicke
418a5067c6
Parse attributes in tables using generic attribute production. Some table
...
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke
3d06707152
Slightly speed up inline tag productions using guards and grouping; Fix list
...
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke
ea8f226fd5
Remove ext and references special cases, now subsumed by generic XML tag
...
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke
e7de089d5b
Decode urls and html entities, 163 tests now passing.
2011-12-06 13:17:14 +00:00
Gabriel Wicke
a72a9e55a3
Don't match internal links with url as target. 161 passing.
2011-12-06 12:26:57 +00:00
Gabriel Wicke
2b5cc67bf5
Further tweaks to headings. 157 tests now passing.
2011-12-06 11:59:41 +00:00
Gabriel Wicke
f4d123886e
Convert heading rules to single rule that figures out the level. This saves a
...
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke
33e19f7275
Recognize block-level elements independent of case; Ignore toc and section
...
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke
9ed9cb31bd
Fix template argument handling somewhat.
2011-12-05 17:58:11 +00:00
Gabriel Wicke
1760210d13
Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
...
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Gabriel Wicke
63c728924b
Use pegjs from npm
2011-12-01 15:23:23 +00:00
Antoine Musso
5ab379f479
fix vim modeline
2011-12-01 15:19:37 +00:00
Gabriel Wicke
0ce1e9fcf3
Add a quick html entity decoding hack, and document need for general decoder.
2011-12-01 14:39:55 +00:00
Gabriel Wicke
d00743ad79
Improve external links and definition lists, now 133 tests passing ;)
...
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke
82e31ffd42
Do not allow newlines in various attributes
2011-11-30 15:12:53 +00:00
Gabriel Wicke
821162484e
Allow inlines in the term part of ; term : definition
2011-11-30 14:53:28 +00:00
Gabriel Wicke
f758894de7
Let another test pass by swapping the default order of italic/bold for '''''.
...
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke
e0fca805a6
Expand tabs in grammar.
2011-11-30 13:42:26 +00:00
Gabriel Wicke
2bb512a4de
A bit of tokenizer grammar clean-up and additional expected-html
...
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke
127d8c8621
Simplify DOM paragraph wrapping postprocessor
2011-11-30 12:28:45 +00:00
Gabriel Wicke
f0edc5cb9a
Fix a few more tests by allowing inline content inside links. 76 now passing.
2011-11-29 18:43:27 +00:00
Gabriel Wicke
ae0b5f9af4
* Split paragraph handling between tokenizer and DOM postprocessor for better
...
html markup handling.
* Remove global 'use strict' declarations from html5 parser.
* Add trailing whitespace handling in dt
Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke
b16c295b98
Consider dl as a block-level element.
2011-11-28 16:54:58 +00:00
Gabriel Wicke
d3f0196df7
Add primitive HTML comparison to detect passing parser tests. The expected
...
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke
6b8c109cf0
Separate block-level tags in tokenizer to delimit inlines and avoid wrapping
...
block-level in paragraphs.
2011-11-25 17:41:26 +00:00
Gabriel Wicke
859379a635
Improvements to nowiki/pre interaction. Will need to distinguish block-level
...
tags from inline HTML tags next.
2011-11-25 15:02:44 +00:00
Gabriel Wicke
dd5cd59ac6
Better HTML, pre and blocklevel handling. Hackish source formatting for easier
...
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Gabriel Wicke
5b3a4497aa
Add generic HTML tokenization and nowiki handling.
2011-11-25 10:59:43 +00:00
Gabriel Wicke
6c36ddcbce
Follow-up to r104164: Clean-up comments, remove old italic/bold productions.
2011-11-24 14:20:56 +00:00
Gabriel Wicke
dee262658f
Add MediaWiki-compatible quote handling including quirks and overlapped
...
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Gabriel Wicke
baf55875b9
Re-add modified wiki list handling to tokenizer.
2011-11-23 14:27:51 +00:00
Gabriel Wicke
694b998f24
Minor improvement to italic/bold, documentation on failed modularization of
...
static parser functions.
2011-11-22 16:51:05 +00:00
Gabriel Wicke
d1b0293569
Fix comment token conversion and serialization
2011-11-21 09:22:30 +00:00
Gabriel Wicke
65afd9b610
Improve internal link handling
2011-11-18 14:48:32 +00:00
Gabriel Wicke
d744e65c48
Add missing token adapter.
2011-11-18 14:00:14 +00:00
Gabriel Wicke
b750ce38b8
Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
...
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Gabriel Wicke
11e487d8c0
Flatten inline token lists before merging text into text tokens.
2011-11-17 15:43:31 +00:00
Gabriel Wicke
ea87e7aaee
Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
...
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Gabriel Wicke
ef3c84bd2e
Extract text from inline elements for better testing. Slightly improved
...
handling of comment-only lines. Change pre to leaf content model.
2011-11-08 16:08:05 +00:00
Gabriel Wicke
18ead89b37
Improved paragraph, br, comment parsing and switched headings to
...
generic inlineline with syntactic flags.
2011-11-07 23:09:30 +00:00
Gabriel Wicke
944d010eb2
Indentation cleanup in PEG parser and Html serializer
2011-11-07 21:05:37 +00:00
Gabriel Wicke
c3a0c56e56
rename definition{term,description} to just {term,description}
2011-11-07 20:36:34 +00:00
Gabriel Wicke
71891131c3
Grammar improvements
...
* replaced regexp stack with a set of break rules for inline content within
specialized parse contexts, switched more rules to generic
inlineline/inline/block rules.
* don't consume end-of-line for proper start-of-line matching
* added some pre support
* still no conversion of inline elements to annotations
2011-11-07 14:39:12 +00:00
Gabriel Wicke
06ca9f12fe
Rename definitiondata to definitiondescription, minor fixes
2011-11-04 12:25:01 +00:00
Gabriel Wicke
7e5c196732
Some more progress for tables and definition lists
2011-11-04 12:06:49 +00:00
Gabriel Wicke
83a80bad49
Fixes for definition lists
2011-11-04 11:08:11 +00:00
Gabriel Wicke
85def70a8a
Add basic list serialization to HtmlSerializer
...
* Added 'definitionterm' and 'definitiondata' styles to support definition
lists, and special-case handling in the serializer to wrap both in dls.
2011-11-04 10:02:59 +00:00
Gabriel Wicke
63398b5749
Update parserTests to latest serializers
2011-11-04 07:45:05 +00:00
Gabriel Wicke
a8838dab18
Start by handling paragraphs, at least a bit.
2011-11-03 15:16:05 +00:00
Gabriel Wicke
0d30a5528e
First combination of WikiDom serializers with existing parser in
...
tests/parser/parserTests.js.
* Removed var from es in es.js to allow node.js to access it as global. Only
alternative solution appears to be a node-specific 'exports' construct:
http://nodejs.org/docs/v0.3.1/api/modules.html
* Added es.Document.js and es.Document.Serializer.js in es/bases. Not sure if
this is the desired location.
* Changed es.extend to es.extendClass in the serializers
* Modified the first parser test to include the WikiDom modules and call the
new HTML serializer
2011-11-03 13:55:48 +00:00
Trevor Parscal
5bae153214
Moving parser stuff back into the modules folder (oops)
2011-11-02 21:45:57 +00:00
Trevor Parscal
2b499d5990
Reorganized modules by javascript namespace
2011-11-02 21:31:45 +00:00
Brion Vibber
213ee7d4a8
followup r101685: the peg definition
2011-11-02 21:09:19 +00:00
Brion Vibber
56a75ccca7
Copy several of the experimental JS parser bits from ParserPlayground to VisualEditor. They'll need retooling to hook up with the wikidom stuff.
2011-11-02 21:07:51 +00:00