Gabriel Wicke
c61b32eaa7
Clean up and comment the Cite extension a bit.
2011-12-13 18:45:09 +00:00
Gabriel Wicke
feee9ded9f
Convert the Cite extension to a token stream transformer.
...
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.
In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke
8e55e79b67
Rename TokenTransformer to TokenTransformDispatcher.
2011-12-13 11:45:12 +00:00
Gabriel Wicke
8231511217
Replace custom object copy with $.extend.
2011-12-13 11:18:15 +00:00
Gabriel Wicke
39aedd4378
Improve comments in QuoteTransformer.
2011-12-13 10:25:18 +00:00
Gabriel Wicke
0ad08b9ae3
Add a README file pointing to the wiki documentation.
2011-12-12 22:30:11 +00:00
Gabriel Wicke
a8fa9433c4
Convert quote handling (italic/bold) to a core extension operating on the
...
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.
The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke
752b0990b2
Refactor parserTests somewhat into a class-like structure, and wire up the
...
TokenTransformer.
2011-12-12 14:03:54 +00:00
Gabriel Wicke
d616f07a79
Don't re-build the wiki tokenizer for each test. This speeds up the full
...
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)
A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke
89c5e0cafb
Follow-up to r105859: Add missing new.
2011-12-12 10:09:13 +00:00
Gabriel Wicke
9ebce5839a
Further development of the TokenTransformer framework.
2011-12-12 10:01:47 +00:00
Gabriel Wicke
80d5067813
Add a TokenTransformer dispatcher class. This class provides subscriptions by
...
token type, and supports asynchronous token expansion (for example for async
template expansion). This code is not yet tested or used. The interface for
token insertion from transformation functions will be expanded as needed.
2011-12-08 14:37:31 +00:00
Gabriel Wicke
c2b69e2486
Clean up newline handling. Emit a NEWLINE token for each
...
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke
abc2254110
A bit of comment clean-up and wrapping of tree building into try/catch block
...
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke
92fdf99384
Further renaming, this time from pegParser to pegTokenizer.
2011-12-08 10:59:44 +00:00
Gabriel Wicke
76bc477038
Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
...
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke
19a1f0850f
Tidy up the grammar a bit.
2011-12-08 10:33:23 +00:00
Gabriel Wicke
3742d70abd
Add some documentation to syntax flags
2011-12-07 15:54:55 +00:00
Gabriel Wicke
545ca1809f
Convert template argument production to generic inline with syntactic stop.
...
Fix a bug in generic inline production. Nested multi-line templates are now
parsed okayish.
2011-12-07 15:39:39 +00:00
Gabriel Wicke
902db40a1f
Process template arguments into an object.
2011-12-07 14:46:07 +00:00
Gabriel Wicke
51a40e4dbc
Follow-up to r105423: Fix off-by-one bug.
2011-12-07 11:56:12 +00:00
Gabriel Wicke
49c286a67b
Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve
...
tag-only-line handling. 180 parser tests now passing.
2011-12-07 11:51:24 +00:00
Gabriel Wicke
418a5067c6
Parse attributes in tables using generic attribute production. Some table
...
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke
3d06707152
Slightly speed up inline tag productions using guards and grouping; Fix list
...
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke
ea8f226fd5
Remove ext and references special cases, now subsumed by generic XML tag
...
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke
e7de089d5b
Decode urls and html entities, 163 tests now passing.
2011-12-06 13:17:14 +00:00
Gabriel Wicke
a72a9e55a3
Don't match internal links with url as target. 161 passing.
2011-12-06 12:26:57 +00:00
Gabriel Wicke
2b5cc67bf5
Further tweaks to headings. 157 tests now passing.
2011-12-06 11:59:41 +00:00
Gabriel Wicke
f4d123886e
Convert heading rules to single rule that figures out the level. This saves a
...
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke
33e19f7275
Recognize block-level elements independent of case; Ignore toc and section
...
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke
9ed9cb31bd
Fix template argument handling somewhat.
2011-12-05 17:58:11 +00:00
Gabriel Wicke
1760210d13
Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
...
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Gabriel Wicke
63c728924b
Use pegjs from npm
2011-12-01 15:23:23 +00:00
Antoine Musso
5ab379f479
fix vim modeline
2011-12-01 15:19:37 +00:00
Gabriel Wicke
0ce1e9fcf3
Add a quick html entity decoding hack, and document need for general decoder.
2011-12-01 14:39:55 +00:00
Gabriel Wicke
d00743ad79
Improve external links and definition lists, now 133 tests passing ;)
...
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke
82e31ffd42
Do not allow newlines in various attributes
2011-11-30 15:12:53 +00:00
Gabriel Wicke
821162484e
Allow inlines in the term part of ; term : definition
2011-11-30 14:53:28 +00:00
Gabriel Wicke
f758894de7
Let another test pass by swapping the default order of italic/bold for '''''.
...
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke
e0fca805a6
Expand tabs in grammar.
2011-11-30 13:42:26 +00:00
Gabriel Wicke
2bb512a4de
A bit of tokenizer grammar clean-up and additional expected-html
...
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke
127d8c8621
Simplify DOM paragraph wrapping postprocessor
2011-11-30 12:28:45 +00:00
Gabriel Wicke
f0edc5cb9a
Fix a few more tests by allowing inline content inside links. 76 now passing.
2011-11-29 18:43:27 +00:00
Gabriel Wicke
ae0b5f9af4
* Split paragraph handling between tokenizer and DOM postprocessor for better
...
html markup handling.
* Remove global 'use strict' declarations from html5 parser.
* Add trailing whitespace handling in dt
Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke
b16c295b98
Consider dl as a block-level element.
2011-11-28 16:54:58 +00:00
Gabriel Wicke
d3f0196df7
Add primitive HTML comparison to detect passing parser tests. The expected
...
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke
6b8c109cf0
Separate block-level tags in tokenizer to delimit inlines and avoid wrapping
...
block-level in paragraphs.
2011-11-25 17:41:26 +00:00
Gabriel Wicke
859379a635
Improvements to nowiki/pre interaction. Will need to distinguish block-level
...
tags from inline HTML tags next.
2011-11-25 15:02:44 +00:00
Gabriel Wicke
dd5cd59ac6
Better HTML, pre and blocklevel handling. Hackish source formatting for easier
...
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Gabriel Wicke
5b3a4497aa
Add generic HTML tokenization and nowiki handling.
2011-11-25 10:59:43 +00:00
Gabriel Wicke
6c36ddcbce
Follow-up to r104164: Clean-up comments, remove old italic/bold productions.
2011-11-24 14:20:56 +00:00
Gabriel Wicke
dee262658f
Add MediaWiki-compatible quote handling including quirks and overlapped
...
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Gabriel Wicke
baf55875b9
Re-add modified wiki list handling to tokenizer.
2011-11-23 14:27:51 +00:00
Gabriel Wicke
694b998f24
Minor improvement to italic/bold, documentation on failed modularization of
...
static parser functions.
2011-11-22 16:51:05 +00:00
Gabriel Wicke
d1b0293569
Fix comment token conversion and serialization
2011-11-21 09:22:30 +00:00
Gabriel Wicke
65afd9b610
Improve internal link handling
2011-11-18 14:48:32 +00:00
Gabriel Wicke
d744e65c48
Add missing token adapter.
2011-11-18 14:00:14 +00:00
Gabriel Wicke
b750ce38b8
Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
...
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Gabriel Wicke
11e487d8c0
Flatten inline token lists before merging text into text tokens.
2011-11-17 15:43:31 +00:00
Gabriel Wicke
ea87e7aaee
Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
...
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Gabriel Wicke
ef3c84bd2e
Extract text from inline elements for better testing. Slightly improved
...
handling of comment-only lines. Change pre to leaf content model.
2011-11-08 16:08:05 +00:00
Gabriel Wicke
18ead89b37
Improved paragraph, br, comment parsing and switched headings to
...
generic inlineline with syntactic flags.
2011-11-07 23:09:30 +00:00
Gabriel Wicke
944d010eb2
Indentation cleanup in PEG parser and Html serializer
2011-11-07 21:05:37 +00:00
Gabriel Wicke
c3a0c56e56
rename definition{term,description} to just {term,description}
2011-11-07 20:36:34 +00:00
Gabriel Wicke
71891131c3
Grammar improvements
...
* replaced regexp stack with a set of break rules for inline content within
specialized parse contexts, switched more rules to generic
inlineline/inline/block rules.
* don't consume end-of-line for proper start-of-line matching
* added some pre support
* still no conversion of inline elements to annotations
2011-11-07 14:39:12 +00:00
Gabriel Wicke
06ca9f12fe
Rename definitiondata to definitiondescription, minor fixes
2011-11-04 12:25:01 +00:00
Gabriel Wicke
7e5c196732
Some more progress for tables and definition lists
2011-11-04 12:06:49 +00:00
Gabriel Wicke
83a80bad49
Fixes for definition lists
2011-11-04 11:08:11 +00:00
Gabriel Wicke
85def70a8a
Add basic list serialization to HtmlSerializer
...
* Added 'definitionterm' and 'definitiondata' styles to support definition
lists, and special-case handling in the serializer to wrap both in dls.
2011-11-04 10:02:59 +00:00
Gabriel Wicke
63398b5749
Update parserTests to latest serializers
2011-11-04 07:45:05 +00:00
Gabriel Wicke
a8838dab18
Start by handling paragraphs, at least a bit.
2011-11-03 15:16:05 +00:00
Gabriel Wicke
0d30a5528e
First combination of WikiDom serializers with existing parser in
...
tests/parser/parserTests.js.
* Removed var from es in es.js to allow node.js to access it as global. Only
alternative solution appears to be a node-specific 'exports' construct:
http://nodejs.org/docs/v0.3.1/api/modules.html
* Added es.Document.js and es.Document.Serializer.js in es/bases. Not sure if
this is the desired location.
* Changed es.extend to es.extendClass in the serializers
* Modified the first parser test to include the WikiDom modules and call the
new HTML serializer
2011-11-03 13:55:48 +00:00
Trevor Parscal
5bae153214
Moving parser stuff back into the modules folder (oops)
2011-11-02 21:45:57 +00:00
Trevor Parscal
2b499d5990
Reorganized modules by javascript namespace
2011-11-02 21:31:45 +00:00
Brion Vibber
213ee7d4a8
followup r101685: the peg definition
2011-11-02 21:09:19 +00:00
Brion Vibber
56a75ccca7
Copy several of the experimental JS parser bits from ParserPlayground to VisualEditor. They'll need retooling to hook up with the wikidom stuff.
2011-11-02 21:07:51 +00:00