Commit graph

22 commits

Author SHA1 Message Date
Gabriel Wicke bd98eb4c5a Land big TokenTransformDispatcher and eventization refactoring.
The TokenTransformDispatcher now actually implements an asynchronous, phased
token transformation framework as described in
https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations.

Additionally, the parser pipeline is now mostly held together using events.
The tokenizer still emits a lame single events with all tokens, as block-level
emission failed with scoping issues specific to the PEGJS parser generator.
All stages clean up when receiving the end tokens, so that the full pipeline
can be used for repeated parsing.

The QuoteTransformer is not yet 100% fixed to work with the new interface, and
the Cite extension is disabled for now pending adaptation. Bold-italic related
tests are failing currently.
2012-01-03 18:44:31 +00:00
Gabriel Wicke 8e00a72d0a Improvements to link trail handling, and two tweaks to the whitelist. 182
tests now passing. 

Link trails depend on language-dependent positive character classes in the PHP
parser. These classes all seem to disallow punctuation implicitly and list
differing plain text characters instead, so it might be possible to get away
with identifying a common class of non-trail punctuation instead. This would
help to keep the tokenizer independent of configurations, which is very
desirable for caching and simplified external parsing.
2011-12-30 12:47:06 +00:00
Gabriel Wicke 11ece76b7b Fix suffix handling for wiki links. 2011-12-30 09:35:57 +00:00
Gabriel Wicke 33e60dd4d9 Update comments a bit. 2011-12-22 12:37:24 +00:00
Gabriel Wicke 9ee0e660ec Fix regression introduced by r107060 for regular table cells. Good to have a
test suite ;)
2011-12-22 12:09:25 +00:00
Gabriel Wicke a94d0ec10c Re-add support for row-only tables. 2011-12-22 11:58:32 +00:00
Gabriel Wicke 1c7fe0eb34 Refactor table productions to support table fragments in templates (table
start / row / table end). The old productions are not deleted yet to make it
easy to compare the output on more complex articles. 181 tests passing after
adding two table tests with whitespace-only differences to the whitelist.
2011-12-22 11:43:55 +00:00
Gabriel Wicke 2845ba9552 Handle noinclude and includeonly at start of line, so that syntax after it
still matches as if it actually was preceded by a newline.
2011-12-21 11:38:50 +00:00
Gabriel Wicke cc06551f2e Rename table_header production to table_heading. Those non-natives strike again. 2011-12-16 19:24:59 +00:00
Gabriel Wicke 605ed23fd2 Fix attributes in table headings. 2011-12-16 19:22:13 +00:00
Gabriel Wicke a04744b2ec Add some more attribute remapping capabilities to the DOMConverter, and clean
up some grammar formatting.
2011-12-15 17:33:07 +00:00
Gabriel Wicke 3585bd9c8e Accept row-only tables. The parser now eats [[en:Barack Obama]] as-is. Hooray! 2011-12-15 00:39:28 +00:00
Gabriel Wicke 6df94a34a1 Less lust for urls 2011-12-15 00:26:22 +00:00
Gabriel Wicke ce2ee067f7 Minor tweak to wiki link production 2011-12-15 00:12:58 +00:00
Gabriel Wicke 574abd9774 A collection of small bug fixes to the grammar, Cite, the Token format
converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all
parserTests.
2011-12-14 23:38:46 +00:00
Gabriel Wicke dc77d73ad5 Add ability to pass through JSON data to WikiDom in data-json-* attributes,
and fix parser to actually parse the Barack Obama article except for one table
with nested templates at the start-of-line.
2011-12-14 17:25:09 +00:00
Gabriel Wicke feee9ded9f Convert the Cite extension to a token stream transformer.
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.

In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke a8fa9433c4 Convert quote handling (italic/bold) to a core extension operating on the
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.

The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke d616f07a79 Don't re-build the wiki tokenizer for each test. This speeds up the full
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)

A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke c2b69e2486 Clean up newline handling. Emit a NEWLINE token for each
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke abc2254110 A bit of comment clean-up and wrapping of tree building into try/catch block
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke 92fdf99384 Further renaming, this time from pegParser to pegTokenizer. 2011-12-08 10:59:44 +00:00
Renamed from modules/parser/pegParser.pegjs.txt (Browse further)