* [[:en:Barack Obama]] can now be expanded in 77 seconds using 330MB RAM,
while it would prevously run out of RAM after ~30 minutes. Wohoooo!
The token transform framework rework really paid off.
* 303 parser tests are passing in the new record time of 5.5 seconds. Two more
tests are passing since these tests expect the day of the week to be
Thursday. Won't be the case tomorrow.
Change-Id: I56e850838476b546df10c6a239c8c9e29a1a3136
* All parser pipelines including tokenizer and DOM stuff are now constructed
from a 'recipe' data structure in a ParserPipelineFactory.
* All sub-pipelines of these can now be cached
* Event registrations to a pipeline are directly forwarded to the last
pipeline member to save relatively expensive event forwarding.
* Some APIs for on-demand expansion / format conversion of parameters from
parser functions are added:
param.to('tokens/expanded', cb)
param.to('text/wiki', cb) (this does not work yet)
All parameters are additionally wrapped into a Param object that provides
method for positional parameter naming (.named() or conversion to a dict
(.dict()).
* The async token transform manager is now separated from a frame object, with
the frame holding arguments, an on-demand expansion method and loop checks.
* Only keys of template parameters are now expanded. Parser functions or
template arguments trigger an expansion on-demand. This (unsurprisingly)
makes a big performance difference with typical switch-heavy template
systems.
* Return values from async transforms are no longer used in favor of plain
callbacks. This saves the complication of having to maintain two code paths.
A trick in transformTokens still avoids the construction of unneeded
TokenAccumulators.
* The results of template expansions are no longer buffered.
* 301 parser tests are passing
Known issues:
* Cosmetic cleanup remains to do
* Some parser functions do not support async expansions yet, and need to be
modified.
Change-Id: I1a7690baffbe8141cadf67270904a1b2e1df879a
* less verbose logging in noinclude processing and template expansion
* Give priority to the processing of templates transcluded from transclusions
to get closer to depth-first processing. This serves to minimize memory
usage from queued-up tokens.
* Increase the maximum outstanding requests per template retrieval. 10000
amazingly proved too low a limit on some big pages.
* Only process a single template request callback at a time for now
* Add a debug print in the treebuilder wrapper
* Don't treat multiple comments on a single line as a single comment to match
the PHP parser's behavior
Change-Id: I9a86b6d7bec3b9e1f17415daf1bf74170240721a
Eat unbalanced external link parts within template parameters. This does not
produce the same output as the PHP parser
(try echo '{{YouTube}}' | node parse.js), but preserves a level of sanity.
Need to check how common this is for external links. If it is rare enough,
moving the ']' after the parser function manually would fix the rendering for
the YouTube case.
Change-Id: I597d808efff36baa22191e7946a0061cc31120e8
behavior switches are converted to tokens which set parser.environment flags during the async transformation stage.
The next step would be for handlers in the sync23 stage to generate the TOC, section edit links, and so on according to these directives.
No tests written, because the switches are consumed and don't appear in rendered html. We can test the magic word layout controls individually, once they're implemented.
Another small change was to store option flags directly in the environment object, not that it makes much difference.
Change-Id: I863fbf4be1a17d2f6c31158298dd301f19ae1137
Match pairs of {{!}} or | for template productions, but not a mix of the two.
Example:
{{#if:1|{{!}}-
{{!}} {{#if:1|style="color: red"{{!}}|}}
}}
Note that the style parameter ends up as the *key* of an empty-valued
attribute on the table cell currently.
Change-Id: I5f9357dd1645ef97b0af89f32e8d92ae49218c72
Parser functions which only accept positional arguments now return both the
key and value of arguments. Complete attributes (key and value) for templates
and the like from parser functions are not yet supported though.
Change-Id: I3f81bb35acd27186222ce6d5217e820042527c01
gets a bit closer to supporting table fragments passed through template
arguments. Next, we'll need a way to indicate start-of-line position to
enable sol block-levels in template parameters.
Example:
{|
{{#if: true|{{!}}Table cell|}}
|}
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation
Removed a few whitelisted test cases that are now passing directly.
The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
mediawiki.tokenizer.js module, and pass a reference to parse(). Faster
inline_breaks production using a JS function which seems to be generally
correct, but still breaks five tests when enabled. Seems to be some weird
interaction with peg.js, possibly something to do with caching.