Commit graph

247 commits

Author SHA1 Message Date
Gabriel Wicke 7e22020398 Convert syntactical break flags for templates from counters to the stack
variant to fix the precedence for {{!}} (break on these inside table content,
but not in template options within tables).
2012-03-14 16:30:59 +00:00
Gabriel Wicke 77a61dd687 Improve support for {{!}}, and don't produce a pre for indented tables. 2012-03-14 10:58:11 +00:00
Gabriel Wicke 835914b2de Support {{=}}. 2012-03-14 09:07:01 +00:00
Gabriel Wicke 2195c31abf Move link types to data-mw-rt, and support some more template tokenization
edge cases. For example, the PHP parser treats | foo | = bar | as | foo = bar |,
believe it or not ;)
2012-03-13 12:32:31 +00:00
Gabriel Wicke 4cd8b302ac Improved template tokenization. The parser can now template-expand
[[:en:Barack Obama]] without exceeding 1.7GB of memory (which is the node
limit).
2012-03-12 17:31:45 +00:00
Gabriel Wicke 3c5fe2523c Tolerate more newlines and spaces in templates, and support templates and
comments in urls.
2012-03-12 14:31:06 +00:00
Gabriel Wicke ae4ab7a39c Refactor syntactic stops into an object and add a stack variant for option
values.
2012-03-12 13:08:43 +00:00
Roan Kattouw 29f416937e Fix some usages of splice.apply in the data model to use
ve.batchedSplice(). Added FIXME comments for occurrences outside of DM
2012-03-10 00:31:28 +00:00
Gabriel Wicke ffc9383096 Temporary fix for template tokenization, especially needed for
[[Template:Cite core]].
2012-03-08 14:24:04 +00:00
Gabriel Wicke 39017dd769 Percent-encode spaces in URLs, so that they are recognized as valid URLs later
on.
2012-03-08 11:53:15 +00:00
Gabriel Wicke 7518db8197 A few fixes to parser functions and template expansion. Trim whitespace off
template arguments, let the last duplicate key win and fake pagenamee slightly
better.
2012-03-08 11:44:37 +00:00
Gabriel Wicke 51023feaa4 Improvements for image option handling. 2012-03-08 10:03:22 +00:00
Gabriel Wicke b1e131d568 A bit more documentation and naming cleanup in the tokenizer wrapper. 2012-03-08 09:00:45 +00:00
Gabriel Wicke f02ff95aa3 Token representation clean-up. Now all tokens are differentiated using
constructors instead of type attributes.
2012-03-07 20:06:54 +00:00
Gabriel Wicke f157093a41 Delegate responsibility for resetting the token rank to transforms, if full
re-processing in a phase is wanted. By default, after a token type change or
the return of multiple tokens only the remaining transforms with higher ranks
are applied.

Updated a few comments as well.
2012-03-07 19:29:53 +00:00
Gabriel Wicke 1f8c43b9e2 A few minor documentation updates. 2012-03-07 18:42:26 +00:00
Gabriel Wicke 5f618103d7 Set allTokensProcessed flag for async callbacks from the template expander. 2012-03-07 17:36:33 +00:00
Gabriel Wicke e5a1116817 Start re-transformation as soon as possible in TokenAccumulator._returnTokens
to maximize IO concurrency. Signal that all tokens are fully transformed to
callbacks called from TokenAccumulator._returnTokens. The result should be a
single re-transformation when entering the callback chain, and only if the
transform does not signal that it took care of full transformation itself.
Template expansion would set this flag, as the nested transform pipeline
processes all tokens to the end of phase async12.
2012-03-07 16:29:06 +00:00
Gabriel Wicke 656524dbbc Fixes for multi-transformer expansion in AsyncTransformManager. Added argument
to callback which lets transforms indicate if their returned tokens are fully
processed for their phase. If not, the callback re-processes them so that any
remaining transforms are applied.
2012-03-07 15:39:18 +00:00
Gabriel Wicke af03eb4f29 Improve generic attribute expansion before external link processing, and make
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
2012-03-06 18:02:35 +00:00
Gabriel Wicke 227103e12c Accept empty table cell attribute sections, and consider percent-encoded %2525
valid. 270 tests passing.
2012-03-06 14:32:45 +00:00
Gabriel Wicke 2efcd3cd57 Reworked percent encoding handling for URIs to get closer to the 'url
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation

Removed a few whitelisted test cases that are now passing directly.

The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
2012-03-06 13:49:37 +00:00
Gabriel Wicke 19fe9726a2 Fix invalid external link representation. 268 tests passing. 2012-03-05 18:06:29 +00:00
Gabriel Wicke a9ebc1d986 Support external images wrapped in a clickable link using bracketed external
link syntax. 265 tests passing.
2012-03-05 16:23:00 +00:00
Gabriel Wicke 7f7202e89c A few improvements to external link and image handling. 264 tests passing. 2012-03-05 15:34:27 +00:00
Gabriel Wicke 7b0c807710 Change wikilink tokenization strategy to split on pipes. This makes it
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
2012-03-05 12:00:38 +00:00
Gabriel Wicke 3e6f1b6bea Use some options primitively. 2012-03-02 14:19:33 +00:00
Gabriel Wicke 167dbdb0fa Parse image options. 2012-03-02 13:36:37 +00:00
Gabriel Wicke 8b7ba9051b Add productions for image option tokenization, and prepare to call those from
the LinkHandler token stream transformer.
2012-03-01 18:07:20 +00:00
Gabriel Wicke b1a7119a46 Hack up some rudimentary image rendering. Using jshashes for the md5, and
a few hard-coded image image sizes ;) 262 tests passing.
2012-03-01 13:51:53 +00:00
Gabriel Wicke d4faf9eaf4 More work on wiki link rendering and general wiki title / namespace
functionality.
2012-03-01 12:47:05 +00:00
Gabriel Wicke 4b9bd45b82 Start to move wikilink expansion to a separate async token transformer. 2012-02-29 13:56:29 +00:00
Gabriel Wicke b8bb503199 Actually commit onlyinclude, as already announced in r112592. 2012-02-28 13:24:35 +00:00
Gabriel Wicke 3227903d48 Follow-up to r112116, accidentally committed from subdirectory. 2012-02-22 16:41:01 +00:00
Gabriel Wicke 3568dfee14 Add some support for functionhooks in test parser and parserTests.js, and
tweak a few parser functions.
2012-02-22 15:59:11 +00:00
Gabriel Wicke d7da324272 Basic fall-through support for #switch parser function 2012-02-22 14:57:50 +00:00
Gabriel Wicke 491ad5ffef Cleanup and commenting. 2012-02-22 13:13:18 +00:00
Gabriel Wicke 9b3313d923 Speed up flatten slightly by avoiding garbage for already flat arrays. Also,
use simple string concatenation instead of arrays as the strings tend to be
few and short.
2012-02-22 11:25:44 +00:00
Gabriel Wicke 8dde1f77b4 Reduce debug print overhead, roughly a 10% speed-up on parserTests. 2012-02-21 18:49:43 +00:00
Gabriel Wicke 058c4213a4 Remove some more unused code and tidy up some more. 2012-02-21 18:26:40 +00:00
Gabriel Wicke 416126c041 Fix the bug in the inline_breaks replacement, and write another switch-based
version, which is slightly faster and shorter. Performance is improved by
about 5% for parserTests.
2012-02-21 17:57:30 +00:00
Gabriel Wicke 18a04f7581 Tidy up and comment the tokenizer a bit more. Start to move code into
mediawiki.tokenizer.js module, and pass a reference to parse(). Faster
inline_breaks production using a JS function which seems to be generally
correct, but still breaks five tests when enabled. Seems to be some weird
interaction with peg.js, possibly something to do with caching.
2012-02-21 17:21:42 +00:00
Gabriel Wicke 8718bd65bc Add list of HTML5 and deprecated HTML3/4 elements in preparation for
end-of-potential-extension rules; Support indented tag-wrapped pre blocks.
2012-02-21 14:44:56 +00:00
Gabriel Wicke ffec77273a Comment and minor code tweaks. 2012-02-21 11:24:20 +00:00
au ea15bffb27 Revert "* Always sort attributes (+1 test pass)."
This reverts commit 45ca281da8eef8030bdd1986418cb914fc9a717c.
2012-02-20 22:26:12 +00:00
Gabriel Wicke 5806705733 Push transformer setup a bit further into the attribute pipeline. 2012-02-20 12:56:00 +00:00
Gabriel Wicke 8eddb4ec6b Add some comments to the Sanitizer 2012-02-20 11:14:53 +00:00
Gabriel Wicke 71e95bd54b Set up token stream transformers from a map of phases per input content type.
Not yet applied to attribute pipeline creation. 249 tests passing.
2012-02-20 11:07:21 +00:00
au 9c55f5e8b7 * Always sort attributes (+1 test pass).
The performance impact for .sort is quite small (12.079s => 12.158s)
  and Sanitizer is probably one of the more accessible places to do this.
2012-02-18 21:01:07 +00:00
au aa589d989b * Rudimentary CSS validation; +4 tests pass. (Bug 2304, 3244). 2012-02-18 20:16:23 +00:00