Commit graph

78 commits

Author SHA1 Message Date
Gabriel Wicke 227103e12c Accept empty table cell attribute sections, and consider percent-encoded %2525
valid. 270 tests passing.
2012-03-06 14:32:45 +00:00
Gabriel Wicke 2efcd3cd57 Reworked percent encoding handling for URIs to get closer to the 'url
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation

Removed a few whitelisted test cases that are now passing directly.

The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
2012-03-06 13:49:37 +00:00
Gabriel Wicke a9ebc1d986 Support external images wrapped in a clickable link using bracketed external
link syntax. 265 tests passing.
2012-03-05 16:23:00 +00:00
Gabriel Wicke 7f7202e89c A few improvements to external link and image handling. 264 tests passing. 2012-03-05 15:34:27 +00:00
Gabriel Wicke 7b0c807710 Change wikilink tokenization strategy to split on pipes. This makes it
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
2012-03-05 12:00:38 +00:00
Gabriel Wicke 167dbdb0fa Parse image options. 2012-03-02 13:36:37 +00:00
Gabriel Wicke 8b7ba9051b Add productions for image option tokenization, and prepare to call those from
the LinkHandler token stream transformer.
2012-03-01 18:07:20 +00:00
Gabriel Wicke 4b9bd45b82 Start to move wikilink expansion to a separate async token transformer. 2012-02-29 13:56:29 +00:00
Gabriel Wicke b8bb503199 Actually commit onlyinclude, as already announced in r112592. 2012-02-28 13:24:35 +00:00
Gabriel Wicke 491ad5ffef Cleanup and commenting. 2012-02-22 13:13:18 +00:00
Gabriel Wicke 9b3313d923 Speed up flatten slightly by avoiding garbage for already flat arrays. Also,
use simple string concatenation instead of arrays as the strings tend to be
few and short.
2012-02-22 11:25:44 +00:00
Gabriel Wicke 8dde1f77b4 Reduce debug print overhead, roughly a 10% speed-up on parserTests. 2012-02-21 18:49:43 +00:00
Gabriel Wicke 058c4213a4 Remove some more unused code and tidy up some more. 2012-02-21 18:26:40 +00:00
Gabriel Wicke 416126c041 Fix the bug in the inline_breaks replacement, and write another switch-based
version, which is slightly faster and shorter. Performance is improved by
about 5% for parserTests.
2012-02-21 17:57:30 +00:00
Gabriel Wicke 18a04f7581 Tidy up and comment the tokenizer a bit more. Start to move code into
mediawiki.tokenizer.js module, and pass a reference to parse(). Faster
inline_breaks production using a JS function which seems to be generally
correct, but still breaks five tests when enabled. Seems to be some weird
interaction with peg.js, possibly something to do with caching.
2012-02-21 17:21:42 +00:00
Gabriel Wicke 8718bd65bc Add list of HTML5 and deprecated HTML3/4 elements in preparation for
end-of-potential-extension rules; Support indented tag-wrapped pre blocks.
2012-02-21 14:44:56 +00:00
Gabriel Wicke 059ff94bc4 Reject match for invalid urlencoded code points. 2012-02-16 13:57:56 +00:00
Gabriel Wicke dc1d30fcb5 Tweaked template parameters a bit further, and made the self-closing tag
protection a bit less trigger-happy.
2012-02-15 15:56:11 +00:00
Gabriel Wicke 089413298c Protect self-closing tags in generic attribute production. 2012-02-15 13:23:50 +00:00
Gabriel Wicke 5e94a238fc Prepare for the support of tables (and later generally block-level elements)
in template parameters. 244 tests passing.
2012-02-15 11:51:29 +00:00
Gabriel Wicke 774a3189c8 Improve support for generic attribute names coming from
templates/templateargs.
2012-02-15 10:19:39 +00:00
Gabriel Wicke 1ce6f5a3c4 Improve support for single-line attributes with preprocessor support. 243
tests passing.
2012-02-14 21:25:52 +00:00
Gabriel Wicke f02b3d91c6 Port urlencoded char support to preprocessor-supporting link target
production, and remove old link_target production.
2012-02-14 21:08:25 +00:00
Gabriel Wicke 001194b140 Replace console.log with console.warn in all debug statements 2012-02-14 20:56:14 +00:00
Gabriel Wicke f42b379e52 Fix named wikilink options (image options really) in template arguments, and
speed up template parameter parsing by eliminating some backtracking. 238
tests passing (unchanged).
2012-02-14 15:45:18 +00:00
Gabriel Wicke 0b8d1b0387 * Add custom toString methods for tokens to aid debugging
* Convert all attributes into strings in Sanitizer
* Use strict comparison against empty string in tokenizer
* Add very simple sitename parserfunction
* 138 tests passing
2012-02-13 17:02:23 +00:00
Gabriel Wicke 025f9cddb3 Prefix all internal data- attributes with data-mw- and adjust the whitelist
and test output normalization accordingly. 235 tests passing.
2012-02-13 13:54:07 +00:00
Gabriel Wicke b1617b1d71 Add some support for ideographic spaces in external links, support the
int: namespace alias and perform some normalization on the MediaWiki namespace
prefix.
2012-02-13 13:35:46 +00:00
Gabriel Wicke a122e51eec Move data-* annotations into separate object on tokens, that is then
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.

A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
2012-02-11 16:43:25 +00:00
Gabriel Wicke aff30be131 Some comments and reshuffling in the grammar, and a typo in the
AttributeExpander.
2012-02-09 22:27:45 +00:00
Gabriel Wicke 6e33255503 Improve support for preprocessor functionality in attributes; Support
multi-line xmlish tags with preprocessor stuff in attributes.
2012-02-09 16:36:29 +00:00
Gabriel Wicke 16ded7d955 Fix a bug in wikilink with trail tokenization. 2012-02-09 14:06:35 +00:00
Gabriel Wicke 3f7c1499cd Enable support for general preprocessor functionality in attribute keys and
values. This includes comments, templates and template arguments.

This also replaces the specialized expansion logic in the TemplateHandler. The
removal of link validation lets one more parser test fail for now. External
link target validation will need to be implemented in the token stream handler
for links. This is noted as TODO in
https://www.mediawiki.org/wiki/Future/Parser_development#Token_stream_transforms.
2012-02-08 15:10:30 +00:00
Gabriel Wicke 1f6db903e9 Pluck a few low-hanging fruit in external link tokenization, and add a simple
localurl parser function implementation. 230 parser tests now passing.
2012-02-07 10:28:23 +00:00
Gabriel Wicke cf8b7bf45d External links don't nest. 2012-02-07 09:38:28 +00:00
Gabriel Wicke 53bf4f2bd0 Temporarily disable the sanitizer and start to support preprocessor
functionality (comments, templates, template arguments) in arbitrary
attributes. The grammar for this is still quite rough, will need to
consolidate that area.
2012-02-06 19:15:44 +00:00
Gabriel Wicke 0bea9fdfbb Fix nowiki tokenization regression introduced r110495 2012-02-03 13:10:04 +00:00
Gabriel Wicke 8c75aa1a7a Remove type attribute for tag tokens. 2012-02-01 18:37:48 +00:00
Gabriel Wicke a5cc10a06b Change token format to plain strings for text tokens, and specific objects for
other tokens. This is only the first half of the conversion. The next step is
to drop the type attribute on most tokens and match on the constructor in the
token transform machinery.
2012-02-01 16:30:43 +00:00
Gabriel Wicke 14a8a13678 A few more debug helpers including a --trace mode for light debugging. Some
improvements to parser functions on the way to support the cite extensions.
Preparation for generic template and template arg in attribute support. 222
parser tests now passing.
2012-01-31 16:50:16 +00:00
Gabriel Wicke 7cd94df47d A few minor tweaks to reduce memory usage 2012-01-27 13:32:44 +00:00
Gabriel Wicke 4e6a54560a * Emit token chunks for top-level block elements by patching the source of the
tokenizer
* Fix a bug uncovered by this
* Increase the number of outstanding listeners on a single download to 10000
2012-01-22 23:21:53 +00:00
Gabriel Wicke 785a4af76f Implement a few parser functions. 220 parser tests now passing. 2012-01-21 20:38:13 +00:00
Gabriel Wicke 1a6546fbca Support empty template arguments and default values in arg expansion 2012-01-21 03:03:33 +00:00
Gabriel Wicke fdd048b3b2 Remove a few stray debug prints and disable debugging in parse.js 2012-01-20 22:21:33 +00:00
Gabriel Wicke 145df2655c * NoInclude and IncludeOnly improvements
* Tokenizer support for templates and template args in template arguments and titles
* Async attribute expansion fixes
2012-01-20 22:02:23 +00:00
Gabriel Wicke 336be4f617 Eat '[[[' as plain text token, makes it 212 passing. 2012-01-18 00:23:17 +00:00
Gabriel Wicke 178adbc342 Accept IPv6 (and IPv4) addresses in the tokenizer, so another test passes. 2012-01-18 00:00:47 +00:00
Gabriel Wicke e7381da5b8 Trim whitespace off template titles and argument names. 209 parser tests now
passing.
2012-01-17 23:18:33 +00:00
Gabriel Wicke f50fecf1e3 Fix template argument expansion. 200 parser tests now passing. 2012-01-17 22:29:26 +00:00