Commit graph

282 commits

Author SHA1 Message Date
Gabriel Wicke 1f8c43b9e2 A few minor documentation updates. 2012-03-07 18:42:26 +00:00
Gabriel Wicke 5f618103d7 Set allTokensProcessed flag for async callbacks from the template expander. 2012-03-07 17:36:33 +00:00
Gabriel Wicke e5a1116817 Start re-transformation as soon as possible in TokenAccumulator._returnTokens
to maximize IO concurrency. Signal that all tokens are fully transformed to
callbacks called from TokenAccumulator._returnTokens. The result should be a
single re-transformation when entering the callback chain, and only if the
transform does not signal that it took care of full transformation itself.
Template expansion would set this flag, as the nested transform pipeline
processes all tokens to the end of phase async12.
2012-03-07 16:29:06 +00:00
Gabriel Wicke 656524dbbc Fixes for multi-transformer expansion in AsyncTransformManager. Added argument
to callback which lets transforms indicate if their returned tokens are fully
processed for their phase. If not, the callback re-processes them so that any
remaining transforms are applied.
2012-03-07 15:39:18 +00:00
Gabriel Wicke af03eb4f29 Improve generic attribute expansion before external link processing, and make
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
2012-03-06 18:02:35 +00:00
Gabriel Wicke 227103e12c Accept empty table cell attribute sections, and consider percent-encoded %2525
valid. 270 tests passing.
2012-03-06 14:32:45 +00:00
Gabriel Wicke 2efcd3cd57 Reworked percent encoding handling for URIs to get closer to the 'url
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation

Removed a few whitelisted test cases that are now passing directly.

The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
2012-03-06 13:49:37 +00:00
Gabriel Wicke 19fe9726a2 Fix invalid external link representation. 268 tests passing. 2012-03-05 18:06:29 +00:00
Gabriel Wicke a9ebc1d986 Support external images wrapped in a clickable link using bracketed external
link syntax. 265 tests passing.
2012-03-05 16:23:00 +00:00
Gabriel Wicke 7f7202e89c A few improvements to external link and image handling. 264 tests passing. 2012-03-05 15:34:27 +00:00
Gabriel Wicke 7b0c807710 Change wikilink tokenization strategy to split on pipes. This makes it
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
2012-03-05 12:00:38 +00:00
Gabriel Wicke 3e6f1b6bea Use some options primitively. 2012-03-02 14:19:33 +00:00
Gabriel Wicke 167dbdb0fa Parse image options. 2012-03-02 13:36:37 +00:00
Gabriel Wicke 8b7ba9051b Add productions for image option tokenization, and prepare to call those from
the LinkHandler token stream transformer.
2012-03-01 18:07:20 +00:00
Gabriel Wicke b1a7119a46 Hack up some rudimentary image rendering. Using jshashes for the md5, and
a few hard-coded image image sizes ;) 262 tests passing.
2012-03-01 13:51:53 +00:00
Gabriel Wicke d4faf9eaf4 More work on wiki link rendering and general wiki title / namespace
functionality.
2012-03-01 12:47:05 +00:00
Gabriel Wicke 4b9bd45b82 Start to move wikilink expansion to a separate async token transformer. 2012-02-29 13:56:29 +00:00
Gabriel Wicke b8bb503199 Actually commit onlyinclude, as already announced in r112592. 2012-02-28 13:24:35 +00:00
Gabriel Wicke 3227903d48 Follow-up to r112116, accidentally committed from subdirectory. 2012-02-22 16:41:01 +00:00
Gabriel Wicke 3568dfee14 Add some support for functionhooks in test parser and parserTests.js, and
tweak a few parser functions.
2012-02-22 15:59:11 +00:00
Gabriel Wicke d7da324272 Basic fall-through support for #switch parser function 2012-02-22 14:57:50 +00:00
Gabriel Wicke 491ad5ffef Cleanup and commenting. 2012-02-22 13:13:18 +00:00
Gabriel Wicke 9b3313d923 Speed up flatten slightly by avoiding garbage for already flat arrays. Also,
use simple string concatenation instead of arrays as the strings tend to be
few and short.
2012-02-22 11:25:44 +00:00
Gabriel Wicke 8dde1f77b4 Reduce debug print overhead, roughly a 10% speed-up on parserTests. 2012-02-21 18:49:43 +00:00
Gabriel Wicke 058c4213a4 Remove some more unused code and tidy up some more. 2012-02-21 18:26:40 +00:00
Gabriel Wicke 416126c041 Fix the bug in the inline_breaks replacement, and write another switch-based
version, which is slightly faster and shorter. Performance is improved by
about 5% for parserTests.
2012-02-21 17:57:30 +00:00
Gabriel Wicke 18a04f7581 Tidy up and comment the tokenizer a bit more. Start to move code into
mediawiki.tokenizer.js module, and pass a reference to parse(). Faster
inline_breaks production using a JS function which seems to be generally
correct, but still breaks five tests when enabled. Seems to be some weird
interaction with peg.js, possibly something to do with caching.
2012-02-21 17:21:42 +00:00
Gabriel Wicke 8718bd65bc Add list of HTML5 and deprecated HTML3/4 elements in preparation for
end-of-potential-extension rules; Support indented tag-wrapped pre blocks.
2012-02-21 14:44:56 +00:00
Gabriel Wicke ffec77273a Comment and minor code tweaks. 2012-02-21 11:24:20 +00:00
au ea15bffb27 Revert "* Always sort attributes (+1 test pass)."
This reverts commit 45ca281da8eef8030bdd1986418cb914fc9a717c.
2012-02-20 22:26:12 +00:00
Gabriel Wicke 5806705733 Push transformer setup a bit further into the attribute pipeline. 2012-02-20 12:56:00 +00:00
Gabriel Wicke 8eddb4ec6b Add some comments to the Sanitizer 2012-02-20 11:14:53 +00:00
Gabriel Wicke 71e95bd54b Set up token stream transformers from a map of phases per input content type.
Not yet applied to attribute pipeline creation. 249 tests passing.
2012-02-20 11:07:21 +00:00
au 9c55f5e8b7 * Always sort attributes (+1 test pass).
The performance impact for .sort is quite small (12.079s => 12.158s)
  and Sanitizer is probably one of the more accessible places to do this.
2012-02-18 21:01:07 +00:00
au aa589d989b * Rudimentary CSS validation; +4 tests pass. (Bug 2304, 3244). 2012-02-18 20:16:23 +00:00
Gabriel Wicke 4d80b8daa8 Detail comments about next steps and divide parser functions in those that
need more information from the wiki and readily implementable items.
2012-02-17 10:23:14 +00:00
Gabriel Wicke 059ff94bc4 Reject match for invalid urlencoded code points. 2012-02-16 13:57:56 +00:00
Gabriel Wicke dc1d30fcb5 Tweaked template parameters a bit further, and made the self-closing tag
protection a bit less trigger-happy.
2012-02-15 15:56:11 +00:00
Gabriel Wicke 089413298c Protect self-closing tags in generic attribute production. 2012-02-15 13:23:50 +00:00
Gabriel Wicke 5e94a238fc Prepare for the support of tables (and later generally block-level elements)
in template parameters. 244 tests passing.
2012-02-15 11:51:29 +00:00
Gabriel Wicke 774a3189c8 Improve support for generic attribute names coming from
templates/templateargs.
2012-02-15 10:19:39 +00:00
Gabriel Wicke 1ce6f5a3c4 Improve support for single-line attributes with preprocessor support. 243
tests passing.
2012-02-14 21:25:52 +00:00
Gabriel Wicke f02b3d91c6 Port urlencoded char support to preprocessor-supporting link target
production, and remove old link_target production.
2012-02-14 21:08:25 +00:00
Gabriel Wicke 001194b140 Replace console.log with console.warn in all debug statements 2012-02-14 20:56:14 +00:00
Gabriel Wicke f42b379e52 Fix named wikilink options (image options really) in template arguments, and
speed up template parameter parsing by eliminating some backtracking. 238
tests passing (unchanged).
2012-02-14 15:45:18 +00:00
Gabriel Wicke 64f63b3714 request is automatically installed by jsdom. Follow-up to r111459. Thanks
Hashar!
2012-02-14 14:15:50 +00:00
Gabriel Wicke 466e8e54ad Tweak comment about request module 2012-02-14 14:01:13 +00:00
Gabriel Wicke 0b8d1b0387 * Add custom toString methods for tokens to aid debugging
* Convert all attributes into strings in Sanitizer
* Use strict comparison against empty string in tokenizer
* Add very simple sitename parserfunction
* 138 tests passing
2012-02-13 17:02:23 +00:00
Gabriel Wicke 9945175416 Reformat Date.replaceChars 2012-02-13 14:23:48 +00:00
Gabriel Wicke 0b40741e1c Strip trailing newlines from included templates 2012-02-13 14:17:03 +00:00
Gabriel Wicke 025f9cddb3 Prefix all internal data- attributes with data-mw- and adjust the whitelist
and test output normalization accordingly. 235 tests passing.
2012-02-13 13:54:07 +00:00
Gabriel Wicke b1617b1d71 Add some support for ideographic spaces in external links, support the
int: namespace alias and perform some normalization on the MediaWiki namespace
prefix.
2012-02-13 13:35:46 +00:00
Gabriel Wicke 55ddb4fd66 Remove WikiDom default serialization and --html argument from parse.js
wrapper. HTML ist now the only supported format. The DOMConverter is now no
longer used. Roan, feel free to remove / butcher it for direct HTML to linear
model conversion.
2012-02-11 17:59:17 +00:00
Gabriel Wicke a122e51eec Move data-* annotations into separate object on tokens, that is then
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.

A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
2012-02-11 16:43:25 +00:00
Gabriel Wicke aff30be131 Some comments and reshuffling in the grammar, and a typo in the
AttributeExpander.
2012-02-09 22:27:45 +00:00
Gabriel Wicke 6e33255503 Improve support for preprocessor functionality in attributes; Support
multi-line xmlish tags with preprocessor stuff in attributes.
2012-02-09 16:36:29 +00:00
Gabriel Wicke 16ded7d955 Fix a bug in wikilink with trail tokenization. 2012-02-09 14:06:35 +00:00
Gabriel Wicke 6983481561 Move attribute expansion back to separate handler, as this makes it easier to
only expand used branches selected by parser functions. Template (and
-argument) expansion is simply registered before general expansion.

Additionally, a few more simple time-based magic words are added in
ParserFunctions.
2012-02-09 13:44:20 +00:00
Gabriel Wicke 3f7c1499cd Enable support for general preprocessor functionality in attribute keys and
values. This includes comments, templates and template arguments.

This also replaces the specialized expansion logic in the TemplateHandler. The
removal of link validation lets one more parser test fail for now. External
link target validation will need to be implemented in the token stream handler
for links. This is noted as TODO in
https://www.mediawiki.org/wiki/Future/Parser_development#Token_stream_transforms.
2012-02-08 15:10:30 +00:00
Gabriel Wicke 157c495a9e Normalize the title in localurl. 232 tests passing. 2012-02-07 12:26:00 +00:00
Gabriel Wicke b4892102a4 Clean up transform callback interface 2012-02-07 11:53:29 +00:00
Gabriel Wicke 1f6db903e9 Pluck a few low-hanging fruit in external link tokenization, and add a simple
localurl parser function implementation. 230 parser tests now passing.
2012-02-07 10:28:23 +00:00
Gabriel Wicke cf8b7bf45d External links don't nest. 2012-02-07 09:38:28 +00:00
Gabriel Wicke 53bf4f2bd0 Temporarily disable the sanitizer and start to support preprocessor
functionality (comments, templates, template arguments) in arbitrary
attributes. The grammar for this is still quite rough, will need to
consolidate that area.
2012-02-06 19:15:44 +00:00
Gabriel Wicke c26243989e Improve toJSON handlers to include all properties 2012-02-06 19:12:29 +00:00
Gabriel Wicke 0bea9fdfbb Fix nowiki tokenization regression introduced r110495 2012-02-03 13:10:04 +00:00
Gabriel Wicke 26f2026cff Add custom JSON serializers for tokens that include a type attribute 2012-02-03 13:09:01 +00:00
Gabriel Wicke 8c75aa1a7a Remove type attribute for tag tokens. 2012-02-01 18:37:48 +00:00
Gabriel Wicke 689f697a93 Push token format conversion a bit further along, and add defines that were
missing in last commit.
2012-02-01 17:03:08 +00:00
Gabriel Wicke a5cc10a06b Change token format to plain strings for text tokens, and specific objects for
other tokens. This is only the first half of the conversion. The next step is
to drop the type attribute on most tokens and match on the constructor in the
token transform machinery.
2012-02-01 16:30:43 +00:00
Gabriel Wicke dd3707ded5 Remove some modules normally bundled with node.js from dependencies, and
remove some older ones that are only used in currently-dead code.
2012-02-01 10:32:33 +00:00
Gabriel Wicke e65c6502c0 Add source for #time implementation in comment 2012-02-01 10:14:01 +00:00
Gabriel Wicke 14a8a13678 A few more debug helpers including a --trace mode for light debugging. Some
improvements to parser functions on the way to support the cite extensions.
Preparation for generic template and template arg in attribute support. 222
parser tests now passing.
2012-01-31 16:50:16 +00:00
Neil Kandalgaonkar 2688f823ef added dependencies to README 2012-01-31 00:56:07 +00:00
Neil Kandalgaonkar f0b934ef2e first pass at an API method that returns wikidom. Shells out to node. Some issues with XML API result formatting but works fine in JSON 2012-01-31 00:02:48 +00:00
Gabriel Wicke 7cd94df47d A few minor tweaks to reduce memory usage 2012-01-27 13:32:44 +00:00
Gabriel Wicke 4e6a54560a * Emit token chunks for top-level block elements by patching the source of the
tokenizer
* Fix a bug uncovered by this
* Increase the number of outstanding listeners on a single download to 10000
2012-01-22 23:21:53 +00:00
Gabriel Wicke 7ea4d7d3db A few parser function fixes and maximum template expansion in environment
config.
2012-01-22 19:32:28 +00:00
Gabriel Wicke 561cf3c237 Bug fixes and a first stab at a #time parser function. You can expand the main
page like this:

cd extensions/VisualEditor/modules/parser
echo '{{:Main Page}}' | node parse.js
echo '{{:Main Page}}' | node parse.js --html
echo '{{:Main Page}}' | node parse.js --debug

Even the date-based includes work somewhat, although they don't yet accept
passed-in dates.
2012-01-22 07:07:16 +00:00
Gabriel Wicke 60e45bb739 A bit of template expansion bug fixing and parser function documentation 2012-01-22 01:27:22 +00:00
Gabriel Wicke e8a7034acf Add some commandline switches to parse.js. Supports switching on/off debug
mode and a selection of html/WikiDom serialization.
2012-01-21 22:42:54 +00:00
Gabriel Wicke 785a4af76f Implement a few parser functions. 220 parser tests now passing. 2012-01-21 20:38:13 +00:00
Gabriel Wicke 1a6546fbca Support empty template arguments and default values in arg expansion 2012-01-21 03:03:33 +00:00
Gabriel Wicke fdd048b3b2 Remove a few stray debug prints and disable debugging in parse.js 2012-01-20 22:21:33 +00:00
Gabriel Wicke 145df2655c * NoInclude and IncludeOnly improvements
* Tokenizer support for templates and template args in template arguments and titles
* Async attribute expansion fixes
2012-01-20 22:02:23 +00:00
Gabriel Wicke 348cac6cf0 Fix a bug in TokenCollector, and misc tweaks for template expansions. 2012-01-20 18:47:17 +00:00
Gabriel Wicke 7cc8e69147 Collapse all requests per template into a single outstanding request using an
event-emitting TemplateRequest object and a request queue.
2012-01-20 02:36:18 +00:00
Gabriel Wicke fc2088bb21 Add some rudimentary noinclude / includeonly support and fix up
TokenCollector.
2012-01-20 01:46:16 +00:00
Gabriel Wicke c15e0d4167 Minor cleanup in TemplateHandler 2012-01-20 00:49:27 +00:00
Gabriel Wicke d0ece16c86 Fix async template expansion, so we can now render simple pages with templates
directly to WikiDom from enwiki using a commandline like this:

  echo '{{User:GWicke/Test}}' | node parse.js

Wohoo!

Complex pages with templates won't render properly yet, as noinclude /
includeonly and parser functions are not yet implemented. As a result, the
parser will run out of memory or hit the currently low expansion depth limit
as it tries to expand documentation for all templates.
2012-01-19 23:43:39 +00:00
Gabriel Wicke 2233d0a488 Eventify parser tests and parse.js commandline wrapper to actuallly allow
async template fetching. Async expansion is not yet fully debugged, but at
least the preconditions for that are now there.
2012-01-18 23:46:01 +00:00
Gabriel Wicke 5b8054636e Make template fetching somewhat functional on node with Inez' help, but
disable it by default in parserTests as it tries to fetch all sorts of parser
functions and is not yet fully supported in parserTests. The next step will be
to build a list of parser functions (to avoid fetching them as templates) and
pushing the event interface into parserTests.
2012-01-18 19:38:32 +00:00
Gabriel Wicke 4bd4307924 Fix comment to reflect the actual regexp/spec in the JS version as well. 2012-01-18 19:35:13 +00:00
Gabriel Wicke 14e6728cc4 Add the start of a minimal sanitizer stage, that only strips IDN ignored
characters from host portions of links hrefs for now. This module needs to be
filled up with pretty much everything Sanitizer.php does, including tag and
attribute whitelists and attribute value sanitation (especially for style
attributes).

We'll also need to think about round-tripping of sanitized tokens.
2012-01-18 01:42:56 +00:00
Gabriel Wicke 336be4f617 Eat '[[[' as plain text token, makes it 212 passing. 2012-01-18 00:23:17 +00:00
Gabriel Wicke 178adbc342 Accept IPv6 (and IPv4) addresses in the tokenizer, so another test passes. 2012-01-18 00:00:47 +00:00
Gabriel Wicke e7381da5b8 Trim whitespace off template titles and argument names. 209 parser tests now
passing.
2012-01-17 23:18:33 +00:00
Gabriel Wicke f50fecf1e3 Fix template argument expansion. 200 parser tests now passing. 2012-01-17 22:29:26 +00:00
Gabriel Wicke 34025251a3 Clean up 'END' token handling a bit. 2012-01-17 20:01:21 +00:00
Gabriel Wicke 7f579398c7 Use isBlockTag in DOMPostProcessor 2012-01-17 18:30:22 +00:00