Commit graph

33 commits

Author SHA1 Message Date
Gabriel Wicke 4cd8b302ac Improved template tokenization. The parser can now template-expand
[[:en:Barack Obama]] without exceeding 1.7GB of memory (which is the node
limit).
2012-03-12 17:31:45 +00:00
Gabriel Wicke ffc9383096 Temporary fix for template tokenization, especially needed for
[[Template:Cite core]].
2012-03-08 14:24:04 +00:00
Gabriel Wicke 39017dd769 Percent-encode spaces in URLs, so that they are recognized as valid URLs later
on.
2012-03-08 11:53:15 +00:00
Gabriel Wicke 7518db8197 A few fixes to parser functions and template expansion. Trim whitespace off
template arguments, let the last duplicate key win and fake pagenamee slightly
better.
2012-03-08 11:44:37 +00:00
Gabriel Wicke f02ff95aa3 Token representation clean-up. Now all tokens are differentiated using
constructors instead of type attributes.
2012-03-07 20:06:54 +00:00
Gabriel Wicke af03eb4f29 Improve generic attribute expansion before external link processing, and make
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
2012-03-06 18:02:35 +00:00
Gabriel Wicke 227103e12c Accept empty table cell attribute sections, and consider percent-encoded %2525
valid. 270 tests passing.
2012-03-06 14:32:45 +00:00
Gabriel Wicke 2efcd3cd57 Reworked percent encoding handling for URIs to get closer to the 'url
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation

Removed a few whitelisted test cases that are now passing directly.

The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
2012-03-06 13:49:37 +00:00
Gabriel Wicke a9ebc1d986 Support external images wrapped in a clickable link using bracketed external
link syntax. 265 tests passing.
2012-03-05 16:23:00 +00:00
Gabriel Wicke 7b0c807710 Change wikilink tokenization strategy to split on pipes. This makes it
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
2012-03-05 12:00:38 +00:00
Gabriel Wicke b1a7119a46 Hack up some rudimentary image rendering. Using jshashes for the md5, and
a few hard-coded image image sizes ;) 262 tests passing.
2012-03-01 13:51:53 +00:00
Gabriel Wicke d4faf9eaf4 More work on wiki link rendering and general wiki title / namespace
functionality.
2012-03-01 12:47:05 +00:00
Gabriel Wicke d7da324272 Basic fall-through support for #switch parser function 2012-02-22 14:57:50 +00:00
Gabriel Wicke 001194b140 Replace console.log with console.warn in all debug statements 2012-02-14 20:56:14 +00:00
Gabriel Wicke b1617b1d71 Add some support for ideographic spaces in external links, support the
int: namespace alias and perform some normalization on the MediaWiki namespace
prefix.
2012-02-13 13:35:46 +00:00
Gabriel Wicke 53bf4f2bd0 Temporarily disable the sanitizer and start to support preprocessor
functionality (comments, templates, template arguments) in arbitrary
attributes. The grammar for this is still quite rough, will need to
consolidate that area.
2012-02-06 19:15:44 +00:00
Gabriel Wicke a5cc10a06b Change token format to plain strings for text tokens, and specific objects for
other tokens. This is only the first half of the conversion. The next step is
to drop the type attribute on most tokens and match on the constructor in the
token transform machinery.
2012-02-01 16:30:43 +00:00
Gabriel Wicke 14a8a13678 A few more debug helpers including a --trace mode for light debugging. Some
improvements to parser functions on the way to support the cite extensions.
Preparation for generic template and template arg in attribute support. 222
parser tests now passing.
2012-01-31 16:50:16 +00:00
Gabriel Wicke 7ea4d7d3db A few parser function fixes and maximum template expansion in environment
config.
2012-01-22 19:32:28 +00:00
Gabriel Wicke 785a4af76f Implement a few parser functions. 220 parser tests now passing. 2012-01-21 20:38:13 +00:00
Gabriel Wicke 7cc8e69147 Collapse all requests per template into a single outstanding request using an
event-emitting TemplateRequest object and a request queue.
2012-01-20 02:36:18 +00:00
Gabriel Wicke fc2088bb21 Add some rudimentary noinclude / includeonly support and fix up
TokenCollector.
2012-01-20 01:46:16 +00:00
Gabriel Wicke d0ece16c86 Fix async template expansion, so we can now render simple pages with templates
directly to WikiDom from enwiki using a commandline like this:

  echo '{{User:GWicke/Test}}' | node parse.js

Wohoo!

Complex pages with templates won't render properly yet, as noinclude /
includeonly and parser functions are not yet implemented. As a result, the
parser will run out of memory or hit the currently low expansion depth limit
as it tries to expand documentation for all templates.
2012-01-19 23:43:39 +00:00
Gabriel Wicke 5b8054636e Make template fetching somewhat functional on node with Inez' help, but
disable it by default in parserTests as it tries to fetch all sorts of parser
functions and is not yet fully supported in parserTests. The next step will be
to build a list of parser functions (to avoid fetching them as templates) and
pushing the event interface into parserTests.
2012-01-18 19:38:32 +00:00
Gabriel Wicke 14e6728cc4 Add the start of a minimal sanitizer stage, that only strips IDN ignored
characters from host portions of links hrefs for now. This module needs to be
filled up with pretty much everything Sanitizer.php does, including tag and
attribute whitelists and attribute value sanitation (especially for style
attributes).

We'll also need to think about round-tripping of sanitized tokens.
2012-01-18 01:42:56 +00:00
Gabriel Wicke e7381da5b8 Trim whitespace off template titles and argument names. 209 parser tests now
passing.
2012-01-17 23:18:33 +00:00
Gabriel Wicke f50fecf1e3 Fix template argument expansion. 200 parser tests now passing. 2012-01-17 22:29:26 +00:00
Gabriel Wicke f4081bef08 First template expansion tests start working, and a bug fix in
DOMPostProcessor paragraph wrapper. 187 parser tests now passing.
2012-01-14 00:58:20 +00:00
Gabriel Wicke 32c9bccd7c Results of early template expansion debugging. Still disabled by default, but
getting closer.
2012-01-11 19:48:49 +00:00
Gabriel Wicke 5ec30252f1 More token transform and pipeline setup refactoring to support template
expansion better.
2012-01-10 01:09:50 +00:00
Trevor Parscal 5bae153214 Moving parser stuff back into the modules folder (oops) 2011-11-02 21:45:57 +00:00
Trevor Parscal 2b499d5990 Reorganized modules by javascript namespace 2011-11-02 21:31:45 +00:00
Brion Vibber 56a75ccca7 Copy several of the experimental JS parser bits from ParserPlayground to VisualEditor. They'll need retooling to hook up with the wikidom stuff. 2011-11-02 21:07:51 +00:00