Commit graph

31 commits

Author SHA1 Message Date
Gabriel Wicke 25523f4cf0 Implement urlencode parser function
Change-Id: I4fca3134c9c3eb9a7d6f3360be6de054fb47477c
2012-04-16 14:54:03 +02:00
Gabriel Wicke 5bb2d96869 Token stream transform improvements
* add past paths for empty arguments etc
* cache attribute token transform pipelines
* fix bugs in TokenCollector and NoIncludeOnly handler, and improve its
  efficiency by only registering for 'end' tokens on demand
* Remove empty reset methods from a few handlers
* Add a simple 'ap' debug print function that makes it easy to only print some
  debug prints by temporarily changing 'dp' to 'ap'
* Improvements and bug fixes in AttributeExpander

Change-Id: Ie69729c8f62d48bba922712e44ebce484c621c50
2012-04-12 15:42:09 +02:00
Gabriel Wicke 9ae572cca0 Fixes to template expansion / token transform managers, 296 tests passing.
* Convert isNoInclude logic to positive isInclude throughout and set it
  properly on attribute pipelines. Also don't cache non-include pipelines.
* Add a --pagename parameter to parse.js, which sets the page name in the
  environment. This is then returned by {{PAGENAME}}. Not the final solution,
  but useful for taxobox testing as taxons are selected based on PAGENAME.
* Add rudimentary pagenamebase parser function

Change-Id: If9c0be4c255200d0f2a30f02e5619437b4fd8f12
2012-04-11 16:34:27 +02:00
Gabriel Wicke b8d980a229 Don't eat newline / space in template parameters
..so that block_lines can match.

Change-Id: I4c464dc44249f40e4aa280df35fb726bfce3a745
2012-04-04 11:22:31 +02:00
Gabriel Wicke 0fe062fbe1 JSHint cleanups and parser function argument handling improvements
Parser functions which only accept positional arguments now return both the
key and value of arguments. Complete attributes (key and value) for templates
and the like from parser functions are not yet supported though.

Change-Id: I3f81bb35acd27186222ce6d5217e820042527c01
2012-04-03 18:10:48 +02:00
Gabriel Wicke 4cd8b302ac Improved template tokenization. The parser can now template-expand
[[:en:Barack Obama]] without exceeding 1.7GB of memory (which is the node
limit).
2012-03-12 17:31:45 +00:00
Gabriel Wicke 7518db8197 A few fixes to parser functions and template expansion. Trim whitespace off
template arguments, let the last duplicate key win and fake pagenamee slightly
better.
2012-03-08 11:44:37 +00:00
Gabriel Wicke af03eb4f29 Improve generic attribute expansion before external link processing, and make
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
2012-03-06 18:02:35 +00:00
Gabriel Wicke 7b0c807710 Change wikilink tokenization strategy to split on pipes. This makes it
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
2012-03-05 12:00:38 +00:00
Gabriel Wicke 3227903d48 Follow-up to r112116, accidentally committed from subdirectory. 2012-02-22 16:41:01 +00:00
Gabriel Wicke 3568dfee14 Add some support for functionhooks in test parser and parserTests.js, and
tweak a few parser functions.
2012-02-22 15:59:11 +00:00
Gabriel Wicke d7da324272 Basic fall-through support for #switch parser function 2012-02-22 14:57:50 +00:00
Gabriel Wicke ffec77273a Comment and minor code tweaks. 2012-02-21 11:24:20 +00:00
Gabriel Wicke 4d80b8daa8 Detail comments about next steps and divide parser functions in those that
need more information from the wiki and readily implementable items.
2012-02-17 10:23:14 +00:00
Gabriel Wicke 059ff94bc4 Reject match for invalid urlencoded code points. 2012-02-16 13:57:56 +00:00
Gabriel Wicke 001194b140 Replace console.log with console.warn in all debug statements 2012-02-14 20:56:14 +00:00
Gabriel Wicke 0b8d1b0387 * Add custom toString methods for tokens to aid debugging
* Convert all attributes into strings in Sanitizer
* Use strict comparison against empty string in tokenizer
* Add very simple sitename parserfunction
* 138 tests passing
2012-02-13 17:02:23 +00:00
Gabriel Wicke 9945175416 Reformat Date.replaceChars 2012-02-13 14:23:48 +00:00
Gabriel Wicke a122e51eec Move data-* annotations into separate object on tokens, that is then
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.

A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
2012-02-11 16:43:25 +00:00
Gabriel Wicke 6983481561 Move attribute expansion back to separate handler, as this makes it easier to
only expand used branches selected by parser functions. Template (and
-argument) expansion is simply registered before general expansion.

Additionally, a few more simple time-based magic words are added in
ParserFunctions.
2012-02-09 13:44:20 +00:00
Gabriel Wicke 3f7c1499cd Enable support for general preprocessor functionality in attribute keys and
values. This includes comments, templates and template arguments.

This also replaces the specialized expansion logic in the TemplateHandler. The
removal of link validation lets one more parser test fail for now. External
link target validation will need to be implemented in the token stream handler
for links. This is noted as TODO in
https://www.mediawiki.org/wiki/Future/Parser_development#Token_stream_transforms.
2012-02-08 15:10:30 +00:00
Gabriel Wicke 157c495a9e Normalize the title in localurl. 232 tests passing. 2012-02-07 12:26:00 +00:00
Gabriel Wicke 1f6db903e9 Pluck a few low-hanging fruit in external link tokenization, and add a simple
localurl parser function implementation. 230 parser tests now passing.
2012-02-07 10:28:23 +00:00
Gabriel Wicke 53bf4f2bd0 Temporarily disable the sanitizer and start to support preprocessor
functionality (comments, templates, template arguments) in arbitrary
attributes. The grammar for this is still quite rough, will need to
consolidate that area.
2012-02-06 19:15:44 +00:00
Gabriel Wicke 8c75aa1a7a Remove type attribute for tag tokens. 2012-02-01 18:37:48 +00:00
Gabriel Wicke a5cc10a06b Change token format to plain strings for text tokens, and specific objects for
other tokens. This is only the first half of the conversion. The next step is
to drop the type attribute on most tokens and match on the constructor in the
token transform machinery.
2012-02-01 16:30:43 +00:00
Gabriel Wicke e65c6502c0 Add source for #time implementation in comment 2012-02-01 10:14:01 +00:00
Gabriel Wicke 14a8a13678 A few more debug helpers including a --trace mode for light debugging. Some
improvements to parser functions on the way to support the cite extensions.
Preparation for generic template and template arg in attribute support. 222
parser tests now passing.
2012-01-31 16:50:16 +00:00
Gabriel Wicke 561cf3c237 Bug fixes and a first stab at a #time parser function. You can expand the main
page like this:

cd extensions/VisualEditor/modules/parser
echo '{{:Main Page}}' | node parse.js
echo '{{:Main Page}}' | node parse.js --html
echo '{{:Main Page}}' | node parse.js --debug

Even the date-based includes work somewhat, although they don't yet accept
passed-in dates.
2012-01-22 07:07:16 +00:00
Gabriel Wicke 60e45bb739 A bit of template expansion bug fixing and parser function documentation 2012-01-22 01:27:22 +00:00
Gabriel Wicke 785a4af76f Implement a few parser functions. 220 parser tests now passing. 2012-01-21 20:38:13 +00:00