Commit graph

264 commits

Author SHA1 Message Date
Gabriel Wicke dbdd320348 Improve parameter tokenization support especially for table rows
Change-Id: I961d69e228b96adc69ea9acb3733d13f5898602d
2012-04-05 16:00:26 +02:00
Gabriel Wicke 7a35e5db16 Remove behaviors var in tokenizer, now handled in token handler
Change-Id: I68eeff3f05ce29c13e347c2cd7ea6519e58b0e03
2012-04-04 21:17:29 +02:00
GWicke da60861be8 Merge ""magic words" are tokenized and used to set parser.environment flags" 2012-04-04 19:11:03 +00:00
Adam Wight a85ed36efa "magic words" are tokenized and used to set parser.environment flags
behavior switches are converted to tokens which set parser.environment flags during the async transformation stage.

The next step would be for handlers in the sync23 stage to generate the TOC, section edit links, and so on according to these directives.

No tests written, because the switches are consumed and don't appear in rendered html.  We can test the magic word layout controls individually, once they're implemented.

Another small change was to store option flags directly in the environment object, not that it makes much difference.

Change-Id: I863fbf4be1a17d2f6c31158298dd301f19ae1137
2012-04-04 11:25:29 -07:00
Adam Wight b234edba88 As much as I have loved writing Makefiles... I've replaced its functionality with package.json, mostly so we can avoid non-node dependencies. This is one of the recommended practices. We should consider moving tests/parser into modules/parser/tests, other node projects keep all module code in one directory.
Explained in the README how to use npm to load the dependencies and run tests.  Too bad about NODE_PATH...

Don't try to find parserTests.txt in assorted places--if it isn't present, fetch from gerrit.  You can symlink from core if you're developing on both parsers, and the fetch script will not overwrite.

Use __dirname in parserTests.js to allow the script to run independent of current working directory.

Change-Id: I4c8b884e91f4fdeae385c7697aff768bdd199dd5
2012-04-04 11:02:58 -07:00
Gabriel Wicke e3a745a024 Improvements for template / -argument precedence; support for empty params
Change-Id: Id0894ccbedfa47fa3658817ca65119a2af76be3e
2012-04-04 16:29:47 +02:00
Gabriel Wicke 2037215185 Disallow '[' in generic attribute names
This avoids interpreting something like

! [[foo|bar]]

as

<th [[foo=''>bar]]</th>.

Change-Id: If59708fa90eb0117a15b2b6446890d1ae19a857c
2012-04-04 14:31:11 +02:00
Gabriel Wicke f588d2a7aa Fix table headings in template parameters
Change-Id: Icdfc5655968fc845230ad7638124309d6b8c1ada
2012-04-04 12:54:34 +02:00
Gabriel Wicke b8d980a229 Don't eat newline / space in template parameters
..so that block_lines can match.

Change-Id: I4c464dc44249f40e4aa280df35fb726bfce3a745
2012-04-04 11:22:31 +02:00
Trevor Parscal 606d97da99 Merge "Add HTML DOM -> linear model converter" 2012-04-03 17:52:55 +00:00
Gabriel Wicke 47de122a95 Improve support for table / template interaction
Match pairs of {{!}} or | for template productions, but not a mix of the two.
Example:

{{#if:1|{{!}}-
{{!}} {{#if:1|style="color: red"{{!}}|}}
}}

Note that the style parameter ends up as the *key* of an empty-valued
attribute on the table cell currently.

Change-Id: I5f9357dd1645ef97b0af89f32e8d92ae49218c72
2012-04-03 18:48:35 +02:00
Gabriel Wicke 0fe062fbe1 JSHint cleanups and parser function argument handling improvements
Parser functions which only accept positional arguments now return both the
key and value of arguments. Complete attributes (key and value) for templates
and the like from parser functions are not yet supported though.

Change-Id: I3f81bb35acd27186222ce6d5217e820042527c01
2012-04-03 18:10:48 +02:00
GWicke b7db83e09a Merge "Magic links and behavior switch tokenization by Ori Livneh" 2012-04-02 16:43:13 +00:00
Gabriel Wicke f662690d02 Shorten data-mw-rt to data-mw and clean up whitelist
Instead of a proliferation of data-mw-* attributes, it should be easier to
stash all private / non-semantic round-trip information in a JSON object
stored in data-mw.

Change-Id: Id200a6a8789fa152f29ea530e5a24b6ee7b4b285
2012-04-02 18:12:49 +02:00
Gabriel Wicke 5248fd31e8 Magic links and behavior switch tokenization by Ori Livneh
Commit first patch by Ori, lets 288 parser tests pass. Yay!

Change-Id: Iac8c3d1ad1984900350b20f7e725c40618a1e8ba
2012-04-02 17:31:34 +02:00
Catrope 8dc994f037 Add HTML DOM -> linear model converter
Also, in ParserPipeline:
* Import the LM converter and expose it through getLinearModel()
* Fix getWikiDom() to actually work (still unused)

In parse.js:
* Add --help option that prints usage information (was unreachable)
* Add --linearmodel option to output linear model JSON instead of HTML

Change-Id: Ic534e03ff40a7c9117bb63f0c635a4213d5e3406
2012-03-29 12:47:14 -07:00
Gabriel Wicke 5ef2074251 Enable support for block-level wiki constructs in template arguments. This
gets a bit closer to supporting table fragments passed through template
arguments. Next, we'll need a way to indicate start-of-line position to
enable sol block-levels in template parameters. 

Example:

{|
{{#if: true|{{!}}Table cell|}}
|}
2012-03-15 11:43:49 +00:00
Gabriel Wicke 7e22020398 Convert syntactical break flags for templates from counters to the stack
variant to fix the precedence for {{!}} (break on these inside table content,
but not in template options within tables).
2012-03-14 16:30:59 +00:00
Gabriel Wicke 77a61dd687 Improve support for {{!}}, and don't produce a pre for indented tables. 2012-03-14 10:58:11 +00:00
Gabriel Wicke 835914b2de Support {{=}}. 2012-03-14 09:07:01 +00:00
Gabriel Wicke 2195c31abf Move link types to data-mw-rt, and support some more template tokenization
edge cases. For example, the PHP parser treats | foo | = bar | as | foo = bar |,
believe it or not ;)
2012-03-13 12:32:31 +00:00
Gabriel Wicke 4cd8b302ac Improved template tokenization. The parser can now template-expand
[[:en:Barack Obama]] without exceeding 1.7GB of memory (which is the node
limit).
2012-03-12 17:31:45 +00:00
Gabriel Wicke 3c5fe2523c Tolerate more newlines and spaces in templates, and support templates and
comments in urls.
2012-03-12 14:31:06 +00:00
Gabriel Wicke ae4ab7a39c Refactor syntactic stops into an object and add a stack variant for option
values.
2012-03-12 13:08:43 +00:00
Roan Kattouw 29f416937e Fix some usages of splice.apply in the data model to use
ve.batchedSplice(). Added FIXME comments for occurrences outside of DM
2012-03-10 00:31:28 +00:00
Gabriel Wicke ffc9383096 Temporary fix for template tokenization, especially needed for
[[Template:Cite core]].
2012-03-08 14:24:04 +00:00
Gabriel Wicke 39017dd769 Percent-encode spaces in URLs, so that they are recognized as valid URLs later
on.
2012-03-08 11:53:15 +00:00
Gabriel Wicke 7518db8197 A few fixes to parser functions and template expansion. Trim whitespace off
template arguments, let the last duplicate key win and fake pagenamee slightly
better.
2012-03-08 11:44:37 +00:00
Gabriel Wicke 51023feaa4 Improvements for image option handling. 2012-03-08 10:03:22 +00:00
Gabriel Wicke b1e131d568 A bit more documentation and naming cleanup in the tokenizer wrapper. 2012-03-08 09:00:45 +00:00
Gabriel Wicke f02ff95aa3 Token representation clean-up. Now all tokens are differentiated using
constructors instead of type attributes.
2012-03-07 20:06:54 +00:00
Gabriel Wicke f157093a41 Delegate responsibility for resetting the token rank to transforms, if full
re-processing in a phase is wanted. By default, after a token type change or
the return of multiple tokens only the remaining transforms with higher ranks
are applied.

Updated a few comments as well.
2012-03-07 19:29:53 +00:00
Gabriel Wicke 1f8c43b9e2 A few minor documentation updates. 2012-03-07 18:42:26 +00:00
Gabriel Wicke 5f618103d7 Set allTokensProcessed flag for async callbacks from the template expander. 2012-03-07 17:36:33 +00:00
Gabriel Wicke e5a1116817 Start re-transformation as soon as possible in TokenAccumulator._returnTokens
to maximize IO concurrency. Signal that all tokens are fully transformed to
callbacks called from TokenAccumulator._returnTokens. The result should be a
single re-transformation when entering the callback chain, and only if the
transform does not signal that it took care of full transformation itself.
Template expansion would set this flag, as the nested transform pipeline
processes all tokens to the end of phase async12.
2012-03-07 16:29:06 +00:00
Gabriel Wicke 656524dbbc Fixes for multi-transformer expansion in AsyncTransformManager. Added argument
to callback which lets transforms indicate if their returned tokens are fully
processed for their phase. If not, the callback re-processes them so that any
remaining transforms are applied.
2012-03-07 15:39:18 +00:00
Gabriel Wicke af03eb4f29 Improve generic attribute expansion before external link processing, and make
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
2012-03-06 18:02:35 +00:00
Gabriel Wicke 227103e12c Accept empty table cell attribute sections, and consider percent-encoded %2525
valid. 270 tests passing.
2012-03-06 14:32:45 +00:00
Gabriel Wicke 2efcd3cd57 Reworked percent encoding handling for URIs to get closer to the 'url
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation

Removed a few whitelisted test cases that are now passing directly.

The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
2012-03-06 13:49:37 +00:00
Gabriel Wicke 19fe9726a2 Fix invalid external link representation. 268 tests passing. 2012-03-05 18:06:29 +00:00
Gabriel Wicke a9ebc1d986 Support external images wrapped in a clickable link using bracketed external
link syntax. 265 tests passing.
2012-03-05 16:23:00 +00:00
Gabriel Wicke 7f7202e89c A few improvements to external link and image handling. 264 tests passing. 2012-03-05 15:34:27 +00:00
Gabriel Wicke 7b0c807710 Change wikilink tokenization strategy to split on pipes. This makes it
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
2012-03-05 12:00:38 +00:00
Gabriel Wicke 3e6f1b6bea Use some options primitively. 2012-03-02 14:19:33 +00:00
Gabriel Wicke 167dbdb0fa Parse image options. 2012-03-02 13:36:37 +00:00
Gabriel Wicke 8b7ba9051b Add productions for image option tokenization, and prepare to call those from
the LinkHandler token stream transformer.
2012-03-01 18:07:20 +00:00
Gabriel Wicke b1a7119a46 Hack up some rudimentary image rendering. Using jshashes for the md5, and
a few hard-coded image image sizes ;) 262 tests passing.
2012-03-01 13:51:53 +00:00
Gabriel Wicke d4faf9eaf4 More work on wiki link rendering and general wiki title / namespace
functionality.
2012-03-01 12:47:05 +00:00
Gabriel Wicke 4b9bd45b82 Start to move wikilink expansion to a separate async token transformer. 2012-02-29 13:56:29 +00:00
Gabriel Wicke b8bb503199 Actually commit onlyinclude, as already announced in r112592. 2012-02-28 13:24:35 +00:00