Roan Kattouw
d70aa70707
Add test for replacing a table with a list. This only works because
...
nesting validity isn't checked yet (lists inside lists are illegal
IIRC), but for now it tests the reversal of the order of the closing
tags nicely
2012-03-09 02:19:50 +00:00
Roan Kattouw
b13d0a849d
Add a check for the length of unwrapOuter, and add a test for each
...
exception
2012-03-09 01:44:31 +00:00
Roan Kattouw
bc600b34be
Make prepareWrap() use the data from the model rather than the unwrap
...
parameters. This fixes the case where rolling back a list unwrap would
restore the list items without their attributes
2012-03-09 01:14:41 +00:00
Roan Kattouw
3bc6b3d8c7
Add tests for unwrapping a list
...
This also excercises unwrapEach. One of the tests is still subtly broken
in that the attributes on the listItems aren't preserved, I'll fix that
next.
2012-03-09 00:38:35 +00:00
Roan Kattouw
5054ed320e
Implement prepareWrap and add tests for it
2012-03-08 23:21:26 +00:00
Roan Kattouw
10a6ee73f4
Add tests for content replacements
2012-03-08 23:21:23 +00:00
Trevor Parscal
3ec0c07843
Fixed name of test suite to match actual class name
2012-03-08 19:37:13 +00:00
Trevor Parscal
becb1daa39
Added more tests for ve.dm.DocumentSynchronizer and fixed some bugs along the way
2012-03-08 19:35:51 +00:00
Trevor Parscal
459c4fa271
Added some basic tests for resize and insert. Fixed some bugs in both of those code paths along the way.
2012-03-08 00:52:30 +00:00
Gabriel Wicke
af03eb4f29
Improve generic attribute expansion before external link processing, and make
...
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
2012-03-06 18:02:35 +00:00
Gabriel Wicke
227103e12c
Accept empty table cell attribute sections, and consider percent-encoded %2525
...
valid. 270 tests passing.
2012-03-06 14:32:45 +00:00
Gabriel Wicke
2efcd3cd57
Reworked percent encoding handling for URIs to get closer to the 'url
...
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation
Removed a few whitelisted test cases that are now passing directly.
The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
2012-03-06 13:49:37 +00:00
Gabriel Wicke
19fe9726a2
Fix invalid external link representation. 268 tests passing.
2012-03-05 18:06:29 +00:00
Gabriel Wicke
7b0c807710
Change wikilink tokenization strategy to split on pipes. This makes it
...
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
2012-03-05 12:00:38 +00:00
Trevor Parscal
0e41da3340
Fixed tests that were broken by r112150.
2012-03-02 23:12:38 +00:00
Gabriel Wicke
009d7a4dea
Namespaces to the rescue.
2012-03-02 15:49:05 +00:00
Gabriel Wicke
fe681042c0
Collect some statistics while grepping.
2012-03-01 16:42:28 +00:00
Gabriel Wicke
e0838db315
Capturing the regexp is no longer necessary, and speeds up the grepper. Also
...
tweaked the multi-line ISBN regexp slightly.
2012-02-29 13:02:46 +00:00
Gabriel Wicke
e3deb304db
Add a misc regexp file for dump grepping.
2012-02-29 11:07:17 +00:00
Gabriel Wicke
14f40aa7d5
Support capturing regexps in dumpGrepper.
2012-02-29 10:49:00 +00:00
Gabriel Wicke
ebcfc2c7a1
Improve grepper documentation.
2012-02-28 14:24:37 +00:00
Gabriel Wicke
b767e03449
Tweak martian regexp and grepper output format.
2012-02-28 14:11:44 +00:00
Gabriel Wicke
4806505ce4
Finish color highlighting for dump grepper / fix broken commit r112592.
2012-02-28 13:48:47 +00:00
Gabriel Wicke
7daeb34d4d
Implement onlyinclude transformer. 254 tests passing.
2012-02-28 13:21:01 +00:00
Gabriel Wicke
32012c00cd
Add martian-endtags regexp wrapper around dumpGrepper.
2012-02-27 16:51:20 +00:00
Gabriel Wicke
19c67c28a2
Add a simple dump grepper using DumpReader. Useful to inform parser design
...
decisions, and as a way to exercise the dump reader in preparation for tests
over full dumps.
2012-02-27 16:40:01 +00:00
Gabriel Wicke
21855c99cd
Tweak dumpReader to work with current libxmljs and stdin 'data' events.
2012-02-27 15:46:08 +00:00
Gabriel Wicke
2e41b19af8
Green two more parser tests by implementing some parser functions.
2012-02-22 16:39:50 +00:00
Gabriel Wicke
3568dfee14
Add some support for functionhooks in test parser and parserTests.js, and
...
tweak a few parser functions.
2012-02-22 15:59:11 +00:00
au
f1fb937b4a
* Instead of sorting attributes, whitelist the one parserTest where it matters.
2012-02-20 22:26:24 +00:00
au
0ca9b00100
* Convert __patched-html-parser to .coffee.
...
Note that the compiled .js file (generated by "make"/"make test")
is still under version control so folks can work on the project
even without a running "coffee" command in PATH.
Also updated README to mention coffee-script and "make test".
2012-02-18 18:54:12 +00:00
au
4d1c6c7d6e
* Add a "make test" target that auto-fetches parserTests.txt.
2012-02-18 17:28:46 +00:00
au
0360e62da7
* Locally apply the HTML5.Marker.type patch.
...
This is needed until https://github.com/aredridel/html5/issues/44
is merged into the upstream "html5" module.
2012-02-18 17:28:35 +00:00
Gabriel Wicke
025f9cddb3
Prefix all internal data- attributes with data-mw- and adjust the whitelist
...
and test output normalization accordingly. 235 tests passing.
2012-02-13 13:54:07 +00:00
Gabriel Wicke
a122e51eec
Move data-* annotations into separate object on tokens, that is then
...
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.
A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
2012-02-11 16:43:25 +00:00
Gabriel Wicke
1f6db903e9
Pluck a few low-hanging fruit in external link tokenization, and add a simple
...
localurl parser function implementation. 230 parser tests now passing.
2012-02-07 10:28:23 +00:00
Gabriel Wicke
d321d96bab
Fix parserTests summary with filtering enabled
2012-02-07 09:27:47 +00:00
Trevor Parscal
5d71c888f9
Updated unit tests in response to structural changes in r110805
2012-02-07 00:12:31 +00:00
Gabriel Wicke
a5b7ea7bcd
Add --debug and --trace options to parserTests as well.
2012-02-01 17:02:37 +00:00
Gabriel Wicke
7cd94df47d
A few minor tweaks to reduce memory usage
2012-01-27 13:32:44 +00:00
Gabriel Wicke
348cac6cf0
Fix a bug in TokenCollector, and misc tweaks for template expansions.
2012-01-20 18:47:17 +00:00
Gabriel Wicke
2233d0a488
Eventify parser tests and parse.js commandline wrapper to actuallly allow
...
async template fetching. Async expansion is not yet fully debugged, but at
least the preconditions for that are now there.
2012-01-18 23:46:01 +00:00
Gabriel Wicke
34025251a3
Clean up 'END' token handling a bit.
2012-01-17 20:01:21 +00:00
Gabriel Wicke
f4081bef08
First template expansion tests start working, and a bug fix in
...
DOMPostProcessor paragraph wrapper. 187 parser tests now passing.
2012-01-14 00:58:20 +00:00
Gabriel Wicke
5ec30252f1
More token transform and pipeline setup refactoring to support template
...
expansion better.
2012-01-10 01:09:50 +00:00
Gabriel Wicke
2e35171fd1
Fix quote handling and tweak the whitelist a bit. 'any' token registrations
...
are now merged with specific registrations by rank. Not yet clear if that is a
good idea overall, need to check use cases when implementing template expansion
and other functionality.
183 parser test now passing.
2012-01-04 14:09:05 +00:00
Gabriel Wicke
29362cc53c
Rename ParseThingy to ParserPipeline and fix up broken WikiDom generation and
...
commandline runner.
2012-01-04 08:39:45 +00:00
Gabriel Wicke
bd98eb4c5a
Land big TokenTransformDispatcher and eventization refactoring.
...
The TokenTransformDispatcher now actually implements an asynchronous, phased
token transformation framework as described in
https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations .
Additionally, the parser pipeline is now mostly held together using events.
The tokenizer still emits a lame single events with all tokens, as block-level
emission failed with scoping issues specific to the PEGJS parser generator.
All stages clean up when receiving the end tokens, so that the full pipeline
can be used for repeated parsing.
The QuoteTransformer is not yet 100% fixed to work with the new interface, and
the Cite extension is disabled for now pending adaptation. Bold-italic related
tests are failing currently.
2012-01-03 18:44:31 +00:00
Gabriel Wicke
8e00a72d0a
Improvements to link trail handling, and two tweaks to the whitelist. 182
...
tests now passing.
Link trails depend on language-dependent positive character classes in the PHP
parser. These classes all seem to disallow punctuation implicitly and list
differing plain text characters instead, so it might be possible to get away
with identifying a common class of non-trail punctuation instead. This would
help to keep the tokenizer independent of configurations, which is very
desirable for caching and simplified external parsing.
2011-12-30 12:47:06 +00:00
Gabriel Wicke
b3a0270d69
Remove env and load grammar in tokenizer constructor. Re-add property hack to
...
keep parserTests running for now. Really need a different pipeline for html
serialization or a reference to the HTML DOM.
2011-12-28 17:04:16 +00:00
Neil Kandalgaonkar
8fbf36e63e
put add terminal token inside tokenize method (will pull it out again for streaming interface)
2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar
6103646ec8
remove need to add newline at end of input
2011-12-28 01:37:11 +00:00
Neil Kandalgaonkar
4158f82d7e
refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js
2011-12-28 01:37:06 +00:00
Neil Kandalgaonkar
aedc6751ae
made parseThingy, temp class for refactoring all thingies related to parsing
2011-12-28 01:36:58 +00:00
Neil Kandalgaonkar
5ff2b4d475
make peg src path outside of peg tokenizer
2011-12-28 01:36:50 +00:00
Neil Kandalgaonkar
962d1262fc
create tokenizer without need to modify namespace with PEG source
2011-12-28 01:36:36 +00:00
Gabriel Wicke
1c7fe0eb34
Refactor table productions to support table fragments in templates (table
...
start / row / table end). The old productions are not deleted yet to make it
easy to compare the output on more complex articles. 181 tests passing after
adding two table tests with whitespace-only differences to the whitelist.
2011-12-22 11:43:55 +00:00
Gabriel Wicke
574abd9774
A collection of small bug fixes to the grammar, Cite, the Token format
...
converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all
parserTests.
2011-12-14 23:38:46 +00:00
Gabriel Wicke
dc77d73ad5
Add ability to pass through JSON data to WikiDom in data-json-* attributes,
...
and fix parser to actually parse the Barack Obama article except for one table
with nested templates at the start-of-line.
2011-12-14 17:25:09 +00:00
Gabriel Wicke
a09aa4d599
Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of
...
parser tests using 'node parserTests.js --wikidom'.
2011-12-14 15:15:41 +00:00
Gabriel Wicke
5f80d30428
Clean up access to document and body after building the tree.
2011-12-14 09:40:49 +00:00
Gabriel Wicke
feee9ded9f
Convert the Cite extension to a token stream transformer.
...
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.
In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke
c33f74d227
Follow-up to r106001: Fix typo spotted by Nikerabbit. Good catch!
2011-12-13 13:00:57 +00:00
Gabriel Wicke
8e55e79b67
Rename TokenTransformer to TokenTransformDispatcher.
2011-12-13 11:45:12 +00:00
Gabriel Wicke
815c63ba6c
Disabled es* inclusion for now as the serializers are not currently used, and
...
the recent addition of references to window are not compatible with node.js.
2011-12-13 11:17:33 +00:00
Gabriel Wicke
dc70687ed0
Update README
2011-12-13 10:03:01 +00:00
Gabriel Wicke
a8fa9433c4
Convert quote handling (italic/bold) to a core extension operating on the
...
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.
The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke
752b0990b2
Refactor parserTests somewhat into a class-like structure, and wire up the
...
TokenTransformer.
2011-12-12 14:03:54 +00:00
Gabriel Wicke
d616f07a79
Don't re-build the wiki tokenizer for each test. This speeds up the full
...
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)
A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke
abc2254110
A bit of comment clean-up and wrapping of tree building into try/catch block
...
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke
92fdf99384
Further renaming, this time from pegParser to pegTokenizer.
2011-12-08 10:59:44 +00:00
Gabriel Wicke
76bc477038
Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
...
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke
1d299f6aa9
Also print out options for failing tests.
2011-12-07 11:45:05 +00:00
Gabriel Wicke
0734fb24c5
Add a few more items to the whitelist
2011-12-07 11:44:38 +00:00
Gabriel Wicke
7e1585d360
Add empty tables to the whitelist (legal in HTML5). Also add one more
...
functionally identical italic/bold/link permmutation on the whitelist.
2011-12-06 22:05:43 +00:00
Trevor Parscal
e61e66856c
Fixed issue in transaction processor's insert method - no need for a special case for structural offsets anymore
2011-12-06 22:04:18 +00:00
Trevor Parscal
88f22ec10f
Added test which currently fails because Transaction processor is broken
2011-12-06 21:36:36 +00:00
Gabriel Wicke
1a5ffacc5c
Add slightly different but functionally identical italic/bold/link nesting to
...
whitelist.
2011-12-06 16:45:19 +00:00
Gabriel Wicke
a922d595cf
Really minor: Add a newline after whitelist printout.
2011-12-06 13:16:43 +00:00
Gabriel Wicke
1bd3f8321e
Minor beautification of whitelist entry print-out header.
2011-12-06 12:35:32 +00:00
Gabriel Wicke
228fccd0c1
Strip toc and edit sections from expected html for now.
2011-12-06 11:39:53 +00:00
Antoine Musso
350d1e8978
util.inspect to dump tokens
...
It gets a better output over JSON.stringify since inspect nicely indent
the object/array dump. Makes it easier to read for humans.
2011-12-06 10:23:58 +00:00
Gabriel Wicke
33e19f7275
Recognize block-level elements independent of case; Ignore toc and section
...
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Trevor Parscal
07af0cab63
* Moved getContent and getText from leaf nodes to document model nodes
...
* Renamed getContent to getContentData
* Renamed getText to getContentText
* Added getElementData
2011-12-05 19:41:04 +00:00
Gabriel Wicke
a6867d76c5
Ignore missing redlink for now, we are concerned with the parser and not a
...
complete wiki at this stage.
2011-12-05 17:07:06 +00:00
Gabriel Wicke
1760210d13
Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
...
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Antoine Musso
7ead617a2e
--cache to save the test cases parsing
...
This is optional but speed up launchtime when other files are not
modified.
2011-12-01 17:51:07 +00:00
Antoine Musso
c21a81ee45
warn on invalid regex passed to --filter
2011-12-01 15:45:40 +00:00
Gabriel Wicke
63c728924b
Use pegjs from npm
2011-12-01 15:23:23 +00:00
Gabriel Wicke
d00743ad79
Improve external links and definition lists, now 133 tests passing ;)
...
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Antoine Musso
cb682c5ade
option to disable color output (use --no-color )
2011-12-01 12:30:15 +00:00
Gabriel Wicke
5d50c6bbf3
Follow-up to r104845: s/args/argv
2011-12-01 12:10:43 +00:00
Gabriel Wicke
edf40c616c
Make whitelist usage an option; tweak comment a bit
2011-12-01 11:47:22 +00:00
Gabriel Wicke
5f72acec8f
Add option to disable whitelist
2011-12-01 11:08:05 +00:00
Gabriel Wicke
35efed6634
Add a parser test whitelist for manually-checked tests, and an option to print
...
JSON-serialized parser output for failing tests, which can then be added to
the whitelist if appropriate.
2011-12-01 10:58:12 +00:00
Gabriel Wicke
e7f182d786
Strip the patch header lines, don't really need those
2011-11-30 18:21:53 +00:00
Antoine Musso
2b6d1896cb
colorize numbers in test summary
...
Also added Brion's ALL TEST PASSED when it makes sense
2011-11-30 17:43:54 +00:00
Antoine Musso
ed74636ab5
--quick Suppress diff output of failed tests
...
A long block of code was not reindented to make this patch easier
to read for people not ignoring white spaces changes :D
2011-11-30 17:18:24 +00:00
Antoine Musso
ebfc6f08fd
--quiet suppress notification of passed tests
...
--no-quiet will make sure you always see PASSING tests :)
2011-11-30 17:10:07 +00:00
Antoine Musso
3038df313f
allow test filtering using a regexp on title test
...
use --filter :)
2011-11-30 17:03:29 +00:00