Commit graph

194 commits

Author SHA1 Message Date
au 0360e62da7 * Locally apply the HTML5.Marker.type patch.
This is needed until https://github.com/aredridel/html5/issues/44
  is merged into the upstream "html5" module.
2012-02-18 17:28:35 +00:00
Gabriel Wicke 025f9cddb3 Prefix all internal data- attributes with data-mw- and adjust the whitelist
and test output normalization accordingly. 235 tests passing.
2012-02-13 13:54:07 +00:00
Gabriel Wicke a122e51eec Move data-* annotations into separate object on tokens, that is then
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.

A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
2012-02-11 16:43:25 +00:00
Gabriel Wicke 1f6db903e9 Pluck a few low-hanging fruit in external link tokenization, and add a simple
localurl parser function implementation. 230 parser tests now passing.
2012-02-07 10:28:23 +00:00
Gabriel Wicke d321d96bab Fix parserTests summary with filtering enabled 2012-02-07 09:27:47 +00:00
Trevor Parscal 5d71c888f9 Updated unit tests in response to structural changes in r110805 2012-02-07 00:12:31 +00:00
Gabriel Wicke a5b7ea7bcd Add --debug and --trace options to parserTests as well. 2012-02-01 17:02:37 +00:00
Gabriel Wicke 7cd94df47d A few minor tweaks to reduce memory usage 2012-01-27 13:32:44 +00:00
Gabriel Wicke 348cac6cf0 Fix a bug in TokenCollector, and misc tweaks for template expansions. 2012-01-20 18:47:17 +00:00
Gabriel Wicke 2233d0a488 Eventify parser tests and parse.js commandline wrapper to actuallly allow
async template fetching. Async expansion is not yet fully debugged, but at
least the preconditions for that are now there.
2012-01-18 23:46:01 +00:00
Gabriel Wicke 34025251a3 Clean up 'END' token handling a bit. 2012-01-17 20:01:21 +00:00
Gabriel Wicke f4081bef08 First template expansion tests start working, and a bug fix in
DOMPostProcessor paragraph wrapper. 187 parser tests now passing.
2012-01-14 00:58:20 +00:00
Gabriel Wicke 5ec30252f1 More token transform and pipeline setup refactoring to support template
expansion better.
2012-01-10 01:09:50 +00:00
Gabriel Wicke 2e35171fd1 Fix quote handling and tweak the whitelist a bit. 'any' token registrations
are now merged with specific registrations by rank. Not yet clear if that is a
good idea overall, need to check use cases when implementing template expansion
and other functionality.

183 parser test now passing.
2012-01-04 14:09:05 +00:00
Gabriel Wicke 29362cc53c Rename ParseThingy to ParserPipeline and fix up broken WikiDom generation and
commandline runner.
2012-01-04 08:39:45 +00:00
Gabriel Wicke bd98eb4c5a Land big TokenTransformDispatcher and eventization refactoring.
The TokenTransformDispatcher now actually implements an asynchronous, phased
token transformation framework as described in
https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations.

Additionally, the parser pipeline is now mostly held together using events.
The tokenizer still emits a lame single events with all tokens, as block-level
emission failed with scoping issues specific to the PEGJS parser generator.
All stages clean up when receiving the end tokens, so that the full pipeline
can be used for repeated parsing.

The QuoteTransformer is not yet 100% fixed to work with the new interface, and
the Cite extension is disabled for now pending adaptation. Bold-italic related
tests are failing currently.
2012-01-03 18:44:31 +00:00
Gabriel Wicke 8e00a72d0a Improvements to link trail handling, and two tweaks to the whitelist. 182
tests now passing. 

Link trails depend on language-dependent positive character classes in the PHP
parser. These classes all seem to disallow punctuation implicitly and list
differing plain text characters instead, so it might be possible to get away
with identifying a common class of non-trail punctuation instead. This would
help to keep the tokenizer independent of configurations, which is very
desirable for caching and simplified external parsing.
2011-12-30 12:47:06 +00:00
Gabriel Wicke b3a0270d69 Remove env and load grammar in tokenizer constructor. Re-add property hack to
keep parserTests running for now. Really need a different pipeline for html
serialization or a reference to the HTML DOM.
2011-12-28 17:04:16 +00:00
Neil Kandalgaonkar 8fbf36e63e put add terminal token inside tokenize method (will pull it out again for streaming interface) 2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar 6103646ec8 remove need to add newline at end of input 2011-12-28 01:37:11 +00:00
Neil Kandalgaonkar 4158f82d7e refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js 2011-12-28 01:37:06 +00:00
Neil Kandalgaonkar aedc6751ae made parseThingy, temp class for refactoring all thingies related to parsing 2011-12-28 01:36:58 +00:00
Neil Kandalgaonkar 5ff2b4d475 make peg src path outside of peg tokenizer 2011-12-28 01:36:50 +00:00
Neil Kandalgaonkar 962d1262fc create tokenizer without need to modify namespace with PEG source 2011-12-28 01:36:36 +00:00
Gabriel Wicke 1c7fe0eb34 Refactor table productions to support table fragments in templates (table
start / row / table end). The old productions are not deleted yet to make it
easy to compare the output on more complex articles. 181 tests passing after
adding two table tests with whitespace-only differences to the whitelist.
2011-12-22 11:43:55 +00:00
Gabriel Wicke 574abd9774 A collection of small bug fixes to the grammar, Cite, the Token format
converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all
parserTests.
2011-12-14 23:38:46 +00:00
Gabriel Wicke dc77d73ad5 Add ability to pass through JSON data to WikiDom in data-json-* attributes,
and fix parser to actually parse the Barack Obama article except for one table
with nested templates at the start-of-line.
2011-12-14 17:25:09 +00:00
Gabriel Wicke a09aa4d599 Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of
parser tests using 'node parserTests.js --wikidom'.
2011-12-14 15:15:41 +00:00
Gabriel Wicke 5f80d30428 Clean up access to document and body after building the tree. 2011-12-14 09:40:49 +00:00
Gabriel Wicke feee9ded9f Convert the Cite extension to a token stream transformer.
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.

In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke c33f74d227 Follow-up to r106001: Fix typo spotted by Nikerabbit. Good catch! 2011-12-13 13:00:57 +00:00
Gabriel Wicke 8e55e79b67 Rename TokenTransformer to TokenTransformDispatcher. 2011-12-13 11:45:12 +00:00
Gabriel Wicke 815c63ba6c Disabled es* inclusion for now as the serializers are not currently used, and
the recent addition of references to window are not compatible with node.js.
2011-12-13 11:17:33 +00:00
Gabriel Wicke dc70687ed0 Update README 2011-12-13 10:03:01 +00:00
Gabriel Wicke a8fa9433c4 Convert quote handling (italic/bold) to a core extension operating on the
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.

The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke 752b0990b2 Refactor parserTests somewhat into a class-like structure, and wire up the
TokenTransformer.
2011-12-12 14:03:54 +00:00
Gabriel Wicke d616f07a79 Don't re-build the wiki tokenizer for each test. This speeds up the full
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)

A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke abc2254110 A bit of comment clean-up and wrapping of tree building into try/catch block
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke 92fdf99384 Further renaming, this time from pegParser to pegTokenizer. 2011-12-08 10:59:44 +00:00
Gabriel Wicke 76bc477038 Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke 1d299f6aa9 Also print out options for failing tests. 2011-12-07 11:45:05 +00:00
Gabriel Wicke 0734fb24c5 Add a few more items to the whitelist 2011-12-07 11:44:38 +00:00
Gabriel Wicke 7e1585d360 Add empty tables to the whitelist (legal in HTML5). Also add one more
functionally identical italic/bold/link permmutation on the whitelist.
2011-12-06 22:05:43 +00:00
Trevor Parscal e61e66856c Fixed issue in transaction processor's insert method - no need for a special case for structural offsets anymore 2011-12-06 22:04:18 +00:00
Trevor Parscal 88f22ec10f Added test which currently fails because Transaction processor is broken 2011-12-06 21:36:36 +00:00
Gabriel Wicke 1a5ffacc5c Add slightly different but functionally identical italic/bold/link nesting to
whitelist.
2011-12-06 16:45:19 +00:00
Gabriel Wicke a922d595cf Really minor: Add a newline after whitelist printout. 2011-12-06 13:16:43 +00:00
Gabriel Wicke 1bd3f8321e Minor beautification of whitelist entry print-out header. 2011-12-06 12:35:32 +00:00
Gabriel Wicke 228fccd0c1 Strip toc and edit sections from expected html for now. 2011-12-06 11:39:53 +00:00
Antoine Musso 350d1e8978 util.inspect to dump tokens
It gets a better output over JSON.stringify since inspect nicely indent
the object/array dump. Makes it easier to read for humans.
2011-12-06 10:23:58 +00:00
Gabriel Wicke 33e19f7275 Recognize block-level elements independent of case; Ignore toc and section
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Trevor Parscal 07af0cab63 * Moved getContent and getText from leaf nodes to document model nodes
* Renamed getContent to getContentData
* Renamed getText to getContentText
* Added getElementData
2011-12-05 19:41:04 +00:00
Gabriel Wicke a6867d76c5 Ignore missing redlink for now, we are concerned with the parser and not a
complete wiki at this stage.
2011-12-05 17:07:06 +00:00
Gabriel Wicke 1760210d13 Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Antoine Musso 7ead617a2e --cache to save the test cases parsing
This is optional but speed up launchtime when other files are not
modified.
2011-12-01 17:51:07 +00:00
Antoine Musso c21a81ee45 warn on invalid regex passed to --filter 2011-12-01 15:45:40 +00:00
Gabriel Wicke 63c728924b Use pegjs from npm 2011-12-01 15:23:23 +00:00
Gabriel Wicke d00743ad79 Improve external links and definition lists, now 133 tests passing ;)
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Antoine Musso cb682c5ade option to disable color output (use --no-color ) 2011-12-01 12:30:15 +00:00
Gabriel Wicke 5d50c6bbf3 Follow-up to r104845: s/args/argv 2011-12-01 12:10:43 +00:00
Gabriel Wicke edf40c616c Make whitelist usage an option; tweak comment a bit 2011-12-01 11:47:22 +00:00
Gabriel Wicke 5f72acec8f Add option to disable whitelist 2011-12-01 11:08:05 +00:00
Gabriel Wicke 35efed6634 Add a parser test whitelist for manually-checked tests, and an option to print
JSON-serialized parser output for failing tests, which can then be added to
the whitelist if appropriate.
2011-12-01 10:58:12 +00:00
Gabriel Wicke e7f182d786 Strip the patch header lines, don't really need those 2011-11-30 18:21:53 +00:00
Antoine Musso 2b6d1896cb colorize numbers in test summary
Also added Brion's ALL TEST PASSED when it makes sense
2011-11-30 17:43:54 +00:00
Antoine Musso ed74636ab5 --quick Suppress diff output of failed tests
A long block of code was not reindented to make this patch easier
to read for people not ignoring white spaces changes :D
2011-11-30 17:18:24 +00:00
Antoine Musso ebfc6f08fd --quiet suppress notification of passed tests
--no-quiet will make sure you always see PASSING tests :)
2011-11-30 17:10:07 +00:00
Antoine Musso 3038df313f allow test filtering using a regexp on title test
use --filter :)
2011-11-30 17:03:29 +00:00
Antoine Musso 513b2e85b7 proper argument handling (requires 'optimist' module)
Handle arguments and options properly by using the 'optimist' node module.
Please note wordwrapping in usage does not seem to work on my setup :(

Only --help implemented yet.

Example:

$ node parserTests.js --help
Starting up JS parser tests
Usage: node ./parserTests.js

Options:
  --filter, --regex  Only run tests whose descriptions which match given regex (option not implemented)
  --help, -h         Show this help message                                                            
  --disabled         Run disabled tests (default false) (option not implemented)                         [boolean]
2011-11-30 16:33:26 +00:00
Antoine Musso 567ef896e7 add some console messages 2011-11-30 15:33:56 +00:00
Antoine Musso 302e1519b3 add colors to visual editor parser testing
TODO: add an option to switch color scheme for light/dark backgrounds
2011-11-30 15:20:46 +00:00
Gabriel Wicke f758894de7 Let another test pass by swapping the default order of italic/bold for '''''.
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke 2bb512a4de A bit of tokenizer grammar clean-up and additional expected-html
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke ae0b5f9af4 * Split paragraph handling between tokenizer and DOM postprocessor for better
html markup handling. 
* Remove global 'use strict' declarations from html5 parser. 
* Add trailing whitespace handling in dt

Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke d7537d9777 Improve comment and general data-* attribute normalization. 2011-11-28 16:55:50 +00:00
Gabriel Wicke 1c91daa7be Provide a summary of failures. 2011-11-28 14:53:07 +00:00
Gabriel Wicke a875597530 Keep going of the HTML parser fails to normalize the expected test outcome.
Minor code simplification, and recognition of tr, td and tbody as block-level
elements.
2011-11-28 14:00:14 +00:00
Antoine Musso 9e887cc34c Adds a path fallback to find test file.
I do not fetch mediawiki in ../../../../phase3 . This patch use another
path as a fallback.
2011-11-28 11:41:47 +00:00
Antoine Musso 901b0a8911 list some more needed node module 2011-11-28 11:40:14 +00:00
Gabriel Wicke 901a089358 Shorten diff output and display comments before each failing test. 2011-11-28 11:38:48 +00:00
Gabriel Wicke 5c2a145bdf Add diff output as well. 2011-11-28 11:19:50 +00:00
Gabriel Wicke d3f0196df7 Add primitive HTML comparison to detect passing parser tests. The expected
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke dd5cd59ac6 Better HTML, pre and blocklevel handling. Hackish source formatting for easier
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Roan Kattouw 5ac817a6f4 Fix bugs in prepareContentAnnotation() related to structural offsets, and add a test. Also add parenthesis to the if statement mixing || and &&, for clarity 2011-11-24 16:27:40 +00:00
Roan Kattouw 6f3c407314 Introduce es.DocumentNode.getCommonAncestorPaths(), with tests 2011-11-24 15:34:12 +00:00
Gabriel Wicke dee262658f Add MediaWiki-compatible quote handling including quirks and overlapped
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Trevor Parscal 3ed6544fe2 Added test that exposes bugs in prepareContentAnnotation 2011-11-23 23:24:05 +00:00
Trevor Parscal 8ed6ee5e3c Fixed a test which was poorly named and had incorrect data 2011-11-23 19:37:57 +00:00
Trevor Parscal 20da830a26 Rewrite of undo/redo - now completely implemented in es.SurfaceModel 2011-11-22 22:59:05 +00:00
Gabriel Wicke 8def550629 Fix parserTests path for full svn checkout 2011-11-22 12:32:34 +00:00
Trevor Parscal 631323b9bd * Refactored es.HistoryModel to always be working from a single array rather than a buffer and an array
* Added support for associating a selection with a state
2011-11-21 23:51:37 +00:00
Trevor Parscal 779a63f486 * Switched to using JSON for hashing, allowing us to use the native JSON.stringify where available, which is much faster
* Added a bunch of utility functions for working with character data and annotations
* Got toolbar button states to follow selection of more than one character
2011-11-21 22:32:22 +00:00
Gabriel Wicke d1b0293569 Fix comment token conversion and serialization 2011-11-21 09:22:30 +00:00
Gabriel Wicke b750ce38b8 Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Roan Kattouw 35a99b4be0 Make es.TransactionProcessor.remove() handle deep merges correctly, and add test cases. The code is still a bit rough and ugly and needs a bit more work, but I'll clean that up later; at least it works now. 2011-11-18 10:17:35 +00:00
Trevor Parscal 48e7f4c3c6 Initial checkin of new es.HistoryModel (needs tests) 2011-11-17 22:44:11 +00:00
Trevor Parscal 6fded56cec Renamed es.Transaction to es.TransactionModel 2011-11-17 22:42:18 +00:00
Roan Kattouw 117c785d85 Improve the merging logic in prepareRemoval() to also allow merging nested nodes, e.g. by deleting </p></li><li><p> 2011-11-17 19:23:15 +00:00
Gabriel Wicke ea87e7aaee Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Roan Kattouw be994da373 Make selectNodes() also descend (recurse) into child nodes when only the start or only the end is in the middle of a child node. Without this, it was stuff like ranges with only openings and no closings. 2011-11-17 15:01:47 +00:00