This is for the case where we have a zero-length range in between two
siblings, and we need to know what index that corresponds to in order to
be able to insert nodes there (rebuildNodes() will use it for this
purpose)
Change-Id: I357d1cd665667a76f955a10b8d9d2810976cdbd7
Selecting a zero-length range at the start or end of a text node
(e.g. (1,1)) would return the text node's parent instead of the text
node
Change-Id: I7fe089bf66b93185dd3415eff53aa7e04e3ffdb2
This makes it possible to transclude list items from a template.
Note: "5 quotes" test is broken by this patch, it appears that ListHandler
newline processing is changing some state which mysteriously affects the
QuoteTransformer. This is ominous, hopefully there's a simple explanation...
gwicke: fix a bug in tokenizer triggered by definition lists like this:
**; foo : bar
Change-Id: I4e3a86596fe9bffcbfc4bf22895362c3bf742bad
This gets rid of the length == outerLength-2 hack in getDataFromNode()
and will make it easier to implement similar logic in selectNodes()
Change-Id: I1294350b67ca3eefde2b7fe9fea0bc6d8b90f772
* Moved implementation of getting and updating a DOM wrapper to ve.ce.BranchNode
* Updated ve.ce.BranchNode tests
* Renamed ve.NodeFactory.createNode to ve.NodeFactory.create
* Added ve.NodeFactory.lookup which gets the constructor of a type
* Added attribute pass-through to ve.dm.BranchNodeStub
Change-Id: I8f5b7d3d3ae616cc5f39828b24b655163d782ae5
* Makes it simpler in the linear model because we don't have to use style: "item" for regular list items and style: "definition" for definition lists
* Enforces correct nesting through existing node rules systems
* Updates tests accordingly
Change-Id: I64d80af938e325f1961226505bdc386bb35ccdda
* Also fixed calls to addListenerMethod
* Also routed adding children in the constructor of ve.dm.BranchNode to the splice method
* Renamed types of ve.dm stub nodes to avoid collisions (since we have to register ce nodes by the same names for them to be generated by onSplice)
Change-Id: Ia2e75cf0a62186cc0e214683feb25c619590318a
* Fixed constructor of ve.ce.BranchNode which was calling the wrong method to perform an onSplice and with the wrong arguments
* Removed/renamed events emitted from ve.ce.BranchNode.onSplice
* Reintroduced .$ to all ce nodes
* Ported over functionality for DOM node type variance used in headings, lists and list items
* Moved the old ve.ce.Content guts to ve.ce.TextNode
* Added getOffsetFromNode and getDataFromNode to ve.dm.DocumentFragment
* Added setDocument and getDocument to dm nodes
Change-Id: I185423ba2f1a858dde562cb2f5bc3852aec930db
* Add .process() and its helpers from ve1
* Fix applyAnnotations()
** Use data[i] rather than data[j]
** Don't add empty annotation objects for no reason
** Remove empty annotation objects
*** I thought this wasn't needed, but it is needed for clean rollbacks
* Remove unused second parameter in applyAnnotations() call
Change-Id: Ia338f62d2eaf2a76f8ef653eead05bc44757a122
* Also removed beforeSplice and afterSplice in favor of just plain splice which is the same as afterSplice used to be - beforeSplice was never used and it was making things more complex looking than needed
Change-Id: Icbbc57eac73a2a206ba35409ab57b3d1a49ab1a5
* Added support for asking if a given node type can have children or grandchildren and what types of nodes can be it's parent or child
* Removed canHaveChildren methods from leaf and branch nodes and converted use of them to depend on factory to read static rules from constructor lookup by type
Change-Id: I9769f95647066576416bacb791c4b68dd0285b35
These are needed to make sure the base classes behave correctly, the ones in sub classes are needed to make sure the prototypes of the base class are correctly inherited
Change-Id: I334cc3ce1c4c0ce2eed23c79e8877332a953f7c3
Neither of these nodes have elements around them, so they must override the default behavior of ve.dm.Node
Change-Id: I19c02c210bfc04b6e2ee1a37b8890e84a236eee3
* Moved node tree assertion to ve.dm.example
* Added rebuildNodes test
* Fixed some typos in rebuildNodes
Change-Id: I4853ded4b062aaa3758435093368bc23667ca3bf
Image nodes are leafs, so providing an empty array to their children/length "contents" constructor argument ends up setting the numeric length to [], which casts to an empty string in arithmetic, causing all further calculations to be concatenations instead of additions.
Change-Id: I40e1ea2295f6095318bc4c24185cadfdfb684557
* Indexes for attribute references were off (the comments that were previously wrong before were relied on and cause this to also be wrong)
* Tree should start with a document node
Change-Id: Ia8e4faa4bcb25db797ff97f6e798ba253adfe535
Within ve.dm.DocumentFragment it makes more sense to call the root node (which is always a document node) a document node, especially since there may be a different node used as a root.
This commit also adds test for getDocumentNode and getNodeFromOffset which uses the offset map.
Change-Id: Ic4609233cedc41f7e5a5f8fdb0e6178652c95554
And fixed ve.dm.DocumentFragment constructor to generate a correct offset map which creates references to branch nodes only
Change-Id: If9e515be0c63d272bfed9bf4da625a48edd36f48
* [[:en:Barack Obama]] can now be expanded in 77 seconds using 330MB RAM,
while it would prevously run out of RAM after ~30 minutes. Wohoooo!
The token transform framework rework really paid off.
* 303 parser tests are passing in the new record time of 5.5 seconds. Two more
tests are passing since these tests expect the day of the week to be
Thursday. Won't be the case tomorrow.
Change-Id: I56e850838476b546df10c6a239c8c9e29a1a3136
* All parser pipelines including tokenizer and DOM stuff are now constructed
from a 'recipe' data structure in a ParserPipelineFactory.
* All sub-pipelines of these can now be cached
* Event registrations to a pipeline are directly forwarded to the last
pipeline member to save relatively expensive event forwarding.
* Some APIs for on-demand expansion / format conversion of parameters from
parser functions are added:
param.to('tokens/expanded', cb)
param.to('text/wiki', cb) (this does not work yet)
All parameters are additionally wrapped into a Param object that provides
method for positional parameter naming (.named() or conversion to a dict
(.dict()).
* The async token transform manager is now separated from a frame object, with
the frame holding arguments, an on-demand expansion method and loop checks.
* Only keys of template parameters are now expanded. Parser functions or
template arguments trigger an expansion on-demand. This (unsurprisingly)
makes a big performance difference with typical switch-heavy template
systems.
* Return values from async transforms are no longer used in favor of plain
callbacks. This saves the complication of having to maintain two code paths.
A trick in transformTokens still avoids the construction of unneeded
TokenAccumulators.
* The results of template expansions are no longer buffered.
* 301 parser tests are passing
Known issues:
* Cosmetic cleanup remains to do
* Some parser functions do not support async expansions yet, and need to be
modified.
Change-Id: I1a7690baffbe8141cadf67270904a1b2e1df879a
* Changed splice to check all elements about to be inserted are allowed before inserting any of them so that catching an exception leaves you in a sane state
* Fixed the order of execution of parent class constructors in ve.dm.LeafNode and ve.dm.TwigNode so that canHaveChildren and canHaveGrandchildren produce correct values and added tests to ensure these methods are correctly inherited in subclasses
* Added tests that check for exceptions when adding nodes that can have children to nodes that can not have grandchildren
* Added test that check for events being emitted before and after splicing, including that beforeSplice should be emitted even in cases where a splice fails and throws an exception because the nodes are incompatible (but afterSplice is not called in this case) since beforeSplice might modify the nodes in some way before the compatibility tests are run
Change-Id: Id12aea995a42c26ff63a74ae3d31f2bf455759e3
* Moved getParent and getRoot from ve.dm.Node back to ve.Node
* Fixed use of getElementLength that should have been changed to getOuterLength, but was changed to getLength (oops)
Change-Id: Ibe5b855aef533dcd493f762a8a02c6a11ce6e7de
In this commit several methods (child node add/remove and parent/root modification) were also moved to ve.dm.BranchNode ve.dm.Node respectively. ve.Node and ve.BranchNode are immutable. ve.dm.Node and ve.dm.BranchNode are mutable. Other subclasses of ve.Node and ve.BranchNode should implement functionality to mimic changes made to a data model.
Change-Id: Ia9ff78764f8f50f99fc8f9f9593657c0a0bf287e
* Ignore safesubst for now
* Remove an unneeded whitelist entry
* Make sure the caption is not lost for thumbs (fix to last commit) and remove
debug print
Change-Id: I243584ed0838cf7c3b4110fe9cdf869272477312
The HTML5 parser we are using to normalize expected HTML output in parserTests
reverses the order of attributes (see
https://github.com/aredridel/html5/pull/53 for the fix). Remove whitelist
entries concerned with this and use the proper order in external image
attributes.
Change-Id: If1868cae05396a150757c85a20473ab756cbcd97
This has some TODOs still but I want to land it now anyway, and fix the
TODOs later.
* Add this.offsetMap which maps each linear model offset to a model tree node
* Refactor createNodesFromData()
** Rename it to buildSubtreeFromData()
** Have it build an offset map as well as a node subtree
** Have it set the root on the fake root node so that when the subtree
is attached to the main tree later, we don't get a rippling root
update all the way down
** Normalize the way the loop processes content, that way adding offsets
for content is easier
* Add rebuildNodes() which uses buildSubtreeFromData() to rebuild stuff
* Use rebuildNodes() in DocumentSynchronizer
* Use pushRebuild() in TransactionProcessor
* Optimize setRoot() for the case where the root is already set correctly
Change-Id: I8b827d0823c969e671615ddd06e5f1bd70e9d54c
Explained in the README how to use npm to load the dependencies and run tests. Too bad about NODE_PATH...
Don't try to find parserTests.txt in assorted places--if it isn't present, fetch from gerrit. You can symlink from core if you're developing on both parsers, and the fetch script will not overwrite.
Use __dirname in parserTests.js to allow the script to run independent of current working directory.
Change-Id: I4c8b884e91f4fdeae385c7697aff768bdd199dd5
Instead of a proliferation of data-mw-* attributes, it should be easier to
stash all private / non-semantic round-trip information in a JSON object
stored in data-mw.
Change-Id: Id200a6a8789fa152f29ea530e5a24b6ee7b4b285
To handle replace operations that are not themselves consistent (these
are common, for instance when replacing an opening element in one place,
then replacing the closing element somewhere else), we process
subsequent replace operations inside the first one until things are
balanced again, then issue a single rebuild for the whole thing.
Change-Id: Ide4613f046fabfeeef383138c39e350b1b710033
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation
Removed a few whitelisted test cases that are now passing directly.
The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.
Note that the compiled .js file (generated by "make"/"make test")
is still under version control so folks can work on the project
even without a running "coffee" command in PATH.
Also updated README to mention coffee-script and "make test".
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.
A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
are now merged with specific registrations by rank. Not yet clear if that is a
good idea overall, need to check use cases when implementing template expansion
and other functionality.
183 parser test now passing.
The TokenTransformDispatcher now actually implements an asynchronous, phased
token transformation framework as described in
https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations.
Additionally, the parser pipeline is now mostly held together using events.
The tokenizer still emits a lame single events with all tokens, as block-level
emission failed with scoping issues specific to the PEGJS parser generator.
All stages clean up when receiving the end tokens, so that the full pipeline
can be used for repeated parsing.
The QuoteTransformer is not yet 100% fixed to work with the new interface, and
the Cite extension is disabled for now pending adaptation. Bold-italic related
tests are failing currently.
tests now passing.
Link trails depend on language-dependent positive character classes in the PHP
parser. These classes all seem to disallow punctuation implicitly and list
differing plain text characters instead, so it might be possible to get away
with identifying a common class of non-trail punctuation instead. This would
help to keep the tokenizer independent of configurations, which is very
desirable for caching and simplified external parsing.
start / row / table end). The old productions are not deleted yet to make it
easy to compare the output on more complex articles. 181 tests passing after
adding two table tests with whitespace-only differences to the whitelist.
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.
In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.
The parser performance is very slightly improved as the token stream is only
traversed once.
Handle arguments and options properly by using the 'optimist' node module.
Please note wordwrapping in usage does not seem to work on my setup :(
Only --help implemented yet.
Example:
$ node parserTests.js --help
Starting up JS parser tests
Usage: node ./parserTests.js
Options:
--filter, --regex Only run tests whose descriptions which match given regex (option not implemented)
--help, -h Show this help message
--disabled Run disabled tests (default false) (option not implemented) [boolean]
html markup handling.
* Remove global 'use strict' declarations from html5 parser.
* Add trailing whitespace handling in dt
Overall, 55 parser tests are now passing.
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
* Added a bunch of utility functions for working with character data and annotations
* Got toolbar button states to follow selection of more than one character
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.