* Fixed constructor of ve.ce.BranchNode which was calling the wrong method to perform an onSplice and with the wrong arguments
* Removed/renamed events emitted from ve.ce.BranchNode.onSplice
* Reintroduced .$ to all ce nodes
* Ported over functionality for DOM node type variance used in headings, lists and list items
* Moved the old ve.ce.Content guts to ve.ce.TextNode
* Added getOffsetFromNode and getDataFromNode to ve.dm.DocumentFragment
* Added setDocument and getDocument to dm nodes
Change-Id: I185423ba2f1a858dde562cb2f5bc3852aec930db
* Add .process() and its helpers from ve1
* Fix applyAnnotations()
** Use data[i] rather than data[j]
** Don't add empty annotation objects for no reason
** Remove empty annotation objects
*** I thought this wasn't needed, but it is needed for clean rollbacks
* Remove unused second parameter in applyAnnotations() call
Change-Id: Ia338f62d2eaf2a76f8ef653eead05bc44757a122
* Also removed beforeSplice and afterSplice in favor of just plain splice which is the same as afterSplice used to be - beforeSplice was never used and it was making things more complex looking than needed
Change-Id: Icbbc57eac73a2a206ba35409ab57b3d1a49ab1a5
They are used by ve.dm.factory so this might make it easier for people to understand what's going on.
Change-Id: I490627e3bfc55ca9c96fdc4f5d047737b6a3db8c
* Added support for asking if a given node type can have children or grandchildren and what types of nodes can be it's parent or child
* Removed canHaveChildren methods from leaf and branch nodes and converted use of them to depend on factory to read static rules from constructor lookup by type
Change-Id: I9769f95647066576416bacb791c4b68dd0285b35
* Moved node tree assertion to ve.dm.example
* Added rebuildNodes test
* Fixed some typos in rebuildNodes
Change-Id: I4853ded4b062aaa3758435093368bc23667ca3bf
Image nodes are leafs, so providing an empty array to their children/length "contents" constructor argument ends up setting the numeric length to [], which casts to an empty string in arithmetic, causing all further calculations to be concatenations instead of additions.
Change-Id: I40e1ea2295f6095318bc4c24185cadfdfb684557
Within ve.dm.DocumentFragment it makes more sense to call the root node (which is always a document node) a document node, especially since there may be a different node used as a root.
This commit also adds test for getDocumentNode and getNodeFromOffset which uses the offset map.
Change-Id: Ic4609233cedc41f7e5a5f8fdb0e6178652c95554
And fixed ve.dm.DocumentFragment constructor to generate a correct offset map which creates references to branch nodes only
Change-Id: If9e515be0c63d272bfed9bf4da625a48edd36f48
* parentCB (if set) is called with { async: true } if expansion is going to be
asynchronous.
* Strings are handled efficiently
* all value parameter chunks can now be converted using .to().
Change-Id: Ib013e1bc3d8e7f692009038209db6a056887326e
Now setting up multiple toolbars per config
Tools & Modes are now configurable per toolbar per instance
Base elements are created on demand and no longer id specific
Note: There are some bugs with multiple instances.
Change-Id: Id0bbbca2d1b76fd2db3f3b0f9abd90194930b610
* [[:en:Barack Obama]] can now be expanded in 77 seconds using 330MB RAM,
while it would prevously run out of RAM after ~30 minutes. Wohoooo!
The token transform framework rework really paid off.
* 303 parser tests are passing in the new record time of 5.5 seconds. Two more
tests are passing since these tests expect the day of the week to be
Thursday. Won't be the case tomorrow.
Change-Id: I56e850838476b546df10c6a239c8c9e29a1a3136
* All parser pipelines including tokenizer and DOM stuff are now constructed
from a 'recipe' data structure in a ParserPipelineFactory.
* All sub-pipelines of these can now be cached
* Event registrations to a pipeline are directly forwarded to the last
pipeline member to save relatively expensive event forwarding.
* Some APIs for on-demand expansion / format conversion of parameters from
parser functions are added:
param.to('tokens/expanded', cb)
param.to('text/wiki', cb) (this does not work yet)
All parameters are additionally wrapped into a Param object that provides
method for positional parameter naming (.named() or conversion to a dict
(.dict()).
* The async token transform manager is now separated from a frame object, with
the frame holding arguments, an on-demand expansion method and loop checks.
* Only keys of template parameters are now expanded. Parser functions or
template arguments trigger an expansion on-demand. This (unsurprisingly)
makes a big performance difference with typical switch-heavy template
systems.
* Return values from async transforms are no longer used in favor of plain
callbacks. This saves the complication of having to maintain two code paths.
A trick in transformTokens still avoids the construction of unneeded
TokenAccumulators.
* The results of template expansions are no longer buffered.
* 301 parser tests are passing
Known issues:
* Cosmetic cleanup remains to do
* Some parser functions do not support async expansions yet, and need to be
modified.
Change-Id: I1a7690baffbe8141cadf67270904a1b2e1df879a
Renamed Selection method to more suitable name.
Misc cleanup
Patchset 2, whitespace cleanup
Patchset 3: Change values used with selection direction to -1 or 1
1 for left to right (normal)
-1 for right to left (opposite)
Change-Id: If9ecc721ace1c7550903170f92395947f1ccc22c
It's not being used at all, and it's broken because
this.lengthDifference is set to zero regardless of what length
difference the operations passed into the constructor might cause
Change-Id: I3b7a312a1920347e7bf34df88a05bf6f2ff11f7d
* Changed splice to check all elements about to be inserted are allowed before inserting any of them so that catching an exception leaves you in a sane state
* Fixed the order of execution of parent class constructors in ve.dm.LeafNode and ve.dm.TwigNode so that canHaveChildren and canHaveGrandchildren produce correct values and added tests to ensure these methods are correctly inherited in subclasses
* Added tests that check for exceptions when adding nodes that can have children to nodes that can not have grandchildren
* Added test that check for events being emitted before and after splicing, including that beforeSplice should be emitted even in cases where a splice fails and throws an exception because the nodes are incompatible (but afterSplice is not called in this case) since beforeSplice might modify the nodes in some way before the compatibility tests are run
Change-Id: Id12aea995a42c26ff63a74ae3d31f2bf455759e3
* Moved getParent and getRoot from ve.dm.Node back to ve.Node
* Fixed use of getElementLength that should have been changed to getOuterLength, but was changed to getLength (oops)
Change-Id: Ibe5b855aef533dcd493f762a8a02c6a11ce6e7de
In this commit several methods (child node add/remove and parent/root modification) were also moved to ve.dm.BranchNode ve.dm.Node respectively. ve.Node and ve.BranchNode are immutable. ve.dm.Node and ve.dm.BranchNode are mutable. Other subclasses of ve.Node and ve.BranchNode should implement functionality to mimic changes made to a data model.
Change-Id: Ia9ff78764f8f50f99fc8f9f9593657c0a0bf287e
Ground-up rewrite of the data model. Putting this in the ve2 directory for now so we still have the old code floating around.
Main changes so far in this rewrite:
* Renamed hasChildren() to canHaveChildren()
* Added canHaveGrandchildren()
* Added a new node type TwigNode that can have children but not grandchildren (so all of its children must be LeafNodes)
* Implemented push/pop/shift/unshift as wrappers around splice()
* Renamed getElementType() to getType(). Nodes now take a string as a type, and the element stuff is gone and won't be back
* Removed clearRoot(), replaced it with setRoot( this ) where needed
Change-Id: I23f3bb1b4a2473575e5446e87fdf17af107bacf6
This is a bit better than cloning tokens wholesale, but not by much. There is
a lot of potential for much better per-token caching with reduced token
cloning. Need to map out all dependencies besides token attributes expanded
from template parameters or other scoped state. Even if tokens themselves
don't need transformation, they might still need to be considered for other
token transformers, so simply keeping the final rank won't quite work even if
the token itself is fully transformed. As a minimum, a shallow clone would
need to be made and the rank reset (as in env.cloneTokens).
Change-Id: I4329113bb21750bae9a635229ed1b08da75dc614
* Added an LRU cache (using the lru-cache node module) for tokenizer output
* Mutation of nested attributes now replaces the containers. A shallow copy of
tokens is sufficient to isolate token transformations. Need to investigate
if we can actually get away without isolation and re-transformation for most
ordinary tokens.
Change-Id: I9136b1d7a1fbcc538183a319d4ecaa290d616fdf
* Ignore safesubst for now
* Remove an unneeded whitelist entry
* Make sure the caption is not lost for thumbs (fix to last commit) and remove
debug print
Change-Id: I243584ed0838cf7c3b4110fe9cdf869272477312
The HTML5 parser we are using to normalize expected HTML output in parserTests
reverses the order of attributes (see
https://github.com/aredridel/html5/pull/53 for the fix). Remove whitelist
entries concerned with this and use the proper order in external image
attributes.
Change-Id: If1868cae05396a150757c85a20473ab756cbcd97
* less verbose logging in noinclude processing and template expansion
* Give priority to the processing of templates transcluded from transclusions
to get closer to depth-first processing. This serves to minimize memory
usage from queued-up tokens.
* Increase the maximum outstanding requests per template retrieval. 10000
amazingly proved too low a limit on some big pages.
* Only process a single template request callback at a time for now
* Add a debug print in the treebuilder wrapper
* Don't treat multiple comments on a single line as a single comment to match
the PHP parser's behavior
Change-Id: I9a86b6d7bec3b9e1f17415daf1bf74170240721a
This has some TODOs still but I want to land it now anyway, and fix the
TODOs later.
* Add this.offsetMap which maps each linear model offset to a model tree node
* Refactor createNodesFromData()
** Rename it to buildSubtreeFromData()
** Have it build an offset map as well as a node subtree
** Have it set the root on the fake root node so that when the subtree
is attached to the main tree later, we don't get a rippling root
update all the way down
** Normalize the way the loop processes content, that way adding offsets
for content is easier
* Add rebuildNodes() which uses buildSubtreeFromData() to rebuild stuff
* Use rebuildNodes() in DocumentSynchronizer
* Use pushRebuild() in TransactionProcessor
* Optimize setRoot() for the case where the root is already set correctly
Change-Id: I8b827d0823c969e671615ddd06e5f1bd70e9d54c
In experiments this dropped the memory consumption further, and reduces the
queuing overhead in the node reactor.
Change-Id: I9409b6ca863b43b7557663bbec9572365059c078
Only call back a few callbacks per reactor iteration from the template fetch
request queue. This changes the expansion pattern from a (memory intensive)
breadth-first expansion to something quite close to depth-first expansion.
Additionally, retrieved pages are quickly added to the page cache so that a
lot of request queuing is avoided in favor of synchronous expansion from the
cache. On pages like Barack Obama that previously ran out of memory after
consuming node's 1.6G heap limit, expansion now runs in relatively constant
100-300M resident (so far, still running).
Change-Id: Ie34a1eeff00d868416de45ef8d289898258f560c
Eat unbalanced external link parts within template parameters. This does not
produce the same output as the PHP parser
(try echo '{{YouTube}}' | node parse.js), but preserves a level of sanity.
Need to check how common this is for external links. If it is rare enough,
moving the ']' after the parser function manually would fix the rendering for
the YouTube case.
Change-Id: I597d808efff36baa22191e7946a0061cc31120e8
Effectively stopping & starting polling prior to conversion
Getting Selection from model
Reselecting after conversion (TODO: modify selection to entire block ?)
Change-Id: I9ba331b5393bf568cc8d137646b43244ae2640a8
* add past paths for empty arguments etc
* cache attribute token transform pipelines
* fix bugs in TokenCollector and NoIncludeOnly handler, and improve its
efficiency by only registering for 'end' tokens on demand
* Remove empty reset methods from a few handlers
* Add a simple 'ap' debug print function that makes it easy to only print some
debug prints by temporarily changing 'dp' to 'ap'
* Improvements and bug fixes in AttributeExpander
Change-Id: Ie69729c8f62d48bba922712e44ebce484c621c50
Non-include attribute pipelines are not cached for now. Adding separate
caching for non-include attribute pipelines is very likely worth it, but
deferred for now.
Change-Id: I13f949d9f0a04536f9ccfcb73a2be69c5c08be01
This was an artifact from experimentation with multiple cursors long long ago in a land far far away
Change-Id: I14491c4adbd40bb8df4b1c31725cb1621351bef2
* Convert isNoInclude logic to positive isInclude throughout and set it
properly on attribute pipelines. Also don't cache non-include pipelines.
* Add a --pagename parameter to parse.js, which sets the page name in the
environment. This is then returned by {{PAGENAME}}. Not the final solution,
but useful for taxobox testing as taxons are selected based on PAGENAME.
* Add rudimentary pagenamebase parser function
Change-Id: If9c0be4c255200d0f2a30f02e5619437b4fd8f12
* DOM based on Wikia's thumb output: HTML5, clean caption without magnify
icon.
* basic RDFa annotations, but most options additionally in data-mw object-
might want to move more (or all?) of those into RDFa data using meta tags.
* no support yet for framed or other formats, image scaling etc
* also tweaked some config options in the environment
Change-Id: Ie461fcdce060cfc2dec65cc057709ae650ef3368
This makes it possible to get identical rendering in the editor, but may make other things more complex. The Wikitext serializer is no longer compatible for rendering lists so it's been stubbed out. Also the way the toolbar works with lists is broken, so that's been disabled. The HTML serializer has been fixed to work correctly and no-longer-used styles have been removed.
Change-Id: If156f55068b1f6d229b3fa789164f28b2e3dfc76
Also:
* Simplified ve.ce.Surface.getLeafNode, which may be better to just be removed and be used inline in the few places it's being used.
* Removed method wrapper for static function ve.ce.Surface.getLeafNode
Change-Id: I1d4cf0bb7ecc8f07f030753e40a13ebef7d02daa
behavior switches are converted to tokens which set parser.environment flags during the async transformation stage.
The next step would be for handlers in the sync23 stage to generate the TOC, section edit links, and so on according to these directives.
No tests written, because the switches are consumed and don't appear in rendered html. We can test the magic word layout controls individually, once they're implemented.
Another small change was to store option flags directly in the environment object, not that it makes much difference.
Change-Id: I863fbf4be1a17d2f6c31158298dd301f19ae1137
Explained in the README how to use npm to load the dependencies and run tests. Too bad about NODE_PATH...
Don't try to find parserTests.txt in assorted places--if it isn't present, fetch from gerrit. You can symlink from core if you're developing on both parsers, and the fetch script will not overwrite.
Use __dirname in parserTests.js to allow the script to run independent of current working directory.
Change-Id: I4c8b884e91f4fdeae385c7697aff768bdd199dd5
Match pairs of {{!}} or | for template productions, but not a mix of the two.
Example:
{{#if:1|{{!}}-
{{!}} {{#if:1|style="color: red"{{!}}|}}
}}
Note that the style parameter ends up as the *key* of an empty-valued
attribute on the table cell currently.
Change-Id: I5f9357dd1645ef97b0af89f32e8d92ae49218c72
Parser functions which only accept positional arguments now return both the
key and value of arguments. Complete attributes (key and value) for templates
and the like from parser functions are not yet supported though.
Change-Id: I3f81bb35acd27186222ce6d5217e820042527c01
Instead of a proliferation of data-mw-* attributes, it should be easier to
stash all private / non-semantic round-trip information in a JSON object
stored in data-mw.
Change-Id: Id200a6a8789fa152f29ea530e5a24b6ee7b4b285
* This high level surface object is responsible for creating & managing editor instances
* Revised Sandbox demo to invoke in this way.
Change-Id: I4043779af9a2ab964deaf26079a992e82ebeef27
* Configured VisualEditorSandbox to use es
* Reconfigured the ce demo to share the sandbox module
* Removed es demo
* Renamed ce demo to ve (es is broken anyways)
Patchset 2: squashed in https://gerrit.wikimedia.org/r/3953
Change-Id: If8d13bf7011616d222be78899b23186859d5ed70
Also, in ParserPipeline:
* Import the LM converter and expose it through getLinearModel()
* Fix getWikiDom() to actually work (still unused)
In parse.js:
* Add --help option that prints usage information (was unreachable)
* Add --linearmodel option to output linear model JSON instead of HTML
Change-Id: Ic534e03ff40a7c9117bb63f0c635a4213d5e3406
To handle replace operations that are not themselves consistent (these
are common, for instance when replacing an opening element in one place,
then replacing the closing element somewhere else), we process
subsequent replace operations inside the first one until things are
balanced again, then issue a single rebuild for the whole thing.
Change-Id: Ide4613f046fabfeeef383138c39e350b1b710033
gets a bit closer to supporting table fragments passed through template
arguments. Next, we'll need a way to indicate start-of-line position to
enable sol block-levels in template parameters.
Example:
{|
{{#if: true|{{!}}Table cell|}}
|}
re-processing in a phase is wanted. By default, after a token type change or
the return of multiple tokens only the remaining transforms with higher ranks
are applied.
Updated a few comments as well.
to maximize IO concurrency. Signal that all tokens are fully transformed to
callbacks called from TokenAccumulator._returnTokens. The result should be a
single re-transformation when entering the callback chain, and only if the
transform does not signal that it took care of full transformation itself.
Template expansion would set this flag, as the nested transform pipeline
processes all tokens to the end of phase async12.
to callback which lets transforms indicate if their returned tokens are fully
processed for their phase. If not, the callback re-processes them so that any
remaining transforms are applied.
wgUploadPath configurable. Also change the hard-coded fall-back image sizes to
sensible defaults. This breaks three parser tests until image size retrieval
from the wiki is implemented.
construction' part of the HTML5 spec:
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation
Removed a few whitelisted test cases that are now passing directly.
The encoding canonicalization could also be moved to the Sanitizer. Doing this
early in token stream processing however has the advantage of providing further
transformations uniform data to work with. We could even consider to move this
even further into the tokenizer.
possible to support template / template argument expansion in image options,
and causes little trouble for wikilinks. Non-image wikilinks with multiple
text pipes are quite rare in the dumps, and concatenating description tokens
with a plain '|' is quite easy. 261 parser tests passing.