* [[:en:Barack Obama]] can now be expanded in 77 seconds using 330MB RAM,
while it would prevously run out of RAM after ~30 minutes. Wohoooo!
The token transform framework rework really paid off.
* 303 parser tests are passing in the new record time of 5.5 seconds. Two more
tests are passing since these tests expect the day of the week to be
Thursday. Won't be the case tomorrow.
Change-Id: I56e850838476b546df10c6a239c8c9e29a1a3136
* All parser pipelines including tokenizer and DOM stuff are now constructed
from a 'recipe' data structure in a ParserPipelineFactory.
* All sub-pipelines of these can now be cached
* Event registrations to a pipeline are directly forwarded to the last
pipeline member to save relatively expensive event forwarding.
* Some APIs for on-demand expansion / format conversion of parameters from
parser functions are added:
param.to('tokens/expanded', cb)
param.to('text/wiki', cb) (this does not work yet)
All parameters are additionally wrapped into a Param object that provides
method for positional parameter naming (.named() or conversion to a dict
(.dict()).
* The async token transform manager is now separated from a frame object, with
the frame holding arguments, an on-demand expansion method and loop checks.
* Only keys of template parameters are now expanded. Parser functions or
template arguments trigger an expansion on-demand. This (unsurprisingly)
makes a big performance difference with typical switch-heavy template
systems.
* Return values from async transforms are no longer used in favor of plain
callbacks. This saves the complication of having to maintain two code paths.
A trick in transformTokens still avoids the construction of unneeded
TokenAccumulators.
* The results of template expansions are no longer buffered.
* 301 parser tests are passing
Known issues:
* Cosmetic cleanup remains to do
* Some parser functions do not support async expansions yet, and need to be
modified.
Change-Id: I1a7690baffbe8141cadf67270904a1b2e1df879a
Renamed Selection method to more suitable name.
Misc cleanup
Patchset 2, whitespace cleanup
Patchset 3: Change values used with selection direction to -1 or 1
1 for left to right (normal)
-1 for right to left (opposite)
Change-Id: If9ecc721ace1c7550903170f92395947f1ccc22c
This is a bit better than cloning tokens wholesale, but not by much. There is
a lot of potential for much better per-token caching with reduced token
cloning. Need to map out all dependencies besides token attributes expanded
from template parameters or other scoped state. Even if tokens themselves
don't need transformation, they might still need to be considered for other
token transformers, so simply keeping the final rank won't quite work even if
the token itself is fully transformed. As a minimum, a shallow clone would
need to be made and the rank reset (as in env.cloneTokens).
Change-Id: I4329113bb21750bae9a635229ed1b08da75dc614
* Added an LRU cache (using the lru-cache node module) for tokenizer output
* Mutation of nested attributes now replaces the containers. A shallow copy of
tokens is sufficient to isolate token transformations. Need to investigate
if we can actually get away without isolation and re-transformation for most
ordinary tokens.
Change-Id: I9136b1d7a1fbcc538183a319d4ecaa290d616fdf
* Ignore safesubst for now
* Remove an unneeded whitelist entry
* Make sure the caption is not lost for thumbs (fix to last commit) and remove
debug print
Change-Id: I243584ed0838cf7c3b4110fe9cdf869272477312
The HTML5 parser we are using to normalize expected HTML output in parserTests
reverses the order of attributes (see
https://github.com/aredridel/html5/pull/53 for the fix). Remove whitelist
entries concerned with this and use the proper order in external image
attributes.
Change-Id: If1868cae05396a150757c85a20473ab756cbcd97
* less verbose logging in noinclude processing and template expansion
* Give priority to the processing of templates transcluded from transclusions
to get closer to depth-first processing. This serves to minimize memory
usage from queued-up tokens.
* Increase the maximum outstanding requests per template retrieval. 10000
amazingly proved too low a limit on some big pages.
* Only process a single template request callback at a time for now
* Add a debug print in the treebuilder wrapper
* Don't treat multiple comments on a single line as a single comment to match
the PHP parser's behavior
Change-Id: I9a86b6d7bec3b9e1f17415daf1bf74170240721a
This has some TODOs still but I want to land it now anyway, and fix the
TODOs later.
* Add this.offsetMap which maps each linear model offset to a model tree node
* Refactor createNodesFromData()
** Rename it to buildSubtreeFromData()
** Have it build an offset map as well as a node subtree
** Have it set the root on the fake root node so that when the subtree
is attached to the main tree later, we don't get a rippling root
update all the way down
** Normalize the way the loop processes content, that way adding offsets
for content is easier
* Add rebuildNodes() which uses buildSubtreeFromData() to rebuild stuff
* Use rebuildNodes() in DocumentSynchronizer
* Use pushRebuild() in TransactionProcessor
* Optimize setRoot() for the case where the root is already set correctly
Change-Id: I8b827d0823c969e671615ddd06e5f1bd70e9d54c
In experiments this dropped the memory consumption further, and reduces the
queuing overhead in the node reactor.
Change-Id: I9409b6ca863b43b7557663bbec9572365059c078
Only call back a few callbacks per reactor iteration from the template fetch
request queue. This changes the expansion pattern from a (memory intensive)
breadth-first expansion to something quite close to depth-first expansion.
Additionally, retrieved pages are quickly added to the page cache so that a
lot of request queuing is avoided in favor of synchronous expansion from the
cache. On pages like Barack Obama that previously ran out of memory after
consuming node's 1.6G heap limit, expansion now runs in relatively constant
100-300M resident (so far, still running).
Change-Id: Ie34a1eeff00d868416de45ef8d289898258f560c
Eat unbalanced external link parts within template parameters. This does not
produce the same output as the PHP parser
(try echo '{{YouTube}}' | node parse.js), but preserves a level of sanity.
Need to check how common this is for external links. If it is rare enough,
moving the ']' after the parser function manually would fix the rendering for
the YouTube case.
Change-Id: I597d808efff36baa22191e7946a0061cc31120e8
Effectively stopping & starting polling prior to conversion
Getting Selection from model
Reselecting after conversion (TODO: modify selection to entire block ?)
Change-Id: I9ba331b5393bf568cc8d137646b43244ae2640a8
* add past paths for empty arguments etc
* cache attribute token transform pipelines
* fix bugs in TokenCollector and NoIncludeOnly handler, and improve its
efficiency by only registering for 'end' tokens on demand
* Remove empty reset methods from a few handlers
* Add a simple 'ap' debug print function that makes it easy to only print some
debug prints by temporarily changing 'dp' to 'ap'
* Improvements and bug fixes in AttributeExpander
Change-Id: Ie69729c8f62d48bba922712e44ebce484c621c50
Non-include attribute pipelines are not cached for now. Adding separate
caching for non-include attribute pipelines is very likely worth it, but
deferred for now.
Change-Id: I13f949d9f0a04536f9ccfcb73a2be69c5c08be01
This was an artifact from experimentation with multiple cursors long long ago in a land far far away
Change-Id: I14491c4adbd40bb8df4b1c31725cb1621351bef2