Commit graph

20 commits

Author SHA1 Message Date
Gabriel Wicke 14e6728cc4 Add the start of a minimal sanitizer stage, that only strips IDN ignored
characters from host portions of links hrefs for now. This module needs to be
filled up with pretty much everything Sanitizer.php does, including tag and
attribute whitelists and attribute value sanitation (especially for style
attributes).

We'll also need to think about round-tripping of sanitized tokens.
2012-01-18 01:42:56 +00:00
Gabriel Wicke e7381da5b8 Trim whitespace off template titles and argument names. 209 parser tests now
passing.
2012-01-17 23:18:33 +00:00
Gabriel Wicke f50fecf1e3 Fix template argument expansion. 200 parser tests now passing. 2012-01-17 22:29:26 +00:00
Gabriel Wicke 6bd7ca1e75 Misc improvements, now 196 parser tests passing.
* Add handler for post-expand paragraph wrapping on token stream, to handle
  things like comments on its own line post-expand
* Add general Util module
* Fix self-closing tag handling in HTML5 tree builder
2012-01-17 18:22:10 +00:00
Gabriel Wicke f4081bef08 First template expansion tests start working, and a bug fix in
DOMPostProcessor paragraph wrapper. 187 parser tests now passing.
2012-01-14 00:58:20 +00:00
Gabriel Wicke 196d704e8e Template expansion now enabled and somewhat working, but template fetching
still fails all the time.
2012-01-13 18:48:25 +00:00
Gabriel Wicke 32c9bccd7c Results of early template expansion debugging. Still disabled by default, but
getting closer.
2012-01-11 19:48:49 +00:00
Gabriel Wicke 6b6ec2933d More work towards template expansion.
* Created AttributeTokenTransformManager for generic attribute conversion, and
  removed { title, template argument {key, value} } expansion from
  TemplateHandler.
* Added caching for attribute and input sub-pipelines. Especially attribute
  pipelines would otherwise be recreated for each attribute value and key.
2012-01-11 00:05:51 +00:00
Gabriel Wicke 5ec30252f1 More token transform and pipeline setup refactoring to support template
expansion better.
2012-01-10 01:09:50 +00:00
Gabriel Wicke 287604c422 A bit of cleanup in ParserPipeline, with better and more consistent support
for multiple input types.
2012-01-09 19:33:49 +00:00
Gabriel Wicke e99d7a2a55 Two batteries worth of token transform manager refactoring.
* TokenTransformDispatcher is now renamed to TokenTransformManager, and is
  also turned into a base class
* SyncTokenTransformManager and AsyncTokenTransformManager subclass
  TokenTransformManager and implement synchronous (phase 1,3) and asynchronous
  (phase 2) transformation stages.
* Communication between stages uses the same chunk / end events as all the
  other token stages.
* The AsyncTokenTransformManager now supports the creation of nested
  AsyncTokenTransformManagers for template expansion.
  The AsyncTokenTransformManager object takes on the responsibilities of a
  preprocessor frame. Transforms are newly created (or potentially resurrected
  from a cache), so that transforms do not have to worry about concurrency.
* The environment is pushed through to all transform managers and the
  individual transforms.
2012-01-09 17:49:16 +00:00
Gabriel Wicke 6cd95fea37 Fix up constructors in EventEmitter inheritance and tweak a few more comments. 2012-01-04 12:28:41 +00:00
Gabriel Wicke e3ae9a702b Fix JSHint warnings (mostly about comment indentation) from r108012. 2012-01-04 11:06:24 +00:00
Gabriel Wicke 4c4a24f0a0 Hook up the DOMPostProcessor using events as well, and rename the subscription
methods to tell a story. Also document idea on how to dynamically configure
the pipeline depending on event registrations in comment.
2012-01-04 11:00:54 +00:00
Gabriel Wicke 29362cc53c Rename ParseThingy to ParserPipeline and fix up broken WikiDom generation and
commandline runner.
2012-01-04 08:39:45 +00:00
Gabriel Wicke bd98eb4c5a Land big TokenTransformDispatcher and eventization refactoring.
The TokenTransformDispatcher now actually implements an asynchronous, phased
token transformation framework as described in
https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations.

Additionally, the parser pipeline is now mostly held together using events.
The tokenizer still emits a lame single events with all tokens, as block-level
emission failed with scoping issues specific to the PEGJS parser generator.
All stages clean up when receiving the end tokens, so that the full pipeline
can be used for repeated parsing.

The QuoteTransformer is not yet 100% fixed to work with the new interface, and
the Cite extension is disabled for now pending adaptation. Bold-italic related
tests are failing currently.
2012-01-03 18:44:31 +00:00
Gabriel Wicke b3a0270d69 Remove env and load grammar in tokenizer constructor. Re-add property hack to
keep parserTests running for now. Really need a different pipeline for html
serialization or a reference to the HTML DOM.
2011-12-28 17:04:16 +00:00
Gabriel Wicke 3a63fb118e Add a few comments inline, and remove unneeded html serialization as we are
only interested in WikiDom output in this parser wrapper.
2011-12-28 13:46:52 +00:00
Neil Kandalgaonkar 8fbf36e63e put add terminal token inside tokenize method (will pull it out again for streaming interface) 2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar 4158f82d7e refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js 2011-12-28 01:37:06 +00:00