Commit graph

16 commits

Author SHA1 Message Date
Gabriel Wicke a146fcb8ad Improve the handling of newlines for round-tripping
An improvement, but there still are some extra newlines inserted after
paragraphs. Example input:

-------

Foo:
{|
|foo
|}
-------

Extra newlines are inserted after the Foo: and the foo in the table. They are
not fed as tokens or text to the tree builder, so there is likely a bug in the
html5 library or JSDom.

Change-Id: I83eb6180e3cd1c4e7f9b15b31d339e1d32bccd3f
2012-06-06 10:17:03 +02:00
Gabriel Wicke 63abd57fc8 Improve newline-before-paragraph round-tripping support
Change-Id: I9176a97f9695018650d9a63b89514c07e0d6be90
2012-06-02 16:39:33 +02:00
Gabriel Wicke 36084c5d93 Preserve original newlines in HTML and serialization
254 round-trip tests (up from 184) are now passing.

Also:
* tweaked runtests.sh slightly (use less -R instead of -r).
* made sure the EOFTk is preserved in phase 3 transforms

Change-Id: I1de22186bdb78e52019370e43f096877005b8f5a
2012-05-29 23:29:03 +02:00
Gabriel Wicke d918fa18ac Big token transform framework overhaul part 2
* Tokens are now immutable. The progress of transformations is tracked on
  chunks instead of tokens. Tokenizer output is cached and can be directly
  returned without a need for cloning. Transforms are required to clone or
  newly create tokens they are modifying.

* Expansions per chunk are now shared between equivalent frames via a cache
  stored on the chunk itself. Equivalence of frames is not yet ideal though,
  as right now a hash tree of *unexpanded* arguments is used. This should be
  switched to a hash of the fully expanded local parameters instead.

* There is now a vastly improved maybeSyncReturn wrapper for async transforms
  that either forwards processing to the iterative transformTokens if the
  current transform is still ongoing, or manages a recursive transformation if
  needed.

* Parameters for parser functions are now wrapped in abstract Params and
  ParserValue objects, which support some handy on-demand *value* expansions.
  Keys are always expanded. Parser functions are converted to use these
  interfaces, and now properly expand their values in the correct frame.
  Making this expansion lazier is certainly possible, but would complicate
  transformTokens and other token-handling machinery. Need to investigate if
  it would really be worth it. Dead branch elimination is certainly a bigger
  win overall.

* Complex recursive asynchronous expansions should now be closer to correct
  for both the iterative (transformTokens) and recursive (maybeSyncReturn
  after transformTokens has returned) code paths.

* Performance degraded slightly. There are no micro-optimizations done yet
  and the shared expansion cache still has a low hit rate. The progress
  tracking on chunks is not yet perfect, so there are likely a lot of unneeded
  re-expansions that can be easily eliminated. There is also more debug
  tracing right now. Obama currently expands in 54 seconds on my laptop.

Change-Id: I4a603f3d3c70ca657ebda9fbb8570269f943d6b6
2012-05-15 17:05:47 +02:00
Gabriel Wicke 6e21f6bb27 Forward-port Cite extension
* Adapted Cite extension to use current interfaces and token formats
* Improved TokenCollector

Change-Id: I20419b19edd9bbad2c2abf17a2ff1411b99c0c04
2012-05-03 13:22:01 +02:00
Gabriel Wicke 421ef44621 Match the empty string as whitespace too
Change-Id: I1a8ed882021804f62855b9db4368270feebbfc16
2012-04-16 14:48:39 +02:00
Gabriel Wicke b8d980a229 Don't eat newline / space in template parameters
..so that block_lines can match.

Change-Id: I4c464dc44249f40e4aa280df35fb726bfce3a745
2012-04-04 11:22:31 +02:00
Gabriel Wicke f02ff95aa3 Token representation clean-up. Now all tokens are differentiated using
constructors instead of type attributes.
2012-03-07 20:06:54 +00:00
Gabriel Wicke 001194b140 Replace console.log with console.warn in all debug statements 2012-02-14 20:56:14 +00:00
Gabriel Wicke a122e51eec Move data-* annotations into separate object on tokens, that is then
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.

A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
2012-02-11 16:43:25 +00:00
Gabriel Wicke b4892102a4 Clean up transform callback interface 2012-02-07 11:53:29 +00:00
Gabriel Wicke 8c75aa1a7a Remove type attribute for tag tokens. 2012-02-01 18:37:48 +00:00
Gabriel Wicke a5cc10a06b Change token format to plain strings for text tokens, and specific objects for
other tokens. This is only the first half of the conversion. The next step is
to drop the type attribute on most tokens and match on the constructor in the
token transform machinery.
2012-02-01 16:30:43 +00:00
Gabriel Wicke 34025251a3 Clean up 'END' token handling a bit. 2012-01-17 20:01:21 +00:00
Gabriel Wicke 7f579398c7 Use isBlockTag in DOMPostProcessor 2012-01-17 18:30:22 +00:00
Gabriel Wicke 6bd7ca1e75 Misc improvements, now 196 parser tests passing.
* Add handler for post-expand paragraph wrapping on token stream, to handle
  things like comments on its own line post-expand
* Add general Util module
* Fix self-closing tag handling in HTML5 tree builder
2012-01-17 18:22:10 +00:00