This patch fixes a tokenizer syntax error encountered on
[[:en:Template:JacksonvilleWikiProject-Member]] and [[:en:Template:Infobox
former country]] by allowing optional whitespace before start-of-line template
syntax.
Change-Id: Ic214a731de58bf766e51f23d5e24ea2ce6788f58
254 round-trip tests (up from 184) are now passing.
Also:
* tweaked runtests.sh slightly (use less -R instead of -r).
* made sure the EOFTk is preserved in phase 3 transforms
Change-Id: I1de22186bdb78e52019370e43f096877005b8f5a
- This is implemented as a post-processing pass.
- Might require additional checks to verify rewriteability.
- Implemented as a pair-wise tag DOM minimization strategy,
i.e. it takes tag pairs (B, I) for ex, and attempts to
normalize the tree just for those tag pairs. Normalizing
across multiple tags is implemented as pairwise rewriting
across all pairs: Ex:(b,i), (b,u),(i,u) for (b,i,u)
- Copied over attributes as part of rewriting, but some of the
attributes lose their meaning on rewriting since tags are
reordered (ex: sourcePosn, sourceTagPosn). How do we handle this?
Output examples and possible issues to fix:
<i><b><u>biu</u></b></i><b><u>bu</u></b><u>u</u>
gets rewritten to:
<u><b><i>biu</i>bu</b>u</u>
But, the equivalent wikitext form:
'''''<u>biu</u>''''''''<u>bu</u>'''<u>u</u>
does not get rewritten because of parsing differences.
This wikitext gets parsed into:
<i><b><u>biu</u>'''</b></i><u>bu<b>u</b></u>
The extra ''' token in the middle thwarts DOM rewriting.
However, a slightly different version:
"'''''<u>biu</u>''<u>bu</u>'''<u>u</u>"
gets properly normalized to:
<u>'''''biu''bu'''u</u>
An alternative, but fun strategy to play with is to use the following
two normalization primitives: S(wap) and M(erge).
- S rewrites T1(T2(x)) into T2(T1(x))
(ex: <b><i>foo</i></b> ==> <i><b>foo</b></i>)
- M rewrites (T(x),T(y)) into (T(x,y)).
(ex: <b>foo</b><b>bar</b> ==> <b>foobar</b>)
The current rewriting strategy could possibly be re-implemented as S-M
rewriting. The problem to solve there would be to find an efficient
rewriting strategy that is guaranteed to lead to a normal form. I may
not play with it now, but just documenting it for later (to play with
in my spare time).
This commit is just as a record of fun/experimental code where I get to
learn details of JS, wikitext, parsing, and DOM manipulation. Next
version of this code will attempt to introduce minimal DOM restructuring
across multiple tags at once which can be more efficient.
gwicke: Removed now passing test from whitelist, and updated another whitelist
entry which is now improved.
Change-Id: Ie97bcb164eb62c34ba61aa76ba2f4c232aa713d8
Save action
1) posts html to parsoid service in exchange for wikitext
2) saves wikitext
3) returns parsed content.
On save, VE is hidden and page content is replaced.
Demoing save in toolbar, followup commit will redesign save options
Change-Id: Ibfbe52de08e3483e1a33f0740c03f96ec2b7f90a
This was caused by the fact that a non-structural leaf can not have children, which makes it appear incompatible as a sibling to an arbitrary structural element (like a paragraph) but since it can not contain content we can check that instead.
Change-Id: Ie3c58b4b43f2aa6921f8f82aa82511e231207854
* Added a generic stx_v 'syntax variant' round-trip attribute
* For pre, use stx:'html' vs. no syntax annotation. This might not be 100%
safe for arbitrary html input, so we might want to flip this to stx:'wiki'
later.
* 181 round-trip tests passing
Change-Id: If6080917a3a7c069066db3db60efe59b1f6c28d8
Supports both -> HTML DOM and round-trip testing. Displays the diff to the
last results using less -r.
Change-Id: Ib3fbadeda3c8f4f7e3d2e6e5236a73ff7a773623