* Don't explicitly add the newline in the pre, as we preserve newline tokens
now. This avoids doubling of newlines when round-tripping.
* Use the sHref attribute even if the href contains spaces.
Change-Id: I8bec8fbfd6a7836bf2e5eec20869a0edd95c93b6
Lists interrupted by non-empty lines would not close the list properly.
Register for any token instead of just for newlines and close the list if no
listItem follows the newline.
Change-Id: I1743901e3db541bbeda78d17707db943e6ceb9b9
If the href would not denormalize, add a copy of the original href in data-mw
and use it to preserve non-conventional capitalization etc.
Change-Id: Ifef50eec7343b0e6b0ba66b6d19a8a3e8c9f8001
The char-based diff looked good in some pages, but yielded terrible results in
others. The word-based algo is more consistent overall.
Change-Id: I7f2d40315ad96df037c2d9a1d50739e3d21b6c81
A tail containing regexp syntax (a ? in [[:en:Main Page]]) would crash the
serializer. Use substr instead.
Change-Id: I8519aec9c07dfe31893d676b1c936a42d2af74a0
The word or char-based algorithm does not scale well beyond 5k chars or so. We
now perform a line-based diff and then continue to diff the line differences
using the char-based algorithm. This gives a char-based diff even for bigger
inputs.
Change-Id: Iec87ca56540060e4df2859ba54c992e7ff5cfe10
* Stay in round-trip mode in HTML DOM output
* Return DOM, wikitext and diff as soon as they are available
Change-Id: I7f8f44cfe8eed63a521d1318d116c22232cb6b1b
- Added a tail json attribute for wikiLinks
- During serialization, this attribute is used to strip the tail from
the link target and render it after the link
[[hen]]s ==> <a ... data-mw="{gc:1, tail: 's'}" ...>hens</a>
==> [[hen]]s
- 2 more roundtrip tests green
Change-Id: I84f3dabaf0271f7a67641a00148467daa8310eb0
This allows us to check the watchlist checkbox on save dialog.
Added watchlist toggling to ve save api.
Added some i18n messages to core integration.
Change-Id: Ibed8edb2c59ad49e1738c937c3bea518238d0845
* The state of syntax stops is now properly included in the cache key for the
tokenizer-internal backtracking cache. This fixes some mis-parses when
re-parsing a bit of text with different flags.
* Clear the backtracking cache after each toplevelblock. This drops the peak
memory usage when expanding [[:en:Barack Obama]] from ~380M to ~110M.
Change-Id: Icdb879cae5907e4595903dd6acba2e686e8c2e4b
* Added converters to all relevant node implementations
* Added new annotation objects with their own factory
Change-Id: I9870d6d5eac45083929d74d2e58917d0939ca917
Also:
* Refactored tests
* Added tests for ve.dm.Transaction.newFromInsertion
* Added tests for ve.dm.Transaction.newFromRemoval
* Fixed problems with ve.dm.Transaction.newFromInsertion
* Added ve.dm.Node.canBeMergedWith which is partially a port of ve.Node.getCommonAncestorPaths merged with canMerge from within ve.dm.DocumentNode.prepareRemoval from the old ve codebase
Change-Id: Ibbc3887d08286d8ab33fd6296487802d65b319fa
* This routine attempts to rewrite the DOM to maximize tag overlap
and thus minimize tag uses.
* This takes as input a set of tags which participate in the
minimization.
* Tested on the following example
<b><i><u><s>BIUS</s></u></i></b><b><i><s>BIS</s></i></b><b><u><s>BUS</s></u></b><u><i>UI</i></u>
with multiple combinations of the 2^4 possible variations of i,b,u,s
tags: [], ['i','b','u','s'], ['i'], ['b','s'], ['i','b','u']
- But, I am not fully sure if this implements the right behavior when
only a subset of inline tags are provided. Needs discussion and tweaking
as necessary.
* Also tested on few others:
<b>B</b><b><i>BI</i></b><b><i><u>BIU</u></i></b><b><i><u><s>BIUS</s></u></i></b>
<s><i><b>SIB</s></i></b><s><i><u>SIU</u></i></s><i><u>IU</u></i><i>I</i>
* The previous pairwise tag rewriting version fails on several of these
examples, so this new version is a definite improvement.
* No change in parserTests run (203 passing before and after).
* Possible improvements that could/should be undertaken:
- get rid of useless/idempotent add/remove of nodes that don't change
the DOM.
- ensure that node attributes post-restructuring are correct.
Change-Id: Ib4a8b39583fa96a2be880a77021ca81cefa06484
Copy-pasting things like "text<IMAGE>moretext" failed spectacularly,
this commit fixes that.
* Check for content rather than structure in the inserted/removed data
* In the content case
** Run selectNodes() over the removal range, rather than just the cursor
*** i.e. no longer assume that content replacements only affect one node
** If there is structure involved, rebuild all affected nodes
Change-Id: I80e40b5b7c514a3fb105d57e4a17770d0fefaaea
Some of the replacement code was assuming that "does not contain
elements" and "is content" were the same. They're not any more, because
we have content nodes (like image) now, so I need a separate function
to distinguish between these cases.
Change-Id: I206ccdf082b7baddf99d382eb3cdd77ea34fb479
If the last element of the input data array was text, the resulting text
node would have length=0 rather than the expected length value.
Change-Id: I3d089a80b8a447a12ba411b2e11c1b84f14f2959