* Removed dead commented out code.
* Cleaned up newline handling in serializer some more.
* Now, onNewLine and onStartOfLine reflect serializer state
more accurately.
* No implicit new lines for explicit html tags.
* 9 more roundtrip tests now green.
Change-Id: I9f640de2ae769c7472538fa687400dc8a40c2b2d
297 round-trip tests are passing with this patch.
TODO:
* generalize data-mw-gc handling in the serializer for any tag
* use data-mw-gc="both" and data-mw.src: 'the wikitext' for round-tripping of
wikitext structures, optionally with some presentational (but read-only)
content
* use span and data-mw-gc="both" for nowiki
Change-Id: I700142a56818977c20c8c06e6a5f2e77a708d25e
This makes sure that we escape start-of-line syntax when needed, since
onNewline is often not yet set.
Discussion / background:
[19:18] <subbu> this will fix it, but, i think this is asking for another
minor refactoring of these flags ... because this is a subtle fix which means
it might be possible to make it clearer. onNewline is one true in on
direction, i.e. if true, we are in a new line state, but if we are in a
newline context, onNewline is not true, which is why this new method is
needed.
[19:19] <subbu> i dont know if it is possible, but it seems like it shoudl be
possible. but, something for later.
[19:20] <subbu> badly phraed. "onNewline" ==> in new line context, but if in
new line context, onNewline may be false.
[19:20] <gwicke> we should perhaps update it as early as possible instead
[19:21] <subbu> i cannot today, but possible monday. i am heading out in
about 15-30 mins.
[19:22] <gwicke> will need to check all conditions depending on it in
_serializeToken
[19:22] <subbu> oh, i misunderstood you :)
[19:22] <gwicke> and if there are cases where the onNewline / onStartOfLine
state could be reverted later
[19:23] <subbu> you were referring to the flag, i thought you meant we should
fix this sooner than later.
[19:23] <gwicke> yes, I wasn't terribly clear
[19:23] <gwicke> you wrote something about following productions swallowing
newlines, but I think we don't actually do that any more
[19:24] <gwicke> I'm quite optimistic that updating those flags much earlier
would work
[19:25] <subbu> yes, it could fix it.
[19:26] <subbu> you might be right reg. swallowing. it was happening earlier.
but, not right now, after single-line mode and other fixes.
Change-Id: Ic1d8141c04eb54a59977d0ba87bcf06bafd421e0
This should not really be needed if the tokenizer did not decode html entities
on the fly. It is still a quick way to make sure no htmlish content can be
inserted even with the current decoding.
The next step and proper fix is to make entity decoding either optional in the
tokenizer (flag-controlled), or move it to a later stage in the token
processing pipeline.
Change-Id: Ife093dcfb95113763dab5635b098c795d3550586
* Renamed defaultOptions to initialState
* Got rid of unused state property
* Added comments explaining how state attributes
and tag handler flags are used
* Refactored listItemHandler check into functions and
added FIXME possible rewriting of that check.
* Protected serializeDOM in a try-catch handler to
catch exceptions and output the exception to the console.
Change-Id: I3d351c06e4b86baeb5a55243b11dbfa9baca5bb7
This fixes a bug Trevor reported where selecting from a list item across
a heading and into a paragraph, pressing backspace, then clicking undo
caused an exception.
Change-Id: Id2851271529e10548f6979a030a198054aa1c48f
ve.ce.TextNode listed textStyle annotations that didn't actually exist,
and failed to recognize some that did exist (such as span; bug 37808).
Added all annotations to both places. <span> tags are now tolerated by
the editor in that it doesn't crash anymore, but they're displayed (and
saved!) without any attributes, so <span style="color:yellow;">y</span>
doesn't show a yellow 'y' in the editor and is saved back as
<span>y</span> .
Change-Id: Iaae11ad5044150fa904010983ff83579cb37733d
Fixed by adding the specialMessages module which is only loaded once the
editor loads. Then after it's loaded we use the summary message from
there to update the (possibly broken) summary message in the save
dialog.
Change-Id: I67f5c59501cdf7c66c925cef8d4dd42b0f2cfde3
* Removed murky ' :' -> ' :' replacement in tokenizer. This breaks four
parser tests, and should be fixed in a token stream transformer or DOM
postprocessor. This replacement clashes with round-tripping, and is not
terribly important visually.
* Added stx:row annotation to single-line dt/dd pairs and use it to preserve
single-line syntax in the serializer. There is no attempt yet to support the
addition of nested lists in an originally single-line dd. We'd need to look
ahead in the serializer to support this. Perhaps the editor can simply drop
data-mw in that case.
* Switched default dt/dd serialization to multi-line. This supports all nested
lists and multiple dds.
* Don't close dls when switching from dt to dd or back in the token stream
ListHandler.
Overall 290 round-trip tests are passing now (up from 284, some due to ,
some due to lists). The number of passing parser tests dropped slightly from
303 to 297 (or 301/295 on weekdays other than Thursday).
Change-Id: I85ff40571833713388c6523e6a4ba2e94daa3807
Basically only prefix all bullets if the serialization output is going to be
in start-of-line context. The test for that is currently inline, but should
perhaps be factored out to a method or state flag instead.
We could alternatively consider to return the start-of-line prefix and let it
be used in _serializeToken in case we end up in start-of-line context.
This patch also fixes a newline issue on input like this:
:d1
::: d3
Both the list and list item handlers now set the startsNewline flag
dynamically depending on the context, so that we don't depend on the
suppression of newlines from list syntax by the singleLineMode any more.
There is still an extra newline inserted between list items in the following
example:
;t1 :d1
;;t2 ::d2
This looks like a bug in the produced DOM and not in the serializer, since the
outer definition list is closed and re-opened between d1 and t2.
Change-Id: I78e3a1ef34cf9159d5a1e86fb64c774ff111e71d
* changes:
Got rid of iteration to get the surface
Removed attach and detach methods from ve.ce.Node
Track adjustments in DocumentSynchronizer and apply them to oldRange
This is needed because oldRange is relative to the state of the model before any changes were made, but when we call selectNodes() it's gonna operate on a partially updated model tree.
This is a genuine bug in DocumentSynchronizer proper, which means I owe the entire team lunch
Change-Id: Ia6510de19df02e961c7f25fb8e7833abceb8d25b
* Adjust both start and end for preceding operations
* Adjust end for the current operation as well
Change-Id: I2f96d609bddf3788aa5700ad1f0b46208f3517d7
text for a link is the selected text.
patchset 2 - add case for if data offset is an object, be sure it
is a string prior to adding it. truncate to 255 chars.
patchset 3 - actually add the patched file
Change-Id: Ibddf942c2a0ba3412d93cf9730f74eb858025fad
Also fixed unlisting to not break when unlisting a range that includes things that are not list nodes
Change-Id: Ib9d4ea851c3ed9bf72a93aa87e470ce40c308453
contained by a list node the button is off. Button is now only toggled on
if all nodes in selection have a parent list node.
Patchset 2 - cleanup whitespace
Change-Id: Ia9adc39c0d5c75e2e96580f0e172f5b602540ac3
The main issue is that the bullets from dd/dt were not stored on the stack. I
added a separate field for it in each stack entry, which now fixes the basic
indent case without (afaik) breaking anything else.
There are still some newline issues, and the need to handle the single-line
dd/dt vs. the multi-line variant.
Change-Id: I65939c05e2c5dde0789bf8aefd7651161a2f137c
to update save dialog checkbox with correct watched state.
Call mw.page.watch.updateWatchLink onSave to refresh icon
with watched state.
Patchset 2- updated event name
Change-Id: I23ef1aad9c8ace13df1b9a6bf0bfeddb9d8bcb37
* Don't escape html-syntax pre content for now; Should parse this with a new
pre content production later (which needs to be split out of the regular pre
production in the tokenizer)
* Protect indent-pre content from start-of-line syntax escaping
* Preserve extra leading spaces in the tokenizer
* Two more (now 284) round-trip tests are passing
Change-Id: I199b89c0ee7fae12546df10c1b5117c97caccac5
Queued newlines and new trailing newlines were not cleanly separated so far,
which caused some trailing newlines to be consumed for needed leading
newlines. This change fixes several newline bugs, taking the number of passing
round-trip tests from 276 back up to 282.
Change-Id: Idb4706e15ce71e63085033e3f3f29557915c11a8
Fixed a bug in the list handler for multiple dds in a definition list. Also
fixed a few JSHint warnings.
Change-Id: I3e883786698a9521347fc2a5e6420646318813a7
Because of the end>=left condition, the loop was exiting right before
hitting the startBefore case, so use end>=left-1 instead.
Fixing this exposed another bug that caused nodeRange and nodeOuterRange
to be off by one: we need to increment left after storing it in
startOffset, not before
Change-Id: I54e18fb2119c8caefb4f7a7f2be43c6129afc4c0
Known issue: breaks round-tripping of :;;;::. That test is normally disabled
anyway, so we can fix it later.
Change-Id: I7954271311bfb7e71caae59d8177e3f04a9ebbca
This handles internal and external links separately, only setting
'title' for the former and only 'href' for the latter.
On the way out, detect external links using a simple regex (needs
improvement). Also don't write sHref any more per Gabriel's request.
Change-Id: Ibc2436d0a12de1d027e116f181db640cbcf3d522
* Started to add more complete tag source range (tsr) annotations to most
start / empty tags. These replace the old sourcePos and sourceTagPos
annotations, and look more promising for general round-tripping than block
source ranges (bsr). See
http://www.mediawiki.org/wiki/User:GWicke/Parsoid_source_ranges for some
notes on this.
* Added an escapeWikitext method in the serializer that tokenizes supposedly
text-only content from the DOM with the tokenizer and wraps runs of returned
non-text tokens into nowiki tags. The source corresponding to non-text
tokens is retrieved using the tsr annotations.
* Removed old (unused) table productions to avoid confusion.
* 276 round-trip tests are passing, vs. 283 without escaping.
Known issues:
* harmless for now, can be improved later: urllinks in external link captions
are wrapped in nowiki. Example HTML:
<a rel='mw:extLink' href="http://example.com">http://example2.com</a>
* some start-of-line syntax in wiki-syntax preformatted blocks might be
wrapped into nowiki when that would not really be needed. Example HTML DOM:
<pre>
* foo
* bar
</pre>
Change-Id: I01c34aedd5c566614d36924add47a6a960e91987
Now that the message keys are showing in the demo we can see how bad things look if the drop-down label is too big and wraps
Change-Id: I776a7e480a3f6240c0929f2f50a865c2de4ec0a5
Various parts of DM and CE choke on completely empty documents, so
return an empty paragraph instead.
Change-Id: I67062b66a44efe53a1bdaf60907653f0cc55dd25
Made edit tabs rendered on non view pages work correctly by routing them to the view page with an extra param that auto-initializes the editor
Change-Id: I4fd9106c8b45c6fc79af9ccb44e18944e9b9d8b9
* Added a newlineTransparent flag to handlers that prevents changes to the
onNewline status, so that content following it is still considered to be in
start-of-line context. This fixes a few rt tests where a comment or nowiki
tag is at the start of the line, and following content should end up on the
same line.
* 283 rt parser tests are now passing.
Change-Id: Ie58dcb9e5e9af9000fff61c2e1db5d8649ffc3f6
This solves lots of issues in the integration work, and also makes it much easier to extend this class to integrate it into other skins
Change-Id: I3b3c5b22a5664e6cf37e429cc0ac3be2e75b630f
* tokens are not modified any more (they are supposed to be immutable)
* handler info is now split in start / end objects and potentially a 'make'
method; added more flags to govern the newline behavior of different tags
* added a generic singleLine mode for single-line syntactical environments
* switched the web service to line-based diffs to avoid issues when diffing
the round-trip results of [[:en:Programming language]]
* 280 round-trip tests are passing now
Change-Id: I74b8ffbf69643c5d6e5ec852ec58e680c9018901
HTML5 defines space characters as [ \r\n\t\f] in
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#space-character.
It treats these specially in a few contexts. As an example, the foster
parenting algorithm does not apply to space characters.
As a result, this change fixes the round-tripping of spaces between table
tags, which were previously moved before the table.
Change-Id: I32ab29275a9f824fc66d8286638eb42748cfc9a5
from Parsoid HTML output as well as VE HTML output. There are still
some newline related failures from parser tests that needs fixing, but
this is getting close. So committing for now so other eyes can make the
bugs shallow :).
Change-Id: Ia6a218ee9fb3e18fe0573c89ff3a4236779e1e64
- Check if href for links has the wgScriptPath prefix before
attempting to strip it from the href.
Change-Id: I844151ef7317476668d1306b96a2aec5a56fd0f1
- Something like this:
<ul><li>1</li><li>2<ul><li>2.1</li><li>2.2<ul><li>2.2.1</li><li>2.2.2</li></ul></li><li>2.3</li></ul></li><li>3</li></ul>
now serializes properly to:
*1
*2
**2.1
**2.2
***2.2.1
***2.2.2
**2.3
*3
So does this form which is what the above wikitext parses to:
<ul><li>1
</li><li>2
<ul><li>2.1
</li><li>2.2
<ul><li>2.2.1
</li><li>2.2.2
</li></ul></li><li>2.3
</li></ul></li><li>3
</li></ul>
- Lists (and nested lists) are not entirely newline-insensitive.
They still depend on newlines *between* lists. The opening
<ul> tag for non-nested lists should always start on a new line.
So, for example,
<ul><li>foo</li></ul><ul><li>bar</li></ul>
will serialize to:
*foo
*bar
which is incorrect. But,
<ul><li>foo</li></ul>
<ul><li>bar</li></ul>
will correctly serialize to:
*foo
*bar
Change-Id: I13a0290368574865957bcf57aebab488fbbb7026
- More pieces are now simplified and all(?) newline handling
is now centralized in the serializeToken function.
- This commit fixes bugs in rt-ing some code snippets
----------
Ex 1: foo<p>bar</p>baz
----------
- This commit fixes bugs serializing VE generated html
----------
Ex 2: <p>foo</p><pre>bar</pre> ==> foo\n bar
----------
- But, this round of fixes introduces RT failures for certain
code examples in parserTests.txt. In all these failing cases,
inline text/html is embedded within a generated <p> tag during
parsing. If these generated <p> tags can have a "gc:1" attribute
added to them, we can properly serialize them to the original
form.
----------
Ex 3: foo<pre>bar</pre>
Parsed HTML: <p>foo</p><pre>bar</pre>
----------
Note how this parsed HTML is identical to what the VE outputs
in Example 2 above. So, without the gc:1 attribute, we now
have conflicting requirements on the example same HTML.
This increases confidence in the correctness of my commit here.
Change-Id: I86beadec91c445a7f8a6d36a639b406697daa0a2
- Eliminated newline handling from several places in code and
mostly isolated it to serializeToken thus simplifying newline
handling logic.
- Fixing some bugs in the process: # of green roundtrip tests
went up by 5 (294 --> 299) but actually introduced failures on
a few originally succeeding tests (additional leading/trailing
newlines on the entire test output).
- Added bonus: made list serializing (mostly) insensitive to
newlines between tags. So, all the following DOM serialize
identically to the following wikitext:
*foo
*bar
----------
<ul><li>foo</li><li>bar</li></ul>
----------
<ul>
<li>foo</li>
<li>bar</li>
</ul>
----------
<ul>
<li>
foo
</li>
<li>
bar</li>
</ul>
----------
Change-Id: I76be56c4b2789039dff5f47de4659746882e45d6
HTML5 defines space characters as [ \r\n\t\f] in
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#space-character.
It treats these specially in a few contexts. As an example, the foster
parenting algorithm does not apply to space characters.
As a result, this change fixes the round-tripping of spaces between table
tags, which were previously moved before the table.
Change-Id: I32ab29275a9f824fc66d8286638eb42748cfc9a5
from Parsoid HTML output as well as VE HTML output. There are still
some newline related failures from parser tests that needs fixing, but
this is getting close. So committing for now so other eyes can make the
bugs shallow :).
Change-Id: Ia6a218ee9fb3e18fe0573c89ff3a4236779e1e64
* Parsoid outputs bare newlines after a heading unless it's followed by
a <p>, so strip leading and trailing newlines in all bare text
* Adding a leading newline in <p>s is only needed if preceded by a
heading, don't add it otherwise
* Headings need a bare newline after them unless followed by a <p>
* Headings also need a bare newline before them if preceded by a <pre>
Change-Id: Ib02f800b26453541604e920fbb3845c51cdc6dea
This strips certain newlines added by Parsoid so they don't end up in
the linear model, and puts them back in on the way out so Parsoid
doesn't freak out and produce invalid wikitext
Change-Id: I256aaded4229c915868dc868ec6eaa1a73e00be1
I know this code is still being worked on but I felt like I should put
this in anyway, it might save the person working on it some work
Change-Id: I1535399b3798cd8de2fc5334cd1eac64b71e8821
This is needed because there are onTransact event handlers that use the
selection and expect it to be up-to-date. The previous behavior caused a
bug when pressing backspace at the end of the document, because the old
selection (at the end) was invalid in the context of the updated
document.
Change-Id: I159e37894d14d437f46495604c14804c0a13e84e
Empty annotation objects are unexpected by the rest of the data model
and cause weird breakage in the converter, resulting in inserted text
being in its own paragraph
Change-Id: I63de37c3c5e19ac650e7c7f2d1a0bfab21d45da9
I forgot to adjust a range based on this.cursor for this.adjustment .
This indirectly caused Rob to get an exception when trying to wrap
the last node in the document, because the unadjusted range was past the
end of the document.
Change-Id: If9d5b76568fae25ba2c0f405f1c4fcdd8d879e4f
- Check if href for links has the wgScriptPath prefix before
attempting to strip it from the href.
Change-Id: I844151ef7317476668d1306b96a2aec5a56fd0f1
This fixed a bug where an <h2> element with HTML attributes would be
converted to a 'heading' element with those HTML attributes but without
the 'level' attribute, which indirectly caused an exception somewhere in
ve.ce
Change-Id: I8bf32ff0d8e0f9d016b2abc6cb31824df05bdfc2
- Something like this:
<ul><li>1</li><li>2<ul><li>2.1</li><li>2.2<ul><li>2.2.1</li><li>2.2.2</li></ul></li><li>2.3</li></ul></li><li>3</li></ul>
now serializes properly to:
*1
*2
**2.1
**2.2
***2.2.1
***2.2.2
**2.3
*3
So does this form which is what the above wikitext parses to:
<ul><li>1
</li><li>2
<ul><li>2.1
</li><li>2.2
<ul><li>2.2.1
</li><li>2.2.2
</li></ul></li><li>2.3
</li></ul></li><li>3
</li></ul>
- Lists (and nested lists) are not entirely newline-insensitive.
They still depend on newlines *between* lists. The opening
<ul> tag for non-nested lists should always start on a new line.
So, for example,
<ul><li>foo</li></ul><ul><li>bar</li></ul>
will serialize to:
*foo
*bar
which is incorrect. But,
<ul><li>foo</li></ul>
<ul><li>bar</li></ul>
will correctly serialize to:
*foo
*bar
Change-Id: I13a0290368574865957bcf57aebab488fbbb7026
- More pieces are now simplified and all(?) newline handling
is now centralized in the serializeToken function.
- This commit fixes bugs in rt-ing some code snippets
----------
Ex 1: foo<p>bar</p>baz
----------
- This commit fixes bugs serializing VE generated html
----------
Ex 2: <p>foo</p><pre>bar</pre> ==> foo\n bar
----------
- But, this round of fixes introduces RT failures for certain
code examples in parserTests.txt. In all these failing cases,
inline text/html is embedded within a generated <p> tag during
parsing. If these generated <p> tags can have a "gc:1" attribute
added to them, we can properly serialize them to the original
form.
----------
Ex 3: foo<pre>bar</pre>
Parsed HTML: <p>foo</p><pre>bar</pre>
----------
Note how this parsed HTML is identical to what the VE outputs
in Example 2 above. So, without the gc:1 attribute, we now
have conflicting requirements on the example same HTML.
This increases confidence in the correctness of my commit here.
Change-Id: I86beadec91c445a7f8a6d36a639b406697daa0a2
Previously, data-mw-gc (generated content) elements were unconditionally
converted to alienInline nodes, and unrecognized elements were
unconditionally converted to alienBlock nodes. This is wrong and
produced weird results when I started experimenting with <code> tags.
Instead, I made both gc and unknown element trigger alienation, but the
decision of whether we generate an alienInline or an alienBlock node is
separate and is based only on whether we're inside a content node.
Change-Id: I12335337c3fa60c725ae7bcfbfb52a1dda153fb5
- Eliminated newline handling from several places in code and
mostly isolated it to serializeToken thus simplifying newline
handling logic.
- Fixing some bugs in the process: # of green roundtrip tests
went up by 5 (294 --> 299) but actually introduced failures on
a few originally succeeding tests (additional leading/trailing
newlines on the entire test output).
- Added bonus: made list serializing (mostly) insensitive to
newlines between tags. So, all the following DOM serialize
identically to the following wikitext:
*foo
*bar
----------
<ul><li>foo</li><li>bar</li></ul>
----------
<ul>
<li>foo</li>
<li>bar</li>
</ul>
----------
<ul>
<li>
foo
</li>
<li>
bar</li>
</ul>
----------
Change-Id: I76be56c4b2789039dff5f47de4659746882e45d6
Parsoid ignores sHref when converting back to wikitext, so we have to
set the href attribute to "/$title"
Change-Id: I1068116c0be72197619d0df3b4d1231a3879fa14
And made it not start on it's own, but be started by ve.Surface - this makes it so it's not polling in the unit tests, for instance
Change-Id: I940df04d392fd134d18847949efe0e2232328323
* As part of an earlier fix, I had changed default value of 'res'
to null instead of ''. But, this was potentially buggy because
the previous check was (res !== '') which could be triggered
by return values of handlers. By changing the check to null,
I was effectively changing the code paths for those handlers that
returned ''.
Change-Id: I2302023be7422ce4fb384ff5a50fe53fa7732855
Fix Inspector bug which prevented applying a link annotation to data
already containing annotations.
Change-Id: I6f315d50805c8c71f2155f955ea5674a7ce98656
This was causing some issues when you started typing there, and it's not clear that the white text is really the best way to go anyways
Change-Id: I8a9d6571ea204603729e96b7ff77184279a31a95
The offset map was broken from the start because it wasn't updated when
adjusting the length of a text node, and if we fix that bug it's
doubtful whether the costs of updating the offset map outweigh the
benefits, especially considering that adjusting the length of a text
node is something we do for almost every keypress. If it turns out
having an offset map does make sense, we can always reintroduce it
later.
Change-Id: I59e8bc154f7d07aa1bab2f473c13ff466d0e463f
paragraphs in lists.
* We need to look at other special-case handling requirements of
html tags in lists (and other contexts like tables).
Change-Id: I84b8402d90a186c9075c2d45263c94377312927a
Parsoid outputs rel="mw:wikiLink" or rel="mw:extLink", so we convert that to link/wikiLink and wiki/extLink respectively.
Also preserve the data-mw attribute; we probably need to do this more generally but this'll do for now.
Change-Id: I32e570bffa5a73a733a120d52cfd8b75d3191e02
Adding a 'style' attribute which is set to either 'data' or 'header'
This breaks even more tests because of missing style attributes
Change-Id: I0a75d8c1578b4414eeae8c484f6c4d6f8a59472a
When a text node was closed (either by encountering a non-text node or by reaching the end of the document), the fixupStack would not be processed. This led to attributes being dropped when nodes were split because of text being inserted into them
Change-Id: I41f6d20e0c1bfc8d8689b7e6325e724dd8156ab1
* Add code to handle elements and annotations
* Drop support for aliens from getDomElementFromDataElement() and move it into getDomFromData()
* Implement getDomElementFromDataAnnotation()
* Document a few functions
Change-Id: Ic6a418cbf9d7d1ad96299d7d3633970a876c6103
it does not include Alien nodes and Images - because we are not going to
support them for June release).
Change-Id: I229e4b5f2881714252699f23aef164655fa8bcf6
* Moved wikipedia default prefixes to environment
* Added 'addInterwiki' method
* Adjusted link handling normalizeTitle to reflect this
Change-Id: If5b2314cc36346b6da8649ed410457a612d80a22
* mw:Foo now loads pages from mediawiki.org
* The default prefix still is 'en'. You can switch this to 'mw' in ParserService.js.
Change-Id: I1208667e6114bd711b7988a8b3adb32ffab70969
- Three bugs that were messing up quote transformations.
- Now, the following cases are handled properly:
* ''foo'''
* '''foo''
* ''foo''''
* ''''foo''
These tests (and other quote tests) have to be added to core parser
tests file.
- One more parser test green.
Change-Id: I4f93e8910639f546bfc9304becab17d26d5529de
Created new document method to determine if a specific annotation
object is inside an annotation array.
Change-Id: Id645929cbf31030b8b0fcacb8dfb36e61aaad129
This fixes a bug where the second replace operation in a transaction
would cause the rebuild of the wrong range, or the adjustment of the
wrong text node.
Change-Id: I9b1c68d84999d538fe10bb193f4dfdd694121d2a
This is needed to make the results of certain transactions' tree sync
round-trip cleanly through the ve.dm.Document constructor
Change-Id: I2ab0758ec6bd7afba5b6645c7330f9fa2d45205d
new static method looks for annotation in annotation object.
ve2 Cleanup on annotate method and surface model.
Partially revive UI tools by exchaning old method usage
for ve2 methods.
Change-Id: Id0ac58330292d76801bbcf1d71a919b493f8ab9e
An improvement, but there still are some extra newlines inserted after
paragraphs. Example input:
-------
Foo:
{|
|foo
|}
-------
Extra newlines are inserted after the Foo: and the foo in the table. They are
not fed as tokens or text to the tree builder, so there is likely a bug in the
html5 library or JSDom.
Change-Id: I83eb6180e3cd1c4e7f9b15b31d339e1d32bccd3f
* Possibly more efficient under heavy GC load -- untested.
* No change in time and memory use for single file parsing.
Change-Id: Id2f3f65cc0e5f38ed968bbda60b97e46523e700e
* Moved the tail attribute to the second attribute (a bit cleaner)
* Disallowed newlines in the tail production
* Improved the selection of round-tripped href vs. generated content vs. href
in the serializer
* renamed state.linkTail to state.dropTail
Change-Id: I5d98c704b6ea566011e22237786f8da17548570f
Pages titles with a wikipedia interwiki prefix now load the page from
corresponding Wikipedia. Links in a page then stay within the given language.
Note that Parsoid currently makes no effort to recognize localized namespaces,
so it won't render media files, categories etc correctly.
Change-Id: I7bc4102e81a402772ea23231170734d580ea15b9
Functional changes (fixes):
* Make writeElement() also update parentNode and parentType for openings
* Also add to fixupStack when opening a wrapper for a text node
Non-functional changes (cleanup&docs):
* Document all variables at the beginning of the function
* Group variables according to where/how they're used
* Move expectedType into writeElement()
* Kill node, duplicates parentNode unnecessarily
* Kill paragraphOpened, was misnamed and unnecessary
* Rename closedElements to reopenElements
Change-Id: Ie5b4e4f30b267943048fdc170accb29139039192
* Push entire elements onto openingStack rather than type strings
* When closing an element, build a clone of the opening and push it onto
closedElements, then insert that clone when reopening the element
Change-Id: I8b0fb44394aed6c471dc6dacaab03e44c2333733
* Don't explicitly add the newline in the pre, as we preserve newline tokens
now. This avoids doubling of newlines when round-tripping.
* Use the sHref attribute even if the href contains spaces.
Change-Id: I8bec8fbfd6a7836bf2e5eec20869a0edd95c93b6
Lists interrupted by non-empty lines would not close the list properly.
Register for any token instead of just for newlines and close the list if no
listItem follows the newline.
Change-Id: I1743901e3db541bbeda78d17707db943e6ceb9b9
If the href would not denormalize, add a copy of the original href in data-mw
and use it to preserve non-conventional capitalization etc.
Change-Id: Ifef50eec7343b0e6b0ba66b6d19a8a3e8c9f8001
A tail containing regexp syntax (a ? in [[:en:Main Page]]) would crash the
serializer. Use substr instead.
Change-Id: I8519aec9c07dfe31893d676b1c936a42d2af74a0
- Added a tail json attribute for wikiLinks
- During serialization, this attribute is used to strip the tail from
the link target and render it after the link
[[hen]]s ==> <a ... data-mw="{gc:1, tail: 's'}" ...>hens</a>
==> [[hen]]s
- 2 more roundtrip tests green
Change-Id: I84f3dabaf0271f7a67641a00148467daa8310eb0
This allows us to check the watchlist checkbox on save dialog.
Added watchlist toggling to ve save api.
Added some i18n messages to core integration.
Change-Id: Ibed8edb2c59ad49e1738c937c3bea518238d0845
* The state of syntax stops is now properly included in the cache key for the
tokenizer-internal backtracking cache. This fixes some mis-parses when
re-parsing a bit of text with different flags.
* Clear the backtracking cache after each toplevelblock. This drops the peak
memory usage when expanding [[:en:Barack Obama]] from ~380M to ~110M.
Change-Id: Icdb879cae5907e4595903dd6acba2e686e8c2e4b
* Added converters to all relevant node implementations
* Added new annotation objects with their own factory
Change-Id: I9870d6d5eac45083929d74d2e58917d0939ca917
Also:
* Refactored tests
* Added tests for ve.dm.Transaction.newFromInsertion
* Added tests for ve.dm.Transaction.newFromRemoval
* Fixed problems with ve.dm.Transaction.newFromInsertion
* Added ve.dm.Node.canBeMergedWith which is partially a port of ve.Node.getCommonAncestorPaths merged with canMerge from within ve.dm.DocumentNode.prepareRemoval from the old ve codebase
Change-Id: Ibbc3887d08286d8ab33fd6296487802d65b319fa
* This routine attempts to rewrite the DOM to maximize tag overlap
and thus minimize tag uses.
* This takes as input a set of tags which participate in the
minimization.
* Tested on the following example
<b><i><u><s>BIUS</s></u></i></b><b><i><s>BIS</s></i></b><b><u><s>BUS</s></u></b><u><i>UI</i></u>
with multiple combinations of the 2^4 possible variations of i,b,u,s
tags: [], ['i','b','u','s'], ['i'], ['b','s'], ['i','b','u']
- But, I am not fully sure if this implements the right behavior when
only a subset of inline tags are provided. Needs discussion and tweaking
as necessary.
* Also tested on few others:
<b>B</b><b><i>BI</i></b><b><i><u>BIU</u></i></b><b><i><u><s>BIUS</s></u></i></b>
<s><i><b>SIB</s></i></b><s><i><u>SIU</u></i></s><i><u>IU</u></i><i>I</i>
* The previous pairwise tag rewriting version fails on several of these
examples, so this new version is a definite improvement.
* No change in parserTests run (203 passing before and after).
* Possible improvements that could/should be undertaken:
- get rid of useless/idempotent add/remove of nodes that don't change
the DOM.
- ensure that node attributes post-restructuring are correct.
Change-Id: Ib4a8b39583fa96a2be880a77021ca81cefa06484
Copy-pasting things like "text<IMAGE>moretext" failed spectacularly,
this commit fixes that.
* Check for content rather than structure in the inserted/removed data
* In the content case
** Run selectNodes() over the removal range, rather than just the cursor
*** i.e. no longer assume that content replacements only affect one node
** If there is structure involved, rebuild all affected nodes
Change-Id: I80e40b5b7c514a3fb105d57e4a17770d0fefaaea
Some of the replacement code was assuming that "does not contain
elements" and "is content" were the same. They're not any more, because
we have content nodes (like image) now, so I need a separate function
to distinguish between these cases.
Change-Id: I206ccdf082b7baddf99d382eb3cdd77ea34fb479
If the last element of the input data array was text, the resulting text
node would have length=0 rather than the expected length value.
Change-Id: I3d089a80b8a447a12ba411b2e11c1b84f14f2959
To allow non sysops to save via VE, refactored ve save api
to use doEdit which bypasses namespace protection.
Add edit link in view nav for non sysop so that they may edit
Add View source link in dropdown for non sysops
Add Edit source link in dropdown for sysops
Cleaned up some of the integration core code
UI tweaks
Change-Id: Ib4249bc5fb7ffa6410e4f2d278aafbb871800981
WARNING: This is not as fast as the implementation of getNodeFromOffset in dm
Change-Id: I5fbe9b6edc66169b9caaa6751fde1b7b752814d1
NOTE: ve.ce.getNodeFromOffset and ve.dm.getNodeFromOffset should be renamed to getBranchNodeFromOffset to clarify that they only return branch nodes.
This patch fixes a tokenizer syntax error encountered on
[[:en:Template:JacksonvilleWikiProject-Member]] and [[:en:Template:Infobox
former country]] by allowing optional whitespace before start-of-line template
syntax.
Change-Id: Ic214a731de58bf766e51f23d5e24ea2ce6788f58
254 round-trip tests (up from 184) are now passing.
Also:
* tweaked runtests.sh slightly (use less -R instead of -r).
* made sure the EOFTk is preserved in phase 3 transforms
Change-Id: I1de22186bdb78e52019370e43f096877005b8f5a
- This is implemented as a post-processing pass.
- Might require additional checks to verify rewriteability.
- Implemented as a pair-wise tag DOM minimization strategy,
i.e. it takes tag pairs (B, I) for ex, and attempts to
normalize the tree just for those tag pairs. Normalizing
across multiple tags is implemented as pairwise rewriting
across all pairs: Ex:(b,i), (b,u),(i,u) for (b,i,u)
- Copied over attributes as part of rewriting, but some of the
attributes lose their meaning on rewriting since tags are
reordered (ex: sourcePosn, sourceTagPosn). How do we handle this?
Output examples and possible issues to fix:
<i><b><u>biu</u></b></i><b><u>bu</u></b><u>u</u>
gets rewritten to:
<u><b><i>biu</i>bu</b>u</u>
But, the equivalent wikitext form:
'''''<u>biu</u>''''''''<u>bu</u>'''<u>u</u>
does not get rewritten because of parsing differences.
This wikitext gets parsed into:
<i><b><u>biu</u>'''</b></i><u>bu<b>u</b></u>
The extra ''' token in the middle thwarts DOM rewriting.
However, a slightly different version:
"'''''<u>biu</u>''<u>bu</u>'''<u>u</u>"
gets properly normalized to:
<u>'''''biu''bu'''u</u>
An alternative, but fun strategy to play with is to use the following
two normalization primitives: S(wap) and M(erge).
- S rewrites T1(T2(x)) into T2(T1(x))
(ex: <b><i>foo</i></b> ==> <i><b>foo</b></i>)
- M rewrites (T(x),T(y)) into (T(x,y)).
(ex: <b>foo</b><b>bar</b> ==> <b>foobar</b>)
The current rewriting strategy could possibly be re-implemented as S-M
rewriting. The problem to solve there would be to find an efficient
rewriting strategy that is guaranteed to lead to a normal form. I may
not play with it now, but just documenting it for later (to play with
in my spare time).
This commit is just as a record of fun/experimental code where I get to
learn details of JS, wikitext, parsing, and DOM manipulation. Next
version of this code will attempt to introduce minimal DOM restructuring
across multiple tags at once which can be more efficient.
gwicke: Removed now passing test from whitelist, and updated another whitelist
entry which is now improved.
Change-Id: Ie97bcb164eb62c34ba61aa76ba2f4c232aa713d8
Save action
1) posts html to parsoid service in exchange for wikitext
2) saves wikitext
3) returns parsed content.
On save, VE is hidden and page content is replaced.
Demoing save in toolbar, followup commit will redesign save options
Change-Id: Ibfbe52de08e3483e1a33f0740c03f96ec2b7f90a
This was caused by the fact that a non-structural leaf can not have children, which makes it appear incompatible as a sibling to an arbitrary structural element (like a paragraph) but since it can not contain content we can check that instead.
Change-Id: Ie3c58b4b43f2aa6921f8f82aa82511e231207854
* Added a generic stx_v 'syntax variant' round-trip attribute
* For pre, use stx:'html' vs. no syntax annotation. This might not be 100%
safe for arbitrary html input, so we might want to flip this to stx:'wiki'
later.
* 181 round-trip tests passing
Change-Id: If6080917a3a7c069066db3db60efe59b1f6c28d8
* Very basic support attribute key-value pairs emitted from templates
* Add TALKPAGENAME stub implementation
* Only show 'no revisions' message for top-level pages
Change-Id: I4b4ac0c7b2c0531ac4b39f0f49f4217302576ab9
Using this argument will only return true if the offset is a place you can add any element to (hence the unrestricted part of it). This is good for testing if a paragraph could potentially be inserted there.
Change-Id: I6cc91da437c52493de03eb687b28966198270fea
This code still needs a lot of work, but it seems to work for most
cases. Things that still need to be done:
* Documentation and comments
* Handling of content and text nodes
** Use Trevor's isContent/canContainContent code which I don't have yet
* Preserve attributes when reopening closed elements
* Tests :)
Change-Id: I3bc16c964ef158693490a61ce12beb21e6fe2a9d
The 'insert' and 'remove' operations weren't implemented in the
transaction processor and were a holdover from the old DM
implementation.
Also migrated the tests, especially those that asserted that consecutive
insert/remove operations were combined (this is no longer the case now
that they are replace operations)
Change-Id: I2379fe92b331c5316f70f4b695397da41581cce9
Removed hard-coding of alien nodes, now aliens are automatically used for anything unknown, and block or inline aliens are selected based on whether the parent element can contain content or not.
Change-Id: I5d2a521ead4f4c96cb44d084a5c160cc20d8048e
* After installing Parsoid (sudo npm install -g in modules/parser), run 'node
server.js' from the api directory and navigate to http://localhost:8000/ and
follow the directions. You can start to navigate the English wikipedia at
http://localhost:8000/Main_Page, or manually enter wikitext or HTML DOM to
convert.
* Uses the express framework, could also use just connect
* Uses the cluster module to manage workers per-core and restart those on
failure
Change-Id: I443f2996ed3df00826b038b7476a2f966ab0c425
* Changed RDFa for links according to
http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary
* Added basic support for internal/external link serialization
* Moved numbering of external links from tokenizer to LinkHandler
* Added round-tripping for generic HTML tags
* Replaced nowiki tag with <meta typeOf="mw:tag" content="nowiki"> and <meta
typeOf="mw:tag" content="/nowiki"> for now.
* 154 round-trip tests passing (node parserTests.js --roundtrip).
Change-Id: I16c4db21b1b543ee57c73e569c83025b64664542
Create context instance in surface.
Move over getSelectionRect into ce.surface
Cleanup ve.surface contstructor class
Move linmod/html test objects to sandbox.js
Change-Id: I0cf602ef991100bf6128c68750b02a00566911dc
Switch sandbox demo to use new ui modules
Update VisualEditor.php to use ve2 modules
SpecialPageSandbox working
Change-Id: I8261d6bf6ceb6ae7b7bfa5f61aec6a0121906765
* added stx: 'html' round-trip information for html tags
* added t_stx: 'row' info for row-wise table wiki syntax, and support for it
in the serializer
* the first table row is implicit in wikitext
* renamed lastToken to prevToken in serializer
* strip first newline in an initial chunkCB
Change-Id: I014b046539d1b674d830551c5fd1b74a67f81993
in the new DM. Change method name getAnnotationRange from offset to
getAnnotatedRangeFromOffset. Write tests
Change-Id: I7028803065409e271ceced73e4803954d4a956dc
Lists are a bit tricky, as nested lists are not wrapped in a separate list
item. Should work now though.
Change-Id: I2e5f29f6afa6bdd2d5e5c0c5d019b70c611b73d1
Splits and merges now work, or at least the tests for it pass
The strategy I used is to gather the affected ranges for each of the
following:
* removed stuff
* the entirety of each node touched by a non-zero removal
* if the inserted data busts out of its parent, the entirety of that
parent node (the 'scope')
then get the covering range of all those ranges, and rebuild that.
Change-Id: I7c3b421abc0ba134157ac8b59042675bb1b5073c
getAnnotationRangeFromOffset and offsetContainsAnnotation
which deprecated getAnnotationBoundaries, and getIndexOfAnnotation
write unit tests for proof
Change-Id: I6c0d4e3ca96dd569b1909cd22fce68c3a6fe382c
This fix only affects following transforms, of which there are few right now.
Also removed a stray token mutation in QuoteTransformer.
Change-Id: Id6d4adce944b06fc1a3651cfbf63fc2670125225
* Tokens are now immutable. The progress of transformations is tracked on
chunks instead of tokens. Tokenizer output is cached and can be directly
returned without a need for cloning. Transforms are required to clone or
newly create tokens they are modifying.
* Expansions per chunk are now shared between equivalent frames via a cache
stored on the chunk itself. Equivalence of frames is not yet ideal though,
as right now a hash tree of *unexpanded* arguments is used. This should be
switched to a hash of the fully expanded local parameters instead.
* There is now a vastly improved maybeSyncReturn wrapper for async transforms
that either forwards processing to the iterative transformTokens if the
current transform is still ongoing, or manages a recursive transformation if
needed.
* Parameters for parser functions are now wrapped in abstract Params and
ParserValue objects, which support some handy on-demand *value* expansions.
Keys are always expanded. Parser functions are converted to use these
interfaces, and now properly expand their values in the correct frame.
Making this expansion lazier is certainly possible, but would complicate
transformTokens and other token-handling machinery. Need to investigate if
it would really be worth it. Dead branch elimination is certainly a bigger
win overall.
* Complex recursive asynchronous expansions should now be closer to correct
for both the iterative (transformTokens) and recursive (maybeSyncReturn
after transformTokens has returned) code paths.
* Performance degraded slightly. There are no micro-optimizations done yet
and the shared expansion cache still has a low hit rate. The progress
tracking on chunks is not yet perfect, so there are likely a lot of unneeded
re-expansions that can be easily eliminated. There is also more debug
tracing right now. Obama currently expands in 54 seconds on my laptop.
Change-Id: I4a603f3d3c70ca657ebda9fbb8570269f943d6b6
This means inserting things like </p><p> are now synced correctly and
split the paragraph in the model tree. Merges (removing e.g. </p><p>)
aren't supported yet.
Also, this needs tests, Trevor tells me he's working on porting replace
tests from the old ve/ directory
Change-Id: Ic5050849d7d007a1696dc36548654979aedb53a8
The tree sync for content replacements was adjusting the parent of the
text node affected, rather than the text node itself. This was because
it called getNodeFromOffset(), which returns branch nodes. Switched it
to use selectNodes() in leaves mode
Change-Id: I50a9be18151a1b75815ab19b787b16b6be385bf9
This was because the while loop was never entered as end >= left was
true from the start. Convert the while loop to a do-while loop to make
sure it runs at least once
Change-Id: I9c6436a7b296e65a36b8301095b6edd00507d321
Now returning an empty array when a non annotated character is found
in the range. No longer looping through each annotation, simply
comparing to previous characters annotations and trimming differences.
Write additional test.
Change-Id: I41d2422a931a74325693edca409aed6d5da20ba8
This makes TransactionProcessor work for regular replacements, as well
as insertions and deletions of self-contained pieces of data. This does
NOT yet work for inserting and deleting unbalanced data
(splitting/merging nodes).
I've tested this from the console for insertions and deletions and
simple replacements, but I haven't tested wrappings. We should write a
bunch of unit tests for this some time :)
Change-Id: Ic2fd75d1cf2e127bc9ae58debce67576be2c912f
This is for the case where we have a zero-length range in between two
siblings, and we need to know what index that corresponds to in order to
be able to insert nodes there (rebuildNodes() will use it for this
purpose)
Change-Id: I357d1cd665667a76f955a10b8d9d2810976cdbd7
* Initialize startOffset to 0 not 1, don't know what I was thinking
* Use currentFrame for nodeRange instead of parentFrame, don't know what
I was thinking there either
* If the returned node has no parent (is the document node), don't
attempt to access parentFrame and don't set index
Change-Id: Iad969a7c29436cdf4151ead7e9d3d8e2a30befb3
Selecting a zero-length range at the start or end of a text node
(e.g. (1,1)) would return the text node's parent instead of the text
node
Change-Id: I7fe089bf66b93185dd3415eff53aa7e04e3ffdb2
* Needs to be initialized to 1, not 0
* Needs to be stored *after* left is incremented to account for an
opening
Change-Id: I7978ae241578a8a17120e494684e6e93626a8529
* Text nodes do not have a wrapper to set classes on
* Use CSS class names that are equivalent to JS class names, swapping . with -
Change-Id: I49c877dd5c9b5dd2a9afad3137f12b14883043a1
Trying something trivial like echo 'Hello world' | node parse.js
would throw TypeError: Function.prototype.apply: Arguments list has wrong type
Change-Id: Ia0a1154b0f3edbfb1f228a1d2072fced1b147141
* Setting the rank on tokens is still used currently, but will be phased out
in favor of setting it on chunks. Tokens will be immutable to allow sharing
and caching without a need for cloning.
* Only register for newline and end tokens in QuoteTransformer when active.
Change-Id: I2c45bc7e4a105219a1404ab221eed7f242128f1e
I was using data.length to check if the range was out of bounds, but
this is a problem when using selectNodes() inside of tree sync code
(which happens when performing rebuilds). While tree sync is in
progress, the model tree and the linear model don't match, so we
shouldn't be looking at the linear model for information about the model
tree. Instead, get the length of the DocumentNode and use that.
`
Change-Id: I11a378544ce1281a89cdcd4363c5cb1bf56f3434
This makes it possible to transclude list items from a template.
Note: "5 quotes" test is broken by this patch, it appears that ListHandler
newline processing is changing some state which mysteriously affects the
QuoteTransformer. This is ominous, hopefully there's a simple explanation...
gwicke: fix a bug in tokenizer triggered by definition lists like this:
**; foo : bar
Change-Id: I4e3a86596fe9bffcbfc4bf22895362c3bf742bad
Currently only implements mode=='leaves', i.e. traverse all leaf nodes.
Seems to work from casual testing, but is missing unit tests. See also
other TODO comments in this commit
Change-Id: I41292c21c627a18af7985e8ef9e23c7b14252b21
This gets rid of the length == outerLength-2 hack in getDataFromNode()
and will make it easier to implement similar logic in selectNodes()
Change-Id: I1294350b67ca3eefde2b7fe9fea0bc6d8b90f772