- Only strip newlines after ">" chars (still not robust,
but better than stripping everywhere). This prevents
useless/incorrect diffs in rtve mode and lets us identify
real bugs.
Change-Id: Iab7b41c4b3d6351c090f8d3a3070330325e876d4
Various parts of DM and CE choke on completely empty documents, so
return an empty paragraph instead.
Change-Id: I67062b66a44efe53a1bdaf60907653f0cc55dd25
This hook tried to enforce $wgNamespaceProtection , but that's a core
config variable that's already enforced by MediaWiki itself
Change-Id: I15a40d92e0296eb60054d9845d5c42b880a8f278
Made edit tabs rendered on non view pages work correctly by routing them to the view page with an extra param that auto-initializes the editor
Change-Id: I4fd9106c8b45c6fc79af9ccb44e18944e9b9d8b9
* Added a newlineTransparent flag to handlers that prevents changes to the
onNewline status, so that content following it is still considered to be in
start-of-line context. This fixes a few rt tests where a comment or nowiki
tag is at the start of the line, and following content should end up on the
same line.
* 283 rt parser tests are now passing.
Change-Id: Ie58dcb9e5e9af9000fff61c2e1db5d8649ffc3f6
This solves lots of issues in the integration work, and also makes it much easier to extend this class to integrate it into other skins
Change-Id: I3b3c5b22a5664e6cf37e429cc0ac3be2e75b630f
* tokens are not modified any more (they are supposed to be immutable)
* handler info is now split in start / end objects and potentially a 'make'
method; added more flags to govern the newline behavior of different tags
* added a generic singleLine mode for single-line syntactical environments
* switched the web service to line-based diffs to avoid issues when diffing
the round-trip results of [[:en:Programming language]]
* 280 round-trip tests are passing now
Change-Id: I74b8ffbf69643c5d6e5ec852ec58e680c9018901
This mode strips all newlines from the html source before serializing it back
to wikitext, thus simulating newline-less DOM output from the VE. This
simplistic method also strips newlines in preformatted text, which will show
up as noise in the diff. This simple mode is still useful for the
identification of basic newline-less DOM serialization issues.
An improved version could try to approximate the VE's behavior more closely by
only stripping some newlines.
Due to the experimental nature this mode is not linked from the index page for
now.
Change-Id: I1dfec7ec3e6c12b7de4bbb9ff6f2d8b7834e2857
HTML5 defines space characters as [ \r\n\t\f] in
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#space-character.
It treats these specially in a few contexts. As an example, the foster
parenting algorithm does not apply to space characters.
As a result, this change fixes the round-tripping of spaces between table
tags, which were previously moved before the table.
Change-Id: I32ab29275a9f824fc66d8286638eb42748cfc9a5
from Parsoid HTML output as well as VE HTML output. There are still
some newline related failures from parser tests that needs fixing, but
this is getting close. So committing for now so other eyes can make the
bugs shallow :).
Change-Id: Ia6a218ee9fb3e18fe0573c89ff3a4236779e1e64
- Check if href for links has the wgScriptPath prefix before
attempting to strip it from the href.
Change-Id: I844151ef7317476668d1306b96a2aec5a56fd0f1
- Something like this:
<ul><li>1</li><li>2<ul><li>2.1</li><li>2.2<ul><li>2.2.1</li><li>2.2.2</li></ul></li><li>2.3</li></ul></li><li>3</li></ul>
now serializes properly to:
*1
*2
**2.1
**2.2
***2.2.1
***2.2.2
**2.3
*3
So does this form which is what the above wikitext parses to:
<ul><li>1
</li><li>2
<ul><li>2.1
</li><li>2.2
<ul><li>2.2.1
</li><li>2.2.2
</li></ul></li><li>2.3
</li></ul></li><li>3
</li></ul>
- Lists (and nested lists) are not entirely newline-insensitive.
They still depend on newlines *between* lists. The opening
<ul> tag for non-nested lists should always start on a new line.
So, for example,
<ul><li>foo</li></ul><ul><li>bar</li></ul>
will serialize to:
*foo
*bar
which is incorrect. But,
<ul><li>foo</li></ul>
<ul><li>bar</li></ul>
will correctly serialize to:
*foo
*bar
Change-Id: I13a0290368574865957bcf57aebab488fbbb7026
- More pieces are now simplified and all(?) newline handling
is now centralized in the serializeToken function.
- This commit fixes bugs in rt-ing some code snippets
----------
Ex 1: foo<p>bar</p>baz
----------
- This commit fixes bugs serializing VE generated html
----------
Ex 2: <p>foo</p><pre>bar</pre> ==> foo\n bar
----------
- But, this round of fixes introduces RT failures for certain
code examples in parserTests.txt. In all these failing cases,
inline text/html is embedded within a generated <p> tag during
parsing. If these generated <p> tags can have a "gc:1" attribute
added to them, we can properly serialize them to the original
form.
----------
Ex 3: foo<pre>bar</pre>
Parsed HTML: <p>foo</p><pre>bar</pre>
----------
Note how this parsed HTML is identical to what the VE outputs
in Example 2 above. So, without the gc:1 attribute, we now
have conflicting requirements on the example same HTML.
This increases confidence in the correctness of my commit here.
Change-Id: I86beadec91c445a7f8a6d36a639b406697daa0a2