General changes
---------------
* Replaced the hacky 'inBlockNode' parser pipeline option with
a cleaner 'noPWrapping' option that suppresses paragraph wrapping
in sub-pipelines (ex: recursive link content, ref tags, attribute
content, etc.).
Changes to wt2html pipeline
---------------------------
* Fixed paragraph-wrapping code to ensure that there are no bare
text nodes left behind, but without removing the line-based block-tag
influences on p-wrapping. Some simplifications as well.
TODO: There are still some discrepancies around <blockquote>
p-wrapping behavior. These will be investigated and addressed
in a future patch.
* Fixed foster parenting code to ensure that fostered content is
added in p-tags where necessary rather than span-tags.
Changes to html2wt/selser pipeline
----------------------------------
* Fixed DOMDiff to tag mw:DiffMarker nodes with a is-block-node
attribute when the deleted node is a block node. This is used
during selective serialization to discard original separators
between adjacent p-nodes if either of their neighbors is a
deleted block node.
* Fixed serialization to account for changes to p-wrapping.
- Updated tag handlers for the <p> tag.
- Updated separator handling to deal with deleted block tags
and their influence on separators around adjacent p-tags.
- Updated selser output code to test whether a deleted block
tag forces nowiki escaping on unedited content from adjacent
p-tags.
Changes to parser tests / test setup
------------------------------------
* Tweaked selser test generation to ensure that text nodes are always
inserted in p-wrappers where necessary.
* Updated parser test output for several tests to introduce p-tags
instead of span-tags or missing p-tags, add html/parsoid section,
or in one case, add missing HTML output.
Parser Test Result changes
--------------------------
Newly passing
- 12 wt2html
- 1 wt2wt
- 3 html2html
- 3 html2wt
Newly failing
- 1 html2wt
"3. Leading whitespace in indent-pre suppressing contexts should not be escaped"
This is just normalization of output where multiple HTML forms
serialize to the same wikitext with a newline difference. It is not
worth the complexity to fix this.
- 1 wt2wt
""Trailing newlines in a deep dom-subtree that ends a wikitext line"
This is again normalization during serialization where an extra
unnecessary newline is introduced.
- A bunch of selser test changes.
182 +add, 188 -add => 6 fewer selser failures
- That is a lot of changes to sift through, and I didn't look at every
one of those, but a number of changes seem to be harmless, and just
a change to previously "failing" tests.
- "Media link with nasty text" test seems to have a lot of selser
changes, but the HTML generated by Parsoid seems to be "buggy" with
interesting DSR values as well. That needs investigation separately.
- "HTML nested bullet list, closed tags (bug 5497) [[3,3,4,[0,1,4],3]]"
has seen a degradation where a dirty diff got introduced.
Haven't investigated carefully why that is so.
Change-Id: Ia9c9950717120fbcd03abfe4e09168e787669ac4
* Nested <ref> tag support was broken in 69b6ec4d since that patch
effectively processed <refs> only on the final DOM rather than
extracted <ref> information in sub-pipelines. In doing so, it
broke support for <ref> in <ref> tags which are supported by {{efn}}
templates and used as follows.
{{efn|A clarification.{{sfn|Smith|2009|p=2}}}}
The {{sfn}} tpl generates a <ref> tag inside another <ref> tag
generated by the {{efn}} tpl.
* This patch fixes the breakage by processing <ref> content after it
is extracted (in case it is known to have nested <ref> tags).
* This fixes the rendering on enwiki/Otto_I%2C_Holy_Roman_Emperor
and is an improvement over when it was broken. The nested <ref>
gets id 18 (just like the Cite.php handling) whereas it used to
get id 1 before the breakage.
* TODO:
- Figure out a way to add a test for this.
Change-Id: Ib82b75f66249b2133a123adbe6fd7acbfd8ec8fb
* Process <ref> and <references> tag on the top-level DOM only
and ignore the generateRefs pass when processing other content.
* This required a few fixes:
- ensure that DOMPostProcessor knows about the top-level.
- ensure that DOMVisitor knows about the top-level.
- cleanup pass leaves behind the ref-marker metas from DOMs from
non top-level content.
- process nested references content.
* One of the references tests had incorrect parsed output. That test
has been updated to reflect the correct output from this patch.
* Barack Obama seems to now have the correct numbering on references.
Change-Id: I5465721d2fc715f2168f267e773a446bc37d198b
* Keep track of table nesting in token stream patcher and use it to
convert <td>, <tr>, and <th> tags to plain strings.
* This fix is only enabled on the top-level token stream.
To support this, fixed the resetState function in the parser
construction code to pass in a toplevel flag which lets the
token stream patcher know the context it is in.
* Fixes 29 (wt2html,wt2wt,html2html,selser) tests and improves
results of 1 previously blacklisted tests. The failing selser
test is actually a false failure because selser is more accurate
than non-selser wts.
* Consolidated a few separate tests into a single test that covers
all this functionality.
- This new test fails wt2wt and html2wt modes because serializer
uses tokenizer information which continues to return table tokens
and results in <nowiki> wrappers.
Bug: 66489
Bug: 66498
Change-Id: I9f42354ea9efb0f8adfc96c23760012220d00dd4
The failure looks like it was caused by
Ib7aa9449bbd994cb23b83b3f23cff944b1cddadf in core. Its not a regression,
just different looking equally valid results.
Change-Id: Icfe02eaf7f2a2d8273e973442e04006b6024684d
We've had Parser::recursiveTagParse since MediaWiki 1.8, back in 2006.
Remove code that only gets used if it's not available.
Change-Id: I76eed5570a675a14cf70ab10981661e0bc8bda99