Instead of outputting a <ref>'s HTML in both data-mw and in
<references>, output it only in the later and point to it from
data-mw.body.id.
Also preserve data-parsoid for <ref>s text in <references>, as now
that's the only representation of it.
To correctly do html2wt when there are <ref>s inside <references>
we need access to the main document DOM when serializing, so also
ensure that env.page.dom is correctly set (it was only set in v2
before).
Updated tests results and blacklist (some tests now pass).
Change-Id: I0fa7ad692585af19136909bfec39db9868b137c5
Recent core update to gallery display includes srcset support; one of the
parser test cases for Cite uses a gallery and needs updating to match.
Bug: T64709
Change-Id: I6283415e2f7608d9a5c53bc94804fd95a79d3793
* Fixed parser test output of one test to add unique ids. Output for
other parser tests modified in f528f508 still need fixing up.
Left for a separate patch.
Change-Id: I04546c2a590930121d960239a1954b26771e9c80
* <references /> was not appearing on its own line and was
instead getting tacked onto previous line of wikitext output.
* Change in blacklisted wt2wt test shows that the new output
is better.
Change-Id: Ie82401a3bc6082b733339e2456810b6b1c87529a
This patch emits a reflist for all ref groups that still have
<ref>s in them at the end of the document. Currently Cite.php only
does so for the default group. See also T88290.
On html2wt the missing <references> are added to the wikitext,
which makes the wikitext correct. Selser catches this if not part
of the edit.
Change tests to include an explicit <references /> tag, and add
one for explcitly testing that they do get added. This last one
has to be blacklisted as the new <references /> don't appear with
selser.
Change-Id: I79af2c34481cadbf0d68d9571928979adf559b58
From prod error logs:
Undefined index: 0 in Cite_body.php on line 396
Undefined index: 1 in Cite_body.php on line 396
Undefined index: 2 in Cite_body.php on line 396
Undefined index: 3 in Cite_body.php on line 396
Undefined index: follow in Cite_body.php on line 396
Change-Id: Id727f2fd7e72d8c4ceb74fdac42885d5c030b4af
One particular case is that Cite.php considers equal a name and
its encoding, i.e. "a & b" === "a & b". Added a new test for
this case, but blacklisted it on html2wt, wt2wt and html2html due
to a different problem with how Parsoid encodes entities. This
will be investigated separately, as a simple fix could break
unrelated cases.
Also updated tests and blacklist to the new ids.
Change-Id: I87637a1dc812a3a8f29327b9e6c0040b22a651c4
Also encode cite ids properly as now they can contain arbitrary
text. Change in blacklist due to this.
TODO: Investigate if it would be better to do this directly in
the tokenizer.
Change-Id: Ic112124e90d256d73a351d0d57fe3c7546fa065f
* Although this resolves the crashes, I'm unsatisfied with it as a
proper fix to the underlying issue. There are many places throughout
the codebase where we serialize and then parse document fragments
that should be instrumented to store and unpack data-* attributes.
Bug: T76518
Change-Id: Idca1b0a37ec924a71cb51160d000c7de9717d422
The coding conventions suggest avoiding ==,
and for this condition definedness is actually more relevant
than whether the string has any text, but since
the string can also be '0', checking for !$text doesn't work.
Similar to I15b422d3345bf4522e68a17dce9682ff28484559 .
Change-Id: Ib823678b639bf4f1a92dffcd9e41c780b56ab128
The coding conventions suggest avoiding ==,
and for this condition definedness is actually more relevant
than whether the string has any text.
Change-Id: I15b422d3345bf4522e68a17dce9682ff28484559
* Currently, this mimics Cite.php behavior where "a b" and "a_b"
are considered identical ids.
* Added new parser test.
* Fixed output of another test.
* Fixed section name of a commented out test.
Change-Id: I0c51404c3e659bbddfe9a8909aa6a109d368b762
In this function $text can be both false and empty string.
It is more intuitive to use a boolean operator here than
to rely on the fact that comparing to '' using == happens
to give the correct result.
Change-Id: I08248a3fcade7744287e9b9f3bc176d29ac1ecde
* This is both faster and consistent with how we're accessing other
parsoid attributes. It's also a step towards not having this data in
the html output.
* Changes to parserTests and the blacklist are for attribute order.
* Requires upgrading domino to 1.0.18
https://github.com/fgnass/domino/pull/48
Change-Id: I1edbc260887d480adf04763b15043c374e27cceb
General changes
---------------
* Replaced the hacky 'inBlockNode' parser pipeline option with
a cleaner 'noPWrapping' option that suppresses paragraph wrapping
in sub-pipelines (ex: recursive link content, ref tags, attribute
content, etc.).
Changes to wt2html pipeline
---------------------------
* Fixed paragraph-wrapping code to ensure that there are no bare
text nodes left behind, but without removing the line-based block-tag
influences on p-wrapping. Some simplifications as well.
TODO: There are still some discrepancies around <blockquote>
p-wrapping behavior. These will be investigated and addressed
in a future patch.
* Fixed foster parenting code to ensure that fostered content is
added in p-tags where necessary rather than span-tags.
Changes to html2wt/selser pipeline
----------------------------------
* Fixed DOMDiff to tag mw:DiffMarker nodes with a is-block-node
attribute when the deleted node is a block node. This is used
during selective serialization to discard original separators
between adjacent p-nodes if either of their neighbors is a
deleted block node.
* Fixed serialization to account for changes to p-wrapping.
- Updated tag handlers for the <p> tag.
- Updated separator handling to deal with deleted block tags
and their influence on separators around adjacent p-tags.
- Updated selser output code to test whether a deleted block
tag forces nowiki escaping on unedited content from adjacent
p-tags.
Changes to parser tests / test setup
------------------------------------
* Tweaked selser test generation to ensure that text nodes are always
inserted in p-wrappers where necessary.
* Updated parser test output for several tests to introduce p-tags
instead of span-tags or missing p-tags, add html/parsoid section,
or in one case, add missing HTML output.
Parser Test Result changes
--------------------------
Newly passing
- 12 wt2html
- 1 wt2wt
- 3 html2html
- 3 html2wt
Newly failing
- 1 html2wt
"3. Leading whitespace in indent-pre suppressing contexts should not be escaped"
This is just normalization of output where multiple HTML forms
serialize to the same wikitext with a newline difference. It is not
worth the complexity to fix this.
- 1 wt2wt
""Trailing newlines in a deep dom-subtree that ends a wikitext line"
This is again normalization during serialization where an extra
unnecessary newline is introduced.
- A bunch of selser test changes.
182 +add, 188 -add => 6 fewer selser failures
- That is a lot of changes to sift through, and I didn't look at every
one of those, but a number of changes seem to be harmless, and just
a change to previously "failing" tests.
- "Media link with nasty text" test seems to have a lot of selser
changes, but the HTML generated by Parsoid seems to be "buggy" with
interesting DSR values as well. That needs investigation separately.
- "HTML nested bullet list, closed tags (bug 5497) [[3,3,4,[0,1,4],3]]"
has seen a degradation where a dirty diff got introduced.
Haven't investigated carefully why that is so.
Change-Id: Ia9c9950717120fbcd03abfe4e09168e787669ac4