Commit graph

172 commits

Author SHA1 Message Date
Subramanya Sastry e66fedcfbe Fix typo in closing sup tag in parsoid html in a test
Change-Id: I06b827ed4b03216eaf73296739cb428eabe14aae
2024-08-12 16:23:36 -05:00
Adam Wight 525b7ab876 Reading mode square brackets become customizable through CSS
Now the brackets can be hidden with `display: none;` or made
invisible but appear in copy-pasted content with `font-size: 0;`.

Bug: T370512
Change-Id: I21accb6bf9cdd7f96144cc9a762ff889e48efca4
2024-07-24 12:14:07 +02:00
thiemowmde 1aeac001fe Additional parser test cases for delayed extends usage
We want it to be possible to turn a <ref> into an extended one after
it was re-used for the first time, not knowing if it later turns out
to be an extended ref.

This should work: <ref name=x/> is a short re-use of a ref that later
turns out to be a <ref extends=… name=x>…</ref>.

<ref name=x></ref> is just another syntax that should behave
identical.

However, it should probably not be possible to turn
<ref name=x>foo</ref> into a subref later because it really, really
looks like a normal ref. Even if the content matches with a later
<ref extends=… name=x>foo</ref> and we usually ignore identical
content, I suggest to block this with a dedicated error message. But
this is for a later patch. This patch here just documents the status
quo.

This patch also contains minor code cleanups that will be useful in
Ia752a7d.

Bug: T367749
Change-Id: Ie38769b36e5c476b96e7af7f03b0fc800b32ba97
2024-07-05 13:08:23 +02:00
Subramanya Sastry f38bcde14e Add 'mw-cite-backlink' class to Parsoid's HTML
This patch adds 'mw-cite-backlink' to the linkback span for both
named and unnamed refs. This requires us to add a span wrapper
for the unnamed refs case.

Verified in local testing that this causes aria attributes to be
added to the linkback tags in Parsoid HTML.

This should likely fix other gadgets and code that rely on this
class name to do their work.

Strictly speaking, this is a breaking change since we add an
extra span wrapper for the unnamed ref backlinks which *could*
break anyone using a li > a[rel="mw:referencedBy"] selector.
But, given the specificity of the a[rel] selector, the "li >"
part is unnecessary and might not be used. So, if we wanted to
push our luck (and break process), we could get this in.

Alternatively, we could:
- do this in the the read views OutputTransformPipeline.
- do a real major version bump -- we would be exercising that
  functionality and have to fix and implement any missing pieces
  that may have broke as part of the RESTBase sunsetting.
- not add the span wrapper and fix gadgets to explicitly look for
  both named and unnamed refs with their selectors.

Bug: T328695
Change-Id: Icbd325ebd12cb42186c5b5220dc016835eb18b64
2024-05-31 14:42:04 -05:00
Subramanya Sastry 50f01c4f28 Add 'reference-text' class to Parsoid's HTML
This adds the 'reference-text' class where Parsoid added
'mw-reference-text'.

If we don't care about the "mw-" prefix, since there are very
few wiki and code references to 'mw-reference-text', it might
seem like we could update all those references and rip out
'mw-reference-text' from Parsoid output.

But, Parsoid HTML is also exposed via the REST API which means
there are likely many users out there analyzing Parsoid HTML.
https://github.com/search?q=%22mw-reference-text%22+NOT+language%3AHTML&type=code
says there are 512 references to this string - so looks like
we are probably going to rely on a major HTML version bump in
Parsoid in the future and then rip out all the duplicate
classes (mw-ref, mw-references, mw-reference-text OR
reference, references, reference-text).

Bug: T328695
Change-Id: I04b18ac75863a0e3e61bdd47b34508e5547dc872
2024-05-28 12:36:49 -05:00
Isabelle Hurbain-Palatin d94d7f923b Re-enable and fix tests after Parsoid modification
Bug: T214241
Depends-On: I022501851ea45cfd2554f9e914aa6f4fe2b07548
Change-Id: Ib7d7e6debb9ea73cb05d4142392b00a195644116
2024-04-09 15:57:17 -04:00
jenkins-bot 5d878d41f7 Merge "Disable some tests for Parsoid CI" 2024-04-04 15:56:55 +00:00
Isabelle Hurbain-Palatin b1ad116f89 Disable some tests for Parsoid CI
Bug: T214241
Needed-By: I022501851ea45cfd2554f9e914aa6f4fe2b07548
Change-Id: I40f3c82cbf861426137b935f1bf0bc4b3c6194b0
2024-04-03 03:01:21 +00:00
thiemowmde 7a9a48b1e0 Improve test coverage for incomplete follow=… rendering
There is currently no test coverage for recursively parsing the
contents of a <ref>…</ref> together with an incomplete follow="…".
This is critical because that's an entirely separate, special code
path (the one that creates a <p> instead of an <li>). Without this
 test we could return unparsed wikitext and never notice.

I discovered this while playing with I0b0e358.

Bug: T245549
Change-Id: Ie65c6bf6bf75db26e0fff733c93cfa28ee7bd228
2024-03-14 15:31:29 +01:00
thiemowmde c02595bb97 Drop obscure error message about an unused group
The message was part of the original patch that introduced the group
feature in 2009, see https://phabricator.wikimedia.org/rECIT75004e33.
Notice how there was never a test scenario for this message. A test
was added in 2020 via I07738cc.

The message appears only in a rare edge-case when a group is entirely
unused in the text, and only when the group is not empty. The shortest
possible example is:

 <references group=g>
 <ref group=g name=a>a</ref>
 </references>

Just adding something unrelated like `<ref group=g>x</ref>` to the
text changes the error message. Now the group is "used". But this
notion is confusing to begin with. References can be part of a group,
and we can use references, but we can't use groups as if they are a
separate entity.

A better error message already exists.

Notice how this special error message doesn't appear anywhere in the
Parsoid code path. That was already using the other, more generic
error message.

Bug: T269531
Change-Id: I63f663d76e45e6c3d664f145d9a564ee00ff53cd
2024-03-04 13:04:36 +01:00
thiemowmde e5eee2d049 Fix confusing error message with empty group
This is about the error message that currently says:

 »Cite error: <ref> tag with name "a" defined in <references> has
  group attribute "" which does not appear in prior text.«

This is a special error message that appears only when a group name
does not appear anywhere in the text. In all other cases a simpler
error message is shown:

 »Cite error: <ref> tag with name "a" defined in <references> is
  not used in prior text.«

While the first error message is not wrong in the edge-case
described in T269531, it's very confusing for a multitude of
reasons. For example:
* There is no group attribute in the wikitext.
* Just adding something completely unrelated like `<ref>x</ref>` to
  the text shows the other error message.

The reason for this behavior is that the assumed default is an empty
`group=""`. The error message changes the moment any other <ref> in
the same group appears in the text vs. when the group is entirely
unused.

We can probably remove this error message entirely, but should at
least not use it when there is no group.

Notice how the Parsoid code path was already using the other error
message.

Bug: T269531
Change-Id: Ifa2e97254f4cda72233a057d8760fb1116143552
2024-03-04 12:43:26 +01:00
Subramanya Sastry 15f5cc71be Add dummy html/parsoid sections for three failing tests
* This is to worka round some confusing html2html failures in CI
  for these tests (that are not reproducible locally for me).

Change-Id: I07725155ef5e04eb4346a90c34cbacbd70e88ea6
2024-01-26 23:56:41 -06:00
Subramanya Sastry b3dff81b6d Enable all Parsoid test modes for citeParserTests.txt
* Updated the known failures file.

Depends-On: I093aeed933fe5927e031c879d48e0191e8ef4685
Depends-On: I0df36653de4c70c7a4669b874f43385a18ac5e9d
Depends-On: Id81e2f4331ce73145389e717b7c78f49b29743bf
Depends-On: I7fb86778cfb674ecc76f748786fa08d8f6ac9c65
Change-Id: I101564587dc7959cd620f0b2cc4f8e436bfba6da
2024-01-25 22:11:49 +00:00
thiemowmde 80c8eaf3ce Convert (almost) all .css files to .less
Intentionally no other change is made (yet). This is for a later,
separate patch.

Intentionally not touching the huge list of per-language
ext.cite.style.*.css files for the moment. Again, I would prefer to
do this in a separate patch.

Change-Id: I4e392c7bd1c69849a6c7946676a64c749ddbcd60
2024-01-21 15:39:08 +01:00
Subramanya Sastry ff699c31aa Sync up Cite repo with Parsoid
This now aligns with Parsoid commit a8db16f33a638931f91ba75d639bb9d661cb979b

Change-Id: Ibd53e38121231a3501dbbfce1c1e1db370740c0b
2024-01-19 12:24:59 -06:00
C. Scott Ananian 234da84418 Hook up Parsoid implementation of Cite
This commit also moves certain parser tests involving <ref> from
the Parsoid repo to citeParserTests.txt in this repo.

Bug: T354215
Change-Id: Ie5b211d2af01a56684473723c68a9ab2775542e3
2024-01-19 11:57:11 -05:00
Adam Wight e4b964eec7 Documenting state of subref reuse rollback
Some interesting stuff is happening, seems to have revealed bugs:
* Rolled-back warnings are still present on the ref
* Subref reuse numbering starts at 0 instead of 1, and formatting is cringe.

But subref rollback does seem to work!

Change-Id: If6321b34d27370553ba85e63dd1e2ae6a3b7c099
2024-01-09 18:05:05 +01:00
Arlo Breault fa5100fa92 Sync up Cite repo with Parsoid
This now aligns with Parsoid commit 285e5e390af1c9370203bb3f6111f01fd41d3009

Change-Id: I9311cd580b938d4dabc43f4a659fb49243f22783
2024-01-05 14:30:56 -05:00
jenkins-bot 9b770fba99 Merge "Include more information in missing parent placeholder" 2024-01-05 11:50:06 +00:00
jenkins-bot 1f7d6527a4 Merge "Render list-defined parent without a backlink" 2024-01-05 11:40:37 +00:00
Adam Wight 76e6e870d4 Include more information in missing parent placeholder
This allows the subreferences to be collected together under a heading.

Bug: T353451
Change-Id: Ibf28f0baca14de8140c87b03ad4aa86d2f81a20d
2024-01-05 11:21:12 +00:00
Adam Wight 5a69c54900 Render list-defined parent without a backlink
In this case, there was never a ref with this name in the article so
no backlinks should be rendered.

TODO:
* test case with empty parent backlink and LDR parent

Bug: T353451
Change-Id: I8a7abd05a48ce83da3beb92b15e894d53252bd33
2024-01-05 12:07:22 +01:00
thiemowmde b01b420199 Track errors in a status object instead of an array
This is another improvement after I7390b68. Status objects are made
to keep track of multiple errors. The only difference is: The merge
method skips duplicates when the message and all parameters are
identical. This causes a minor user-facing change. One of the
shortest possible examples is:

 <references>
 <ref />
 <ref />
 </references>

This showed two identical, indistinguishable error messages before,
but will only show one now. We argue this is fine. The duplicates
are confusing and of (almost) no value to the user. In case the
information is relevant the correct solution is to make the error
messages distinguishable, or introduce a message like "multiple
<ref> tags defined in <references> have the same error". This is
something for a later patch, if needed.

Bug: T353266
Change-Id: I444105462ed24d5ba37b057622b4dc847b40f8d8
2024-01-05 10:49:08 +01:00
thiemowmde ca3203699c Capitalized dir="RTL" should not trigger any error
This fixes a minor issue introduced in I294b59f. Two identical
dir="…" with different capitalizations should not be reported as an
error.

Turns out the implementation in the Cite extension doesn't care
about this capitalization at all. That's why I suggest to do the
normalization as early as possible. This is slightly different in
the Parsoid implementation.

Bug: T202593
Change-Id: I96b4a281d6020d61d1f36ec027cf833bbb244f03
2024-01-03 16:30:16 +00:00
jenkins-bot fc5d0bb6a8 Merge "Disable Parsoid-integrated-mode tests for now" 2024-01-02 20:04:15 +00:00
Subramanya Sastry c3b1492d10 Disable Parsoid-integrated-mode tests for now
* Since Cite development happens in two repos (here and the Parsoid
  repo), integrated tests ensures that changes don't fall too far
  out of sync.

  CI runs Parsoid-integrated-mode tests in extensions repo with the
  vendor-released Parsoid.

  Parsoid CI runs Parsoid-standalone-mode tests in the Parsoid repo
  which also has a copy of the citeParserTests.txt file found here.
  But, that CI run uses the Parsoid patch itself.

  This difference makes for unnecessrily laborious test syncing
  while making changes to the two repos. It is manageable for one-off
  changes but when making lots of updates that changes tests a lot,
  this quickly becomes painful.

* For now, we can break this coupling temporarily by disabling
  Parsoid-integrated-mode test runs. This simplifies the test syncing
  by letting patches in Cite repo to be merged in a chain and then
  doing a single test sync to the Parsoid repo (otherwise, Parsoid's
  CI will be broken since the html/php sections in Parsoid's cite
  test copy will be out of date).

* Filed T354215 to move Parsoid's Cite implemntation to this repo
  which eliminates this complexity altogether.

Change-Id: Id5727381b0e23058d098180c308797b2555ad02f
2024-01-02 19:31:52 +00:00
jenkins-bot d8711ce3ed Merge "Show warning when dir="…" don't match" 2023-12-21 21:24:13 +00:00
thiemowmde b181614ba1 Show warning when dir="…" don't match
This classifies as a "warning" because we still show everything,
just with an error message appended.

Disabling the Parsoid tests right away hopefully makes it easier to
do the same change in Parsoid.

Bug: T202593
Depends-On: If14acd1070617ca8c4d15be6b1759bd47ead4926
Change-Id: I294b59f989f553932b40d08308906dd72d92d2cd
2023-12-19 14:17:30 +01:00
Subramanya Sastry 58f008ae1e Sync up Cite repo tests with Parsoid + (en/dis)able some Parsoid tests
* This now aligns with Parsoid commit 0fab92ba453d424aedeadaaa9e1514c42bbd94d1
* Disabled the newly added tests because that Parsoid fixes for the
  tests haven't been released to vendor to let CI pass these tests.
* Re-enabled a previously disabled test.

Change-Id: I4ab87d2d486b7a1fef652c50c4f1e79ddfe83ce6
2023-12-18 16:35:01 -06:00
jenkins-bot f112648cb2 Merge "Revert "Revert "Temporarily disable a Parsoid test to let us change code in Parsoid""" 2023-12-15 21:14:43 +00:00
Subramanya Sastry 8ac343718b Revert "Revert "Temporarily disable a Parsoid test to let us change code in Parsoid""
This reverts commit b163add15b.

Reason for revert: This was my mistake. I forgot that reverting this
would break Parsoid CI once the Parsoid Cite patch merged. So, I have to
wait till the Parsoid Cite change is released to vendor before I sync
the test change here.

Change-Id: Icaecee1e56907980681aae01be377b6906bd93a6
2023-12-15 20:36:43 +00:00
jenkins-bot 708436c998 Merge "Revert "Temporarily disable a Parsoid test to let us change code in Parsoid"" 2023-12-15 19:24:05 +00:00
jenkins-bot d5fe9559e7 Merge "Temporarily disable a Parsoid test to let us change code in Parsoid" 2023-12-15 19:10:57 +00:00
xiplus f7a181ed42 Give a different error from too_many_keys when 'follow' attribute conflicts
Add message "cite_error_ref_follow_conflicts" for tags with
conflicting parameters.

Bug: T299280
Change-Id: Ie64f4ab4831966f66f812ea67cc244718f818afb
2023-12-15 15:23:53 +01:00
jenkins-bot cdc2bd2b96 Merge "Skip URL encoding in id="…" attributes that aren't URLs" 2023-12-12 14:34:07 +00:00
thiemowmde 89bd26fcf5 Skip URL encoding in id="…" attributes that aren't URLs
I played around with a few options (see patchset 1) but ended
introducing new terminology:

* "Backlink" describes the ↑ button down in the list of <references>
  that jumps back up into the article. The code was already using
  "backlink" in some places.
* "Backlink target" is the id="…" attribute up there, visible as the
  typical [1] in the article.
* I use "jump" to describe the idea that clicking the [1] jumps down
  to the full reference.
* "Jump target" is the id="…" down there in the list of <references>.
* "Jump link" is the same id, but encoded to be used as the href="…"
  attribute when clicking the [1].

I hope this makes sense. Suggestions welcome.

Another benefit is that "normalization" is really only normalization
now, not any URL and/or HTML encoding.

Bug: T298278
Change-Id: I5a64ac43aef895110b61df65b27f683b131886fb
2023-12-12 13:56:37 +00:00
thiemowmde 6a18eac513 Fix regular expressions not being case-insensitive
The "parser marker" placeholders are case-sensitive, e.g. for a tag
that's written like <rEf> the placeholder will also say …-rEf-…. This
was really just a mistake.

The error is as old as this code is. Added in commit 75004e33 in
2009.

Note we shouldn't use /i at the end because the marker itself should
not be case-insensitive. Only the tag name.

Instead of adding more (slow) test cases I update two that are
exactly about this part of Cite (nested tags) anyway.

Bug: T64335
Change-Id: I44c7a42a0da682a1082952fd1af817bf7d45378c
2023-12-11 19:21:12 +01:00
Subramanya Sastry b163add15b Revert "Temporarily disable a Parsoid test to let us change code in Parsoid"
This reverts commit 650d6a9f13.

Depends-On: I7249bd03a7942ff7725a20178a051300b777e3a8
Change-Id: Ica414002604f2dffde866dfac6a85db400ea714e
2023-12-08 11:35:13 -06:00
Subramanya Sastry 650d6a9f13 Temporarily disable a Parsoid test to let us change code in Parsoid
Change-Id: I401656265253a429691cc76adc5db5b129cff2cc
2023-12-08 11:33:38 -06:00
Subramanya Sastry 69529bdcf6 Sync up Cite repo with Parsoid
This now aligns with Parsoid commit 2f962cd9a66c9fd69664e3e8a2d79820cd6f1453

Change-Id: Ia93f8ced5c79e2ba49d40aafe6ea14d1691609b0
2023-12-07 18:46:23 +00:00
thiemowmde 0bae6eb224 Fix confusing wording of "invalid parameter in <ref>" message
This error message really always meant nothing but "there is an
unknown parameter in your <ref> tag". It's unnecessarily confusing
only for historical reasons. See T299280#9384546 for a long
explanation.

Bug: T299280
Change-Id: Ic224d5828f7b7ac0928c44f526c61654ccf3425e
2023-12-07 10:54:46 +01:00
jenkins-bot e73c7d61ca Merge "Correctly encode non-breaking spaces in reference names" 2023-12-06 18:00:43 +00:00
thiemowmde f9bb125e4c Correctly encode non-breaking spaces in reference names
Note how this currently behaves. The user input is
<ref name="…&nbsp;…">
But what we get in the end is
<li id="…&#160;…">
This implies that the &nbsp; is decoded and re-encoded with a
slightly different entity encoding. (Note that &nbsp; and &#160;
and &#xa0; are all the same character.)

Also note how there is only an underscore in the href="…", but the
non-breaking space is gone. This is identical to what happens in
links and headlines. Try for example [[a&nbsp;_a]]. Multiple
underscores, non-breaking spaces, and normal spaces will be
normalized. We just do the same in the id="…" attributes.

Note this fixes only one of the issues listed in T298278.

Bug: T298278
Change-Id: Ia01f2fdd3b3e9ee6aaa9da60ca3386dcd5d6b1a0
2023-12-05 07:58:38 +01:00
Subramanya Sastry f267635b48 Update tests to match update to <gallery> output in core
Depends-On: I5039c7ef9e07199c256fd568b4f94714e5831d17
Change-Id: I69776da432eeca134785329d424d310fb506bce6
2023-11-27 18:09:03 -06:00
Subramanya Sastry 4929e015d1 Temporarily disable test to allow us to make changes in core
* Needed by change I5039c7ef9e07199c256fd568b4f94714e5831d17

Change-Id: Ieeb6b98afc74595a928bd141889486acfc9eb346
2023-11-27 18:07:35 -06:00
thiemowmde 5aa6cb0c7b Replace extremely slow parser test with fast unit tests
This parser test is a bit obscure, in my opinion. We added it in
I8c4de96 to make sure we don't get thousand separators in most
places.

We continued reworking the code since then. By now it's effectively
impossible to "accidentally" get thousand separators. The
problematic methods from the Language class are not even accessible
any more from this code.

To make the tests more robust we now use createNoOpMock (done via
the previous patch) where it matters, specifically for all Language
and Parser mocks. This proves the problematic Language methods are
never called.

Bug: T253743
Bug: T238187
Change-Id: I9bfe1f4decfaf699996da63e19473c2c0d581d9d
2023-07-28 00:32:38 +00:00
Arlo Breault 9177b50feb Sync up Cite repo with Parsoid
This now aligns with Parsoid commit 7b724ddc6c4abd44de2e1f67f64ca1d9685c6b4f

Change-Id: If24ad064d7d84fb070cb0c7976d56373e8a9db3a
2023-05-12 09:59:44 -04:00
Arlo Breault 6ed2daec12 Disabled tests to break circular dependency
html/php sections are added since otherwise it complains that the
"Test lacks html or metadata section on lines"

Change-Id: Ib1c47be09bdbe1e84b595373ad71772f2a983fc9
2023-05-10 19:47:13 -04:00
Arlo Breault 1c8a0115e9 Update parsertests with new media classes
Depends-On: Ifd4001e312a5fa4b7beaad63ba8c4e79e3201b9b
Change-Id: I80b76e2f4f538eba323f47cb2bf831016e2b2dc2
2023-04-26 13:19:58 -04:00
Arlo Breault 984aa7750c Disable tests to break circular dependency
Needed-By: Ifd4001e312a5fa4b7beaad63ba8c4e79e3201b9b
Change-Id: Ie85ee7048273023a2c51f42a333a9c1493360404
2023-04-20 16:00:43 -04:00