These tests pass today because Parsoid is providing an
alternative implementation of Cite, but that means this
test case isn't actually testing the code in this repo.
Bug: T354215
Change-Id: I42521026bab36035ae5eded7c05716234a5a29ea
This commit also moves certain parser tests involving <ref> from
the Parsoid repo to citeParserTests.txt in this repo.
Bug: T354215
Change-Id: Ie5b211d2af01a56684473723c68a9ab2775542e3
Such a message shouldn't exist, and doesn't:
https://global-search.toolforge.org/?q=.®ex=1&namespaces=8&title=Cite+link+label+group-
Additional notes:
* Rename the method to make it more obvious that it's not a cheap
getter, but doing something slightly more expensive.
* Use more appropriate array_key_exists to check if a cache entry
already exists.
* Also add a bit more documentation.
Bug: T297430
Bug: T353227
Change-Id: Ia5827bbf6fd700b87a749aac17320796428f0688
This encapsulation gives us field name, type validation and code
documentation.
This patch only affects ReferenceStack and continues to return
approximately the same array outputs to callers. Some additional
information is included and the placeholder column has a new name.
Bug: T353451
Change-Id: I405fe7ac241f6991fd4c526bfbb58fbc34f2e147
The previous patch deprecated the last conditional depending on magic
meanings of 0 and -1, so now we're free to let "count" take on a more
natural meaning: the number of times a footnote mark appears in
article text.
Includes a small hack to avoid changing parser output, by
artificially decrementing the count by one during rendering. The
hack can be removed and test output updated in a separate patch.
Bug: T353227
Change-Id: I6f76c50357b274ff97321533e52f435798048268
Encapsulate all information about a ref inside of the internal
structure, rather than relying on the container to be organized by
group.
Bug: T353451
Change-Id: I4c91e8089638b7655bf120402a4a5fcbd1b35452
This is another improvement after I7390b68. Status objects are made
to keep track of multiple errors. The only difference is: The merge
method skips duplicates when the message and all parameters are
identical. This causes a minor user-facing change. One of the
shortest possible examples is:
<references>
<ref />
<ref />
</references>
This showed two identical, indistinguishable error messages before,
but will only show one now. We argue this is fine. The duplicates
are confusing and of (almost) no value to the user. In case the
information is relevant the correct solution is to make the error
messages distinguishable, or introduce a message like "multiple
<ref> tags defined in <references> have the same error". This is
something for a later patch, if needed.
Bug: T353266
Change-Id: I444105462ed24d5ba37b057622b4dc847b40f8d8
Same as Icfa8215 where we removed the …_suffix messages.
This patch is not blocked on anything according to CodeSearch:
https://codesearch.wmcloud.org/search/?q=cite_references%3F_link_prefix
According to GlobalSearch there are 2 usages we need to talk about:
https://global-search.toolforge.org/?q=.®ex=1&namespaces=8&title=Cite.references%3F.link.prefix.*
zh.wiktionary replaces "cite_ref-" with "_ref-", and "cite_note-"
with "_note-", i.e. they did nothing but remove the word "cite". This
happened in 2006, with no explanation.
ka.wikibooks and ka.wikiquote replace "cite_note-" with "_შენიშვნა-",
which translates back to "_note-". One user did this in 2007,
16 seconds apart.
It appears like both are attempts to localize what can be localized,
no matter if it's really necessary or not.
https://zh.wiktionary.org/wiki/Special:Contributions/Shibo77?offset=20060510https://ka.wikiquote.org/wiki/Special:Contributions/Trulala?offset=20070219
Note how one user experimented with an "a" in some of the edits to
see what effect the change might have, to imediatelly revert it.
The modifications don't really have an effect on anything, except on
the anchors in the resulting <a href="#_ref-5"> and <sup id="_ref-5">
HTML. It might also be briefly visible in the browser's address bar
when such a link is clicked. We can only assume the two users did this
to make the URL appear shorter (?). A discussion apparently never
happened. Bot users are inactive.
Both pieces of HTML are generated in the Cite code. Removing the
messages will change all places the same time. All links will
continue to work. The only possible effect is that hard-coded
weblinks to an individual reference will link to the top of the
article instead. But:
a) This is extremely unlikely to happen. There is no reason to link
to a reference from outside of the article.
b) Such links are not guaranteed to work anyway as they can break
for a multitude of other reasons, e.g. the <ref> being renamed,
removed, or replaced.
c) Even if such a link breaks, it still links to the correct article.
There is also no on-wiki code on zh.wiktionary that would do anything
with the shortened prefix:
https://zh.wiktionary.org/w/index.php?search=insource%3A%2F_%28ref%7Cnote%29-%2F&title=Special%3A%E6%90%9C%E7%B4%A2&profile=advanced&fulltext=1&ns2=1&ns4=1&ns8=1&ns10=1&ns12=1&ns828=1&ns2300=1
I argue this is safe to remove, even without contacting the mentioned
communities first.
Bug: T321217
Change-Id: I160a119710dc35679dbdc2f39ddf453dbd5a5dfa
This fixes a minor issue introduced in I294b59f. Two identical
dir="…" with different capitalizations should not be reported as an
error.
Turns out the implementation in the Cite extension doesn't care
about this capitalization at all. That's why I suggest to do the
normalization as early as possible. This is slightly different in
the Parsoid implementation.
Bug: T202593
Change-Id: I96b4a281d6020d61d1f36ec027cf833bbb244f03
Check out how this gets rid of so many "to do" as well as
"deprecated" comments.
Next qustion: The elements in the stack become more and more
complicated. It's probably worth converting them from arrays into
first-class objects. But this is for another patch.
Bug: T353266
Change-Id: If14acd1070617ca8c4d15be6b1759bd47ead4926
For example, use convenient upstream methods, and generally make the
test setup a bit more readable.
Bug: T353227
Change-Id: Ifab71041fcc3f804315793ca7b783f84829c7a0f
Same arguments as in Iafa2412. The one reason to use more detailled
per-method @covers annotations is to avoid "accidental coverage"
where code is marked as being covered by tests that don't assert
anything that would be meaningful for this code. This is especially a
problem with older, bigger classes with lots of side effects.
But all the new classes we introduced over the years are small, with
predictable, local effects.
That's also why we keep the more detailled @covers annotations for
the original Cite class.
Bug: T353227
Bug: T353269
Change-Id: I69850f4d740d8ad5a7c2368b9068dc91e47cc797
I wanted to make this a unit test but it turns out the
Sanitizer::safeEncodeAttribute() calls currently make this
impossible.
Bug: T353269
Change-Id: I5266e7b8b67db1c812dc9e4675d0c079ab1f9a40
This patch only moves existing code around without changing any
behavior. What I basically did was merging the old "guardedReferences"
method into "references", and then splitting the resulting code in
other ways. Now we see a few other concepts emerging. But the idea
something would be "guarded" (how?) is gone.
The most critical detail in this patch are the new method names, and
how the code is split. The names should tell a story, and the methods
should do exactly what the name says. Suggestions?
Bug: T353266
Change-Id: I8b7921ce24487e9657e4193ea6a2e3e7d7b0b1c3
This removes almost 200 lines from the main class.
This patch intentionally doesn't make any changes to the code but
only moves it around. Further improvements are for later patches.
Bug: T353269
Change-Id: Ic73f1b7458b3f7b7b89806a88a1111161e3cf094
The main benefit is that the two lines that set and reset
$this->inReferencesGroup are now next to each other. More can be
done in later patches.
Bug: T353266
Change-Id: Ib3f40c40e0b1854f8e5a32af600f28931fffdb8c
This moves the actual parsing down to be done much later in the
process. This won't make any difference in production but makes it
easier to refactor the code further.
Note I tried to use a StatusValue object but couldn't because it
merges seemingly identical messages, while the plain array is fine
with containing duplicates. There is one parser test that covers
this. While we could change this it needs discussion and most
probably a PM decision.
Change-Id: I7390b688a33dace95753470a927bbe4de43ea03a
Two problems:
1. Manipulating globals directly affects all following tests. They
are not independent from each other. This problem can be seen in
CiteTest.
2. Some test cases in testValidateRef don't test what you think.
For example, the test for a conflicting "extends" + "follow" was not
failing because of the conflict but because "extends" was disabled
and disallowed.
Change-Id: Iaa4e1f3f3222155d59984e577cba3f0b8dec40c3
This patch makes only sense together with I5a64ac4 where it is split
from. See I5a64ac4 for details.
The idea is that this patch just re-arranges the code without making
any changes to how the code behaves. This leaves a minimal change
behind that's much easier to revert, if needed.
Bug: T298278
Change-Id: Ie78313b7f3ac1ec7bce5ac7512e60a3bb011480a
This patch does two things:
1. The "normalization" function was never only doing normalization,
but also all the necessary HTML encoding. This is now more visible
and split into two separate functions.
2. To make this easier we change the order slightly. Because of this
the normalization step must now consider spaces. Before spaces have
been converted to underscores by escapeIdForLink.
The results are all the exact same as before.
This is split from I5a64ac4 to make that easier to review.
Bug: T298278
Change-Id: I9435a2ddaa21559e29587c58b7523103141467f7
This parser test is a bit obscure, in my opinion. We added it in
I8c4de96 to make sure we don't get thousand separators in most
places.
We continued reworking the code since then. By now it's effectively
impossible to "accidentally" get thousand separators. The
problematic methods from the Language class are not even accessible
any more from this code.
To make the tests more robust we now use createNoOpMock (done via
the previous patch) where it matters, specifically for all Language
and Parser mocks. This proves the problematic Language methods are
never called.
Bug: T253743
Bug: T238187
Change-Id: I9bfe1f4decfaf699996da63e19473c2c0d581d9d
Both Language and Parser are extremely complex classes with hundreds
of public methods. We really want to make sure we are not depending
on anything unexpected from these classes. If calls are made into
these classes we want to know exactly what is called.
Doing this also showed that some mocked methods are not even needed.
Change-Id: Icdfff6c07be78a47bf7cadb1813a72581a51272a
This is a mistake that exists in this codebase for who knows how
long.
Cite mis-uses the messaging system a lot for internal things we still
want to customize somehow, but are not labels that will ever be shown
on the screen. The prefix/suffix messages in this patch are meant to be
part of the HTML in id="…" attributes. Prefix/suffix must be a static
plain text strings. Using e.g. {{GENDER}} or {{PLURAL}} in these
messages is not even possible because there is no $1 parameter to use.
Note how all other similar messages already use ->plain().
A few wikis override these messages, but stick to the plain-text
convention, as they should:
https://global-search.toolforge.org/?q=.®ex=1&namespaces=8&title=Cite.*reference.*fix
This will continue to work.
This has minor performance implications. Fetching these messages is
faster if we can skip transformations.
Bug: T321217
Change-Id: I7969c255fe4ce897e904897081da5f52678721aa
This is mostly because recent IDEs can understand createMock() quite
good. We usually don't add such hints every time we use createMock().
We would have a million of them. ;-)
Change-Id: If9e37807a6945c4408d374fc97664cd636020ffd