With this change the <ol> is added after the inner list of <li> was
parsed as wikitext. In other words, the outer <ol> is now raw HTML
and not sanitized any more. This is fine because it's generated via
code. It doesn't contain any user input.
You might ask how it is possible to parse "invalid" HTML that
contains a sequence of <li> without the outer <ol>. This is fine
because the wikitext parser doesn't care about the nesting structure
of HTML elements. This is done later by Remex (Tidy). But Remex is
never called here. What we care about here is that the wikitext
parser sanitizes the individual HTML elements and their attributes.
The <ol> doesn't need sanitization.
This will make it possible to use reserved data attributes for
T196828. A bazillion unit and parser tests prove that this change
doesn't have any unwanted side-effects.
Bug: T196828
Bug: T239572
Change-Id: I0a9d419f48cad5ddb7251c8fdd2cf9506649436b
I'm importing the current keys from the Popups extension without
renaming. After that's done keys can be renamed and we can take
care off the fallback solutions.
Bug: T363156
Change-Id: I788c16c5bddc0df7f00dbbc39625b9adaa5bf184
This patch adds 'mw-cite-backlink' to the linkback span for both
named and unnamed refs. This requires us to add a span wrapper
for the unnamed refs case.
Verified in local testing that this causes aria attributes to be
added to the linkback tags in Parsoid HTML.
This should likely fix other gadgets and code that rely on this
class name to do their work.
Strictly speaking, this is a breaking change since we add an
extra span wrapper for the unnamed ref backlinks which *could*
break anyone using a li > a[rel="mw:referencedBy"] selector.
But, given the specificity of the a[rel] selector, the "li >"
part is unnecessary and might not be used. So, if we wanted to
push our luck (and break process), we could get this in.
Alternatively, we could:
- do this in the the read views OutputTransformPipeline.
- do a real major version bump -- we would be exercising that
functionality and have to fix and implement any missing pieces
that may have broke as part of the RESTBase sunsetting.
- not add the span wrapper and fix gadgets to explicitly look for
both named and unnamed refs with their selectors.
Bug: T328695
Change-Id: Icbd325ebd12cb42186c5b5220dc016835eb18b64
Page property is removed immediately since $wgCiteBookReferencing has
never been enabled in production.
Bug: T239989
Change-Id: I6252fcf1485994244dca40470cc5955e8d4f6917
This adds the 'reference-text' class where Parsoid added
'mw-reference-text'.
If we don't care about the "mw-" prefix, since there are very
few wiki and code references to 'mw-reference-text', it might
seem like we could update all those references and rip out
'mw-reference-text' from Parsoid output.
But, Parsoid HTML is also exposed via the REST API which means
there are likely many users out there analyzing Parsoid HTML.
https://github.com/search?q=%22mw-reference-text%22+NOT+language%3AHTML&type=code
says there are 512 references to this string - so looks like
we are probably going to rely on a major HTML version bump in
Parsoid in the future and then rip out all the duplicate
classes (mw-ref, mw-references, mw-reference-text OR
reference, references, reference-text).
Bug: T328695
Change-Id: I04b18ac75863a0e3e61bdd47b34508e5547dc872
The two are different:
* CiteReferencePreviews as specified in extension.json is a feature
flag that allows us to disable the feature entirely. It could be
named "CiteReferencePreviewsFeature" or "CiteEnableReferencePreviews",
but renaming a feature flag that's already in use is hard.
* The client-side flag tells the JavaScript code "you know what, it
was kind of a mistake you got loaded, please stop". This is because
we can not make all decisions before we register the ResourceLoader
module, e.g. if the user has certain gadgets enabled.
Adding the word "Active" is not a huge improvement, but at least
makes the two different now. Suggestions welcome.
Bug: T362771
Change-Id: I0f6a911df8772616ac50c1301f402f77dbe32089
PHP classes and test are somewhat copies from the Popups codebase.
Some refactoring was applied. More could be done. Not to sure if
this should happen more in follow ups though.
Could also reduce the complexity of checks on the JS side. Most of
these things can only change on page load. The only dynamic part
left is the anon user setting managed by the Popups extension.
Note, that I needed to add a new PHP config for here although the
other still exists and is needed in the Popups extension. This
will change, when the user settings code also moves.
I guess it's okay for now though. Both settings default to true
and are not overridden in the config repos.
Also needed to add the Gadget extension as phan dependency.
Bug: T362771
Depends-On: Ia028c41f8aaa1c522dfc7c372e1ce51e40933a5e
Change-Id: Ie6e8bc706235724494036c7f0d873f5c996c46e6
I50557e0 changes the ExtensionTagHandler::lintHandler interface so it
can't return a node any more but must return true instead. True
indicates linting has been handled.
Depends-On: I441699e7fe9827a5e06e4638ce88c685deb9b856
Change-Id: I8fe4edc41a840c72cb539bf6f931d45ac777f8a0
The ::setPageProperty() method has some tricky corner cases where the
type of the value determines whether or not the page property will be
sorted. Since sort order for the BOOK_REF_PROPERTY is irrelevant,
use ::setUnsortedPageProperty() to communicate this clearly to the
reader.
Depends-On: Ia94c192c429d0482c58467bed787fd2e0aca052f
Followup-To: Ibfd84b52057baa8e249d321ec9df612efd6a29a6
Change-Id: I399f4895ec8720ff2927c5cd5a09c7af4664ee46
Whether the dynamic property is present or not, it should have a null
value when 'unset' -- and don't use `unset` to delete an *actual*
property when one is present!
Change-Id: Ifcb9492cc5c814d702c6e61e8231abfd8ea0647c
* Improve PHPDoc documentation.
* Add some missing language-level type declarations.
* Remove unused code.
* Avoid count() when the code doesn't care about the actual count.
* Use the short ??= operator where possible.
Change-Id: I79de49b65d32661b7efa67ecc350276968943e11
This does the exact same as the previously used generic stdClass
object, just strictly typed. Turned out to be surprisingly
straightforward, as proven by the small size of this patch.
I'm intentionally not adding anything new in this patch. For
example, the new class is perfect to write longer documentation
for every field. But this is for a later patch.
Change-Id: Ibf696f6b5ef1bfdbe846b571fb7e9ded96693351
The ext.cite.referencePreviews module will transparently replace the
ext.popups.referencePreviews module after this patch. Configuration
stays in Popups for now, we can migrate it in later work.
CSS classes may be renamed in the future but this will be handled
separately since it could be a breaking change for on-wiki
customizations.
A lot of fancy footwork happens in this patch to emulate a soft
dependency on Popups. This mechanism doesn't exist explicitly in
either ResourceLoader or QUnit, so lots of workarounds are used, to
conditionally load the module and to dynamically skip dependent tests.
renderer.test.js is fully skipped for now, but can be wired up in
later work.
Bug: T355194
Change-Id: I0dc47abb59a40d4e41e7dda0eb7b415a2e1ae508
The first two files have been added to the root modules/ directory
via I487095d in 2015. No problem.
Many, many more files have been added via I000b453 in 2022. It's
really hard to tell what is what since then.
I'm not absolutely sure what the naming convention for this folder
should be. Could as well be "localized-styles/" or just "Parsoid/".
Bug: T156350
Change-Id: Ibcf8c7a6db5400ed8a9811244a070e03ff372a39
The information read from the …cite-tool-definition.json files is
effectively user input, even if only interface administrators can
edit it. Usually we carefully validate user input. But as of now
this code starts failing with all kinds of uncatched errors.
* An entry with no name, an empty name, or a name that's not a
string will cause all kinds of undefined behavior.
* An entry with an empty title results in an invisible button.
* A missing message results in a technical <…> placeholder, even if
the name is usually a sensible fallback.
Note that hard-coding titles as plain text strings in the ….json
file was already possible.
Change-Id: Iddcedbe859e86ac4c3f79a53d36237daff86c0db
The message was part of the original patch that introduced the group
feature in 2009, see https://phabricator.wikimedia.org/rECIT75004e33.
Notice how there was never a test scenario for this message. A test
was added in 2020 via I07738cc.
The message appears only in a rare edge-case when a group is entirely
unused in the text, and only when the group is not empty. The shortest
possible example is:
<references group=g>
<ref group=g name=a>a</ref>
</references>
Just adding something unrelated like `<ref group=g>x</ref>` to the
text changes the error message. Now the group is "used". But this
notion is confusing to begin with. References can be part of a group,
and we can use references, but we can't use groups as if they are a
separate entity.
A better error message already exists.
Notice how this special error message doesn't appear anywhere in the
Parsoid code path. That was already using the other, more generic
error message.
Bug: T269531
Change-Id: I63f663d76e45e6c3d664f145d9a564ee00ff53cd
This is about the error message that currently says:
»Cite error: <ref> tag with name "a" defined in <references> has
group attribute "" which does not appear in prior text.«
This is a special error message that appears only when a group name
does not appear anywhere in the text. In all other cases a simpler
error message is shown:
»Cite error: <ref> tag with name "a" defined in <references> is
not used in prior text.«
While the first error message is not wrong in the edge-case
described in T269531, it's very confusing for a multitude of
reasons. For example:
* There is no group attribute in the wikitext.
* Just adding something completely unrelated like `<ref>x</ref>` to
the text shows the other error message.
The reason for this behavior is that the assumed default is an empty
`group=""`. The error message changes the moment any other <ref> in
the same group appears in the text vs. when the group is entirely
unused.
We can probably remove this error message entirely, but should at
least not use it when there is no group.
Notice how the Parsoid code path was already using the other error
message.
Bug: T269531
Change-Id: Ifa2e97254f4cda72233a057d8760fb1116143552
This was added in 2006 via commit eb3a3f78, see
https://phabricator.wikimedia.org/rECITeb3a3f78
Hard to tell what happened back then. It's obviously not needed any
more, as proven by the tests. I mean, even if there would be an
extra newline character, it would be irrelevant at the end of an
<ol>…</ol>.
Change-Id: I5715cd9f31ac7ef86c1ea227642336ae71684291
I always found the name a little ambiguous. The fact that it outputs
an actual HTML list and not just some "references" – whatever that
means – is relevant, in my opinion.
Change-Id: I0d169455c8d2b42d62da4dccb8376c09fb6902bc
… as well as "cite_warning". Both are extremely trivial and don't
really do anything by default. All they do is to add the prefix
"Cite error:" or "Cite warning:" to all error messages.
This patch will make it possible to disable both messages by
default, i.e. replace their default in en.json with "-" without
breaking anything. That's part of the plan outlined in T353695.
Local on-wiki overrides will continue to work.
Bug: T353695
Change-Id: I374800d0d0b837cd17ed3a1fdde20b70325b06de
This commit also moves certain parser tests involving <ref> from
the Parsoid repo to citeParserTests.txt in this repo.
Bug: T354215
Change-Id: Ie5b211d2af01a56684473723c68a9ab2775542e3
The namespace change avoids a conflict with the existing Parsoid
implementation in Wikimedia\Parsoid\Ext\Cite and matches the current
Cite codebase better. We also need to add some phan stubs to allow
Cite to use Parsoid's generic DOM implementation classes, and some
type assertions to satisfy phan.
Bug: T354215
Change-Id: Ic904601b29555c9485a804f131061f207970ddd4
Parsoid's phpcs configuration is slightly different from the one in
this repository; this commit just keeps CI happy with the imported
code.
Change-Id: I9ce2993e8a9416f331b5157dfcfb01fb6e31baaf
Further commits will be necessary to complete the migration, but
this merge commit imports all of the existing history of the Cite
extension. It was generated using the following command on a checkout
of Parsoid:
git filter-repo --path src/Ext/Cite --path src/lib/ext/Cite \
--path lib/ext/Cite --path lib/ext/Cite.js --path lib/ext.Cite.js \
--path js/lib/ext.Cite.js --path modules/parser/ext.Cite.js \
--tag-rename '':'parsoid-' \
--path-rename src/Ext/Cite:src/Parsoid \
--path-rename src/lib/ext/Cite:src/Parsoid
And then, in the Cite repository:
git remote add parsoid ../path/to/parsoid/checkout
git merge parsoid/master --allow-unrelated-histories
Bug: T354215
Change-Id: I54edd9cf7951ca024c66fe357e8777eed85ab13b
Same as I294b59f in the Cite codebase.
An additional, necessary change is that we need to track all dir="…"
values in the ReferencesData object, even if we aren't going to use
the value from a <ref name="…" dir="…" /> reuse without content.
This is the same what's done in the ReferenceStack in the Cite
codebase.
Bug: T202593
Depends-On: I294b59f989f553932b40d08308906dd72d92d2cd
Change-Id: Ida38ae6a41e8550089cf7a37a549080d17943521
This test was obscured by testing for a field on the parent, but that
would exist if and only if the parent also existed. Clarify the
guard condition and introduce a named local variable for the parent.
Change-Id: I03079f45cf5ba00d54642c89ac4232a944b2f353
Such a message shouldn't exist, and doesn't:
https://global-search.toolforge.org/?q=.®ex=1&namespaces=8&title=Cite+link+label+group-
Additional notes:
* Rename the method to make it more obvious that it's not a cheap
getter, but doing something slightly more expensive.
* Use more appropriate array_key_exists to check if a cache entry
already exists.
* Also add a bit more documentation.
Bug: T297430
Bug: T353227
Change-Id: Ia5827bbf6fd700b87a749aac17320796428f0688
This encapsulation gives us field name, type validation and code
documentation.
This patch only affects ReferenceStack and continues to return
approximately the same array outputs to callers. Some additional
information is included and the placeholder column has a new name.
Bug: T353451
Change-Id: I405fe7ac241f6991fd4c526bfbb58fbc34f2e147
The placeholder field will only be set if the ref exists, so we can
put these in a more logical order.
Change-Id: I2ddfb501fcc3aca936bb45c0d40e4f68c5d2b192
The previous patch deprecated the last conditional depending on magic
meanings of 0 and -1, so now we're free to let "count" take on a more
natural meaning: the number of times a footnote mark appears in
article text.
Includes a small hack to avoid changing parser output, by
artificially decrementing the count by one during rendering. The
hack can be removed and test output updated in a separate patch.
Bug: T353227
Change-Id: I6f76c50357b274ff97321533e52f435798048268
Stop relying on the magic number distinction between "count" = 0 and -1,
by explicitly testing the "name" field instead.
Bug: T353227
Change-Id: I9dce16b01814e19f508d45b927de570049f0e0f5
These can be hard to read so this patch introduces named, temporary
variables.
PHP reference assignment is helpful here, and has the nice property
of responding correctly to `isset` as if it were called on the
referenced variable. However, we're prevented from using this trick
in more places in the code because of an unfortunate side-effect that
PHP will store `null` under the referenced array key. In some cases
(the ones here), this is harmless because we always test using
`isset` and null behaves the same as an unset value. In other cases
such as arrays that are iterated over, the spurious key and null
value would be more of a nuisance.
Bug: T353227
Change-Id: Ie43592a2f10677ba19842e92fa29eb4bf3be240c
Encapsulate all information about a ref inside of the internal
structure, rather than relying on the container to be organized by
group.
Bug: T353451
Change-Id: I4c91e8089638b7655bf120402a4a5fcbd1b35452
In this case, there was never a ref with this name in the article so
no backlinks should be rendered.
TODO:
* test case with empty parent backlink and LDR parent
Bug: T353451
Change-Id: I8a7abd05a48ce83da3beb92b15e894d53252bd33
This is another improvement after I7390b68. Status objects are made
to keep track of multiple errors. The only difference is: The merge
method skips duplicates when the message and all parameters are
identical. This causes a minor user-facing change. One of the
shortest possible examples is:
<references>
<ref />
<ref />
</references>
This showed two identical, indistinguishable error messages before,
but will only show one now. We argue this is fine. The duplicates
are confusing and of (almost) no value to the user. In case the
information is relevant the correct solution is to make the error
messages distinguishable, or introduce a message like "multiple
<ref> tags defined in <references> have the same error". This is
something for a later patch, if needed.
Bug: T353266
Change-Id: I444105462ed24d5ba37b057622b4dc847b40f8d8
Same as Icfa8215 where we removed the …_suffix messages.
This patch is not blocked on anything according to CodeSearch:
https://codesearch.wmcloud.org/search/?q=cite_references%3F_link_prefix
According to GlobalSearch there are 2 usages we need to talk about:
https://global-search.toolforge.org/?q=.®ex=1&namespaces=8&title=Cite.references%3F.link.prefix.*
zh.wiktionary replaces "cite_ref-" with "_ref-", and "cite_note-"
with "_note-", i.e. they did nothing but remove the word "cite". This
happened in 2006, with no explanation.
ka.wikibooks and ka.wikiquote replace "cite_note-" with "_შენიშვნა-",
which translates back to "_note-". One user did this in 2007,
16 seconds apart.
It appears like both are attempts to localize what can be localized,
no matter if it's really necessary or not.
https://zh.wiktionary.org/wiki/Special:Contributions/Shibo77?offset=20060510https://ka.wikiquote.org/wiki/Special:Contributions/Trulala?offset=20070219
Note how one user experimented with an "a" in some of the edits to
see what effect the change might have, to imediatelly revert it.
The modifications don't really have an effect on anything, except on
the anchors in the resulting <a href="#_ref-5"> and <sup id="_ref-5">
HTML. It might also be briefly visible in the browser's address bar
when such a link is clicked. We can only assume the two users did this
to make the URL appear shorter (?). A discussion apparently never
happened. Bot users are inactive.
Both pieces of HTML are generated in the Cite code. Removing the
messages will change all places the same time. All links will
continue to work. The only possible effect is that hard-coded
weblinks to an individual reference will link to the top of the
article instead. But:
a) This is extremely unlikely to happen. There is no reason to link
to a reference from outside of the article.
b) Such links are not guaranteed to work anyway as they can break
for a multitude of other reasons, e.g. the <ref> being renamed,
removed, or replaced.
c) Even if such a link breaks, it still links to the correct article.
There is also no on-wiki code on zh.wiktionary that would do anything
with the shortened prefix:
https://zh.wiktionary.org/w/index.php?search=insource%3A%2F_%28ref%7Cnote%29-%2F&title=Special%3A%E6%90%9C%E7%B4%A2&profile=advanced&fulltext=1&ns2=1&ns4=1&ns8=1&ns10=1&ns12=1&ns828=1&ns2300=1
I argue this is safe to remove, even without contacting the mentioned
communities first.
Bug: T321217
Change-Id: I160a119710dc35679dbdc2f39ddf453dbd5a5dfa
This fixes a minor issue introduced in I294b59f. Two identical
dir="…" with different capitalizations should not be reported as an
error.
Turns out the implementation in the Cite extension doesn't care
about this capitalization at all. That's why I suggest to do the
normalization as early as possible. This is slightly different in
the Parsoid implementation.
Bug: T202593
Change-Id: I96b4a281d6020d61d1f36ec027cf833bbb244f03
* Same as Ie64f4ab in the Cite codebase.
* Mark the changed tests as standalone since this Parsoid code isn't yet
released to vendor and integrated tests run with vendor.
Bug: T299280
Depends-On: Ie64f4ab4831966f66f812ea67cc244718f818afb
Change-Id: I0ea1bc3f57576d215ba4060a0e886e588ffda0b3
Internal ref key is always an int, but another string `key` is
created in the formatters. This patch makes the typing explicit. We
can distinguish between these two different values in a later patch.
Bug: T353451
Change-Id: Id5e40517705961f4d54622e91264430d9f62008d
Thanks to strict types and a recent MediaWiki CodeSniffer update a
lot of the PHPDoc comments in this codebase became redundant. Only
very few comments in this codebase contain additional information.
Such comments don't add any new information to what the code alone
already says. We started removing them in many other codebases
already.
In case someone wants to add more documentation to a method the
basic PHPDoc block can usually automatically be generated with a
button press in the IDE.
The only additional change in this patch is that I occasionally
add a missing `void` return type. This is necessary to be able to
remove the comment.
Change-Id: Id7d6d6a437175a9d017f564daf7ed16e76f09158
This is doing the same as before, in pretty much the same execution
order. The only difference is the syntax.
In JavaScript it's relevant to not do array initializations to early.
Otherwise different instances share the same array. But this doesn't
happen in PHP.
Change-Id: I56363ccadf29f2b806f765ab8f54a3c1863fc10f
I'm not sure how much this helps. But this merges two code paths
that are both about "we are in the middle of a <references> section
right now.
Nothing changes, as proven by the tests.
Bug: T353266
Change-Id: I446e224b81d35c47736a437d78527c0cc8636f77
This classifies as a "warning" because we still show everything,
just with an error message appended.
Disabling the Parsoid tests right away hopefully makes it easier to
do the same change in Parsoid.
Bug: T202593
Depends-On: If14acd1070617ca8c4d15be6b1759bd47ead4926
Change-Id: I294b59f989f553932b40d08308906dd72d92d2cd
By now I'm sure this really doesn't belong here. The code in the Cite
extension is doing this because it generates HTML by concatenating
plain strings. In such a context the necessary HTML entity encoding
(" and such) must be done manually. Here in the Parsoid context
this is not needed.
This is split from I7249bd0. See the discussion over there.
Change-Id: I5589e5c2147bfc9f205a0ff80d8bdd247ab49c63
* This partly replicates the fixes in I9435a2d and Ia01f2fd. More
to be done in later patches.
* Updated html/parsoid test output (which matches the change in the
html/php section).
Depends-On: I401656265253a429691cc76adc5db5b129cff2cc
Change-Id: I7249bd03a7942ff7725a20178a051300b777e3a8
This moves one more error situation into the stack class, together
with other error situations that are already there.
Bug: T353266
Change-Id: Icf169650f67f64e6d29d175c3b47cf558b8de3d4
Check out how this gets rid of so many "to do" as well as
"deprecated" comments.
Next qustion: The elements in the stack become more and more
complicated. It's probably worth converting them from arrays into
first-class objects. But this is for another patch.
Bug: T353266
Change-Id: If14acd1070617ca8c4d15be6b1759bd47ead4926
We are discussing this for a long time and finally renamed the tag
on Phabricator: https://phabricator.wikimedia.org/tag/cite-extends
This patch updates only places where it can't have any negative
consequences.
This is also a direct follow-up to Ic73f1b7 where this class was
created.
Bug: T353269
Change-Id: I644fe41d3386b9bf02b83366654301633efd535f
Same arguments as in Iafa2412. The one reason to use more detailled
per-method @covers annotations is to avoid "accidental coverage"
where code is marked as being covered by tests that don't assert
anything that would be meaningful for this code. This is especially a
problem with older, bigger classes with lots of side effects.
But all the new classes we introduced over the years are small, with
predictable, local effects.
That's also why we keep the more detailled @covers annotations for
the original Cite class.
Bug: T353227
Bug: T353269
Change-Id: I69850f4d740d8ad5a7c2368b9068dc91e47cc797
This is a concept that's only relevant when a sub-reference (formerly
known as BookReferencing) appears before the parent reference it
belongs to. Let the name reflect this.
Bug: T353227
Change-Id: Iabf259e72942ea70cb1cc1e0ca5a5d8cf15d7225
This patch only moves existing code around without changing any
behavior. What I basically did was merging the old "guardedReferences"
method into "references", and then splitting the resulting code in
other ways. Now we see a few other concepts emerging. But the idea
something would be "guarded" (how?) is gone.
The most critical detail in this patch are the new method names, and
how the code is split. The names should tell a story, and the methods
should do exactly what the name says. Suggestions?
Bug: T353266
Change-Id: I8b7921ce24487e9657e4193ea6a2e3e7d7b0b1c3
This removes almost 200 lines from the main class.
This patch intentionally doesn't make any changes to the code but
only moves it around. Further improvements are for later patches.
Bug: T353269
Change-Id: Ic73f1b7458b3f7b7b89806a88a1111161e3cf094
Allow other extensions to provide lists of page content
models for which they want to load the Cite toolbar button.
This will, for example, make it possible for ProofreadPage
to have the button on Page pages.
Bug: T348403
Change-Id: Id28cb0b6cb8a2b86a66b17232575afe513969c54