Commit graph

5 commits

Author SHA1 Message Date
thiemowmde 89bd26fcf5 Skip URL encoding in id="…" attributes that aren't URLs
I played around with a few options (see patchset 1) but ended
introducing new terminology:

* "Backlink" describes the ↑ button down in the list of <references>
  that jumps back up into the article. The code was already using
  "backlink" in some places.
* "Backlink target" is the id="…" attribute up there, visible as the
  typical [1] in the article.
* I use "jump" to describe the idea that clicking the [1] jumps down
  to the full reference.
* "Jump target" is the id="…" down there in the list of <references>.
* "Jump link" is the same id, but encoded to be used as the href="…"
  attribute when clicking the [1].

I hope this makes sense. Suggestions welcome.

Another benefit is that "normalization" is really only normalization
now, not any URL and/or HTML encoding.

Bug: T298278
Change-Id: I5a64ac43aef895110b61df65b27f683b131886fb
2023-12-12 13:56:37 +00:00
thiemowmde f9bb125e4c Correctly encode non-breaking spaces in reference names
Note how this currently behaves. The user input is
<ref name="…&nbsp;…">
But what we get in the end is
<li id="…&#160;…">
This implies that the &nbsp; is decoded and re-encoded with a
slightly different entity encoding. (Note that &nbsp; and &#160;
and &#xa0; are all the same character.)

Also note how there is only an underscore in the href="…", but the
non-breaking space is gone. This is identical to what happens in
links and headlines. Try for example [[a&nbsp;_a]]. Multiple
underscores, non-breaking spaces, and normal spaces will be
normalized. We just do the same in the id="…" attributes.

Note this fixes only one of the issues listed in T298278.

Bug: T298278
Change-Id: Ia01f2fdd3b3e9ee6aaa9da60ca3386dcd5d6b1a0
2023-12-05 07:58:38 +01:00
C. Scott Ananian af2352e523 parser tests: Make !! config values JSON-compatible
Bug: T307720
Change-Id: Ib716c70bc47659701edfc572674b3e890e19605b
2022-05-11 21:05:55 -04:00
Thiemo Kreuz 7b30a165e4 Use correct Sanitizer method for id/fragment escaping
Note how only the HTML5 mode behavior changes, but nothing in
legacy mode.

Also note this does not 100% fix the issue. The esample with a
non-breaking space is still broken. But it's already much better
than before.

Bug: T298278
Change-Id: Idf50dad4219ff4c594a0cc15f63cb10fdac5ffb7
2022-01-03 16:23:45 +01:00
Thiemo Kreuz 83041449a7 Add parser & unit test cases for different $wgFragmentMode's
This is only to document the current status quo and make later patches
smaller and easier to review.

Bug: T298278
Change-Id: I6c78f4d3ee32de596f2b5ee081d56eaffb1cc7bd
2022-01-03 14:14:47 +01:00