This time, I've written a Parser method called serialiseHalfParsedText, which, as the name implies, grabs some half-parsed text, and fixes up all of the strip
markers, and link comments, and makes it safe to import some other time with unserialiseHalfParsedText. I tested it by live-hacking the cache key to be a constant,
and then putting <references /> on a completely different page, where it worked perfectly.
Calling it with no extra arguments will now assume that you're escaping
a whole id, not an id fragment, which is safer. Also, instead of ugly
bitfield-based options, I've changed the options to use an array of
strings. I fixed all callers in trunk. Out-of-tree callers that were
using Sanitizer::NONE will get correct behavior, while those that were
calling it with no arguments will get slightly changed behavior (an x
will be prepended). I think this is harmless enough that we can skip
back-compat cruft here.
This should cause no visible changes. No parser test regressions.
My last commit didn't catch anything but plain <ref>: <ref name="foo"> would be allowed. Fixed using a regex from the patch on bug 12757 by Max Semenik.
This basically uses the patch I posted to that bug two years ago. It's crude, but it should avoid the most common false positives while hopefully not causing too many false negatives. It should be possible to refine it to avoid even more false negatives, but for the time being, this will at least prevent most of the constant headaches that newbies get when chunks of articles vanish because they forgot a closing </ref>.
It isn't obviously clear how this is meant to be used, and there are no test cases for the newly added mode.
Newly added code doesn't match our code standards, making it harder to read, and contains mysterious things like "$argv=array_merge($argv);" which seem a bit odd.