mirror of
https://gerrit.wikimedia.org/r/mediawiki/extensions/Cite
synced 2024-11-24 06:54:00 +00:00
Refine and fix "unclosed <ref> detected" regular expression
This simplifies as well as fixes a series of issues with this regular expression: * Before, the wikitext `<REF><REF>` would not trigger the error, but `<ref><ref>` would. Parser tags are case-insensitive, but the error check was not. * Before, the wikitext `<ref><ref name="<">` would not trigger the error. That's a valid name. The error check should not stop just because it found a `<`. * Both the old and the new code do *not* fail with the wikitext `<ref><ref</ref>` where the inner `<ref` does not have a closing `>`. I was thinking about changing this, but figured it might be used as a feature. * The old code was not able to properly understand HTML comments, <nowiki> tags and such that contain a line break. That caused inconsistent and confusing error reporting in some cases, but not in others. This change *reduces* the amount of errors this code produces. * The old code was looking for "SGML tags" with names that could be anything, not just alphanumeric characters. This allowed for strange edge-cases like `<ref><>><ref></>></ref>` that have not been reported, but should be. This change *increases* the amount of errors. However, relevant edge-cases should be extremely rare. Note the ++ avoids backtracking, speeding up the regex. Change-Id: I0c61a245f4f743871b4cad886ce239650af2b37c
This commit is contained in:
parent
a3c589ac42
commit
a7ee7c9586
|
@ -241,8 +241,10 @@ class Cite {
|
|||
}
|
||||
}
|
||||
|
||||
if ( preg_match( '/<ref\b[^<]*?>/',
|
||||
preg_replace( '#<([^ ]+?).*?>.*?</\\1 *>|<!--.*?-->#', '', $text ) ) ) {
|
||||
if ( preg_match(
|
||||
'/<ref\b.*?>/i',
|
||||
preg_replace( '#<(\w++).*?>.*?</\1\s*>|<!--.*?-->#s', '', $text )
|
||||
) ) {
|
||||
// (bug T8199) This most likely implies that someone left off the
|
||||
// closing </ref> tag, which will cause the entire article to be
|
||||
// eaten up until the next <ref>. So we bail out early instead.
|
||||
|
|
Loading…
Reference in a new issue