Remove expensive regular expression that doesn't have any effect

See, this last part of the compiled regular expression is wrapped in
an (…)*, which means it is entirely optional. It does not make any
difference if this part is found or not. The compiled regular
expression matches with or without any of these "line ending"
fragments being present.

I can not really figure out what the intention of this was. A line
ending anchor ($) is not missing – I'm pretty sure about this.
Otherwise it could not detect signatures that are wrapped in more
than a single HTML tag, for example.

Instead of fixing it I decided to remove it. The tests should show
this code was not needed.

The motivation for this patch is to improve performance. This part of
the regular expression is quite heavy and can cause a lot of
backtracking for literally zero benefit.

Bug: T203930
Bug: T204291
Change-Id: Ia5323b401b947edeb7094d7eec131ba6c80edf70
This commit is contained in:
Thiemo Kreuz 2018-09-25 18:15:49 +02:00
parent 7d2aa4a596
commit acba72e011

View file

@ -871,10 +871,9 @@ abstract class EchoDiscussionParser {
*/
static function getTimestampPosition( $line ) {
$timestampRegex = self::getTimestampRegex();
$endOfLine = self::getLineEndingRegex();
$tsMatches = [];
if ( !preg_match(
"/$timestampRegex$endOfLine/mu",
"/$timestampRegex/mu",
$line,
$tsMatches,
PREG_OFFSET_CAPTURE
@ -1121,26 +1120,6 @@ abstract class EchoDiscussionParser {
return User::getCanonicalName( $userMatches[0], false );
}
/**
* Gets a regular expression fragmentmatching characters that
* can appear in a line after the signature.
*
* @return string regular expression fragment.
*/
static function getLineEndingRegex() {
$ignoredEndings = [
'\s*',
preg_quote( '}' ),
preg_quote( '{' ),
'\<[^\>]+\>',
preg_quote( '{{' ) . '[^}]+' . preg_quote( '}}' ),
];
$regex = '(?:' . implode( '|', $ignoredEndings ) . ')*';
return $regex;
}
/**
* Gets a regular expression that will match this wiki's
* timestamps as given by ~~~~.