Fix DiscussionParser failing in certain languages

It appears like the initial \h in this non-Unicode regular
expression matches parts of an UTF-8 character, destroying it.
This makes the final preg_match() in this method fail, when
$output is going to be used as a pattern.

Bug: T264922
Change-Id: Iaf240bc2e0808c2f57c1f8bab2589d3207915afe
This commit is contained in:
Thiemo Kreuz 2020-10-27 19:59:26 +01:00
parent b8d29d62ef
commit 8880df4123
2 changed files with 6 additions and 1 deletions

View file

@ -1155,7 +1155,7 @@ abstract class EchoDiscussionParser {
// Step 2: Generalise it
// Trim off the timezone to replace at the end
$output = $exemplarTimestamp;
$tzRegex = '/\h*\(\w+\)\h*$/';
$tzRegex = '/\h*\(\w+\)\h*$/u';
$tzMatches = [];
if ( preg_match( $tzRegex, $output, $tzMatches, PREG_OFFSET_CAPTURE ) ) {
$output = substr( $output, 0, $tzMatches[0][1] );

View file

@ -1013,6 +1013,11 @@ TEXT
$this->assertSame( 1, $match );
}
public function testTimestampRegex_T264922() {
$this->setMwGlobals( 'wgLanguageCode', 'skr' );
$this->assertIsString( EchoDiscussionParser::getTimestampRegex(), 'does not fail' );
}
public function testGetTimestampPosition() {
$line = 'Hello World. ' . self::getExemplarTimestamp();
$pos = EchoDiscussionParser::getTimestampPosition( $line );