mediawiki-extensions-TextEx.../includes/TextTruncator.php

<?php

namespace MediaWiki\Extension\TextExtracts;

use MediaWiki\MediaWikiServices;

/**
 * This class needs to understand HTML as well as plain text. It tries to not break HTML tags, but
 * might break pairs of tags, leaving unclosed tags behind. We can tidy the output to fix
 * this.
 *
 * @license GPL-2.0-or-later
 */
class TextTruncator {
	/**
	 * @var bool Whether to tidy the output
	 */
	private $useTidy;

	/**
	 * @param bool $useTidy
	 */
	public function __construct( bool $useTidy ) {
		$this->useTidy = $useTidy;
	}

	/**
	 * Returns no more than the given number of sentences
	 *
	 * @param string $text Source text to extract from
	 * @param int $requestedSentenceCount Maximum number of sentences to extract
	 * @return string
	 */
	public function getFirstSentences( $text, $requestedSentenceCount ) {
		if ( $requestedSentenceCount <= 0 ) {
			return '';
		}

		// Based on code from OpenSearchXml by Brion Vibber
		$endchars = [
			// regular ASCII
			'\P{Lu}\.(?=[ \n]|$)',
			'[!?](?=[ \n]|$)',
			// full-width ideographic full-stop
			'。',
			// double-width roman forms
			'．',
			'！',
			'？',
			// half-width ideographic full stop
			'｡',
		];

		$regexp = '/(?:' . implode( '|', $endchars ) . ')+/u';
		$res = preg_match_all( $regexp, $text, $matches, PREG_OFFSET_CAPTURE );

		if ( !$res ) {
			// Just return the first line
			$lines = explode( "\n", $text, 2 );
			return trim( $lines[0] );
		}

		$index = min( $requestedSentenceCount, $res ) - 1;
		[ $tail, $length ] = $matches[0][$index];
		// PCRE returns raw offsets, so using substr() instead of mb_substr()
		$text = substr( $text, 0, $length ) . $tail;

		return $this->tidy( $text );
	}

	/**
	 * Returns no more than a requested number of characters, preserving words
	 *
	 * @param string $text Source text to extract from
	 * @param int $requestedLength Maximum number of characters to return
	 * @return string
	 */
	public function getFirstChars( $text, $requestedLength ) {
		if ( $requestedLength <= 0 ) {
			return '';
		}

		$length = mb_strlen( $text );
		if ( $length <= $requestedLength ) {
			return $text;
		}

		// This ungreedy pattern always matches, just might return an empty string
		$pattern = '/^[\w\/]*>?/su';
		preg_match( $pattern, mb_substr( $text, $requestedLength ), $m );
		$truncatedText = mb_substr( $text, 0, $requestedLength ) . $m[0];
		if ( $truncatedText === $text ) {
			return $text;
		}

		return $this->tidy( $truncatedText );
	}

	/**
	 * @param string $text
	 * @return string
	 */
	private function tidy( $text ) {
		if ( $this->useTidy ) {
			$text = MediaWikiServices::getInstance()->getTidy()->tidy( $text );
		}

		return trim( $text );
	}

}
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+								<?php
-												Use extension namespace for TextExtracts

Change-Id: I01177e9bef0f25b6245ee3e93f605dc771642273

											
										
										
											2024-07-12 22:40:54 +00:00
+								namespace MediaWiki\Extension\TextExtracts;
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
-												Replace use of deprecated MWTidy class

Bump required MW to >= 1.36.0

Change-Id: Ida40e6c1d84eec0e51e53f6aa98ac9f09fd52666

											
										
										
											2021-10-20 18:33:06 +00:00
+								use MediaWiki\MediaWikiServices;
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+								/**
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+								 * This class needs to understand HTML as well as plain text. It tries to not break HTML tags, but
-												Tidy is no longer configurable in MW 1.35

Remove use of deprecated MWTidy::isEnabled() and internal
MWTidy::singleton() methods.  See I3584181070da7ed4888beaaf04e083114aca1eab
for context.

Bug: T198214
Change-Id: I511068cc7b2398773a837f66e08def206cbb5626

											
										
										
											2020-05-02 05:25:10 +00:00
+								 * might break pairs of tags, leaving unclosed tags behind. We can tidy the output to fix
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+								 * this.
 								 *
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+								 * @license GPL-2.0-or-later
 								 */
 								class TextTruncator {
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+									/**
-												Tidy is no longer configurable in MW 1.35

Remove use of deprecated MWTidy::isEnabled() and internal
MWTidy::singleton() methods.  See I3584181070da7ed4888beaaf04e083114aca1eab
for context.

Bug: T198214
Change-Id: I511068cc7b2398773a837f66e08def206cbb5626

											
										
										
											2020-05-02 05:25:10 +00:00
+									 * @var bool Whether to tidy the output
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+									 */
-												Tidy is no longer configurable in MW 1.35

Remove use of deprecated MWTidy::isEnabled() and internal
MWTidy::singleton() methods.  See I3584181070da7ed4888beaaf04e083114aca1eab
for context.

Bug: T198214
Change-Id: I511068cc7b2398773a837f66e08def206cbb5626

											
										
										
											2020-05-02 05:25:10 +00:00
+									private $useTidy;
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
 									/**
-												Tidy is no longer configurable in MW 1.35

Remove use of deprecated MWTidy::isEnabled() and internal
MWTidy::singleton() methods.  See I3584181070da7ed4888beaaf04e083114aca1eab
for context.

Bug: T198214
Change-Id: I511068cc7b2398773a837f66e08def206cbb5626

											
										
										
											2020-05-02 05:25:10 +00:00
+									 * @param bool $useTidy
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+									 */
-												Tidy is no longer configurable in MW 1.35

Remove use of deprecated MWTidy::isEnabled() and internal
MWTidy::singleton() methods.  See I3584181070da7ed4888beaaf04e083114aca1eab
for context.

Bug: T198214
Change-Id: I511068cc7b2398773a837f66e08def206cbb5626

											
										
										
											2020-05-02 05:25:10 +00:00
+									public function __construct( bool $useTidy ) {
 										$this->useTidy = $useTidy;
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+									}
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+									/**
 									 * Returns no more than the given number of sentences
 									 *
 									 * @param string $text Source text to extract from
 									 * @param int $requestedSentenceCount Maximum number of sentences to extract
 									 * @return string
 									 */
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+									public function getFirstSentences( $text, $requestedSentenceCount ) {
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+										if ( $requestedSentenceCount <= 0 ) {
 											return '';
 										}
 										// Based on code from OpenSearchXml by Brion Vibber
 										$endchars = [
 											// regular ASCII
-												Fix truncate code potentially removing whitespace from extract

By turning the (?:…) into (?=…) they become lookaheads and are not
part of the returned string in $tail any more. This is exactly what we
want here. All we want is to *know* if the dot, question or exclamation
mark is followed by a space. But we don't need the space captured.

Change-Id: I4be715c4c084165e5ab25da77609f12ffce4d385

											
										
										
											2019-03-19 17:28:58 +00:00
+											'\P{Lu}\.(?=[ \n]|$)',
 											'[!?](?=[ \n]|$)',
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+											// full-width ideographic full-stop
 											'。',
 											// double-width roman forms
-												Fix truncate code potentially removing whitespace from extract

By turning the (?:…) into (?=…) they become lookaheads and are not
part of the returned string in $tail any more. This is exactly what we
want here. All we want is to *know* if the dot, question or exclamation
mark is followed by a space. But we don't need the space captured.

Change-Id: I4be715c4c084165e5ab25da77609f12ffce4d385

											
										
										
											2019-03-19 17:28:58 +00:00
+											'．',
 											'！',
 											'？',
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+											// half-width ideographic full stop
 											'｡',
 										];
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+										$regexp = '/(?:' . implode( '|', $endchars ) . ')+/u';
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+										$res = preg_match_all( $regexp, $text, $matches, PREG_OFFSET_CAPTURE );
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+										if ( !$res ) {
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+											// Just return the first line
 											$lines = explode( "\n", $text, 2 );
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+											return trim( $lines[0] );
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+										}
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
 										$index = min( $requestedSentenceCount, $res ) - 1;
-												build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0

Change-Id: I7a3887f4fac7c4e78e0828fca52d0c355357e5b1

											
										
										
											2024-03-12 19:22:40 +00:00
+										[ $tail, $length ] = $matches[0][$index];
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+										// PCRE returns raw offsets, so using substr() instead of mb_substr()
-												Fix truncate code potentially removing whitespace from extract

By turning the (?:…) into (?=…) they become lookaheads and are not
part of the returned string in $tail any more. This is exactly what we
want here. All we want is to *know* if the dot, question or exclamation
mark is followed by a space. But we don't need the space captured.

Change-Id: I4be715c4c084165e5ab25da77609f12ffce4d385

											
										
										
											2019-03-19 17:28:58 +00:00
+										$text = substr( $text, 0, $length ) . $tail;
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
 										return $this->tidy( $text );
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+									}
 									/**
 									 * Returns no more than a requested number of characters, preserving words
 									 *
 									 * @param string $text Source text to extract from
 									 * @param int $requestedLength Maximum number of characters to return
 									 * @return string
 									 */
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+									public function getFirstChars( $text, $requestedLength ) {
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+										if ( $requestedLength <= 0 ) {
 											return '';
 										}
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+										$length = mb_strlen( $text );
 										if ( $length <= $requestedLength ) {
 											return $text;
 										}
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
 										// This ungreedy pattern always matches, just might return an empty string
 										$pattern = '/^[\w\/]*>?/su';
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+										preg_match( $pattern, mb_substr( $text, $requestedLength ), $m );
-												Fix API adding ellipsis… when not needed

When the text is short enough to be returned as it is, it's very
confusing to see it with an ellipsis added at the end. There is
no more text. It should not look like there is more text.

Change-Id: I7ef205fde6c358a1cbcbb41346a1c9e2a856d8fd

											
										
										
											2021-01-08 08:05:35 +00:00
+										$truncatedText = mb_substr( $text, 0, $requestedLength ) . $m[0];
 										if ( $truncatedText === $text ) {
 											return $text;
 										}
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
-												Fix API adding ellipsis… when not needed

When the text is short enough to be returned as it is, it's very
confusing to see it with an ellipsis added at the end. There is
no more text. It should not look like there is more text.

Change-Id: I7ef205fde6c358a1cbcbb41346a1c9e2a856d8fd

											
										
										
											2021-01-08 08:05:35 +00:00
+										return $this->tidy( $truncatedText );
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+									}
 									/**
 									 * @param string $text
 									 * @return string
 									 */
 									private function tidy( $text ) {
-												Tidy is no longer configurable in MW 1.35

Remove use of deprecated MWTidy::isEnabled() and internal
MWTidy::singleton() methods.  See I3584181070da7ed4888beaaf04e083114aca1eab
for context.

Bug: T198214
Change-Id: I511068cc7b2398773a837f66e08def206cbb5626

											
										
										
											2020-05-02 05:25:10 +00:00
+										if ( $this->useTidy ) {
-												Replace use of deprecated MWTidy class

Bump required MW to >= 1.36.0

Change-Id: Ida40e6c1d84eec0e51e53f6aa98ac9f09fd52666

											
										
										
											2021-10-20 18:33:06 +00:00
+											$text = MediaWikiServices::getInstance()->getTidy()->tidy( $text );
-												Move Tidy functionality to TextTruncator

I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.

Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.

Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271

											
										
										
											2019-03-19 11:21:41 +00:00
+										}
 										return trim( $text );
-												Extract unrelated static code from ExtractFormatter

This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.

Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827

											
										
										
											2019-03-19 11:43:25 +00:00
+									}
 								}