Use ParserOutput::getRawText() instead of deprecated getText().
This avoids a dependency on MediaWiki\OutputTransform classes,
which call into Skin and lots of other scary code.
Small updates are needed so that we continue stripping the TOC
and section edit links in this format.
This simultaneously reverts two patches that were only needed to work
around problems caused by markup added by OutputTransforms:
* Revert "Skip <h2> in TOC when extracting first section"
This reverts commit 60e1c5ad83.
* Revert "ExtractFormatter: Rescue headings from being removed"
This reverts commit 0fafa44a20.
Change-Id: Ie436576a356d05f2c4c4b84c8c1d82ba70d357d4
I argue that the code fixing unclosed HTML tags is – even if optional –
an integral part of the code that potentially breaks these HTML tags in
the first place. Notice how much code disappears in the ApiQueryExtracts
class.
Additionally, the new approach uses an interface instead of a static
function call that is impossible to mock and hard to test.
Change-Id: Ic1a65995f4dba11d060a8738d642905cbfc79271
Note how only two files mentioned the license before. For consistency
it should be either all or none. Both solutions would be possible. Even
*not* mentioning the license anywhere in these files would be fine from
a legal perspective, as long as the relevant file COPYING is still
there in the root folder of this extension.
The overly long "deed" text does not serve much of a purpose. It's not a
complete, legally relevant license text. It's hard to read as the fact
this is "GPL2+" is surprisingly hard to find. The @license tag solves
these problems, and is recognized by documentation generators.
Change-Id: I7844be0c5f4f3d7562156cd9f34fe466552a9c9d
This is a straightforward baseline patch that does nothing but moving
existing code around, without touching it. I'm not even trying to
remove the "static" keyword. The actual refactoring will be done in
the next patch. I hope with this the changes I do in the refactoring
become more visible and much easier to review.
Change-Id: Idba859ec0c24f3622ea8fb8d7a9b11843d1e3827
* Use the ?? operator.
* Use "\u{00A0}" instead of "\xC2\xA0".
Also increase the minimum required MediaWiki version from 1.30 to 1.31
because 1.31 requires PHP7.
Change-Id: Ic5c279976f50b381cec65e74b7cc821a210c2173
Also renames $action to $name in APIQueryExtracts.php, because trying to
document the parameter revealed that "action" doesn't match the use of
the parameter.
Bug: T170580
Change-Id: I1b7f3f0e17b118ea9bcfd28c69321aa692aad4e3
Changes:
- ApiBase::setWarning() is deprecated, use addWarning() instead
- ParserCache::singleton() is deprecated, use MediaWikiServices instead
- Exception import is not used, drop it
- added MediaWiki 1.29 as a requirement
Bug: T166714
Change-Id: Ib81e5acbb28e1f803c7a792b9f990f2aa6d57521
Some space characters like or $thinsp; usually is not indicate to the end of sentence, so shouldn't be used as separators.
Bug: T115817
Change-Id: Ieb56b0ef723dd299f848ea88b66613d92977bef0
Disables sentence termination at a full stop preceeded by a capital
alphabet which is likely to be an initial.
Bug: T115795
Change-Id: Ibf38e87823155c704ffb106642944cbd05e3f632
Allows sentences to end with numbers before a full stop in query
extractsentences.
Also added some more unit tests.
Bug: T118621
Change-Id: I9cbf487601d4165b490696d38d5fcbcf6d8f4637
In the spirit of escaping as close to the output as possible, ensure
that the number of sentences is an integer before using it in a regex.
Just in case someone changes the api's param definition.
Change-Id: I406d6ed365ecd53bd8f56a09218a7e1403fe0fa9
... so that per-span information for different languages, i.e. lang and
dir attributes aren't lost.
Bug: T59582
Change-Id: If1b04714fdc0f4d581ddb858d8d53f6f340dc10b