Commit graph

20 commits

Author SHA1 Message Date
jenkins-bot fbe7379738 Merge "The last sentence of the paragraph was lost." 2016-04-14 00:00:21 +00:00
Max Semenik 9bc33683a0 Switch to librarized HtmlFormatter
Bug: T125001
Change-Id: Iac73553ac4b03e75ef321c6a659ece1ac155260b
2016-04-12 21:23:01 -07:00
Sergey Leschina ae7fe951f1 The last sentence of the paragraph was lost.
Change-Id: I963ca71b73dc7396156e8b5fcf5d2952e4abbc05
2016-04-11 02:08:14 +03:00
Sergey Leschina 472d84c9de Fix separation of text into sentences.
Some space characters like   or $thinsp; usually is not indicate to the end of sentence, so shouldn't be used as separators.

Bug: T115817
Change-Id: Ieb56b0ef723dd299f848ea88b66613d92977bef0
2016-04-01 10:49:17 +03:00
Kunal Mehta 0664ddbf94 Add missing use statement
Removed another unused one, and cleaned up the doc block.

Bug: T121283
Change-Id: I1a8b9920152e6d52ffb59de385fcc29c92f33c92
2015-12-11 15:11:43 -08:00
mhutti1 80703452ed Converted TextExtracts to new extension registration system
Moved most of TextExtracts.php to the new extension.json
and added method for backward compatable implementation
of the extension if still called though the php file. Moved
unit test hook to Hooks.php and deleted old il8n.php.

Bug: T87979
Change-Id: I3d26bd931ad2941268b94474f3e6327282da24ec
2015-12-10 22:59:49 +01:00
Sumit Asthana 13d6592978 TextExtracts do not crop after initials
Disables sentence termination at a full stop preceeded by a capital
alphabet which is likely to be an initial.

Bug: T115795
Change-Id: Ibf38e87823155c704ffb106642944cbd05e3f632
2015-12-03 07:11:36 +05:30
Sumit Asthana d83ac976e3 TextExtracts allow sentence end with numbers
Allows sentences to end with numbers before a full stop in query
extractsentences.

Also added some more unit tests.

Bug: T118621
Change-Id: I9cbf487601d4165b490696d38d5fcbcf6d8f4637
2015-11-18 20:11:20 -06:00
Kunal Mehta 36d1b4f3c4 Use page_touched in cache key instead of page_latest
Because the extracts depend upon template inclusion, to make sure
the extract is properly updated whenever the page's dependencies change,
use the page_touched timestamp instead of the latest revision id.

Since we're changing the cache key format, remove the 'mf' prefix from
back when it was still in MobileFrontend.

As a side-effect, this will also make action=purge invalidate the cache
since it updates page_touched.

Bug: T117322
Change-Id: Ib6f415c756c57caf6c83be495a4f229446e8b61e
2015-10-31 22:00:51 -07:00
Matthew Flaschen 63b358fca2 SECURITY: Disallow extracts for non-wikitext for now.
Note that the sensitive information is still in the TextExtracts
memcached, so this requires security review (and either eviction
or a cache key change) before enabling other content models.

Bug: T107170
Change-Id: I57642e84db39d585c5b04453f86102b10fb69cdf
(cherry picked from commit f5c114c571)
2015-08-04 00:08:43 +00:00
jenkins-bot 0285c9e033 Merge "Ensure sentences is an int" 2015-07-15 20:35:34 +00:00
Ori Livneh 7c1ea48971 Update for rename of WikiPage::isParserCacheUsed() in I7de67937f0
Make the code compatible with both the old name (WikiPage::isParserCacheUsed)
and new name (WikiPage::shouldCheckParserCache).

Change-Id: If5d5da8eab132eb6d60f7141884ed2aeaa46e444
2015-06-22 20:44:23 -07:00
Brad Jorsch 95002e7a59 Further cleanup for core API change
PS25 and later changed things around a fair bit, meaning the previous update
needs some further updating. In some cases additional cleanup is also necessary
for future core API changes.

Bug: T96595
Change-Id: I1573e523cf3c945fca95d8d2db002f5abcdbb29d
2015-04-20 14:41:29 -04:00
csteipp 97495d1ff3 Ensure sentences is an int
In the spirit of escaping as close to the output as possible, ensure
that the number of sentences is an integer before using it in a regex.
Just in case someone changes the api's param definition.

Change-Id: I406d6ed365ecd53bd8f56a09218a7e1403fe0fa9
2015-03-24 12:54:54 -07:00
Brad Jorsch c3eb02a9a6 Update ApiResult handling for mediawiki/core change I7b37295e
Change I7b37295e for mediawiki/core deprecates several methods, and more
importantly changes the format of the data returned from
ApiResult::getData(). This change should handle these differences in a
backwards-compatible manner.

Change-Id: I7b37295e8862b188d1f3b0cd07f66ac34629678e
2015-02-17 14:37:22 -05:00
Chad Horohoe d9869ef8d0 Remove obvious function-level profiling
Change-Id: I0c272eb337566eff28d46d198c9aa065ffdbddb2
2015-02-11 08:49:13 -08:00
jenkins-bot 1c58fd6df9 Merge "Don't flatten spans" 2015-01-13 20:40:04 +00:00
Sam Smith 59633e2be9 Don't flatten spans
... so that per-span information for different languages, i.e. lang and
dir attributes aren't lost.

Bug: T59582
Change-Id: If1b04714fdc0f4d581ddb858d8d53f6f340dc10b
2015-01-13 16:31:01 +00:00
Ori Livneh 23dcce746a MWException -> Exception
Change-Id: If111014ef2d7aea5c72bdcf4600a9067e2e21e00
2015-01-09 19:06:21 -08:00
Max Semenik fbd8e93a8b Reorg: move hooks to a separate class, introduce namespaces
Change-Id: Ic784010e79b1168f0e112cf912f463036255eb64
2014-12-31 15:05:19 -08:00