Commit graph

12 commits

Author SHA1 Message Date
jenkins-bot fbe7379738 Merge "The last sentence of the paragraph was lost." 2016-04-14 00:00:21 +00:00
Max Semenik 9bc33683a0 Switch to librarized HtmlFormatter
Bug: T125001
Change-Id: Iac73553ac4b03e75ef321c6a659ece1ac155260b
2016-04-12 21:23:01 -07:00
Sergey Leschina ae7fe951f1 The last sentence of the paragraph was lost.
Change-Id: I963ca71b73dc7396156e8b5fcf5d2952e4abbc05
2016-04-11 02:08:14 +03:00
Sergey Leschina 472d84c9de Fix separation of text into sentences.
Some space characters like   or $thinsp; usually is not indicate to the end of sentence, so shouldn't be used as separators.

Bug: T115817
Change-Id: Ieb56b0ef723dd299f848ea88b66613d92977bef0
2016-04-01 10:49:17 +03:00
Sumit Asthana 13d6592978 TextExtracts do not crop after initials
Disables sentence termination at a full stop preceeded by a capital
alphabet which is likely to be an initial.

Bug: T115795
Change-Id: Ibf38e87823155c704ffb106642944cbd05e3f632
2015-12-03 07:11:36 +05:30
Sumit Asthana d83ac976e3 TextExtracts allow sentence end with numbers
Allows sentences to end with numbers before a full stop in query
extractsentences.

Also added some more unit tests.

Bug: T118621
Change-Id: I9cbf487601d4165b490696d38d5fcbcf6d8f4637
2015-11-18 20:11:20 -06:00
csteipp 97495d1ff3 Ensure sentences is an int
In the spirit of escaping as close to the output as possible, ensure
that the number of sentences is an integer before using it in a regex.
Just in case someone changes the api's param definition.

Change-Id: I406d6ed365ecd53bd8f56a09218a7e1403fe0fa9
2015-03-24 12:54:54 -07:00
Chad Horohoe d9869ef8d0 Remove obvious function-level profiling
Change-Id: I0c272eb337566eff28d46d198c9aa065ffdbddb2
2015-02-11 08:49:13 -08:00
jenkins-bot 1c58fd6df9 Merge "Don't flatten spans" 2015-01-13 20:40:04 +00:00
Sam Smith 59633e2be9 Don't flatten spans
... so that per-span information for different languages, i.e. lang and
dir attributes aren't lost.

Bug: T59582
Change-Id: If1b04714fdc0f4d581ddb858d8d53f6f340dc10b
2015-01-13 16:31:01 +00:00
Ori Livneh 23dcce746a MWException -> Exception
Change-Id: If111014ef2d7aea5c72bdcf4600a9067e2e21e00
2015-01-09 19:06:21 -08:00
Max Semenik fbd8e93a8b Reorg: move hooks to a separate class, introduce namespaces
Change-Id: Ic784010e79b1168f0e112cf912f463036255eb64
2014-12-31 15:05:19 -08:00
Renamed from ExtractFormatter.php (Browse further)