ConfirmEdit was tripling the API save time, because it was parsing the
entire content twice to evaluate whether the addurl trigger is hit.
While I was here, I stopped using the deprecated non-Content hooks. The
new hook, EditEditFilterMergedContent, does not pass an EditPage object,
which means that Title or WikiPage objects need to be passed around
instead. Also, since EditPage::showEditForm() cannot be called with no
EditPage object, use a EditPage::showEditForm:fields hook instead.
If non-wikitext content is edited, assume that the regex trigger is not
hit.
For further architectural details, see the associated core change:
I4b4270dd868a . MW_EDITFILTERMERGED_SUPPORTS_API is a constant
introduced to detect the presence of the associated core change.
Also, in APIGetAllowedParams, set the allowed parameters even if we are
not on the help screen. This allows API users to submit their CAPTCHA
answer without it failing with an "unrecognized parameter" error.
Compatibility with MediaWiki 1.21 is retained, compatibility before that
is dropped.
Change-Id: I9529b7e8d3fc9301c754b28fda185aa3ab36f13e
MediaWiki core change I04b1a384 added support for i18n of API module
help. This takes advantage of that while still maintaining backwards
compatibility with earlier versions of MediaWiki.
Once support for MediaWiki before 1.25 is dropped, the methods marked
deprecated in this patch may be removed.
Change-Id: I67395aff48185f3e09da31b51a08aa2541fe6a17
This defeats naive thresholding, giving Tesseract break rate of 0 out of
1000, even if a sensible threshold value is hand-chosen. Reduced the
text value and noise to make room for the gradient, but kept an SNR of
1.3, as before, which provides good legibility.
Obviously the gradient can be removed with custom preprocessing -- the
point of these changes is to raise the bar from "unconfigured Tessearct"
to "some small amount of developer effort".
Change-Id: I30ebc904ca59bf29a2aa812f881a077a13493e68
Tesseract is a popular open source OCR package. Running it on
FancyCaptcha images, with no training or configuration, yielded a 56%
break rate. By restricting the character set, the OCR break rate was
improved to 66%.
So:
* Reduce k, increase wob scale, increase rr fuzz. The net effect of
these three changes is to more reliably bend the baseline. In the old
captcha, the baseline would often be bent by chance, but when it
wasn't bent, it provided a very easy challenge for the OCR engine.
This reduced the break rate from 66% to around 40%.
* Introduce additive noise, based on a bilinear upscale of a random
greyscale image. This, combined with the above change, reduces the
Tesseract break rate to 6%.
Change-Id: I05b5bb6475de9378cd89cce13b1b2f28b32cd405