Script fails on macOS Python 3.9.2 otherwise, with "name 'output' is not
defined".
Change-Id: Id3df19b4a4dedf69f860f1a41348770ac1207377
(cherry picked from commit d1bc02e2fb)
Also remove int() casting from captcha-old.py which was added
in 66152162fe but diverges from
captcha.py
Most of the code should be the same in these two files should
be the same bar the actual image processing...
Bug: T263223
Follows-Up: If4f6bc9048aceacc41538c001255425e848fd8e9
Change-Id: I3062e64fc380022ca9fee793bc522f212eb873d3
This defeats naive thresholding, giving Tesseract break rate of 0 out of
1000, even if a sensible threshold value is hand-chosen. Reduced the
text value and noise to make room for the gradient, but kept an SNR of
1.3, as before, which provides good legibility.
Obviously the gradient can be removed with custom preprocessing -- the
point of these changes is to raise the bar from "unconfigured Tessearct"
to "some small amount of developer effort".
Change-Id: I30ebc904ca59bf29a2aa812f881a077a13493e68
Tesseract is a popular open source OCR package. Running it on
FancyCaptcha images, with no training or configuration, yielded a 56%
break rate. By restricting the character set, the OCR break rate was
improved to 66%.
So:
* Reduce k, increase wob scale, increase rr fuzz. The net effect of
these three changes is to more reliably bend the baseline. In the old
captcha, the baseline would often be bent by chance, but when it
wasn't bent, it provided a very easy challenge for the OCR engine.
This reduced the break rate from 66% to around 40%.
* Introduce additive noise, based on a bilinear upscale of a random
greyscale image. This, combined with the above change, reduces the
Tesseract break rate to 6%.
Change-Id: I05b5bb6475de9378cd89cce13b1b2f28b32cd405
If a blacklist is not specified using the '--blacklist' command-line option,
captcha.py will use a default blacklist, included in this patch-set.
Bug: 21025
Change-Id: I93eeaead4a86b38cf5aa0049ac5e61e5b4935b58
* Use optparse instead of getopt
* Replace deprecated md5 module
* Replace deprecated string module functions with string methods
* More graceful failure
* Allow users to set the font size
* Don't run forever if no valid word combinations can be found