Commit graph

13 commits

Author SHA1 Message Date
Tim Starling 6f286e52db Add a gradient to FancyCaptcha
This defeats naive thresholding, giving Tesseract break rate of 0 out of
1000, even if a sensible threshold value is hand-chosen. Reduced the
text value and noise to make room for the gradient, but kept an SNR of
1.3, as before, which provides good legibility.

Obviously the gradient can be removed with custom preprocessing -- the
point of these changes is to raise the bar from "unconfigured Tessearct"
to "some small amount of developer effort".

Change-Id: I30ebc904ca59bf29a2aa812f881a077a13493e68
2014-09-26 10:41:26 +10:00
Tim Starling df4806c64c Improve FancyCaptcha resistance to OCR
Tesseract is a popular open source OCR package. Running it on
FancyCaptcha images, with no training or configuration, yielded a 56%
break rate. By restricting the character set, the OCR break rate was
improved to 66%.

So:
* Reduce k, increase wob scale, increase rr fuzz. The net effect of
  these three changes is to more reliably bend the baseline. In the old
  captcha, the baseline would often be bent by chance, but when it
  wasn't bent, it provided a very easy challenge for the OCR engine.
  This reduced the break rate from 66% to around 40%.
* Introduce additive noise, based on a bilinear upscale of a random
  greyscale image. This, combined with the above change, reduces the
  Tesseract break rate to 6%.

Change-Id: I05b5bb6475de9378cd89cce13b1b2f28b32cd405
2014-09-26 08:47:55 +10:00
emufarmers cd7106e0d9 Add a default blacklist for FancyCaptcha.
If a blacklist is not specified using the '--blacklist' command-line option,
captcha.py will use a default blacklist, included in this patch-set.

Bug: 21025
Change-Id: I93eeaead4a86b38cf5aa0049ac5e61e5b4935b58
2013-06-12 20:52:27 -07:00
Aaron Schulz 3b987fb098 Fixed blacklist param for captcha script.
* Also closes the word list file handles.

Change-Id: I4feac2eca6ed29756b6fcbd38acc5e16bc26b2ce
2012-12-18 13:20:42 -08:00
Platonides 1f740061f5 Support generation of random challenges instead of wordlist-based ones.
Change-Id: Ib23dc4c7baab67184d25b15b4dde9b2a1f879924
2012-11-01 16:56:48 +01:00
Platonides 8cfde58450 Allow to generate the challenges from more than 2 wordlist entries.
Change-Id: I94a84e703a4072eb083177158690de190bee53fa
2012-11-01 15:49:37 +00:00
Platonides 008c232ca6 Provide parameters for setting the min/max length of the captcha "word" (challenge).
Change-Id: Ic2968cec884534dfa8ae7479589a1622a4db7de0
2012-11-01 15:48:18 +00:00
Sam Reed 0324318073 Correct the address of the FSF in extension GPL headers
59 Temple Place -> 51 Franklin Street
2010-06-21 13:45:17 +00:00
Alex Z. f81c299c27 Various code cleanups for the captcha generating script
* Use optparse instead of getopt
* Replace deprecated md5 module
* Replace deprecated string module functions with string methods
* More graceful failure
* Allow users to set the font size
* Don't run forever if no valid word combinations can be found
2009-09-08 01:11:52 +00:00
Greg Sabino Mullane 9aafa888c0 Skip words if they don't contain all letters. 2008-01-07 03:28:38 +00:00
Brion Vibber b2e474ebba Optional blacklist for word pair generation 2007-06-29 19:57:01 +00:00
Brion Vibber 74e3c3bb9f Add options to break up the captcha image storage with hash-digit subdirectories to avoid trawling through a giant directory on every hit 2007-02-19 20:09:03 +00:00
Brion Vibber cf1c61a3bd Captcha generating script by Neil Harris
with some tweaks for command-line options
Requires Python Imaging Library, a word list file, and a TrueType font.
2006-01-27 10:22:37 +00:00