Commit graph

69 commits

Author SHA1 Message Date
Andrew Garrett 95f53efdfe Follow-up to r56296, replace htmlspecialchars_decode with html_entity_decode. 2009-09-15 10:25:15 +00:00
Brion Vibber 9bbd4f8bc9 Merge remaining unmerged line of live hacks from r53208 on AbuseFilter 2009-09-14 21:17:09 +00:00
Andrew Garrett 55c83ea218 Add HTML entity decoding to AbuseFilter ccnorm() function 2009-09-14 11:33:44 +00:00
Andrew Garrett 47d513310d Use multibyte-safe string operations in AbuseFilter bug 19333 2009-07-31 11:26:30 +00:00
Andrew Garrett 2eafa9bd66 Bug 19604, backwards-compatibility issues with AbuseFilter count() function. 2009-07-17 16:55:31 +00:00
Andrew Garrett 5cf4cf2d5f Fix Abuse Filter fatals. Resulted from the fact that whenever a regex error was encountered, the error handler was not reset. This error handler was then triggering for any PHP notice, E_STRICT or whatever, causing fatals on Wikimedia 2009-06-18 20:13:52 +00:00
Andrew Garrett db3c0bbe05 Fix regex error handling by returning immediately if error reporting is disabled. 2009-06-17 11:38:31 +00:00
Andrew Garrett 6678b42d8e Remove special-case list handling for contains_any, len, like/in -- breaks backwards-compatibility with old filters. 2009-06-16 14:28:00 +00:00
Andrew Garrett 48bfcc35ee Various code quality fixes for AbuseFilter suggested by Tim Starling in a private email, including bugfixes, memory safeguards, performance improvements, removal of redundant code, consolidation of similar functionaality. 2009-05-26 13:08:15 +00:00
Tim Starling da372fdec0 Reverted r49855, r49656, r49401, r49399, r49397. The language converter cannot be used outside the parser at present without generating a large number of bugs, due to global lifetime state variables, inappropriate $wgParser references, etc. Some refactoring needs to be done before it can be used in this way. 2009-05-26 07:46:29 +00:00
Tim Starling 268d72f43b Code formatting and comments. 2009-05-22 06:42:10 +00:00
Andrew Garrett 7e70a0d197 Merge in r49312 from preferences-work -- non preference related performance improvement to the AbuseFilter parser 2009-04-23 03:37:51 +00:00
Philip Tzou 28202160b8 Add a new function named 'convert()', allow user to convert a string to specified variant in Abuse Filter. With the support of LanguageConverter which updated on r49397. 2009-04-11 10:59:38 +00:00
Victor Vasiliev 128ae5983b Introduce list (non-associated array) support into abuse filter parser. 2009-04-05 17:11:17 +00:00
Victor Vasiliev 258d340fb5 Abuse filter:
* Introduce := operator for setting variables
* Throw an exception when user tries to override built-in variable
* Fix UTF-8 handling in fnmatch() fallback
* Copy three main abuse filters from enwiki to test suite
* Fix update.php integration
2009-04-05 11:47:42 +00:00
Andrew Garrett 7c2a7a2fe0 Support for variable setting with the set_var function, and multiple expressions separated by semicolons (;). In evaluation, the result of the LAST expression will be the return value. 2009-04-01 06:53:18 +00:00
Andrew Garrett ba0b30a054 Add syntax error messages for invalid regexes 2009-04-01 05:56:24 +00:00
Andrew Garrett 3f62707206 String manipulation functions substr, str_replace and strpos for AbuseFilter 2009-04-01 05:05:23 +00:00
Andrew Garrett c597c1915f Add contains_any function, for searching a single haystack for multiple needles. Implemented with FSS with a fallback to a for loop, so it should be really fast. 2009-03-26 02:03:32 +00:00
Andrew Garrett d4d2f4913d Patch by Robert Rohde to prevent empty-string matches of a regex intended to match numbers 2009-03-26 01:30:05 +00:00
Andrew Garrett 20f8b1d16b Properly fix regex munging 2009-03-25 12:43:53 +00:00
Andrew Garrett 1bb05bb402 Fix regex munging by not breaking with regexes with already-escaped /s 2009-03-25 12:15:28 +00:00
Andrew Garrett 5e70316a3a Faster brace short-circuit in Abuse Filter Parser. Patch by Robert Rohde. 2009-03-25 11:48:33 +00:00
Andrew Garrett 86e4081206 Abuse Filter Parser:
* Efficiency -- use /A instead of PREG_OFFSET_CAPTURE and comparing offsets.
* Expand error messages to enhance debugging.
* General code quality
2009-03-25 11:36:38 +00:00
Andrew Garrett fa2ef6a6ca Revert half-done patch from r48802 2009-03-25 10:57:46 +00:00
Andrew Garrett 91d501a4e0 Remove OBSOLETE file for PasswordReset 2009-03-25 10:55:43 +00:00
Andrew Garrett cf6f2899f6 Follow-up to r48674. 2009-03-22 10:34:54 +00:00
Andrew Garrett de32554f33 Fix remote execution vulnerability (exploitable only by admins) 2009-03-22 10:31:26 +00:00
Andrew Garrett 2495c5fcf7 Optimise rmdoubles by replacing its entire code with a single regex. Benchmarking shows it's up to 20 times faster. 2009-03-22 02:39:34 +00:00
Andrew Garrett 12f62fdea4 Fix another annoying bug 2009-03-19 00:18:03 +00:00
Andrew Garrett 33a83c67a2 Some fixes for r48545 2009-03-19 00:07:29 +00:00
Andrew Garrett e2ad3830a0 New short-circuiting of expensive operations when a boolean op means that the result won't matter 2009-03-18 23:28:35 +00:00
Andrew Garrett 1f4f45f8f2 Again revert accidentally-committed half-done code 2009-03-16 08:24:20 +00:00
Andrew Garrett 334582b645 Fix weird bug occurring in corrupted databases. 2009-03-16 08:21:24 +00:00
Andrew Garrett a8a4d7fc5a Revert half-done code introduced in r48372 2009-03-13 08:11:43 +00:00
Andrew Garrett 0e070fac7f Fix problems with prevention of double warnings 2009-03-13 08:02:05 +00:00
Andrew Garrett 864a73e907 New ip_in_range function 2009-03-09 12:39:52 +00:00
Andrew Garrett 5983a65415 Change escaping handling -- make \d => \d instead of d. It helps with writing regexes. 2009-03-07 01:31:35 +00:00
Andrew Garrett 55b417f517 Add rcount function, same as count except it takes a regex as the needle 2009-03-07 01:26:42 +00:00
Andrew Garrett e60dee6cac Add an interface for extensions to add variables into the variable list (only for ones generated for filtering, for now). Includes an implementation in the TorBlock extension 2009-03-05 02:43:05 +00:00
Andrew Garrett 92698e95ba Improve AbuseFilter performance by implementing lazy initialisation of computed variables.
This has been done by replacing simple associative arrays with an AbuseFilterVariableHolder, which recognises helper classes called AFComputedVariables.
Computation may occur during the abuse filter analysis, or later when testing and reviewing filters.
2009-02-26 12:15:14 +00:00
Andrew Garrett 05ea5b783d Add rmwhitespace function 2009-02-18 19:42:01 +00:00
Andrew Garrett 32d676942d Remove remnants of ctype_, and replace them with appropriate regexes (which, while slower, are locale-safe). 2009-02-11 20:01:00 +00:00
Andrew Garrett 35e61feeb6 Abuse Filter Parser updates
* Deprecate parseTokens in favour of a parse-as-you-go approach, faster and uses less memory.
* Display variables in lower_case so they aren't SHOUTING_AT_PEOPLE.
* Tell people if they try to use variables that don't exist, rather than silently returning NULL.
2009-02-11 20:00:33 +00:00
Andrew Garrett 0880f444b1 Abuse Filter Parser updates:
* Use strcspn to scan ahead for long regions of uninteresting text in string handling (performance).
* Remove cruft specific to my system in phpTest.php.
* Remove a test that was in incorrect syntax, and useless without adding variable support.
2009-02-11 18:23:21 +00:00
Andrew Garrett bfe57be65d Rewrite of Abuse Filter parser tokeniser.
I've made it more performant and fixed a few bugs by using regexes
instead of PHP loops, where possible, under the assumption that the
PCRE parser is more efficient than the same thing implemented in pure PHP.
Also, I'm now passing the same string around and calculating offsets, which
Tim tells me is far more performant than continually truncating the same string.

All tests still pass, with the exception of string.t, which I've modified
to remove the offending code, which never worked.
2009-02-11 01:41:51 +00:00
Andrew Garrett 430c95a60d Make variable names and keywords case-insensitive. 2009-01-30 23:46:25 +00:00
Andrew Garrett 48748d8fa7 Fix use of instance methods in nextToken, which is a static method. 2009-01-27 04:09:53 +00:00
Andrew Garrett 11ab345814 Localise Abuse Filter exceptions. 2009-01-26 23:32:46 +00:00
Andrew Garrett d50a26f04d Explicit detection for division by zero. 2009-01-25 05:54:49 +00:00