wikimedia/mediawiki-extensions-AbuseFilter

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/AbuseFilter.git synced 2024-11-24 22:15:26 +00:00

Author	SHA1	Message	Date
Victor Vasiliev	46faa02c49	Fix the associativity of boolean logic operators Change-Id: Icaf0fde0d74064532af4b110faef4014f8303f80	2016-11-06 20:30:07 -05:00
Victor Vasiliev	aa399da279	Implement a tree-caching abuse filter parser This filter is fully functional. The old filter is still enabled by default for a transitional period in case the new one suddenly has issues. Change-Id: I4aea5f00c62420108030e60e79d5bf34e913e95d	2016-09-24 02:53:26 +00:00
Victor Vasiliev	5da98b67bf	Add test coverage for more bizzare features of the filter parser I am pretty sure all of the behavior documented in these tests is a bad idea. It is possible that we can fix it since some of those features are probably unused, but for now those tests will serve as a documentation of the current behavior. Change-Id: Ia2a2f57a538d7aef2ac73fb2e47fe82dd5d5e09a	2016-08-21 18:45:22 -04:00
Kaldari	acd28cb00f	Update tests for AntiSpoof fixes Bug: T29987 Depends-On: Iccb3e50073bbbc2b979cb62dd0e129afd1c2e55f Change-Id: I8bef839b9b9ca5fced94ce6428e769133ede868f	2016-08-13 20:37:43 +00:00
Bartosz Dziewoński	5fc30112c7	Optimize 'count()' function substr_count() is just as fast as looped strpos() when there are no matches, and gets faster as the number of matches increases. Note that this introduces a small change in behavior when the needle is composed of repeated substrings, e.g. 'asdasdasd' or 'aa', and haystack is such that the needle can be matched in overlapping positions, e.g. 'asdasdasdasd' or 'aaaaa'. The old implementation counted overlapping matches, the new one doesn't. I don't think this behavior was intentional and I don't think this change will cause any real problems. Change-Id: Icc905ca34bf08d63e969787a5e3c119d498bf878	2016-04-17 08:32:27 +02:00
Bartosz Dziewoński	7d83540527	Add some tests for behavior of 'count()' function Change-Id: I29a6c91d0780dc9a1eaee6d29d3b1f9c9c708df7	2016-04-17 08:18:29 +02:00
Bartosz Dziewoński	e79b45b71f	Improve ignoring short-circuited operations Previously, 'false & a == b' would actually execute the comparison and count it against the condition limit, while 'false & (a == b)' wouldn't. They behave the same now. mShortCircuit was only checked for the most potentially expensive operations (computing functions and getting variables), all the other operations on bogus values generated by this would be executed and the results ignored later. This probably doesn't noticeably improve performance, but it corrects how the condition limit is counted. Bug: T43693 Change-Id: Id1d5f577b14b6ae6d987ded12689788eb7922474	2016-04-09 16:25:52 +02:00
Bartosz Dziewoński	3b32cf00e9	Improve how the number of conditions is counted With the new behavior, the number of conditions in incremented when: * Evaluating a function * Evaluating a comparison operator (== === != !== < > <= >= =) * Evaluating a keyword (in like matches contains rlike irlike regex) Previously, the number of conditions was incremented when: * Evaluating a function * Entering the comparison operator evaluation mode This resulted in a number of surprising behaviors. In particular: * '(((a == b)))' counted as 4 conditions, not 1 * 'contains_any(a, b, c)' counted as 5 conditions, not 1 * 'a == b == c' counted as 1 condition, not 2 * 'a in b + c in d + e in f' counted as 1 condition, not 3 * 'true' counted as 1 condition, not 0 It is still possible to easily cheat the count by rewriting comparisons as arithmetic operations. I believe this is meant to advise users of the complexity of their rules and not really enforce strict limits. Bug: T132190 Change-Id: I897769db4c2ceac802e3ae5d6fa8e9c9926ef246	2016-04-09 16:16:27 +02:00
Ori Livneh	bab9832415	Move rule tokenization to new AbuseFilterTokenizer class * Move AbuseFilterParser::nextToken() and the various AbuseFilterParser properties that accompanied it to a new class, AbuseFilterTokenizer. * Tokenize rules eagerly and cache the result in APC. Change-Id: I15f5b5b65e8c4ec4fba3000d7c9fd78b98967d1d	2015-08-25 14:00:10 -07:00
Ori Livneh	b388dfab1b	Clean-up of AbuseFilterParser::nextToken() No functional changes. * Don't include $code as part of the return value; it is ignored anyway. * Removed AbuseFilterParser::lastHandledToken and AFPParserState::lastInput, because AbuseFilterParser::nextToken() no longer calls itself recursively. * The regular expression that matches operators is no longer constructed dynamically, but hard-coded into the class. To make sure it does not drift apart from the more legible AbuseFilterParser::$mOps, add a unit test that constructs the regex dynamically as before and compares it to AbuseFilterParser::OPERATOR_RE. * AbuseFilterParser::RADIX_RE ditto. Change-Id: I9c23b60759ed2f4c73a9b480243b16bbce5a208f	2015-08-25 10:50:31 -07:00
Ori Livneh	0e36b728e3	Fix double escaping in AFPData::keywordLike() If we don't map '\-' and '\+' to themselves, the leading slash gets escaped, and the resultant pattern only matches a literal slash. Bug: 67670 Change-Id: Ifa1e3edd6f41985a3bb97bfb1497985f8fa64af5	2014-07-11 14:56:42 -07:00
Marius Hoch	35747761fb	Allow running the AbuseFilter parser tests via phpunit I've also added myself to the credits file as I'm the only maintainer of this extension for a while now. Change-Id: Id998172ea2abd70b8243de9db1a96cc2cfa47a64	2013-07-08 19:22:43 +02:00
jenkins-bot	3c83358506	Merge "Add parser tests for bug 25373"	2013-05-01 21:25:11 +00:00
Kunal Mehta	4bec58cd54	Add a "ucase" function to convert the provided string to uppercase. I basically took the lcase code and tweaked it to work for uppercase. Bug: 47321 Change-Id: I230dbd99c27bf3a4a042befd6d334b4c0439bde0	2013-04-17 11:48:15 -05:00
Marius Hoch	3010d78950	Add parser tests for bug 25373 Change-Id: I2f2524731098f323e61bbc0442e7b56b11cdea37	2013-03-23 21:49:57 +01:00
Marius Hoch	03da29b9da	Fix the abusefilter array parser test The abusefilter array test failed because length( ['a', 'b', 'c'] ) returned 12 instead of 6. That was du to it converted the array to a string with new line seperated values first before measuring the string length. Changed that behaviour to act like the php count() function or the python len() function which seems far more useful to me. The old behaviour can be established using length( string( array ) ). Change-Id: I16646891837c9743ca5af2dd328077a7225bb5f1	2012-12-20 02:19:55 +01:00
Alexandre Emsenhuber	56e6f0a262	svn:eol-style native	2009-04-09 20:45:31 +00:00
Victor Vasiliev	27fb1303a8	* Use lists instead of implode()d strings in built-in variables wherever it's possible ATTENTION! This may break filters that rely on "added_lines contains 'bla-bla'" syntax. They'll need to be replaced with "string(added_lines) contains 'bla-bla'"	2009-04-05 19:07:47 +00:00
Victor Vasiliev	128ae5983b	Introduce list (non-associated array) support into abuse filter parser.	2009-04-05 17:11:17 +00:00
Victor Vasiliev	258d340fb5	Abuse filter: * Introduce := operator for setting variables * Throw an exception when user tries to override built-in variable * Fix UTF-8 handling in fnmatch() fallback * Copy three main abuse filters from enwiki to test suite * Fix update.php integration	2009-04-05 11:47:42 +00:00
Andrew Garrett	86e4081206	Abuse Filter Parser: * Efficiency -- use /A instead of PREG_OFFSET_CAPTURE and comparing offsets. * Expand error messages to enhance debugging. * General code quality	2009-03-25 11:36:38 +00:00
Andrew Garrett	0880f444b1	Abuse Filter Parser updates: * Use strcspn to scan ahead for long regions of uninteresting text in string handling (performance). * Remove cruft specific to my system in phpTest.php. * Remove a test that was in incorrect syntax, and useless without adding variable support.	2009-02-11 18:23:21 +00:00
Andrew Garrett	bfe57be65d	Rewrite of Abuse Filter parser tokeniser. I've made it more performant and fixed a few bugs by using regexes instead of PHP loops, where possible, under the assumption that the PCRE parser is more efficient than the same thing implemented in pure PHP. Also, I'm now passing the same string around and calculating offsets, which Tim tells me is far more performant than continually truncating the same string. All tests still pass, with the exception of string.t, which I've modified to remove the offending code, which never worked.	2009-02-11 01:41:51 +00:00
Andrew Garrett	53179c675f	Apply changes from change-tagging branch. I will remove all of the stuff actually related to change tagging in a moment, to avoid trunk changes on Wikimedia sites.	2009-01-23 19:23:19 +00:00

24 commits