Commit graph

128 commits

Author SHA1 Message Date
Victor Vasiliev aa399da279 Implement a tree-caching abuse filter parser
This filter is fully functional.  The old filter is still enabled by
default for a transitional period in case the new one suddenly has
issues.

Change-Id: I4aea5f00c62420108030e60e79d5bf34e913e95d
2016-09-24 02:53:26 +00:00
Paladox b3081e1798 Update mediawiki/mediawiki-codesniffer to 0.5.1
Change-Id: I4b2055a76db4362a8136e3fd595228cf07d083a9
2016-08-23 16:18:27 +02:00
Max Semenik 817b57a8fe Remove PHP 5.4 compatibility hacks
Change-Id: Id26a014478e5ec550bd81a8f660ddca7f6fd93f8
2016-07-28 15:35:51 -07:00
Brad Jorsch 8ee834c376 AFPUserVisibleException should log in English, not the user's language
The user's language should only be used when the exception is actually
displayed to the user.

This will also avoid "User::loadFromSession called before the end of
Setup.php" warnings when the syntax error is encountered during filter
execution for account autocreation, where we don't display it to the
user.

Bug: T124367
Change-Id: Ic17f56aecbe575ef15c6970c4298f889249e1904
2016-07-20 15:28:27 +00:00
Bartosz Dziewoński 3283e2f7b8 Optimize 'rcount()' function
Starting with PHP 5.4, preg_match_all() allows omitting the '$matches'
argument. This lets PHP avoid computing them, just count the matches.

I'm not sure how various versions of HHVM handle this, so I made it
just test whether this works rather than check version numbers.

Change-Id: Ib27505c8355109c6e6a9f1c4631281a0f0caaa75
2016-04-27 18:01:56 +02:00
Bartosz Dziewoński 5fc30112c7 Optimize 'count()' function
substr_count() is just as fast as looped strpos() when there are no
matches, and gets faster as the number of matches increases.

Note that this introduces a small change in behavior when the needle
is composed of repeated substrings, e.g. 'asdasdasd' or 'aa', and
haystack is such that the needle can be matched in overlapping
positions, e.g. 'asdasdasdasd' or 'aaaaa'. The old implementation
counted overlapping matches, the new one doesn't. I don't think this
behavior was intentional and I don't think this change will cause any
real problems.

Change-Id: Icc905ca34bf08d63e969787a5e3c119d498bf878
2016-04-17 08:32:27 +02:00
Kunal Mehta 5a5844b840 Use $wgExtensionDirectory instead of hardcoding $IP/extensions
Change-Id: Icf28a218d18df2ba90726f8b5a86621043138e48
2016-04-12 09:13:05 -07:00
Bartosz Dziewoński e00909b3c3 Remove unnecessary phpcs overrides
Change-Id: Ic7648c14f2ac2a07878f69e102832215af6855c8
2016-04-12 12:36:06 +00:00
Bartosz Dziewoński e79b45b71f Improve ignoring short-circuited operations
Previously, 'false & a == b' would actually execute the comparison and
count it against the condition limit, while 'false & (a == b)' wouldn't.
They behave the same now.

mShortCircuit was only checked for the most potentially expensive
operations (computing functions and getting variables), all the other
operations on bogus values generated by this would be executed and the
results ignored later.

This probably doesn't noticeably improve performance, but it corrects
how the condition limit is counted.

Bug: T43693
Change-Id: Id1d5f577b14b6ae6d987ded12689788eb7922474
2016-04-09 16:25:52 +02:00
Bartosz Dziewoński 3b32cf00e9 Improve how the number of conditions is counted
With the new behavior, the number of conditions in incremented when:
* Evaluating a function
* Evaluating a comparison operator (== === != !== < > <= >= =)
* Evaluating a keyword (in like matches contains rlike irlike regex)

Previously, the number of conditions was incremented when:
* Evaluating a function
* Entering the comparison operator evaluation mode

This resulted in a number of surprising behaviors. In particular:
* '(((a == b)))' counted as 4 conditions, not 1
* 'contains_any(a, b, c)' counted as 5 conditions, not 1
* 'a == b == c' counted as 1 condition, not 2
* 'a in b + c in d + e in f' counted as 1 condition, not 3
* 'true' counted as 1 condition, not 0

It is still possible to easily cheat the count by rewriting comparisons
as arithmetic operations. I believe this is meant to advise users of
the complexity of their rules and not really enforce strict limits.

Bug: T132190
Change-Id: I897769db4c2ceac802e3ae5d6fa8e9c9926ef246
2016-04-09 16:16:27 +02:00
Siebrand Mazeland ce1396aea7 Add support for PHP CodeSniffer checks
Also fix any remaining errors and warnings.

Change-Id: Ie49c6172e6bbf8040e5524d33d2f719e96784745
2016-01-06 09:59:47 -08:00
Ori Livneh d80a737921 AbuseFilter: don't install custom error handler
Scrap the AFPRegexErrorHandler custom error handler clusterfuck and replace it
with a simple try-catch that accomplishes the same thing.

Change-Id: Ice1b6da433b892d9871780a9753c098aa639bf6c
2015-10-21 23:56:53 +00:00
Ori Livneh bab9832415 Move rule tokenization to new AbuseFilterTokenizer class
* Move AbuseFilterParser::nextToken() and the various AbuseFilterParser
  properties that accompanied it to a new class, AbuseFilterTokenizer.
* Tokenize rules eagerly and cache the result in APC.

Change-Id: I15f5b5b65e8c4ec4fba3000d7c9fd78b98967d1d
2015-08-25 14:00:10 -07:00
Ori Livneh b388dfab1b Clean-up of AbuseFilterParser::nextToken()
No functional changes.

* Don't include $code as part of the return value; it is ignored anyway.
* Removed AbuseFilterParser::lastHandledToken and AFPParserState::lastInput,
  because AbuseFilterParser::nextToken() no longer calls itself recursively.
* The regular expression that matches operators is no longer constructed
  dynamically, but hard-coded into the class. To make sure it does not drift
  apart from the more legible AbuseFilterParser::$mOps, add a unit test that
  constructs the regex dynamically as before and compares it to
  AbuseFilterParser::OPERATOR_RE.
* AbuseFilterParser::RADIX_RE ditto.

Change-Id: I9c23b60759ed2f4c73a9b480243b16bbce5a208f
2015-08-25 10:50:31 -07:00
Ori Livneh 3eee5c7d89 Drop unused AbuseFilterParser::$mParams property
Change-Id: I0f1af7778c22ca7b5701fba7eda2701c038ec1d5
2015-08-21 13:58:41 -07:00
Chad Horohoe 0715c05ed9 Remove obvious function-level profiling
Change-Id: I85f65a65abb5b91b44441caa21fb9c5e8edb2ebe
2015-02-04 10:26:04 -08:00
Ori Livneh 0e36b728e3 Fix double escaping in AFPData::keywordLike()
If we don't map '\-' and '\+' to themselves, the leading slash gets escaped,
and the resultant pattern only matches a literal slash.

Bug: 67670
Change-Id: Ifa1e3edd6f41985a3bb97bfb1497985f8fa64af5
2014-07-11 14:56:42 -07:00
Ori Livneh ea46bfdd16 Use preg_match rather than fnmatch for 'like' operator
fnmatch() will not recognize 'é' as a single character when the LC_CTYPE locale
is set to C / POSIX. So transform the shell-style pattern to PCRE, and use
preg_match() instead.

fnmatch() was not available on Windows prior to PHP 5.3, so code snippets for
preg_match()-powered polyfills abound. I used the pattern translation map from
<http://www.php.net/manual/en/function.fnmatch.php#100207> after testing
different implementations and finding it to be the most complete.

Bug: 66930
Change-Id: Ice12c7b9dbe6472fe4131679a48a0ad54fac6394
2014-06-23 11:38:08 -07:00
Andrew Garrett 3b7cae1965 Fix ccnorm() function -- strtr fails on empty key
ReplacementArray->replace() calls strtr() which fails if an empty key is provided.

Change-Id: I635f057dab53edcfe1736f74829b6dbe1e7739d3
2014-05-30 11:24:21 -07:00
Marius Hoch 74bef04015 Add missing wfProfileOut calls
Change-Id: I4699284984f21e97fd85d62ca6d4722d3470a6f3
2014-01-26 04:54:16 +00:00
Jackmcbarn 4046a3ceaf Don't pass empty strings to strpos
When an empty parameter is passed to a function using strpos, such as when
an extra comma appears at the end of contains_any's parameter list, don't
call strpos on the empty string.

Bug: 60203
Change-Id: I6221a01ad1ec9090de7bfc1d9d6583f22ba0eb2e
2014-01-18 12:10:42 -05:00
Siebrand Mazeland 2bb441ee9c Declare visibility on class properties
Also compress some variable documentation.

Change-Id: I47d31e18df18492373aa407ea19f1a81b2cf5d57
2013-11-06 16:11:09 +00:00
Siebrand Mazeland 153c285758 Update comments and satisfy analyzer
Change-Id: I7d8ddaa61ca8a521a98fc10237df27a11a5bbe85
2013-10-14 21:45:02 +02:00
Siebrand Mazeland 84d5cd33a2 Remove unused local variables
Change-Id: If708a112ae6df070da19f52682f0e2b19fe67959
2013-10-14 21:39:35 +02:00
Marius Hoch 9823d78e3b Make AbuseFilter work without AntiSpoof
This dependency isn't really needed in many cases
while it can cause troubles if not fulfilled.

Change-Id: I9990e99c4d3d49b5bc400bbc4a0ec02142d6f055
2013-07-02 21:50:23 +02:00
jenkins-bot cd87fb0ee8 Merge "Conditional operator with () returns false." 2013-05-01 21:07:48 +00:00
jenkins-bot 84487b86d8 Merge "Deprecate addHolder for addHolders." 2013-04-23 21:22:37 +00:00
nischayn22 454a7cc897 Deprecate addHolder for addHolders.
lot of code was using ::merge() to create a new AbuseFilterVariableHolder
this is now simplified using a single addHolders method.

merge() still exists as its usful as a static function.
addHolder() is deprecated.

Change-Id: Ia4f6a56f642242a04cf2973b74ce44d91fce00eb
2013-04-23 23:19:32 +02:00
Kunal Mehta 4bec58cd54 Add a "ucase" function to convert the provided string to uppercase.
I basically took the lcase code and tweaked it to work for uppercase.

Bug: 47321
Change-Id: I230dbd99c27bf3a4a042befd6d334b4c0439bde0
2013-04-17 11:48:15 -05:00
nischayn22 dd28075e46 Conditional operator with () returns false.
Directly applying the patch given in the bug report. Thanks to orlodrim.
bug: 25373

Change-Id: I72b27e3dd22416288f3113e5a7c5a21ffbac01fb
2013-03-23 11:11:25 +05:30
shirayuki b460fa790f Adding trailing dot + comment for grep (24 messages)
Change-Id: I7b532028a2bdbed11f0dfe6cdf4eb4514671294c
2013-03-07 23:03:59 +09:00
Marius Hoch 42bd0d84f4 AbuseFilter: Change format of database logging/ performance
AF is setting several lazy load variables for the currently editing user.
To do this it's passing along the user name extracted from a user object
and generating a new user object later from that name which is of course
pointless. With this patch I'll pass user objects directly to prevent that.
On top of that I've deprecated a method in AFComputedVariable::compute which
was redundant as there is a more generic one which can solve that task
just fine.

Furthermore I've changed the logging behaviour from serializing the whole
AbuseFilterVariableHolder object to only store the variables. That has two
major advantages:
 * The amount of data that needs to be saved on a filter hit is reduced
   to about 1/10 of what the old version needed.
 * This is much more forward compatible as the old way of saving this
   relied on the class structure to stay the same while this is a simple
   array containing the vars.

On top of that we now only log variables already set by the time
a filter is hit. On top of the obvious performance increasement
that makes it easier for the user to spot the relevant data.

Another thing this change alters is the way the AbuseFilter internally
works with AbuseFilterVariableHolder objects. Right now we use one for
testing the filter(s) and later we use another one to compute the same
data again in case a filter was hit (for logging)!

This is not thoroughly tested yet, but way more sane than what we're
currently doing!

Change-Id: Ib15e7501bff32a54afe2d103ef5aedb950e58ef6
2013-02-28 22:35:22 +01:00
Chad Horohoe 0cd1053d21 Remove in_string() usage
Change-Id: I8ff74a827d742fdaf7d63d51be1f2300bbb5436b
2013-02-06 14:46:03 -05:00
CSteipp 27c083a9d5 Merge "Fix the abusefilter array parser test" 2013-01-31 23:24:37 +00:00
Marius Hoch fc5ef1666b Minor removal of duplicate code within the AF parser
Change-Id: I4e318028c2c623f77f1615971090eb06fc21f2d3
2012-12-20 17:09:10 +01:00
Marius Hoch 03da29b9da Fix the abusefilter array parser test
The abusefilter array test failed because length( ['a', 'b', 'c'] )
returned 12 instead of 6. That was du to it converted the array
to a string with new line seperated values first before measuring
the string length. Changed that behaviour to act like the php count()
function or the python len() function which seems far more useful to me.
The old behaviour can be established using length( string( array ) ).

Change-Id: I16646891837c9743ca5af2dd328077a7225bb5f1
2012-12-20 02:19:55 +01:00
Siebrand Mazeland 176227e721 Maintenance for AbuseFilter extension.
* Replace deprecated methods.
* Remove no longer needed function fnmatch().
* Remove superfluous newlines.
* Remove unused and redundant local variables and globals.
* Deglobalization.
* Update documentation.
* Fix incorrect return values or add FIXMEs when in doubt.
* Escape output in a few places where needed.
* Remove unneeded MEDIAWIKI constant checks.
* Fix various JSHint/JSLint issues.

Patch Set 11: Merged https://gerrit.wikimedia.org/r/24701 into
this one per Siebrand's request

Change-Id: I02ba4ce31b6aca5b7324114093f8ece143abc295
2012-10-09 22:26:45 +02:00
Andrew Garrett 53aea9c0ce AbuseFilter: Resolve bug 18374, bug 28633.
* Store the revision ID associated with a log entry
 if the action is successful.
* Expose this as a diff link in the UI.
* Implicitly hide log entries if their
 corresponding revisions are also hidden.
* Includes scope for expanding to log entries if desired.

Change-Id: Ie2d43dd1bacf14289fdf0492bb22267590ee649d
2012-07-11 10:16:59 -04:00
Sam Reed 8417c901f7 Few more types top flesh out the previous revisions
Change-Id: I1215dcf92f1b64e744c9ab41e0c5c046114dd48c
2012-03-26 16:03:23 +02:00
Sam Reed bea9cb0874 A LOT of function level documentation
Change-Id: I8b591be3c2da7cfb29d3be026772816d14037d37
2012-03-26 16:03:22 +02:00
Roan Kattouw 6c4bd57043 Revert r111217 (unreviewed rev in AbuseFilter) and its dependencies r113585, r113587, r113588, r113589.
All of these revisions are tagged with 'gerritmigration' and will be resubmitted into Gerrit after the Gerrit switchover. See also http://lists.wikimedia.org/pipermail/wikitech-l/2012-March/059124.html
2012-03-21 19:41:11 +00:00
Sam Reed 06e4721b80 Few more types top flesh out the previous revisions 2012-03-11 21:01:29 +00:00
Sam Reed 0c99b2bc15 A LOT of function level documentation 2012-03-11 20:40:04 +00:00
Andrew Garrett 5e4289ce4e AbuseFilter: Resolve bugs 18374, 28633.
* Store the revision ID associated with a log entry if the action is successful.
* Expose this as a diff link in the UI.
* Implicitly hide log entries if their corresponding revisions are also hidden.
* Includes scope for expanding to log entries if desired.
2012-02-10 23:41:05 +00:00
John Du Hart 5e724d1ada require -> require_once per report in IRC by Ciencia_Al_Poder 2011-12-26 19:22:47 +00:00
Victor Vasiliev dd289e2f89 Add comment explaining why we do not need second parameter to preg_quote, per Nikerabbit's comments on r100139 CR. 2011-10-19 18:57:36 +00:00
Victor Vasiliev eca7343487 (part of this commit is in r100135 due to SVN fail)
* (bug 24109) Add regex escaping function to abuse filter

Patch by Jérémie Roquet
2011-10-18 17:57:33 +00:00
Tim Starling da936e9bfe (bug 31379) Don't use the $errcontext parameter of a PHP error handler to get information for error display, this introduces an unexpected, difficult-to-maintain data flow which leads to bugs like the referenced one above. 2011-10-05 23:31:34 +00:00
Sam Reed a9e738f099 More document
Few minor code improvements
2011-08-24 22:11:52 +00:00
Sam Reed 6d548203f7 Parameter and Return Type hints 2011-02-10 17:32:57 +00:00