This filter is fully functional. The old filter is still enabled by
default for a transitional period in case the new one suddenly has
issues.
Change-Id: I4aea5f00c62420108030e60e79d5bf34e913e95d
The user's language should only be used when the exception is actually
displayed to the user.
This will also avoid "User::loadFromSession called before the end of
Setup.php" warnings when the syntax error is encountered during filter
execution for account autocreation, where we don't display it to the
user.
Bug: T124367
Change-Id: Ic17f56aecbe575ef15c6970c4298f889249e1904
Starting with PHP 5.4, preg_match_all() allows omitting the '$matches'
argument. This lets PHP avoid computing them, just count the matches.
I'm not sure how various versions of HHVM handle this, so I made it
just test whether this works rather than check version numbers.
Change-Id: Ib27505c8355109c6e6a9f1c4631281a0f0caaa75
substr_count() is just as fast as looped strpos() when there are no
matches, and gets faster as the number of matches increases.
Note that this introduces a small change in behavior when the needle
is composed of repeated substrings, e.g. 'asdasdasd' or 'aa', and
haystack is such that the needle can be matched in overlapping
positions, e.g. 'asdasdasdasd' or 'aaaaa'. The old implementation
counted overlapping matches, the new one doesn't. I don't think this
behavior was intentional and I don't think this change will cause any
real problems.
Change-Id: Icc905ca34bf08d63e969787a5e3c119d498bf878
Previously, 'false & a == b' would actually execute the comparison and
count it against the condition limit, while 'false & (a == b)' wouldn't.
They behave the same now.
mShortCircuit was only checked for the most potentially expensive
operations (computing functions and getting variables), all the other
operations on bogus values generated by this would be executed and the
results ignored later.
This probably doesn't noticeably improve performance, but it corrects
how the condition limit is counted.
Bug: T43693
Change-Id: Id1d5f577b14b6ae6d987ded12689788eb7922474
With the new behavior, the number of conditions in incremented when:
* Evaluating a function
* Evaluating a comparison operator (== === != !== < > <= >= =)
* Evaluating a keyword (in like matches contains rlike irlike regex)
Previously, the number of conditions was incremented when:
* Evaluating a function
* Entering the comparison operator evaluation mode
This resulted in a number of surprising behaviors. In particular:
* '(((a == b)))' counted as 4 conditions, not 1
* 'contains_any(a, b, c)' counted as 5 conditions, not 1
* 'a == b == c' counted as 1 condition, not 2
* 'a in b + c in d + e in f' counted as 1 condition, not 3
* 'true' counted as 1 condition, not 0
It is still possible to easily cheat the count by rewriting comparisons
as arithmetic operations. I believe this is meant to advise users of
the complexity of their rules and not really enforce strict limits.
Bug: T132190
Change-Id: I897769db4c2ceac802e3ae5d6fa8e9c9926ef246
Scrap the AFPRegexErrorHandler custom error handler clusterfuck and replace it
with a simple try-catch that accomplishes the same thing.
Change-Id: Ice1b6da433b892d9871780a9753c098aa639bf6c
* Move AbuseFilterParser::nextToken() and the various AbuseFilterParser
properties that accompanied it to a new class, AbuseFilterTokenizer.
* Tokenize rules eagerly and cache the result in APC.
Change-Id: I15f5b5b65e8c4ec4fba3000d7c9fd78b98967d1d
No functional changes.
* Don't include $code as part of the return value; it is ignored anyway.
* Removed AbuseFilterParser::lastHandledToken and AFPParserState::lastInput,
because AbuseFilterParser::nextToken() no longer calls itself recursively.
* The regular expression that matches operators is no longer constructed
dynamically, but hard-coded into the class. To make sure it does not drift
apart from the more legible AbuseFilterParser::$mOps, add a unit test that
constructs the regex dynamically as before and compares it to
AbuseFilterParser::OPERATOR_RE.
* AbuseFilterParser::RADIX_RE ditto.
Change-Id: I9c23b60759ed2f4c73a9b480243b16bbce5a208f
If we don't map '\-' and '\+' to themselves, the leading slash gets escaped,
and the resultant pattern only matches a literal slash.
Bug: 67670
Change-Id: Ifa1e3edd6f41985a3bb97bfb1497985f8fa64af5
fnmatch() will not recognize 'é' as a single character when the LC_CTYPE locale
is set to C / POSIX. So transform the shell-style pattern to PCRE, and use
preg_match() instead.
fnmatch() was not available on Windows prior to PHP 5.3, so code snippets for
preg_match()-powered polyfills abound. I used the pattern translation map from
<http://www.php.net/manual/en/function.fnmatch.php#100207> after testing
different implementations and finding it to be the most complete.
Bug: 66930
Change-Id: Ice12c7b9dbe6472fe4131679a48a0ad54fac6394
When an empty parameter is passed to a function using strpos, such as when
an extra comma appears at the end of contains_any's parameter list, don't
call strpos on the empty string.
Bug: 60203
Change-Id: I6221a01ad1ec9090de7bfc1d9d6583f22ba0eb2e
lot of code was using ::merge() to create a new AbuseFilterVariableHolder
this is now simplified using a single addHolders method.
merge() still exists as its usful as a static function.
addHolder() is deprecated.
Change-Id: Ia4f6a56f642242a04cf2973b74ce44d91fce00eb
AF is setting several lazy load variables for the currently editing user.
To do this it's passing along the user name extracted from a user object
and generating a new user object later from that name which is of course
pointless. With this patch I'll pass user objects directly to prevent that.
On top of that I've deprecated a method in AFComputedVariable::compute which
was redundant as there is a more generic one which can solve that task
just fine.
Furthermore I've changed the logging behaviour from serializing the whole
AbuseFilterVariableHolder object to only store the variables. That has two
major advantages:
* The amount of data that needs to be saved on a filter hit is reduced
to about 1/10 of what the old version needed.
* This is much more forward compatible as the old way of saving this
relied on the class structure to stay the same while this is a simple
array containing the vars.
On top of that we now only log variables already set by the time
a filter is hit. On top of the obvious performance increasement
that makes it easier for the user to spot the relevant data.
Another thing this change alters is the way the AbuseFilter internally
works with AbuseFilterVariableHolder objects. Right now we use one for
testing the filter(s) and later we use another one to compute the same
data again in case a filter was hit (for logging)!
This is not thoroughly tested yet, but way more sane than what we're
currently doing!
Change-Id: Ib15e7501bff32a54afe2d103ef5aedb950e58ef6
The abusefilter array test failed because length( ['a', 'b', 'c'] )
returned 12 instead of 6. That was du to it converted the array
to a string with new line seperated values first before measuring
the string length. Changed that behaviour to act like the php count()
function or the python len() function which seems far more useful to me.
The old behaviour can be established using length( string( array ) ).
Change-Id: I16646891837c9743ca5af2dd328077a7225bb5f1
* Replace deprecated methods.
* Remove no longer needed function fnmatch().
* Remove superfluous newlines.
* Remove unused and redundant local variables and globals.
* Deglobalization.
* Update documentation.
* Fix incorrect return values or add FIXMEs when in doubt.
* Escape output in a few places where needed.
* Remove unneeded MEDIAWIKI constant checks.
* Fix various JSHint/JSLint issues.
Patch Set 11: Merged https://gerrit.wikimedia.org/r/24701 into
this one per Siebrand's request
Change-Id: I02ba4ce31b6aca5b7324114093f8ece143abc295
* Store the revision ID associated with a log entry
if the action is successful.
* Expose this as a diff link in the UI.
* Implicitly hide log entries if their
corresponding revisions are also hidden.
* Includes scope for expanding to log entries if desired.
Change-Id: Ie2d43dd1bacf14289fdf0492bb22267590ee649d
* Store the revision ID associated with a log entry if the action is successful.
* Expose this as a diff link in the UI.
* Implicitly hide log entries if their corresponding revisions are also hidden.
* Includes scope for expanding to log entries if desired.