This is a first step towards MCR support in AbuseFilter. The textual
representation of all slots is concatenated. Since AbuseFilter uses
getTextForSearchIndex to determine the textual representation of
content, blind concatenation should not break any assumptions
made by AbsueFilter rules: this naive approach is no worse than
AbuseFilters handling of non-textual content in general, and should
work fine for textual content.
Bug: T209291
Change-Id: Ic141085cad2e11bfe106fe83dafcb35ac31206ba
For several reasons:
*We're not really checking permissions (and the hook previously used is
meant to be used in such case)
*We'll show a cleaner error message (i.e. without the "You do not have
permission..." part)
*Filtering will happen closer to the actual move
Bug: T208907
Depends-On: I4733724075b7514e9db59e7be772d9409aa9da87
Change-Id: If88f736a446247f8b4b13c055c641d56f544d1ea
Simplify some logic constructs, reduce the amount of return statements
inside methods, explicitly declare variables before using them, reduce
code duplication, add names to JS anonymous function to produce clearer
stack traces.
Change-Id: Ife4546a91c30d4c519d09a712ba56a2f33abe579
Long (sigh) explanation in T203587#4569698. Also, simplified the way
TagMultiselect are generated, this one and the one for change tags.
This new selector is back-compat both with the old textarea and the OOUI
checkboxMultiselect; actually, this one is //fully// compatible with the
old textarea.
Add validation for throttle parameters and unit tests for validation
(split from I976c95658cddb2585910b6f8a5f047aadc4e4d47).
Added a trim when retrieving throttle identifier to allow syntax like
'ip, user'.
Improved the message shown on history.
Re-added the maintenance script to clean DB.
As I wrote in the task, a review by two other people would be great, at
least for the maintenance script (it could potentially break the DB).
Bug: T203587
Bug: T203336
Bug: T203584
Bug: T203585
Depends-On: I3b2e763bd8835207dc5df1db43d3e1881e6961c3
Change-Id: I7831dbb0bab55807392ac1f7915d6cb0cb713593
* AbuseFilterConsequencesTest is somehow leaving blocks behind. Mark
ipblocks as being used to avoid that.
* AFComputedVariable::getLastPageAuthors() uses indeterminate order for
multiple revisions with the same timestamp. Fall back to rev_id
ordering like MySQL accidentally did before.
* AbuseFilterTest tries to create revisions attributed to users that
don't exist. Switch to interwiki usernames.
Change-Id: I30f7cdcc3875f3f7af116c1e41e88f62ab9e91d0
These are updated in deferred updates and should not rely on the same
User instance being used in those updates. This also avoids convoluted
logic in User to set the new edit count for various cases.
Change-Id: I6d239a5ea286afb10d9e317b2ee1436de60f7e4f
This simplifies the test for user_age, although I'm not totally sure it
will be fixed. AFAICS, there's nothing wrong in there, but we'll see on
future phpunit executions.
Bug: T206501
Change-Id: Iee1a2a65d08c2cffc7a0d655be1eadb018d8bf37
Use a single function to check parameters amount, avoid duplication
between keywordIn and keywordContains, use if...elseif instead of
if-else when statements have a return inside, simplify some other logic,
add typehinting, and change method visibility according to use of such
methods.
Change-Id: I22225a5cbbb93679a0e78bf6e15866829167fbf4
Fixed some comments adding explanations, fixing syntax, and parameter types
for docblocks. Also fixed some whitespace mess, and added a missing use
statement.
Change-Id: I3547c90bdaa2cab5443e8bf0c63b217fe6ba663f
This problem have been making filters potentially fail silently since
2009. Also add tests for arrays to make sure that no problems arise
when short circuit is used.
Bug: T204841
Change-Id: Ie4e2e06498c1202ba73afcc5d164a72427abbca5
This test checks every deprecated variable to be identical to the
newly-named one, and to emit a debug notice. It also changes such debug
to be emitted via logger instead of wfDebug.
Bug: T201193
Bug: T173889
Change-Id: Ie55746bb7731062ae2d46d84857af2a05d78cf4c
This will make tokenizer almost fully covered. The only uncovered parts
are the one with cache and an else condition which I think won't ever be
executed, and thus added a comment for that. Also, remove an obsolete
xxx comment from ComputedVariable (fixed in
I8e420f0259ef6c9e579f7a00beb58f28af9da37d)
Bug: T201193
Change-Id: I6e9a73aa9e437f096f6a1e20d53a7cb50e5ed85d
This should help with tracking code coverage and also explains some
coverage discrepancies encountered while writing other tests.
Bug: T201193
Change-Id: I8b20abc46c2d6c6f582953139b9a9f3710b2e4ea
Check a bunch of them, they should be computed and be identical to the
ones with new syntax.
Bug: T173889
Depends-On: I5c370b54e6516889624088e27928ad3a1f48a821
Change-Id: I276913a98e06b5f2ff1c5f5f3ba5bcc7b1e8c997
Variables regarding title (full list in task description) are quite
deceiving, since they use "text" instead of "title". As proposed in the
task, this is the first patch to add aliases for those variables and
slightly deprecate the old ones. In the future we may be able to replace
every occurrence (either with a search function or directly on the
database), but even a coexistence would be enough to avoid
confusion. A wfDebug log is generated whenever a deprecated variable is
parsed. The "article_" prefix is also changed to "title_", in the same
way as above.
Also, added a hook which other extension may use to specify their
deprecated variables, which will be handled the same as core ones.
Bug: T173889
Change-Id: I5c370b54e6516889624088e27928ad3a1f48a821
Add some tests and improve others to raise coverage percentage. This
should lead to almost 100% for the AbuseFilterParser class. Aside from
this, a couple of changes:
* Remove an unused function
* Let equals_to_any return a genuine result with empty strings
* Remove an if which will never be true in skipOverBraces, since the
function is called after checking the same conditions.
Bug: T201193
Change-Id: I7020b2ed996236c38c5784d161ad98ec44163406
We're currently emitting the same error twice, but in one of those cases
it's completely wrong. Damned copy&pasting!
Bug: T202073
Change-Id: I7687826a85f3ef0abaf15d7cd973afc4e55758b2
Adding tests for generic functions in AbuseFilter class, ranging from
simple utility function to variable computation.
Bug: T42478
Change-Id: I903fb7ffbc436b27462e3e4611ab65ecb8a543ba
Adding the template for unit tests and some tests. These should cover
all the validation failure cases.
Bug: T42478
Depends-On: Ib7a0335fa7fb3b8a21765438a720205656c1ea09
Change-Id: I3fd0d627295d680ed33b1cbc730435df0446277f
The last one of what I think are the must-have tests. This patch
provides the basic tests and the framework, which may be further
expanded later on. Please note that the failures are due to an actual
problem in core, for which there is I7bb0e92b2906a2511fc4290bdc76fc39ec4617fe.
Bug: T42478
Change-Id: I28eb464c63fda7faa3ec7d1f6082f36154d66962
We're really missing exception tests: in fact, 'noparams' not being
thrown was discovered only a few days ago and worked like that for
years. This patch adds phpunit tests for both noparams and notenoughargs
exception, also checking the returned message.
Depends-On: I484fe2994292970276150d2e417801453339e540
Change-Id: Ia0b9b8fd5c979be06879723b746f9356c628f5cd
Follow-up of Iacb8f7a361079e3e117dc6845597c7bd8473e54a for exceptions
thrown outside the parser. With this patch all uses of AFPUserVisibleException
will be covered.
Depends-On: Iacb8f7a361079e3e117dc6845597c7bd8473e54a
Change-Id: Ia7ef6eb832d5725a804a60cb58bc110b06c8abe2
All uses of "throw" inside AbuseFilterParser are now covered.
Bonus: added a standard suppresswarning when checking regex validity.
Change-Id: Iacb8f7a361079e3e117dc6845597c7bd8473e54a
Arrays were introduced with the name "lists". While it **may** look
user-friendlier and so on, it actually uses a wrong name: lists are
different from arrays. I ran a grep and I should've replaced
every occurrence, plus everything seems to work, however a double check
wouldn't be bad.
Change-Id: I6a858f02f5dd9250ba7e1abf9c6422fd98758c9e
This is taken from I6a57a28f22600aafb2e529587ecce6083e9f7da4 and makes
all the needed changes to make phan pass. Seccheck will instead fail,
but since it's not clear how to fix it (and it is non-voting), for the
moment we may merge this and enable phan on IC.
Bug: T192325
Change-Id: I77648b6f8e146114fd43bb0f4dfccdb36b7ac1ac
I added on MW an example of comparison with empty array, which we should
keep inside the dedicated test as well.
Change-Id: Ifa4bca85c8978ef24ed5bb26787730bb4521261f
Introduce a new function which can be used to group multiple comparisons
in a single condition. In particular, equals_to_any(S, A, B) is the
equivalent of S === A || S === B. This is especially useful in checking
for multiple namespaces, as proposed in the Community health initiative.
Change-Id: I9dcfe303eb5e51e1882fe4a65fa876aa93db7686
I left as ToDo the checks between an array and something else. With this
patch, it'll work like PHP: the result will be true iff the comparison
is loose, the array is empty and the other operand is either false or
null.
Change-Id: Idc5cadb697ed4fc7f4856967274169f77495ed9f
This should fix every error with excluded rules, leaving only the one
for $wgTitle. A double check would be nice in order to avoid regressions
due to stupid mistakes.
Bug: T178007
Change-Id: I22c179f3a01d652640304b59e43fcb5b5a9abac3
Some of them are actually too simple, and may be unuseful in tricky
situations. This patch adds a lot of test cases to provide an (almost)
bombproof safety with future patches.
Depends-On: I0bb1ed0109af66997e238b532d342d82d4c4ae19
Change-Id: I274ef306775c36be20acb662353f6537ff3f1a33
So that type and value will be identical to PHP's ones.
Bug: T191688
Depends-On: I1140900cdda63eed292d9f20aefd721ef9247fcd
Change-Id: I398c9a972b7e9fcb27d055d23939be2b8bb68244
This feature was never implemented. I'm not sure whether we need a way to compare array and other types of variables (left as ToDo), since e.g. in PHP it's always false.
Bug: T179238
Change-Id: I5d2c33fd117e69cbc84c0b04b6cb82edbdcadf16
The following sniffs are failing and were disabled:
* MediaWiki.Commenting.LicenseComment.InvalidLicenseTag
The following sniffs now pass and were enabled:
* MediaWiki.Commenting.FunctionComment.MissingParamComment
Change-Id: I38c334ea6c6ff07dfcb64d551413a02dc8c5e51e
Added the contains_all function, with basically the same role as
contains_any but using logic AND instead of OR. Also added
ccnorm_contains_all, that is the same of ccnorm_contains_any but with
AND mode. Finally, fixed three wrong task IDs.
Co-authored with Valerio Bozzolan.
Bug: T21176
Change-Id: Ib0a8b783db6ce0d5db64771c8e0c70f0f8d13d36
Use the new equivset library instead of AntiSpoof.
Bug: T175413
Change-Id: I439387deeba99543e194c210953ac73ff98bc5b7
Depends-On: I977d3498b2084a426e2ab4d85c000d1b9dcfe824
This filter is fully functional. The old filter is still enabled by
default for a transitional period in case the new one suddenly has
issues.
Change-Id: I4aea5f00c62420108030e60e79d5bf34e913e95d
I am pretty sure all of the behavior documented in these tests is a bad
idea. It is possible that we can fix it since some of those features
are probably unused, but for now those tests will serve as a
documentation of the current behavior.
Change-Id: Ia2a2f57a538d7aef2ac73fb2e47fe82dd5d5e09a
substr_count() is just as fast as looped strpos() when there are no
matches, and gets faster as the number of matches increases.
Note that this introduces a small change in behavior when the needle
is composed of repeated substrings, e.g. 'asdasdasd' or 'aa', and
haystack is such that the needle can be matched in overlapping
positions, e.g. 'asdasdasdasd' or 'aaaaa'. The old implementation
counted overlapping matches, the new one doesn't. I don't think this
behavior was intentional and I don't think this change will cause any
real problems.
Change-Id: Icc905ca34bf08d63e969787a5e3c119d498bf878
Previously, 'false & a == b' would actually execute the comparison and
count it against the condition limit, while 'false & (a == b)' wouldn't.
They behave the same now.
mShortCircuit was only checked for the most potentially expensive
operations (computing functions and getting variables), all the other
operations on bogus values generated by this would be executed and the
results ignored later.
This probably doesn't noticeably improve performance, but it corrects
how the condition limit is counted.
Bug: T43693
Change-Id: Id1d5f577b14b6ae6d987ded12689788eb7922474
With the new behavior, the number of conditions in incremented when:
* Evaluating a function
* Evaluating a comparison operator (== === != !== < > <= >= =)
* Evaluating a keyword (in like matches contains rlike irlike regex)
Previously, the number of conditions was incremented when:
* Evaluating a function
* Entering the comparison operator evaluation mode
This resulted in a number of surprising behaviors. In particular:
* '(((a == b)))' counted as 4 conditions, not 1
* 'contains_any(a, b, c)' counted as 5 conditions, not 1
* 'a == b == c' counted as 1 condition, not 2
* 'a in b + c in d + e in f' counted as 1 condition, not 3
* 'true' counted as 1 condition, not 0
It is still possible to easily cheat the count by rewriting comparisons
as arithmetic operations. I believe this is meant to advise users of
the complexity of their rules and not really enforce strict limits.
Bug: T132190
Change-Id: I897769db4c2ceac802e3ae5d6fa8e9c9926ef246
* Move AbuseFilterParser::nextToken() and the various AbuseFilterParser
properties that accompanied it to a new class, AbuseFilterTokenizer.
* Tokenize rules eagerly and cache the result in APC.
Change-Id: I15f5b5b65e8c4ec4fba3000d7c9fd78b98967d1d
No functional changes.
* Don't include $code as part of the return value; it is ignored anyway.
* Removed AbuseFilterParser::lastHandledToken and AFPParserState::lastInput,
because AbuseFilterParser::nextToken() no longer calls itself recursively.
* The regular expression that matches operators is no longer constructed
dynamically, but hard-coded into the class. To make sure it does not drift
apart from the more legible AbuseFilterParser::$mOps, add a unit test that
constructs the regex dynamically as before and compares it to
AbuseFilterParser::OPERATOR_RE.
* AbuseFilterParser::RADIX_RE ditto.
Change-Id: I9c23b60759ed2f4c73a9b480243b16bbce5a208f
If we don't map '\-' and '\+' to themselves, the leading slash gets escaped,
and the resultant pattern only matches a literal slash.
Bug: 67670
Change-Id: Ifa1e3edd6f41985a3bb97bfb1497985f8fa64af5
I've also added myself to the credits file as I'm the only
maintainer of this extension for a while now.
Change-Id: Id998172ea2abd70b8243de9db1a96cc2cfa47a64
The abusefilter array test failed because length( ['a', 'b', 'c'] )
returned 12 instead of 6. That was du to it converted the array
to a string with new line seperated values first before measuring
the string length. Changed that behaviour to act like the php count()
function or the python len() function which seems far more useful to me.
The old behaviour can be established using length( string( array ) ).
Change-Id: I16646891837c9743ca5af2dd328077a7225bb5f1
ATTENTION! This may break filters that rely on "added_lines contains 'bla-bla'" syntax. They'll need to be replaced with "string(added_lines) contains 'bla-bla'"
* Introduce := operator for setting variables
* Throw an exception when user tries to override built-in variable
* Fix UTF-8 handling in fnmatch() fallback
* Copy three main abuse filters from enwiki to test suite
* Fix update.php integration
* Use strcspn to scan ahead for long regions of uninteresting text in string handling (performance).
* Remove cruft specific to my system in phpTest.php.
* Remove a test that was in incorrect syntax, and useless without adding variable support.
I've made it more performant and fixed a few bugs by using regexes
instead of PHP loops, where possible, under the assumption that the
PCRE parser is more efficient than the same thing implemented in pure PHP.
Also, I'm now passing the same string around and calculating offsets, which
Tim tells me is far more performant than continually truncating the same string.
All tests still pass, with the exception of string.t, which I've modified
to remove the offending code, which never worked.