Commit graph

231 commits

Author SHA1 Message Date
Daimona Eaytoy a0de056299 Add contains_all and ccnorm_contains_all functions
Added the contains_all function, with basically the same role as
contains_any but using logic AND instead of OR. Also added
ccnorm_contains_all, that is the same of ccnorm_contains_any but with
AND mode. Finally, fixed three wrong task IDs.
Co-authored with Valerio Bozzolan.

Bug: T21176
Change-Id: Ib0a8b783db6ce0d5db64771c8e0c70f0f8d13d36
2018-02-09 17:33:24 +01:00
Kunal Mehta 5238c8e8b5 Improve @covers tags
Change-Id: I3df3698b5d3f3eae95db8c740c611f365ff9cb31
2018-01-23 14:08:52 -08:00
Daimona Eaytoy 4e20c933f4 Add get_matches function
Added the get_matches function to store a regex match.

Bug: T179957
Change-Id: I19366ebcaa4d0f007dd675a61c91457dde57f604
2017-11-13 17:32:45 +01:00
David Barratt 5335b6c811 Use Equivset library intead of AntiSpoof
Use the new equivset library instead of AntiSpoof.

Bug: T175413
Change-Id: I439387deeba99543e194c210953ac73ff98bc5b7
Depends-On: I977d3498b2084a426e2ab4d85c000d1b9dcfe824
2017-10-21 21:55:18 -07:00
Dayllan Maza 2bc8873c30 Add ccnorm_contains_any function
Normalize and search a string for multiple substrings

Bug: T65242
Change-Id: I4034c0054a6849babbf2d96ea13dc97d3660d5b4
2017-10-06 11:32:45 -04:00
Umherirrender 1a58507870 build: Updating mediawiki/mediawiki-codesniffer to 0.10.0
Change-Id: I5f37c45d748d5f0da21aceaef32cc89367e312ff
2017-07-08 20:49:30 +02:00
Umherirrender a063e33ee8 Use short array syntax
Done by phpcbf over composer fix

Change-Id: I53fd1fc8d056b9b60194d2d630852cfca37aadea
2017-06-15 17:02:57 +02:00
Victor Vasiliev 46faa02c49 Fix the associativity of boolean logic operators
Change-Id: Icaf0fde0d74064532af4b110faef4014f8303f80
2016-11-06 20:30:07 -05:00
Victor Vasiliev aa399da279 Implement a tree-caching abuse filter parser
This filter is fully functional.  The old filter is still enabled by
default for a transitional period in case the new one suddenly has
issues.

Change-Id: I4aea5f00c62420108030e60e79d5bf34e913e95d
2016-09-24 02:53:26 +00:00
Victor Vasiliev 5da98b67bf Add test coverage for more bizzare features of the filter parser
I am pretty sure all of the behavior documented in these tests is a bad
idea.  It is possible that we can fix it since some of those features
are probably unused, but for now those tests will serve as a
documentation of the current behavior.

Change-Id: Ia2a2f57a538d7aef2ac73fb2e47fe82dd5d5e09a
2016-08-21 18:45:22 -04:00
Kaldari acd28cb00f Update tests for AntiSpoof fixes
Bug: T29987
Depends-On: Iccb3e50073bbbc2b979cb62dd0e129afd1c2e55f
Change-Id: I8bef839b9b9ca5fced94ce6428e769133ede868f
2016-08-13 20:37:43 +00:00
Bartosz Dziewoński 5fc30112c7 Optimize 'count()' function
substr_count() is just as fast as looped strpos() when there are no
matches, and gets faster as the number of matches increases.

Note that this introduces a small change in behavior when the needle
is composed of repeated substrings, e.g. 'asdasdasd' or 'aa', and
haystack is such that the needle can be matched in overlapping
positions, e.g. 'asdasdasdasd' or 'aaaaa'. The old implementation
counted overlapping matches, the new one doesn't. I don't think this
behavior was intentional and I don't think this change will cause any
real problems.

Change-Id: Icc905ca34bf08d63e969787a5e3c119d498bf878
2016-04-17 08:32:27 +02:00
Bartosz Dziewoński 7d83540527 Add some tests for behavior of 'count()' function
Change-Id: I29a6c91d0780dc9a1eaee6d29d3b1f9c9c708df7
2016-04-17 08:18:29 +02:00
Bartosz Dziewoński e79b45b71f Improve ignoring short-circuited operations
Previously, 'false & a == b' would actually execute the comparison and
count it against the condition limit, while 'false & (a == b)' wouldn't.
They behave the same now.

mShortCircuit was only checked for the most potentially expensive
operations (computing functions and getting variables), all the other
operations on bogus values generated by this would be executed and the
results ignored later.

This probably doesn't noticeably improve performance, but it corrects
how the condition limit is counted.

Bug: T43693
Change-Id: Id1d5f577b14b6ae6d987ded12689788eb7922474
2016-04-09 16:25:52 +02:00
Bartosz Dziewoński 3b32cf00e9 Improve how the number of conditions is counted
With the new behavior, the number of conditions in incremented when:
* Evaluating a function
* Evaluating a comparison operator (== === != !== < > <= >= =)
* Evaluating a keyword (in like matches contains rlike irlike regex)

Previously, the number of conditions was incremented when:
* Evaluating a function
* Entering the comparison operator evaluation mode

This resulted in a number of surprising behaviors. In particular:
* '(((a == b)))' counted as 4 conditions, not 1
* 'contains_any(a, b, c)' counted as 5 conditions, not 1
* 'a == b == c' counted as 1 condition, not 2
* 'a in b + c in d + e in f' counted as 1 condition, not 3
* 'true' counted as 1 condition, not 0

It is still possible to easily cheat the count by rewriting comparisons
as arithmetic operations. I believe this is meant to advise users of
the complexity of their rules and not really enforce strict limits.

Bug: T132190
Change-Id: I897769db4c2ceac802e3ae5d6fa8e9c9926ef246
2016-04-09 16:16:27 +02:00
Ori Livneh bab9832415 Move rule tokenization to new AbuseFilterTokenizer class
* Move AbuseFilterParser::nextToken() and the various AbuseFilterParser
  properties that accompanied it to a new class, AbuseFilterTokenizer.
* Tokenize rules eagerly and cache the result in APC.

Change-Id: I15f5b5b65e8c4ec4fba3000d7c9fd78b98967d1d
2015-08-25 14:00:10 -07:00
Ori Livneh b388dfab1b Clean-up of AbuseFilterParser::nextToken()
No functional changes.

* Don't include $code as part of the return value; it is ignored anyway.
* Removed AbuseFilterParser::lastHandledToken and AFPParserState::lastInput,
  because AbuseFilterParser::nextToken() no longer calls itself recursively.
* The regular expression that matches operators is no longer constructed
  dynamically, but hard-coded into the class. To make sure it does not drift
  apart from the more legible AbuseFilterParser::$mOps, add a unit test that
  constructs the regex dynamically as before and compares it to
  AbuseFilterParser::OPERATOR_RE.
* AbuseFilterParser::RADIX_RE ditto.

Change-Id: I9c23b60759ed2f4c73a9b480243b16bbce5a208f
2015-08-25 10:50:31 -07:00
Ori Livneh 0e36b728e3 Fix double escaping in AFPData::keywordLike()
If we don't map '\-' and '\+' to themselves, the leading slash gets escaped,
and the resultant pattern only matches a literal slash.

Bug: 67670
Change-Id: Ifa1e3edd6f41985a3bb97bfb1497985f8fa64af5
2014-07-11 14:56:42 -07:00
Marius Hoch 35747761fb Allow running the AbuseFilter parser tests via phpunit
I've also added myself to the credits file as I'm the only
maintainer of this extension for a while now.

Change-Id: Id998172ea2abd70b8243de9db1a96cc2cfa47a64
2013-07-08 19:22:43 +02:00
jenkins-bot 3c83358506 Merge "Add parser tests for bug 25373" 2013-05-01 21:25:11 +00:00
Kunal Mehta 4bec58cd54 Add a "ucase" function to convert the provided string to uppercase.
I basically took the lcase code and tweaked it to work for uppercase.

Bug: 47321
Change-Id: I230dbd99c27bf3a4a042befd6d334b4c0439bde0
2013-04-17 11:48:15 -05:00
Marius Hoch 3010d78950 Add parser tests for bug 25373
Change-Id: I2f2524731098f323e61bbc0442e7b56b11cdea37
2013-03-23 21:49:57 +01:00
Marius Hoch 03da29b9da Fix the abusefilter array parser test
The abusefilter array test failed because length( ['a', 'b', 'c'] )
returned 12 instead of 6. That was du to it converted the array
to a string with new line seperated values first before measuring
the string length. Changed that behaviour to act like the php count()
function or the python len() function which seems far more useful to me.
The old behaviour can be established using length( string( array ) ).

Change-Id: I16646891837c9743ca5af2dd328077a7225bb5f1
2012-12-20 02:19:55 +01:00
Alexandre Emsenhuber 56e6f0a262 svn:eol-style native 2009-04-09 20:45:31 +00:00
Victor Vasiliev 27fb1303a8 * Use lists instead of implode()d strings in built-in variables wherever it's possible
ATTENTION! This may break filters that rely on "added_lines contains 'bla-bla'" syntax. They'll need to be replaced with "string(added_lines) contains 'bla-bla'"
2009-04-05 19:07:47 +00:00
Victor Vasiliev 128ae5983b Introduce list (non-associated array) support into abuse filter parser. 2009-04-05 17:11:17 +00:00
Victor Vasiliev 258d340fb5 Abuse filter:
* Introduce := operator for setting variables
* Throw an exception when user tries to override built-in variable
* Fix UTF-8 handling in fnmatch() fallback
* Copy three main abuse filters from enwiki to test suite
* Fix update.php integration
2009-04-05 11:47:42 +00:00
Andrew Garrett 86e4081206 Abuse Filter Parser:
* Efficiency -- use /A instead of PREG_OFFSET_CAPTURE and comparing offsets.
* Expand error messages to enhance debugging.
* General code quality
2009-03-25 11:36:38 +00:00
Andrew Garrett 0880f444b1 Abuse Filter Parser updates:
* Use strcspn to scan ahead for long regions of uninteresting text in string handling (performance).
* Remove cruft specific to my system in phpTest.php.
* Remove a test that was in incorrect syntax, and useless without adding variable support.
2009-02-11 18:23:21 +00:00
Andrew Garrett bfe57be65d Rewrite of Abuse Filter parser tokeniser.
I've made it more performant and fixed a few bugs by using regexes
instead of PHP loops, where possible, under the assumption that the
PCRE parser is more efficient than the same thing implemented in pure PHP.
Also, I'm now passing the same string around and calculating offsets, which
Tim tells me is far more performant than continually truncating the same string.

All tests still pass, with the exception of string.t, which I've modified
to remove the offending code, which never worked.
2009-02-11 01:41:51 +00:00
Andrew Garrett 53179c675f Apply changes from change-tagging branch. I will remove all of the stuff actually related to change tagging in a moment, to avoid trunk changes on Wikimedia sites. 2009-01-23 19:23:19 +00:00