Commit graph

92 commits

Author SHA1 Message Date
jenkins-bot f1ab591d27 Merge "Avoid implicit casts from DUNDEFINED to something else" 2019-08-20 13:04:48 +00:00
Daimona Eaytoy d58b5930f8 Add the filter ID to empty operand logging
To make debugging a lot easier.

Bug: T156096
Bug: T153251
Change-Id: I1f905c6e1a524a745240b05709ef9d1dfc3c23a1
2019-08-13 15:22:55 +00:00
Daimona Eaytoy 1197eb6b41 Make parser aware of the filter it is parsing
This information will mostly be used for debugging purposes.

Change-Id: Ia1bcc2acc22aba97d855382b5b173ac3d5f2c54b
2019-08-13 15:22:38 +00:00
Daimona Eaytoy 1fe3647268 Avoid implicit casts from DUNDEFINED to something else
This patch keeps the current behaviour for everything (since DUNDEFINED
was always casted to boolean false), but handles the cast at a higher
level instead of relying on what AFPData::castTypes will do. This way
it's easier to spot places where we may get DUNDEFINED, and decide how
to handle them one by one.

Change-Id: I1070e15ea03c7dd4a4231b87afbc42240a558581
2019-08-12 11:18:15 +02:00
Daimona Eaytoy 69ad23da98 Ban variable variables
As explained on phab, it's not worth the effort of keeping this feature.

Bug: T229947
Change-Id: Ic6067cab8e1ede98545e704888c99e2ed9a004e4
2019-08-11 01:47:35 +00:00
Daimona Eaytoy 2ed6272bb2 Partly handle set and set_var in shortcircuit
This is more complicated than the := operator, because the var name
could be a complicated expression, and we have to handle a function
call. This patch only covers the case where the variable name is a
literal, which is enough for WMF production.

Bug: T214674
Change-Id: I6c0f8e95663919a0235b5ccf0c88ad0a539315a7
2019-08-06 16:14:34 +02:00
Daimona Eaytoy afeff7c222 Really avoid DEMPTY leak
Follow-up I7831f3ed9f7c0656e0e8f77ded049c20eca682ba, really avoid the leak. My addition was pointless because we need DUNDEFINED, not DEMPTY, and I spent way too much time trying to understand what was still wrong.
Still have to get used to these new names...

Change-Id: I332967f6fb00b67fd355547b19638c95ffa5bba7
2019-08-04 22:02:13 +00:00
Daimona Eaytoy f977e858ab Avoid DEMPTY leak
As shown in the coverage reports [0], some empty operand logging lines are covered, but no test should have empty operands. I see one of the cause is skipOverBraces keeping $result as is, even if DEMPTY, so turn it into a DUNDEFINED.

[0] - https://doc.wikimedia.org/cover-extensions/AbuseFilter/includes/parser/AbuseFilterParser.php.html

Change-Id: I7831f3ed9f7c0656e0e8f77ded049c20eca682ba
2019-08-04 18:25:04 +00:00
jenkins-bot e733872e13 Merge "Allow accessing offsets of built-in variables" 2019-08-04 17:48:46 +00:00
Daimona Eaytoy 517919fca8 Allow accessing offsets of built-in variables
I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7 almost fixed this bug, but we
also have to make it possible to access builtin variables as arrays.
This will only make sense for a few variables (e.g. added_lines and
removed_lines), but I don't think we should validate it when checking
syntax.

Bug: T198531
Change-Id: I417e1b8d4802bbfccd091ce5c7617659cfd1e4ea
2019-08-04 17:14:44 +00:00
Daimona Eaytoy 5f4491f9aa Further deprecation for empty conditions
Start deprecating "empty" logic operators, and now that we have DEMPTY, simplify handling of empty function arguments introduced in Ica3e49f5b00595a95513d9683732e490aa7aae17.

Bug: T156096
Change-Id: Ied6b385e8690b6cc6e69afcf614389f737ab95bd
2019-08-04 15:33:49 +00:00
Daimona Eaytoy 9049be3609 Specialize empty AFPData types
As described in T156096#5389655.

Change-Id: Ifbf95a6b72a280cd77db6affbd8d642499bbfedc
2019-08-04 15:26:57 +00:00
Daimona Eaytoy a85e1ccc59 Make AbuseFilterParser::$funcCache non-static
Change-Id: I312efe3ce4d1f06e697aa4564aeec1bacbaf97d3
2019-08-03 09:19:49 +00:00
Daimona Eaytoy 09d0254172 Better handling of DNONE
This patch includes:
 * Making it possible to access offsets of a DNONE (returning a DNONE)
 * Initializing user-defined variables as DNONE inside short-circuited branches
 * Make DNONE propagate with other operators
 * Make DNONE count as false for logic operators
 * Remove a now-outaded bit in doLevelAtom. In case of shortcircuit,
   $result is now DNONE instead of DNULL, and thus it's possible to
   access offsets of it. Performance++!
 * Don't allow modifying or adding an element of a DNONE as if it were an
    array (to avoid inconsistencies)

This re-applies Id85c673337fa90a3782fd22eb9690cd996967111 with several fixes.

NOTE: Haven't tested locally, although I'm pretty confident thanks to
the amount of tests added.

Bug: T214674
Bug: T228677
Change-Id: I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7
2019-08-02 21:05:08 +00:00
jenkins-bot e3e157361d Merge "Revert "Initialize user-defined variables during shortcircuit"" 2019-07-29 23:30:50 +00:00
Daimona Eaytoy 13cdb86dd2 Revert "Initialize user-defined variables during shortcircuit"
Reason for revert: T214674#5374806

This reverts commit 56e6117afd.

Bug: T214674
Change-Id: Iccce248d2693cd9877a740b74e72a577e730435e
2019-07-29 23:06:23 +00:00
Daimona Eaytoy 4720c97530 Add a new class for methods related to running filters
Currently we strongly abuse (pardon the pun) the AbuseFilter class: its
purpose should be to hold static functions intended as generic utility
functions (e.g. to format messages, determine whether a filter is global
etc.), but we actually use it for all methods related to running filters.
This patch creates a new class, AbuseFilterRunner, containing all such
methods, which have been made non-static. This leads to several
improvements (also for related methods and the parser), and opens the
way to further improve the code.
Aside from making the code prettier, less global and easier to test,
this patch could also produce a performance improvement, although I
don't have tools to measure that.
Also note that many public methods have been removed, and almost any of
them has been made protected; a couple of them (the ones used from outside)
are left for back-compat, and will be removed in the future.

Change-Id: I2eab2e50356eeb5224446ee2d0df9c787ae95b80
2019-07-23 19:06:27 +00:00
Daimona Eaytoy 56e6117afd Initialize user-defined variables during shortcircuit
Bug: T214674
Depends-On: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05
Change-Id: Id85c673337fa90a3782fd22eb9690cd996967111
2019-07-23 12:20:53 +00:00
Daimona Eaytoy 18d7d2ed62 Start using AFPData::DNONE
This should allow more flexibility when checking syntax, and a saner
behaviour overall.
Aside from not throwing exception in certain cases, the results should
be almost equal to the ones you would get without this patch. However,
there are still a few things to improve (which for convenience I wrote
inside the parser test) and many to test.

Bug: T204654
Depends-On: I69bfec45c76509fb1112641393f78e8d8834adcd
Change-Id: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05
2019-07-14 08:48:47 +00:00
jenkins-bot c3dcd95733 Merge "Start making APFData members private" 2019-07-09 09:23:17 +00:00
jenkins-bot 35ab35978b Merge "Add a new data type for non-initialized stuff" 2019-07-09 08:58:48 +00:00
Daimona Eaytoy 3aaeb20063 Start making APFData members private
$data and $type are meant to be read-only and should have getter
functions, but as usual they're just public. Add getter methods, a
comment with a @private annotation and remove usages in our codebase.

Change-Id: I5e51efc9f982a4e340b48d20cb1b38a75bb10021
2019-07-09 10:57:00 +02:00
Daimona Eaytoy 304b58d46a Make AbuseFilterVariableHolder::mVars private
This property is meant to be private, since it has all kinds of
getters/setters, aside from one which is introduced in this patch.

Change-Id: I217b1e22cabd3c0468c84b1d6a69a6ed3c6fa8e6
2019-07-08 16:25:10 +02:00
Daimona Eaytoy bc79962803 Add a new data type for non-initialized stuff
Split from I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05.
This adds a new data type to use for empty AFPDatas. Using NULL for that
makes it impossible to distinguish cases where we really got a null
value, and cases where there was nothing to parse.
For now, DNONE is the same as DNULL, but I've explicited DNULL where
necessary. A subsequent patch will make proper use of DNONE.

Bug: T156096
Change-Id: I69bfec45c76509fb1112641393f78e8d8834adcd
2019-07-08 15:35:02 +02:00
Daimona Eaytoy 7398730563 Disallow consecutive comparisons
As explained on phabricator, they don't work with shortcircuit, so they
already fail for all filters using them. Plus IMHO it's an unnecessary
deviation from PHP's behaviour, given that this syntax doesn't do what
users may expect.

Bug: T218906
Change-Id: If9e7545e14044c8dc3b4163bb6fca8ab0683b9fa
2019-07-04 19:15:07 +02:00
Daimona Eaytoy 382751a707 Move conditions-related stuff inside AbuseFilterParser
Instead of relying on static methods and members in the AbuseFilter
class, move everything related to conditions inside the Parser, as the
amount of used conditions is something pertaining a single
AbuseFilter(Caching)Parser instance.
This change requires changing some signatures and adding parameters,
but will make introducing the new AbuseFilterRunner class easier (and
that will clean signatures, too).

Depends-On: I5b29ff556eca45fe59d15e2e3df4d06f1f6b3934
Change-Id: I7c1ea17adf7f42cf9260d416906bfbf3b8a20688
2019-06-19 15:14:17 +00:00
Daimona Eaytoy a8e8611509 Remove log_ids meta-variables
This is the second part of removing meta-variables. To achieve this, a
static property is added and another one removed.

Depends-On: I7f60df24dc8e706af289ebbbde7536c0baf8d5c3
Change-Id: I5b29ff556eca45fe59d15e2e3df4d06f1f6b3934
2019-06-19 14:44:56 +00:00
Daimona Eaytoy 39fc7c12af Restore unit tests for CachingParser and fix it
Added cachingParser back to *all* the parser tests, fixed a couple of
differences with the normal parser, and added a couple of tests so that
any cachingParser-related file has 100% coverage. Also move the remaining
get_matches tests inside parserTests, and specify the parser used in case of failure.
This also adds a new base class for parser-related tests with a couple
of util methods.

Bug: T201193
Change-Id: I980aec3481a52ecc35f1811a366014a5581a7cdb
2019-05-25 10:55:24 +02:00
jenkins-bot 058e215882 Merge "Refactor tokenizer caching" 2019-05-24 19:09:03 +00:00
Daimona Eaytoy 33fca4e096 Ignore trailing commas in function calls
First step before removing this weird syntax. I'd love to add a unit
test for params count, but I couldn't find a way, since doLevelFunction
is protected, relies on class members, and the args count is local.

Bug: T153251
Change-Id: Ica3e49f5b00595a95513d9683732e490aa7aae17
2019-05-21 23:13:31 +00:00
Daimona Eaytoy 2276d8ed2a Refactor tokenizer caching
Split a method, use WAN cache so that we're enabled to use
getWithSetCallback, pass the "version" option there and adapt the test
to it.
Follow-up of I9b3bc36b552901bc6ca7609ee51e80be2979a9c4

Change-Id: I4dd81a723e2bdb828b90594ad66a3918d8ec5b6c
2019-04-23 19:38:10 +02:00
Daimona Eaytoy 4bcb64b01a Increase code coverage a bit
This patch mostly adds coverageIgnore comments for intendedly
unreachable code etc. Some of them could be made testable by adding a new
filter function (e.g. array cast), but this patch is meant to be
comment-only (aside from the parser test).
Ignoring coverage for these lines makes some methods reach 100%
coverage, which in turn makes it easier to look at the coverage chart
and identify at a glance which parts of the code *really* need to be
covered.

Bug: T201193
Change-Id: Ic30883f7d261d974a2be46308d023e2714119e95
2019-04-13 18:30:14 +02:00
jenkins-bot efe32b7c93 Merge "Add doc for every class member" 2019-04-06 14:37:19 +00:00
Daimona Eaytoy 27545422b4 Add a function to the parser to retrieve the next token
It provides some sort of look ahead capability, avoiding to move and
then roll back.

Change-Id: I6293cbd355572c9de3a8591dd8286b14a239ffb2
2019-03-23 11:55:08 +01:00
Daimona Eaytoy f2c1beec44 Replace double-equals with triple-equals
Since double-equals are evil. I left some of them in place where I
wasn't sure, but I may be changed some which were intended to be
doubles. It could be a good idea to delay merging this patch until we'll
have more code coverage.

Change-Id: I1721a3ba532d481e3ecf35f51099c1438b6b73b2
2019-03-22 16:12:13 +01:00
Daimona Eaytoy bedbe36744 Add doc for every class member
Adding PHPdocs to every class members, in every file. This patch only
touches comments, and moved properties on their own lines. Note that
some of these properties would need to be moved, somehow changed, or
just removed (either because they're old, unused leftovers, or just
because we can move them to local scope), but I wanted to keep this
patch doc-only.

Change-Id: I9fe701445bea8f09d82783789ff1ec537ac6704b
2019-03-17 11:40:24 +01:00
jenkins-bot 3f3e98fbc5 Merge "Fix shortcircuit for consecutive operations" 2019-03-17 10:04:14 +00:00
jenkins-bot e2f1880922 Merge "Don't use wgLang and wgContLang" 2019-03-17 09:53:16 +00:00
Daimona Eaytoy 53ab2b5067 Fix documentation errors reported by Phan
Change-Id: I5788147ba1998235ded9eedbf64ebad37fce236f
2019-03-16 09:27:05 +00:00
Daimona Eaytoy 6f4bfc9597 Fix shortcircuit for consecutive operations
Using break could halt parsing between operations, instead use continue
to parse all operations.

Bug: T214642
Change-Id: If67ddaffef280c2448c55ae536013758617bba68
2019-02-08 17:55:59 +00:00
Daimona Eaytoy 51120e51c5 Don't use wgLang and wgContLang
For wgLang, there's a Language object available in the proximity, so just pass it.
For wgContLang, use MediaWikiServices.

Change-Id: Ic492007f2d5eeb8048d0919a4b9b7dd98c15c350
2019-02-06 12:00:44 +01:00
libraryupgrader b744d18526 build: Updating mediawiki/mediawiki-codesniffer to 24.0.0
The following sniffs are failing and were disabled:
* MediaWiki.Usage.DeprecatedGlobalVariables.Deprecated$wgContLang

Change-Id: Ic167fc5e836c5437edc6b330e5d73f9913bc2859
2019-02-06 09:28:26 +00:00
jenkins-bot b44984c50a Merge "Remove unused stuff" 2019-01-19 12:18:22 +00:00
jenkins-bot a2bee3bcf3 Merge "Simplify parser methods" 2019-01-19 12:11:04 +00:00
Daimona Eaytoy 6217ffb928 Remove unused stuff
Variables declared but never used, redundant code, and old leftovers.

Change-Id: Ic51044a45a1b49ad6c7af06c646b11893411a7cd
2019-01-18 17:04:19 +01:00
Daimona Eaytoy 93e8cb5ac5 Tune logging channel
As follow-up of I10b1fd2d9bdfe518089c053d77fef568170ecb65, use
'AbuseFilter' instead of 'AbuseFilterDeprecatedVars' as channel name.
Raise level for null-title filtering. Since with a null title
several things are likely to break, a warning is more appropriate here.
Tweaked the message as well, to include the bug number and to avoid
pointlessly including the title (which is null).
Lower the level for stashedit hit/miss (as it's really spammy and not
that useful right now).
Use 'abusefilter' instead of 'AbuseFilter' for statsd so that everything
has the same prefix.
Also raise the level for parser exceptions and unrecognized
consequences.

Change-Id: I1f9988155e924232b201281795cd322636da8082
2019-01-16 08:56:22 +00:00
Daimona Eaytoy 16475c0266 Fix regex group counting for get_matches
Adding the * as character to match after parentheses, since it may be
used with backtrack verbs (e.g. (*FAIL), (*SKIP)). I guess this is a
very, very rare use case, but since the fix is easy, let's include it.
Also, added a ToDo since we should probably find a better way to count
capturing groups, although I cannot figure out any.

Change-Id: Idcb303b4740530af9d3f009414d35d68f59effd0
2018-11-01 11:52:33 +01:00
jenkins-bot 7e151f5edc Merge "Unbreak short circuit for arrays" 2018-10-18 04:04:31 +00:00
Daimona Eaytoy eafb4f56c7 Avoid useless error message for regexfailure exception
Users writing filters probably don't care about preg_match or whatever
happens in PHP. Also, it's not that useful to see "unspecified error".

Change-Id: I014742fa6f678126f55ac5ccff38e44b2c5a7d15
2018-10-08 19:19:01 +02:00
Daimona Eaytoy 6d54b83f2c Simplify parser methods
Use a single function to check parameters amount, avoid duplication
between keywordIn and keywordContains, use if...elseif instead of
if-else when statements have a return inside, simplify some other logic,
add typehinting, and change method visibility according to use of such
methods.

Change-Id: I22225a5cbbb93679a0e78bf6e15866829167fbf4
2018-10-03 17:19:40 +02:00