Commit graph

220 commits

Author SHA1 Message Date
jenkins-bot 7a684c487c Merge "Move some misplaced AbuseFilterParser entry points" 2020-09-29 13:51:17 +00:00
Daimona Eaytoy 55ba083b13 Introduce a KeywordsManager service
This will decouple a bit the huge and chaotic tangle of AF classes. Some
boilerplate code for AbuseFilter services is also added with this patch.

Note that this requires injecting a KeywordsManager in
AbuseFilterVariableHolder, or unit tests would fail. This is still
incomplete, and the Manager is only injected in tests, because
VariableHolder still has to be refactored.

The test for the UpdateVarDumps script had to be updated, because
serializing VHs in there was a bad choice. As pointed out in a comment,
the test is likely going to break again once we remove the BC code, but
I hope that we'll be able to remove the test at that point.

Change-Id: I12a656a310adb8c5f75cab63f6db9e121e109717
2020-09-28 23:03:52 +00:00
Daimona Eaytoy a1626a0d7f Move some misplaced AbuseFilterParser entry points
These methods had no reals reason to be static and belong to the
AbuseFilter class. Most of them were moved to Parser class as common
variations of the existing entry points. One was specific to the
EvalExpression API module and was moved there.

This change comes at no cost, and will make it possible to inject a
parser where needed.

Change-Id: Ifd169cfc99df8a5eb4ca94ac330f301ca28a2442
2020-09-29 00:36:08 +02:00
Daimona Eaytoy e5746bbb0e parser: Add a BC option to get DNULL for unset variables
While checking a filter, if a variable is not set (e.g. added_lines for
an account creation), the VariableHolder will return a DNULL, rather
than a DUNDEFINED. This means that some filters will resume working, and
the WMF servers will stop getting AF warnings at a rate of 4 millions per
day. This also requires adjusting some tests to reflect the new
behaviour (which is actually the OLD behaviour, that filters had until
last year when we introduced the DUNDEFINED data type). It also requires
adjusting a check in the old parser, but that's not really relevant
because the plan is to remove the old parser before 1.36 is released
(see I0e75f334c7e0dfc1239f2e5f5f7d7452b0bbf29e).

Bug: T230256
Change-Id: I4d06303047397674c1edbfc32628f1bc83ac3340
2020-09-18 15:05:58 +02:00
proc a31f4e46af
Strict type comparison
Bug: T248806
Change-Id: I039ab7f103bb37052987b815412b71f70643a6d2
2020-06-27 15:55:57 +01:00
jenkins-bot b118fd50dc Merge "Improve var dumping in /details, /examine and /tools" 2020-04-29 20:00:54 +00:00
Daimona Eaytoy 1d6b9f6617 Add new methods for checking DUNDEFINED recursively, use them
The problem is explained at T250570#6068702; basically, the previous
check didn't account for DUNDEFINED nested deep inside arrays.

Bug: T250570
Change-Id: Iacee2db54ca00108de6339bb3dae70af7e2eeb56
2020-04-19 13:58:14 +02:00
Daimona Eaytoy 4c98aecf4d Improve var dumping in /details, /examine and /tools
Using var_export for better visual effect, especially for arrays.
The result from /tools is much clearer and the 'wrong syntax' message is
a bit more explicative than before.

Bug: T190653
Bug: T239972
Change-Id: I79a17305c7f19f7900f896f895e9365bb5f2fd58
2020-03-28 17:35:43 +01:00
Daimona Eaytoy b9a1e86245 Remove old number syntax
Bug: T212730
Change-Id: I7573da1683efc83b5002b8948c97dd7f6658a488
2020-02-25 23:38:19 +00:00
Daimona Eaytoy 472d1221bd tests: Increase and rebalance code coverage
Also fix a couple of broken tests in Consequences:
 - For createaccount, $user->addToDatabase must be called before
   testForAccountCreation, or it will throw a CannotCreateActorException.
 - In testThrottleLimit, also set wgAbuseFilterEmergencyDisableThreshold
   to avoid relying on the local config.

Bug: T201193
Change-Id: If1a50b0a729e4d554485f2e2225d5877510966b6
2020-02-07 18:32:17 +00:00
libraryupgrader a14ec744f7 build: Updating composer dependencies
* mediawiki/minus-x: 0.3.2 → 1.0.0
* mediawiki/mediawiki-phan-config: 0.9.0 → 0.9.1

Change-Id: I119f4d56cce674302f34e938e598e6cc6bf28dc0
2020-01-28 17:51:38 +00:00
Ammar Abdulhamid 641aeebbcf Replace deprecated IP class with IPUtils
Bug: T242556
Change-Id: If8e9034885726b673d1500fa8b538b5302e66165
2020-01-24 18:27:26 +01:00
jenkins-bot 70d31da673 Merge "Stop using deprecated stuff with easy replacements" 2020-01-22 16:44:57 +00:00
Daimona Eaytoy e9fe252def Fix remaining PHPCS issues
Mainly, add visibility modifiers on constants.

Change-Id: I41e8e2d691b2bad6ea6f244d54517d37d7783181
2020-01-21 12:36:37 +00:00
libraryupgrader 1d911b8187 build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 28.0.0 → 29.0.0
  The following sniffs are failing and were disabled:
  * MediaWiki.Commenting.FunctionComment.MissingParamTag
  * MediaWiki.Commenting.FunctionComment.ParamNameNoMatch

npm:
* eslint-config-wikimedia: 0.13.1 → 0.15.0
* grunt-stylelint: 0.11.1 → 0.13.0
* stylelint-config-wikimedia: 0.6.0 → 0.8.0

Additional changes:
* Remove direct "stylelint" dependency in favor of "grunt-stylelint".
* Also sorted "composer fix" command to run phpcbf last.
* Removing manual reportUnusedDisableDirectives for eslint.

Change-Id: I8f73202db1333fbc36ccf556b3bb05b1e8c279cb
2020-01-21 07:38:54 +00:00
Daimona Eaytoy 87459ec679 build: Upgrade phan
Depends-On: I6d538ce3ca7fd2d495c2bafbab7cc279da69db1c
Change-Id: Ic8c3a01a5c37fdf461f4fd5598e597eb9c9073d3
2020-01-19 18:48:51 +00:00
Daimona Eaytoy 10c2fe7151 Stop using deprecated stuff with easy replacements
This patch is mostly replacing Revision::* constants,
Wikimedia\(restore|suppress)Warnings, and wfWikiId.

Change-Id: I13544cc3e12955a9376ccce3c120e2cee1f2ee2e
2020-01-08 14:59:30 +01:00
jenkins-bot 8fea62529b Merge "Fix AbuseFilterCachingParser violating return type constraint" 2019-12-27 10:04:57 +00:00
jenkins-bot 5c9fe8bd9b Merge "Always evaluate the offset when retrieving array elements" 2019-12-27 09:58:50 +00:00
Daimona Eaytoy 8ad4ecd31d Always evaluate the offset when retrieving array elements
Even if the array is DUNDEFINED, we need to check the offset to ensure
that it's valid.

Bug: T237351
Change-Id: Ibfa360c4ae1d80abe14d9fdf66991b76cb5954df
2019-12-23 16:04:45 +00:00
Daimona Eaytoy b3e0529d55 Log deprecated vars in the cached phase in the new parser
For the new parser, xhgui shows that AbuseFilterParser::getVarValue is
taking up a lot of time; in turn, most of the time spent inside
getVarValue is used to log the use of deprecated variables. Hence, given
that:
 - We should keep the new parser performant
 - There are tons of deprecated variables out there and they likely
 won't be replaced
 - Having gazillions of debugLog entries doesn't help

log them only in the cached phase.

Bug: T234427
Change-Id: I2bfc692c829c3cbe889e5076f5205e2c99097087
2019-12-16 13:54:58 +01:00
Daimona Eaytoy f382304aae Add a base class for parser transition
Change-Id: I31282b8632c332b6d46a6bb4a42f57ac0d005b5f
2019-12-15 13:29:56 +00:00
Daimona Eaytoy d5ab147dcf Fix AbuseFilterCachingParser violating return type constraint
This is identical to I8a3c31e7385283d95b4712d457784016239a0b3b, except
for the array append case.

Bug: T236870
Change-Id: Iac033ba467232f6ff110d575920e968759ce0e15
2019-12-04 18:27:46 +00:00
Daimona Eaytoy 07572da2fe Really throw for too many params
Bug: T230803
Change-Id: I4e68bb7220f1151bb32b2be859f6cffc55888a30
2019-11-30 10:57:16 +00:00
Daimona Eaytoy 2ddd79fd98 Forbid assignments where the LHS is a built-in identifier
And not just a built-in variable.

Bug: T237130
Bug: T237216
Change-Id: Ie1d86dc324993efcb863be23697732e6aa1dac10
2019-11-28 14:40:38 +00:00
jenkins-bot a8c50150d6 Merge "Convert static arrays to constants" 2019-11-22 13:39:39 +00:00
Daimona Eaytoy c03f0a3b08 Convert static arrays to constants
Beloved PHP7!

Change-Id: Id5170662f7c5ceacfc0ac8d90787f2c92fd93464
2019-11-16 16:32:36 +01:00
Daimona Eaytoy c73381b6db Tokenizer: don't strip backslashes from \x
Bug: T238475
Change-Id: I8c2ea6ad369946df93440eece60d456dc1a3fd7a
2019-11-16 16:21:39 +01:00
Daimona Eaytoy 98bcad25c3 Also parse numbers with the new syntax and hard-deprecate the old one
This will allow people to switch their filters to the new syntax. The
deprecation warning is now more exhaustive, and the info() warning is
kept to ensure that everything proceeds smoothly.
The regex v2 has also been fixed to:
 - Consume all the digits/letters on the right (*)
 - Have named groups
 - Be created dynamically with other constants

(*) The previous version of v2 could complete the match and leave
digits/letters on the right when encountering numbers with the old
syntax, hence dropping support too early. We also cannot use a word
boundary (\b) because that would prevent matching numbers with trailing
dots (e.g. "5.").

Bug: T212730
Change-Id: Ibf6ac571f6b5c09149d69a19c38240ce6b024dff
2019-11-12 11:52:38 +00:00
Daimona Eaytoy a77a59b962 Hard-deprecate empty operands
This bumps the level to WARN, and makes it very clear that people should
fix the affected filters. It also removes the calling method, which was
mostly meant for debugging purposes, and changes the type to 'op_type'
to avoid conflicting with type:mediawiki in logstash.

Bug: T156096
Change-Id: Ie73f1604e8ed82bc2e1be9fc90fa065be37889a3
2019-11-12 11:39:25 +00:00
Daimona Eaytoy f7ac35d5c6 Hard-deprecate too many params
Bug: T230803
Change-Id: Icec8bcb8ab23956654857acc8b3d235889f587a9
2019-11-10 12:59:33 +00:00
jenkins-bot 91bc961712 Merge "Check for 0-like floats passed to the modulo operator" 2019-11-10 11:51:28 +00:00
Daimona Eaytoy c0f8374624 Check for 0-like floats passed to the modulo operator
That throws an error in PHP.

Bug: T237459
Change-Id: Ia0b29d6a8b9f4aac6b5b72ce8f2f45afb03f4c99
2019-11-10 11:22:04 +00:00
jenkins-bot 7ff4b95aec Merge "Expand the list of types that can be cast to int" 2019-11-10 11:00:36 +00:00
Daimona Eaytoy 585d6cdb24 Make to sure to report division by zero when the LHS is undefined
Bug: T234339
Change-Id: I1575ec013c1e7e321a8f13f40804ebc5ab076268
2019-11-08 14:08:52 +00:00
Daimona Eaytoy 1abaff1aac Better handling of keywords and functions
Always run the keyword/function handler, even if there are DUNDEFINED
arguments, so that the handler can perform further validation on the
input and report any error to the user. However, replace DUNDEFINED with
DNULL before running the handler, to avoid special-casing DUNDEFINED in
every handler. If any argument was a DUNDEFINED, we will return
DUNDEFINED anyway.

Also centralize the keyword handling logic to a new method, like it
happens for functions.

Bug: T234339
Change-Id: I875cb77418a39790e91fe5867c49917bfe406ed4
2019-11-08 15:07:20 +01:00
Daimona Eaytoy e98799a00a Centralize the code for calling keywords
This allows sharing the code between cachingparser and the old parser
(for DRY-ness), and even when the old parser will be killed, having the
logic outside of the generic parse method seems saner.

This copies what I446a307e5395ea8cc8ec5ca5d5390b074bea2f24 did for
functions.

Change-Id: Ie6290243a6c78661510a9b4cb713d6e7b2778248
2019-11-08 15:02:17 +01:00
Daimona Eaytoy b7c7ae168d Explicitly forbid negative indexes in arrays
This emits its own error because:
1- It's clearer to understand
2- It's easier to find where we're dealing with negative offsets, if
we'll ever want to allow that.

Note that trying to use a negative index already results in a hard PHP
error being thrown.

Bug: T237219
Change-Id: Ib11eaaca5e21f740269141c75e62bac48093e8d0
2019-11-08 05:55:56 +00:00
Daimona Eaytoy a7b28369ea Expand the list of types that can be cast to int
Bug: T237624
Change-Id: I2220cb8a8ec998a433a4469d7e0591ec0b4f2b12
2019-11-07 15:14:17 +01:00
Daimona Eaytoy b9e4475985 build: Upgrade mediawiki-phan-config to 0.8.0
This is to verify that our CI is able to handle the new version.

Bug: T235049
Change-Id: Ib7427e15f673a575738489476e604c387f449ddd
2019-10-09 19:12:51 +02:00
jenkins-bot c6ee722273 Merge "Remove AFPData::dup" 2019-10-04 19:42:52 +00:00
jenkins-bot 9ab13cf24b Merge "Replace array_map with foreach" 2019-10-04 19:42:49 +00:00
Daimona Eaytoy c7fa503e9b Remove AFPData::dup
The method, which simply duplicates an AFPData instance, is only used
when casting types, to return a different instance when the object
already has the desired type.
However, nothing is assuming that, so we can just return the original
instance and save some time.

Bug: T234427
Change-Id: Id8067b418a00260ceead35f234e55268390699ab
2019-10-04 19:15:08 +00:00
Daimona Eaytoy 703835e835 Drop HHVM support
Change-Id: Ib7ccb4f68278ba8ca009e9d18e9d8b127f799cde
2019-10-03 12:27:18 +00:00
Daimona Eaytoy 337771f83b Replace array_map with foreach
This is a micro-optimization, but IMHO it's necessary. The AF parser
code is executed for every active filter, for every
edit/move/deletion/accountcreation. In PHP, foreach is usually faster
than array_map. Especially in the case of variadic functions potentially
taking hundreds of strings, foreach will consume less time.

Bug: T234427
Change-Id: I1beedf419a6637a9a3dd668635645df950ceda21
2019-10-02 11:29:19 +00:00
Daimona Eaytoy 4c8be4d374 Add profiling points throughout the code for the CachingParser switch
Bug: T156095
Change-Id: Ib934be34a953166fe1b94cfe8ed216afe3b906ca
2019-09-18 10:02:55 +00:00
jenkins-bot 48713c824b Merge "Throw AFPUserVisibleExceptions for empty operands in CachingParser" 2019-09-15 08:36:39 +00:00
Daimona Eaytoy a4e25c1ac9 Throw AFPUserVisibleExceptions for empty operands in CachingParser
Instead of TypeErrors. Basically, only empty parenthesis had to be
fixed.

Bug: T156096
Change-Id: I019615c7bfaa179c2184b5d3ea2c6b5da91366e3
2019-09-14 18:35:40 +00:00
Daimona Eaytoy 5267082c85 Better logging for unset variables
We have many log entries, so we need some more debug data.

Bug: T230256
Change-Id: I0e9638c1ffe537ea6cfd6886ff32ef447fdacc28
2019-09-14 16:49:55 +00:00
Daimona Eaytoy 6e9a9a3bc2 CachingParser: ensure to catch errors inside short-circuited blocks
This is similar to the old parser: when discarding a node, actually
evaluate it if short-circuit is not allowed.
Add a whole lot of tests for all possible exceptions.
Move the logic to extract a message from an AFPUserVisibleException away
from the parser, to keep unit tests working.

Bug: T232498
Change-Id: I31ee4e255c6a87dd693b9bcd582539fdf57acd45
2019-09-13 21:13:15 +00:00