wikimedia/mediawiki-extensions-AbuseFilter

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/AbuseFilter.git synced 2024-12-01 09:06:22 +00:00

Author	SHA1	Message	Date
Daimona Eaytoy	337771f83b	Replace array_map with foreach This is a micro-optimization, but IMHO it's necessary. The AF parser code is executed for every active filter, for every edit/move/deletion/accountcreation. In PHP, foreach is usually faster than array_map. Especially in the case of variadic functions potentially taking hundreds of strings, foreach will consume less time. Bug: T234427 Change-Id: I1beedf419a6637a9a3dd668635645df950ceda21	2019-10-02 11:29:19 +00:00
Daimona Eaytoy	4c8be4d374	Add profiling points throughout the code for the CachingParser switch Bug: T156095 Change-Id: Ib934be34a953166fe1b94cfe8ed216afe3b906ca	2019-09-18 10:02:55 +00:00
Daimona Eaytoy	6e9a9a3bc2	CachingParser: ensure to catch errors inside short-circuited blocks This is similar to the old parser: when discarding a node, actually evaluate it if short-circuit is not allowed. Add a whole lot of tests for all possible exceptions. Move the logic to extract a message from an AFPUserVisibleException away from the parser, to keep unit tests working. Bug: T232498 Change-Id: I31ee4e255c6a87dd693b9bcd582539fdf57acd45	2019-09-13 21:13:15 +00:00
Daimona Eaytoy	004ccfdb5c	Annotate the AST with var names before caching the AST This implements T230982#5475400, and it should speed up the CachingParser by roughly 40%. Bug: T230982 Change-Id: I803cc58637d50eb90e57decf243f5ca78075d63d	2019-09-13 19:43:50 +00:00
jenkins-bot	5be19f6f65	Merge "Add a 'strict' option to VariableHolder::getVar"	2019-09-05 19:23:23 +00:00
Daimona Eaytoy	489da0d229	Add a 'strict' option to VariableHolder::getVar This will help mitigating problems like T230256 by enforcing that the requested variables must exist. For now, it will only log bad usages, thus providing a way to identify affected filters and fix them. Bug: T230256 Change-Id: I7a61916576e444a56f0e07da7b6e5033346226bd	2019-09-04 18:19:23 +00:00
Daimona Eaytoy	13b1e880f2	Hotfix other DUNDEFINED casts to bool These were spotted on testwiki with wmf.21. Change-Id: Ic4d67a2b83aedfeb574fa1363a9fc618b2862f95	2019-09-04 18:06:22 +00:00
Daimona Eaytoy	d51ca862c6	Move parser tests to /unit IMHO these can be considered unit tests; they were already fast, but now they're executed in an instant. This requires several changes: 1 - delay retrieving messages in AFPUserVisibleException, to avoid having to deal with i18n whenever we want to test exceptions; 2 - Use some DI for Parser and Tokenizer. Equivset-dependend tests are also moved to a new class, thus helping to fix the AF part of T189560. Change-Id: If4585bf9bb696857005cf40a0d6985c36ac7e7a8	2019-08-28 16:36:37 +00:00
Daimona Eaytoy	71730f7d44	Warn if a function has been given too many parameters While this is not as important as throwing for too few parameters, IMHO it's still important to fail in this case. Mostly because if a function receives too many parameters, chances are that who wrote the filter didn't do that intendedly, and thus there may be a hidden bug. Bonus: fix a few docblocks. Bug: T230803 Change-Id: Iac2931f17b50ace8c8f4c2faa44b3f54ca134c54	2019-08-26 20:29:49 +02:00
Daimona Eaytoy	d515af0ae6	Add a new class for the CachingParser's AST This allows a little bit more of abstraction: we can store other data in the tree, without having to store it in a specific node (e.g. the variables map, which is still unused). It also adds a few typehints, and specializes the return value of eval'ing the AST: previously, it was the one of evalNode, which wasn't guaranteed to be an AFPData. Now we have this guarantee. Last but not least, we can now measure runtime metrics for evalTree, which doesn't recurse. Bonus: fix a check in the old parser, which used the wrong variable when reporting outofbounds errors. Change-Id: Iff806793b1d968e9bb6220f1459f3d0ac587c7da	2019-08-25 17:29:16 +00:00
jenkins-bot	47838715fa	Merge "Allow if without else"	2019-08-20 20:12:19 +00:00
jenkins-bot	5e605aaa62	Merge "Even better handling of DUNDEFINED"	2019-08-20 20:00:52 +00:00
jenkins-bot	bf8ccccade	Merge "Fix a bug in the return value of the CachingParser"	2019-08-20 19:58:38 +00:00
Daimona Eaytoy	af7744781f	Allow if without else Bug: T230727 Depends-On: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45 Change-Id: I3b85087677607573f4fa68681735dc35348dcd87	2019-08-20 19:36:14 +00:00
Daimona Eaytoy	963221ad6d	Even better handling of DUNDEFINED Ensure that the variable isn't set before marking it as DUNDEFINED: that's only for when we cannot use a default, but if the variable is set we already have one. Most notably, this fixes conditionals handling: right now, if you have a conditional with an assignment in both branches, the variable will be undefined. That's obviously wrong, so it's fixed in this patch. Plus: catch only AFPExceptions in a test to avoid unintentionally catching the assert exception; simplify some assignments using wfSetVar. Depends-On: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24 Change-Id: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45	2019-08-20 19:17:30 +00:00
Daimona Eaytoy	fa76405ea7	Fix a bug in the return value of the CachingParser This has always been wrong, and remained unnoticed. Also added a typehint for added safety. Change-Id: I8a3c31e7385283d95b4712d457784016239a0b3b	2019-08-20 20:54:19 +02:00
Daimona Eaytoy	aa867bd370	Better handling of function params in CachingParser This patch includes various fixes to how func arguments are handled in CachingParser: - Add a comment about a future improvement of checkSyntax, which we could limit to try building the AST. - Having enough args for each function is now also checked when building the AST. This allows implementing the previous point without stopping to report notenoughargs at syntaxcheck-time (otherwise it'd be a runtime error). And it also ensure that we check for the params count inside skipped branches, e.g. inside if/else: these were already only discovered at runtime in CachingParser. The old parser is not affected by this change, because when checking syntax it will always execute all branches, and at runtime it will skip braces altogether. - Fix arg count for CachingParser, which previously added a bogus param in case of a function called without parameters. This was fixed for the other parser in I484fe2994292970276150d2e417801453339e540, and I just ported the updated fix. Also note that the CachingParser was already failing for e.g. `count()`, but instead of complaining about missing arguments, it failed hard when trying to pass NULL to evalNode. - Fixed some tests not to use setExpectedException, which caused the previous point to remain unnoticed: calling that method prevents the loop from continuing, and thus only the AbuseFilterParser part was being executed. The new implementation checks the exception ID and is thus more future-proof if the i18n message changes. - Fixed some function names in error reporting for the old parser. - The arg count is now checked outside of the function handlers, thus it's no more necessary to call checkEnoughArguments at the beginning of each handler. This also produces clearer error messages in case of aliases (e.g. set/set_var). - Check the args count even if some of the args are DUNDEFINED. This is much easier now that the check is outside of the handler. This will make syntax check fail for e.g. `contains_any(added_lines)`. Bug: T156095 Change-Id: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24	2019-08-20 15:32:02 +00:00
jenkins-bot	7addec7b4a	Merge "Make some other AFPData methods non-static"	2019-08-20 14:16:16 +00:00
jenkins-bot	1f45336157	Merge "Move keywords handlers to the Parser"	2019-08-20 14:16:10 +00:00
jenkins-bot	f18d0814e2	Merge "Make several AFPData functions non-static"	2019-08-20 14:06:02 +00:00
jenkins-bot	f1ab591d27	Merge "Avoid implicit casts from DUNDEFINED to something else"	2019-08-20 13:04:48 +00:00
Daimona Eaytoy	d715f6d2c0	Increase cache hits for CachingParser If $parser->parse returns a falsey value (=null), that's because the filter doesn't have any statement. But that's not a valid reason not to cache the filter. Hence, return whatever parse() is returning inside the callback, so that the result is always cached. Change-Id: Ib6b0e72d882dc484456a3be6bbc74da36ef48bf7	2019-08-13 18:03:13 +02:00
Daimona Eaytoy	4b0911ee01	Make some other AFPData methods non-static Change-Id: I22ea337a36f911c57d3dadb9a3c45fc2c8b7c628	2019-08-12 14:40:51 +02:00
Daimona Eaytoy	3f171dc0a5	Move keywords handlers to the Parser Just like we do for functions, it doesn't really make sense to have keywords separately, in AFPData. Change-Id: I208a9b1ce2bd12038e9fbcc515c48d604ec80eb8	2019-08-12 14:29:56 +02:00
Daimona Eaytoy	2fdf091eb9	Make several AFPData functions non-static The keywords-related ones will be handled in a subsequent patch. Change-Id: Ifcfad438023ef136dc6f2cd5529e867df9b23789	2019-08-12 14:12:16 +02:00
Daimona Eaytoy	1fe3647268	Avoid implicit casts from DUNDEFINED to something else This patch keeps the current behaviour for everything (since DUNDEFINED was always casted to boolean false), but handles the cast at a higher level instead of relying on what AFPData::castTypes will do. This way it's easier to spot places where we may get DUNDEFINED, and decide how to handle them one by one. Change-Id: I1070e15ea03c7dd4a4231b87afbc42240a558581	2019-08-12 11:18:15 +02:00
Daimona Eaytoy	69ad23da98	Ban variable variables As explained on phab, it's not worth the effort of keeping this feature. Bug: T229947 Change-Id: Ic6067cab8e1ede98545e704888c99e2ed9a004e4	2019-08-11 01:47:35 +00:00
Daimona Eaytoy	2ed6272bb2	Partly handle set and set_var in shortcircuit This is more complicated than the := operator, because the var name could be a complicated expression, and we have to handle a function call. This patch only covers the case where the variable name is a literal, which is enough for WMF production. Bug: T214674 Change-Id: I6c0f8e95663919a0235b5ccf0c88ad0a539315a7	2019-08-06 16:14:34 +02:00
Daimona Eaytoy	517919fca8	Allow accessing offsets of built-in variables I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7 almost fixed this bug, but we also have to make it possible to access builtin variables as arrays. This will only make sense for a few variables (e.g. added_lines and removed_lines), but I don't think we should validate it when checking syntax. Bug: T198531 Change-Id: I417e1b8d4802bbfccd091ce5c7617659cfd1e4ea	2019-08-04 17:14:44 +00:00
Daimona Eaytoy	9049be3609	Specialize empty AFPData types As described in T156096#5389655. Change-Id: Ifbf95a6b72a280cd77db6affbd8d642499bbfedc	2019-08-04 15:26:57 +00:00
Daimona Eaytoy	a85e1ccc59	Make AbuseFilterParser::$funcCache non-static Change-Id: I312efe3ce4d1f06e697aa4564aeec1bacbaf97d3	2019-08-03 09:19:49 +00:00
Daimona Eaytoy	09d0254172	Better handling of DNONE This patch includes: * Making it possible to access offsets of a DNONE (returning a DNONE) * Initializing user-defined variables as DNONE inside short-circuited branches * Make DNONE propagate with other operators * Make DNONE count as false for logic operators * Remove a now-outaded bit in doLevelAtom. In case of shortcircuit, $result is now DNONE instead of DNULL, and thus it's possible to access offsets of it. Performance++! * Don't allow modifying or adding an element of a DNONE as if it were an array (to avoid inconsistencies) This re-applies Id85c673337fa90a3782fd22eb9690cd996967111 with several fixes. NOTE: Haven't tested locally, although I'm pretty confident thanks to the amount of tests added. Bug: T214674 Bug: T228677 Change-Id: I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7	2019-08-02 21:05:08 +00:00
Daimona Eaytoy	13cdb86dd2	Revert "Initialize user-defined variables during shortcircuit" Reason for revert: T214674#5374806 This reverts commit `56e6117afd`. Bug: T214674 Change-Id: Iccce248d2693cd9877a740b74e72a577e730435e	2019-07-29 23:06:23 +00:00
Daimona Eaytoy	56e6117afd	Initialize user-defined variables during shortcircuit Bug: T214674 Depends-On: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05 Change-Id: Id85c673337fa90a3782fd22eb9690cd996967111	2019-07-23 12:20:53 +00:00
Daimona Eaytoy	18d7d2ed62	Start using AFPData::DNONE This should allow more flexibility when checking syntax, and a saner behaviour overall. Aside from not throwing exception in certain cases, the results should be almost equal to the ones you would get without this patch. However, there are still a few things to improve (which for convenience I wrote inside the parser test) and many to test. Bug: T204654 Depends-On: I69bfec45c76509fb1112641393f78e8d8834adcd Change-Id: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05	2019-07-14 08:48:47 +00:00
jenkins-bot	c3dcd95733	Merge "Start making APFData members private"	2019-07-09 09:23:17 +00:00
jenkins-bot	35ab35978b	Merge "Add a new data type for non-initialized stuff"	2019-07-09 08:58:48 +00:00
Daimona Eaytoy	3aaeb20063	Start making APFData members private $data and $type are meant to be read-only and should have getter functions, but as usual they're just public. Add getter methods, a comment with a @private annotation and remove usages in our codebase. Change-Id: I5e51efc9f982a4e340b48d20cb1b38a75bb10021	2019-07-09 10:57:00 +02:00
Daimona Eaytoy	304b58d46a	Make AbuseFilterVariableHolder::mVars private This property is meant to be private, since it has all kinds of getters/setters, aside from one which is introduced in this patch. Change-Id: I217b1e22cabd3c0468c84b1d6a69a6ed3c6fa8e6	2019-07-08 16:25:10 +02:00
Daimona Eaytoy	bc79962803	Add a new data type for non-initialized stuff Split from I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05. This adds a new data type to use for empty AFPDatas. Using NULL for that makes it impossible to distinguish cases where we really got a null value, and cases where there was nothing to parse. For now, DNONE is the same as DNULL, but I've explicited DNULL where necessary. A subsequent patch will make proper use of DNONE. Bug: T156096 Change-Id: I69bfec45c76509fb1112641393f78e8d8834adcd	2019-07-08 15:35:02 +02:00
Daimona Eaytoy	382751a707	Move conditions-related stuff inside AbuseFilterParser Instead of relying on static methods and members in the AbuseFilter class, move everything related to conditions inside the Parser, as the amount of used conditions is something pertaining a single AbuseFilter(Caching)Parser instance. This change requires changing some signatures and adding parameters, but will make introducing the new AbuseFilterRunner class easier (and that will clean signatures, too). Depends-On: I5b29ff556eca45fe59d15e2e3df4d06f1f6b3934 Change-Id: I7c1ea17adf7f42cf9260d416906bfbf3b8a20688	2019-06-19 15:14:17 +00:00
Daimona Eaytoy	39fc7c12af	Restore unit tests for CachingParser and fix it Added cachingParser back to all the parser tests, fixed a couple of differences with the normal parser, and added a couple of tests so that any cachingParser-related file has 100% coverage. Also move the remaining get_matches tests inside parserTests, and specify the parser used in case of failure. This also adds a new base class for parser-related tests with a couple of util methods. Bug: T201193 Change-Id: I980aec3481a52ecc35f1811a366014a5581a7cdb	2019-05-25 10:55:24 +02:00
Daimona Eaytoy	f2c1beec44	Replace double-equals with triple-equals Since double-equals are evil. I left some of them in place where I wasn't sure, but I may be changed some which were intended to be doubles. It could be a good idea to delay merging this patch until we'll have more code coverage. Change-Id: I1721a3ba532d481e3ecf35f51099c1438b6b73b2	2019-03-22 16:12:13 +01:00
Daimona Eaytoy	53ab2b5067	Fix documentation errors reported by Phan Change-Id: I5788147ba1998235ded9eedbf64ebad37fce236f	2019-03-16 09:27:05 +00:00
Daimona Eaytoy	cea1b0aec3	Compact some other comments Remove unnecessary ones, make clear what the cryptic ones mean, and inline them when possible. Change-Id: I384859871a66ced8cb0d81260c06c5a5b278866f	2018-07-17 17:17:44 +02:00
Daimona Eaytoy	c75bc35f7d	Rename lists to arrays Arrays were introduced with the name "lists". While it may look user-friendlier and so on, it actually uses a wrong name: lists are different from arrays. I ran a grep and I should've replaced every occurrence, plus everything seems to work, however a double check wouldn't be bad. Change-Id: I6a858f02f5dd9250ba7e1abf9c6422fd98758c9e	2018-06-26 14:42:23 +02:00
Daimona Eaytoy	caa4b1c763	Add phan configuration This is taken from I6a57a28f22600aafb2e529587ecce6083e9f7da4 and makes all the needed changes to make phan pass. Seccheck will instead fail, but since it's not clear how to fix it (and it is non-voting), for the moment we may merge this and enable phan on IC. Bug: T192325 Change-Id: I77648b6f8e146114fd43bb0f4dfccdb36b7ac1ac	2018-04-30 08:32:58 +00:00
Daimona Eaytoy	3c3a521fec	Fix coding conventions exclusion rules This should fix every error with excluded rules, leaving only the one for $wgTitle. A double check would be nice in order to avoid regressions due to stupid mistakes. Bug: T178007 Change-Id: I22c179f3a01d652640304b59e43fcb5b5a9abac3	2018-04-20 08:40:18 +00:00
Umherirrender	a2ebd0c70a	Improve some parameter docs Change-Id: Ibac10a20243a4eedd826485d56eddd5234da6fec	2017-10-07 00:54:58 +02:00
Umherirrender	a063e33ee8	Use short array syntax Done by phpcbf over composer fix Change-Id: I53fd1fc8d056b9b60194d2d630852cfca37aadea	2017-06-15 17:02:57 +02:00

1 2

51 commits