wikimedia/mediawiki-extensions-AbuseFilter

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/AbuseFilter.git synced 2024-11-24 14:13:54 +00:00

Author	SHA1	Message	Date
Daimona Eaytoy	c73381b6db	Tokenizer: don't strip backslashes from \x Bug: T238475 Change-Id: I8c2ea6ad369946df93440eece60d456dc1a3fd7a	2019-11-16 16:21:39 +01:00
Daimona Eaytoy	98bcad25c3	Also parse numbers with the new syntax and hard-deprecate the old one This will allow people to switch their filters to the new syntax. The deprecation warning is now more exhaustive, and the info() warning is kept to ensure that everything proceeds smoothly. The regex v2 has also been fixed to: - Consume all the digits/letters on the right () - Have named groups - Be created dynamically with other constants () The previous version of v2 could complete the match and leave digits/letters on the right when encountering numbers with the old syntax, hence dropping support too early. We also cannot use a word boundary (\b) because that would prevent matching numbers with trailing dots (e.g. "5."). Bug: T212730 Change-Id: Ibf6ac571f6b5c09149d69a19c38240ce6b024dff	2019-11-12 11:52:38 +00:00
Daimona Eaytoy	a7b28369ea	Expand the list of types that can be cast to int Bug: T237624 Change-Id: I2220cb8a8ec998a433a4469d7e0591ec0b4f2b12	2019-11-07 15:14:17 +01:00
Daimona Eaytoy	7b06be0204	Allow dangling commas in variargs This is because there are many filters using this feature. Moreover, it could make it a little easier to add new arguments, just like dangling commas in PHP arrays do. Also re-align the CachingParser code of doLevelFunctions to the one in the old Parser. Bug: T153251 Change-Id: Ie4325159f47310788da57415a5e36e62aa4efad0	2019-09-07 11:19:14 +02:00
Daimona Eaytoy	d51ca862c6	Move parser tests to /unit IMHO these can be considered unit tests; they were already fast, but now they're executed in an instant. This requires several changes: 1 - delay retrieving messages in AFPUserVisibleException, to avoid having to deal with i18n whenever we want to test exceptions; 2 - Use some DI for Parser and Tokenizer. Equivset-dependend tests are also moved to a new class, thus helping to fix the AF part of T189560. Change-Id: If4585bf9bb696857005cf40a0d6985c36ac7e7a8	2019-08-28 16:36:37 +00:00
Daimona Eaytoy	d515af0ae6	Add a new class for the CachingParser's AST This allows a little bit more of abstraction: we can store other data in the tree, without having to store it in a specific node (e.g. the variables map, which is still unused). It also adds a few typehints, and specializes the return value of eval'ing the AST: previously, it was the one of evalNode, which wasn't guaranteed to be an AFPData. Now we have this guarantee. Last but not least, we can now measure runtime metrics for evalTree, which doesn't recurse. Bonus: fix a check in the old parser, which used the wrong variable when reporting outofbounds errors. Change-Id: Iff806793b1d968e9bb6220f1459f3d0ac587c7da	2019-08-25 17:29:16 +00:00
jenkins-bot	47838715fa	Merge "Allow if without else"	2019-08-20 20:12:19 +00:00
jenkins-bot	5e605aaa62	Merge "Even better handling of DUNDEFINED"	2019-08-20 20:00:52 +00:00
Daimona Eaytoy	af7744781f	Allow if without else Bug: T230727 Depends-On: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45 Change-Id: I3b85087677607573f4fa68681735dc35348dcd87	2019-08-20 19:36:14 +00:00
Daimona Eaytoy	963221ad6d	Even better handling of DUNDEFINED Ensure that the variable isn't set before marking it as DUNDEFINED: that's only for when we cannot use a default, but if the variable is set we already have one. Most notably, this fixes conditionals handling: right now, if you have a conditional with an assignment in both branches, the variable will be undefined. That's obviously wrong, so it's fixed in this patch. Plus: catch only AFPExceptions in a test to avoid unintentionally catching the assert exception; simplify some assignments using wfSetVar. Depends-On: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24 Change-Id: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45	2019-08-20 19:17:30 +00:00
Daimona Eaytoy	fa76405ea7	Fix a bug in the return value of the CachingParser This has always been wrong, and remained unnoticed. Also added a typehint for added safety. Change-Id: I8a3c31e7385283d95b4712d457784016239a0b3b	2019-08-20 20:54:19 +02:00
Daimona Eaytoy	430ba818d0	Add test for multiple conditions inside conditionals The regression itself was fixed in I980aec3481a52ecc35f1811a366014a5581a7cdb, so this patch only adds a test for it. Also remove a comment about CachingParser failures: we don't want to encourage people to remove it from tests anymore. Bug: T152281 Change-Id: I3ad49050ea49bf45d3226878e091da3c8dbefdb1	2019-08-12 18:18:05 +02:00
Daimona Eaytoy	517919fca8	Allow accessing offsets of built-in variables I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7 almost fixed this bug, but we also have to make it possible to access builtin variables as arrays. This will only make sense for a few variables (e.g. added_lines and removed_lines), but I don't think we should validate it when checking syntax. Bug: T198531 Change-Id: I417e1b8d4802bbfccd091ce5c7617659cfd1e4ea	2019-08-04 17:14:44 +00:00
Daimona Eaytoy	9049be3609	Specialize empty AFPData types As described in T156096#5389655. Change-Id: Ifbf95a6b72a280cd77db6affbd8d642499bbfedc	2019-08-04 15:26:57 +00:00
Daimona Eaytoy	09d0254172	Better handling of DNONE This patch includes: * Making it possible to access offsets of a DNONE (returning a DNONE) * Initializing user-defined variables as DNONE inside short-circuited branches * Make DNONE propagate with other operators * Make DNONE count as false for logic operators * Remove a now-outaded bit in doLevelAtom. In case of shortcircuit, $result is now DNONE instead of DNULL, and thus it's possible to access offsets of it. Performance++! * Don't allow modifying or adding an element of a DNONE as if it were an array (to avoid inconsistencies) This re-applies Id85c673337fa90a3782fd22eb9690cd996967111 with several fixes. NOTE: Haven't tested locally, although I'm pretty confident thanks to the amount of tests added. Bug: T214674 Bug: T228677 Change-Id: I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7	2019-08-02 21:05:08 +00:00
Daimona Eaytoy	9937f8b050	Remove extra file from parser tests Added in I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05, but .r files aren't used anymore since I6c06e596587750c4ebaabafbd277bc75eeb436a5, and I forgot to remove the file upon rebasing. Change-Id: Id688d215b1136bd0a04b8c0d8d8d16de5da1295e	2019-07-15 12:22:09 +02:00
Daimona Eaytoy	18d7d2ed62	Start using AFPData::DNONE This should allow more flexibility when checking syntax, and a saner behaviour overall. Aside from not throwing exception in certain cases, the results should be almost equal to the ones you would get without this patch. However, there are still a few things to improve (which for convenience I wrote inside the parser test) and many to test. Bug: T204654 Depends-On: I69bfec45c76509fb1112641393f78e8d8834adcd Change-Id: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05	2019-07-14 08:48:47 +00:00
Daimona Eaytoy	39fc7c12af	Restore unit tests for CachingParser and fix it Added cachingParser back to all the parser tests, fixed a couple of differences with the normal parser, and added a couple of tests so that any cachingParser-related file has 100% coverage. Also move the remaining get_matches tests inside parserTests, and specify the parser used in case of failure. This also adds a new base class for parser-related tests with a couple of util methods. Bug: T201193 Change-Id: I980aec3481a52ecc35f1811a366014a5581a7cdb	2019-05-25 10:55:24 +02:00
Daimona Eaytoy	909eec6716	Tweak coverage part 2 Follow-up of Ic30883f7d261d974a2be46308d023e2714119e95, with two files that I forgot to git-add and a repositioning of comments to avoid the last bracket to be reported as uncovered. Bug: T201193 Change-Id: I6bf7e5892a0f49f6a138792f0aedf230a70c18a8	2019-04-13 19:26:01 +02:00
Daimona Eaytoy	4bcb64b01a	Increase code coverage a bit This patch mostly adds coverageIgnore comments for intendedly unreachable code etc. Some of them could be made testable by adding a new filter function (e.g. array cast), but this patch is meant to be comment-only (aside from the parser test). Ignoring coverage for these lines makes some methods reach 100% coverage, which in turn makes it easier to look at the coverage chart and identify at a glance which parts of the code really need to be covered. Bug: T201193 Change-Id: Ic30883f7d261d974a2be46308d023e2714119e95	2019-04-13 18:30:14 +02:00
Daimona Eaytoy	0ff581e246	Clean AbuseFilterParserTests Mostly delete result files and assume the result is always true. The few exceptions were either moved to standalone test, or inverted. Change-Id: I6c06e596587750c4ebaabafbd277bc75eeb436a5	2019-03-23 12:59:03 +01:00
Daimona Eaytoy	6f4bfc9597	Fix shortcircuit for consecutive operations Using break could halt parsing between operations, instead use continue to parse all operations. Bug: T214642 Change-Id: If67ddaffef280c2448c55ae536013758617bba68	2019-02-08 17:55:59 +00:00
Daimona Eaytoy	d9d5af3890	Unbreak short circuit for arrays This problem have been making filters potentially fail silently since 2009. Also add tests for arrays to make sure that no problems arise when short circuit is used. Bug: T204841 Change-Id: Ie4e2e06498c1202ba73afcc5d164a72427abbca5	2018-10-03 16:44:10 +02:00
Daimona Eaytoy	775c736512	Improve coverage for AbuseFilterTokenizer This will make tokenizer almost fully covered. The only uncovered parts are the one with cache and an else condition which I think won't ever be executed, and thus added a comment for that. Also, remove an obsolete xxx comment from ComputedVariable (fixed in I8e420f0259ef6c9e579f7a00beb58f28af9da37d) Bug: T201193 Change-Id: I6e9a73aa9e437f096f6a1e20d53a7cb50e5ed85d	2018-08-25 10:25:16 +02:00
Daimona Eaytoy	447d434e2a	Improve code coverage Add some parser tests, improve existing ones, and add missing @covers. Bug: T201193 Change-Id: I9c0d2d83560baa4a3e1d4465b7919a48c4e26ac1	2018-08-22 19:07:14 +02:00
Daimona Eaytoy	4f3b020f5d	Improve code coverage for AbuseFilterParser Add some tests and improve others to raise coverage percentage. This should lead to almost 100% for the AbuseFilterParser class. Aside from this, a couple of changes: * Remove an unused function * Let equals_to_any return a genuine result with empty strings * Remove an if which will never be true in skipOverBraces, since the function is called after checking the same conditions. Bug: T201193 Change-Id: I7020b2ed996236c38c5784d161ad98ec44163406	2018-08-20 14:38:40 +02:00
Daimona Eaytoy	c75bc35f7d	Rename lists to arrays Arrays were introduced with the name "lists". While it may look user-friendlier and so on, it actually uses a wrong name: lists are different from arrays. I ran a grep and I should've replaced every occurrence, plus everything seems to work, however a double check wouldn't be bad. Change-Id: I6a858f02f5dd9250ba7e1abf9c6422fd98758c9e	2018-06-26 14:42:23 +02:00
Huji Lee	2792fce41e	Introduce sanitize() function Normalizes HTML entities into unicode characters Bug: T169122 Change-Id: Ic916a6f8976e486d62d65156fa2dab56a55cf22a	2018-06-03 16:37:23 -04:00
Daimona Eaytoy	9eea111d9f	Sync parser tests with examples on mediawiki I added on MW an example of comparison with empty array, which we should keep inside the dedicated test as well. Change-Id: Ifa4bca85c8978ef24ed5bb26787730bb4521261f	2018-04-26 18:47:51 +02:00
jenkins-bot	6aa6b8fc13	Merge "Add the remaining equality checks"	2018-04-26 13:25:56 +00:00
Daimona Eaytoy	71f375f19a	Add equals_to_any function Introduce a new function which can be used to group multiple comparisons in a single condition. In particular, equals_to_any(S, A, B) is the equivalent of S === A \|\| S === B. This is especially useful in checking for multiple namespaces, as proposed in the Community health initiative. Change-Id: I9dcfe303eb5e51e1882fe4a65fa876aa93db7686	2018-04-25 23:12:19 +00:00
Daimona Eaytoy	24c8d7d54e	Add the remaining equality checks I left as ToDo the checks between an array and something else. With this patch, it'll work like PHP: the result will be true iff the comparison is loose, the array is empty and the other operand is either false or null. Change-Id: Idc5cadb697ed4fc7f4856967274169f77495ed9f	2018-04-25 10:16:50 +02:00
Daimona Eaytoy	8cfd527f31	Reinforce parser tests Some of them are actually too simple, and may be unuseful in tricky situations. This patch adds a lot of test cases to provide an (almost) bombproof safety with future patches. Depends-On: I0bb1ed0109af66997e238b532d342d82d4c4ae19 Change-Id: I274ef306775c36be20acb662353f6537ff3f1a33	2018-04-09 16:25:54 +02:00
Daimona Eaytoy	2dda2e381c	Convert division/multiplication/modulo results after calculation So that type and value will be identical to PHP's ones. Bug: T191688 Depends-On: I1140900cdda63eed292d9f20aefd721ef9247fcd Change-Id: I398c9a972b7e9fcb27d055d23939be2b8bb68244	2018-04-09 16:16:04 +02:00
Daimona Eaytoy	284ab234fd	Allow comparing two lists This feature was never implemented. I'm not sure whether we need a way to compare array and other types of variables (left as ToDo), since e.g. in PHP it's always false. Bug: T179238 Change-Id: I5d2c33fd117e69cbc84c0b04b6cb82edbdcadf16	2018-04-06 11:44:28 +00:00
Daimona Eaytoy	a0de056299	Add contains_all and ccnorm_contains_all functions Added the contains_all function, with basically the same role as contains_any but using logic AND instead of OR. Also added ccnorm_contains_all, that is the same of ccnorm_contains_any but with AND mode. Finally, fixed three wrong task IDs. Co-authored with Valerio Bozzolan. Bug: T21176 Change-Id: Ib0a8b783db6ce0d5db64771c8e0c70f0f8d13d36	2018-02-09 17:33:24 +01:00
Dayllan Maza	2bc8873c30	Add ccnorm_contains_any function Normalize and search a string for multiple substrings Bug: T65242 Change-Id: I4034c0054a6849babbf2d96ea13dc97d3660d5b4	2017-10-06 11:32:45 -04:00
Victor Vasiliev	46faa02c49	Fix the associativity of boolean logic operators Change-Id: Icaf0fde0d74064532af4b110faef4014f8303f80	2016-11-06 20:30:07 -05:00
Victor Vasiliev	aa399da279	Implement a tree-caching abuse filter parser This filter is fully functional. The old filter is still enabled by default for a transitional period in case the new one suddenly has issues. Change-Id: I4aea5f00c62420108030e60e79d5bf34e913e95d	2016-09-24 02:53:26 +00:00
Victor Vasiliev	5da98b67bf	Add test coverage for more bizzare features of the filter parser I am pretty sure all of the behavior documented in these tests is a bad idea. It is possible that we can fix it since some of those features are probably unused, but for now those tests will serve as a documentation of the current behavior. Change-Id: Ia2a2f57a538d7aef2ac73fb2e47fe82dd5d5e09a	2016-08-21 18:45:22 -04:00
Kaldari	acd28cb00f	Update tests for AntiSpoof fixes Bug: T29987 Depends-On: Iccb3e50073bbbc2b979cb62dd0e129afd1c2e55f Change-Id: I8bef839b9b9ca5fced94ce6428e769133ede868f	2016-08-13 20:37:43 +00:00
Bartosz Dziewoński	5fc30112c7	Optimize 'count()' function substr_count() is just as fast as looped strpos() when there are no matches, and gets faster as the number of matches increases. Note that this introduces a small change in behavior when the needle is composed of repeated substrings, e.g. 'asdasdasd' or 'aa', and haystack is such that the needle can be matched in overlapping positions, e.g. 'asdasdasdasd' or 'aaaaa'. The old implementation counted overlapping matches, the new one doesn't. I don't think this behavior was intentional and I don't think this change will cause any real problems. Change-Id: Icc905ca34bf08d63e969787a5e3c119d498bf878	2016-04-17 08:32:27 +02:00
Bartosz Dziewoński	7d83540527	Add some tests for behavior of 'count()' function Change-Id: I29a6c91d0780dc9a1eaee6d29d3b1f9c9c708df7	2016-04-17 08:18:29 +02:00
Bartosz Dziewoński	e79b45b71f	Improve ignoring short-circuited operations Previously, 'false & a == b' would actually execute the comparison and count it against the condition limit, while 'false & (a == b)' wouldn't. They behave the same now. mShortCircuit was only checked for the most potentially expensive operations (computing functions and getting variables), all the other operations on bogus values generated by this would be executed and the results ignored later. This probably doesn't noticeably improve performance, but it corrects how the condition limit is counted. Bug: T43693 Change-Id: Id1d5f577b14b6ae6d987ded12689788eb7922474	2016-04-09 16:25:52 +02:00
Ori Livneh	0e36b728e3	Fix double escaping in AFPData::keywordLike() If we don't map '\-' and '\+' to themselves, the leading slash gets escaped, and the resultant pattern only matches a literal slash. Bug: 67670 Change-Id: Ifa1e3edd6f41985a3bb97bfb1497985f8fa64af5	2014-07-11 14:56:42 -07:00
Marius Hoch	35747761fb	Allow running the AbuseFilter parser tests via phpunit I've also added myself to the credits file as I'm the only maintainer of this extension for a while now. Change-Id: Id998172ea2abd70b8243de9db1a96cc2cfa47a64	2013-07-08 19:22:43 +02:00

46 commits