Commit graph

51 commits

Author SHA1 Message Date
Daimona Eaytoy 337771f83b Replace array_map with foreach
This is a micro-optimization, but IMHO it's necessary. The AF parser
code is executed for every active filter, for every
edit/move/deletion/accountcreation. In PHP, foreach is usually faster
than array_map. Especially in the case of variadic functions potentially
taking hundreds of strings, foreach will consume less time.

Bug: T234427
Change-Id: I1beedf419a6637a9a3dd668635645df950ceda21
2019-10-02 11:29:19 +00:00
Daimona Eaytoy 4c8be4d374 Add profiling points throughout the code for the CachingParser switch
Bug: T156095
Change-Id: Ib934be34a953166fe1b94cfe8ed216afe3b906ca
2019-09-18 10:02:55 +00:00
Daimona Eaytoy 6e9a9a3bc2 CachingParser: ensure to catch errors inside short-circuited blocks
This is similar to the old parser: when discarding a node, actually
evaluate it if short-circuit is not allowed.
Add a whole lot of tests for all possible exceptions.
Move the logic to extract a message from an AFPUserVisibleException away
from the parser, to keep unit tests working.

Bug: T232498
Change-Id: I31ee4e255c6a87dd693b9bcd582539fdf57acd45
2019-09-13 21:13:15 +00:00
Daimona Eaytoy 004ccfdb5c Annotate the AST with var names before caching the AST
This implements T230982#5475400, and it should speed up the CachingParser by roughly 40%.

Bug: T230982
Change-Id: I803cc58637d50eb90e57decf243f5ca78075d63d
2019-09-13 19:43:50 +00:00
jenkins-bot 5be19f6f65 Merge "Add a 'strict' option to VariableHolder::getVar" 2019-09-05 19:23:23 +00:00
Daimona Eaytoy 489da0d229 Add a 'strict' option to VariableHolder::getVar
This will help mitigating problems like T230256 by enforcing that the
requested variables must exist. For now, it will only log bad usages,
thus providing a way to identify affected filters and fix them.

Bug: T230256
Change-Id: I7a61916576e444a56f0e07da7b6e5033346226bd
2019-09-04 18:19:23 +00:00
Daimona Eaytoy 13b1e880f2 Hotfix other DUNDEFINED casts to bool
These were spotted on testwiki with wmf.21.

Change-Id: Ic4d67a2b83aedfeb574fa1363a9fc618b2862f95
2019-09-04 18:06:22 +00:00
Daimona Eaytoy d51ca862c6 Move parser tests to /unit
IMHO these can be considered unit tests; they were already fast, but now
they're executed in an instant.
This requires several changes: 1 - delay retrieving messages in
AFPUserVisibleException, to avoid having to deal with i18n whenever we
want to test exceptions; 2 - Use some DI for Parser and Tokenizer.
Equivset-dependend tests are also moved to a new class, thus helping to
fix the AF part of T189560.

Change-Id: If4585bf9bb696857005cf40a0d6985c36ac7e7a8
2019-08-28 16:36:37 +00:00
Daimona Eaytoy 71730f7d44 Warn if a function has been given too many parameters
While this is not as important as throwing for too few parameters, IMHO
it's still important to fail in this case. Mostly because if a function
receives too many parameters, chances are that who wrote the filter
didn't do that intendedly, and thus there may be a hidden bug.
Bonus: fix a few docblocks.

Bug: T230803
Change-Id: Iac2931f17b50ace8c8f4c2faa44b3f54ca134c54
2019-08-26 20:29:49 +02:00
Daimona Eaytoy d515af0ae6 Add a new class for the CachingParser's AST
This allows a little bit more of abstraction: we can store other data in the
tree, without having to store it in a specific node (e.g. the variables map,
which is still unused). It also adds a few typehints, and specializes
the return value of eval'ing the AST: previously, it was the one of
evalNode, which wasn't guaranteed to be an AFPData. Now we have this
guarantee. Last but not least, we can now measure runtime metrics for
evalTree, which doesn't recurse.
Bonus: fix a check in the old parser, which used the wrong variable when
reporting outofbounds errors.

Change-Id: Iff806793b1d968e9bb6220f1459f3d0ac587c7da
2019-08-25 17:29:16 +00:00
jenkins-bot 47838715fa Merge "Allow if without else" 2019-08-20 20:12:19 +00:00
jenkins-bot 5e605aaa62 Merge "Even better handling of DUNDEFINED" 2019-08-20 20:00:52 +00:00
jenkins-bot bf8ccccade Merge "Fix a bug in the return value of the CachingParser" 2019-08-20 19:58:38 +00:00
Daimona Eaytoy af7744781f Allow if without else
Bug: T230727
Depends-On: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45
Change-Id: I3b85087677607573f4fa68681735dc35348dcd87
2019-08-20 19:36:14 +00:00
Daimona Eaytoy 963221ad6d Even better handling of DUNDEFINED
Ensure that the variable isn't set before marking it as DUNDEFINED:
that's only for when we cannot use a default, but if the variable is set
we already have one. Most notably, this fixes conditionals handling: right
now, if you have a conditional with an assignment in both
branches, the variable will be undefined. That's obviously wrong, so
it's fixed in this patch.
Plus: catch only AFPExceptions in a test to avoid unintentionally
catching the assert exception; simplify some assignments using wfSetVar.

Depends-On: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24
Change-Id: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45
2019-08-20 19:17:30 +00:00
Daimona Eaytoy fa76405ea7 Fix a bug in the return value of the CachingParser
This has always been wrong, and remained unnoticed. Also added a
typehint for added safety.

Change-Id: I8a3c31e7385283d95b4712d457784016239a0b3b
2019-08-20 20:54:19 +02:00
Daimona Eaytoy aa867bd370 Better handling of function params in CachingParser
This patch includes various fixes to how func arguments are handled in
CachingParser:
- Add a comment about a future improvement of checkSyntax, which we
  could limit to try building the AST.
- Having enough args for each function is now also checked when
  building the AST. This allows implementing the previous point without
  stopping to report notenoughargs at syntaxcheck-time (otherwise it'd be
  a runtime error). And it also ensure that we check for the params count
  inside skipped branches, e.g. inside if/else: these were already only
  discovered at runtime in CachingParser. The old parser is not affected
  by this change, because when checking syntax it will always execute
  all branches, and at runtime it will skip braces altogether.
- Fix arg count for CachingParser, which previously added a bogus param
  in case of a function called without parameters. This was fixed for
  the other parser in I484fe2994292970276150d2e417801453339e540, and I
  just ported the updated fix. Also note that the CachingParser was
  already failing for e.g. `count()`, but instead of complaining about
  missing arguments, it failed hard when trying to pass NULL to
  evalNode.
- Fixed some tests not to use setExpectedException, which caused the
  previous point to remain unnoticed: calling that method prevents the
  loop from continuing, and thus only the AbuseFilterParser part was
  being executed. The new implementation checks the exception ID and is
  thus more future-proof if the i18n message changes.
- Fixed some function names in error reporting for the old parser.
- The arg count is now checked outside of the function handlers, thus
  it's no more necessary to call checkEnoughArguments at the beginning
  of each handler. This also produces clearer error messages in case of
  aliases (e.g. set/set_var).
- Check the args count even if some of the args are DUNDEFINED. This is
  much easier now that the check is outside of the handler. This will
  make syntax check fail for e.g. `contains_any(added_lines)`.

Bug: T156095
Change-Id: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24
2019-08-20 15:32:02 +00:00
jenkins-bot 7addec7b4a Merge "Make some other AFPData methods non-static" 2019-08-20 14:16:16 +00:00
jenkins-bot 1f45336157 Merge "Move keywords handlers to the Parser" 2019-08-20 14:16:10 +00:00
jenkins-bot f18d0814e2 Merge "Make several AFPData functions non-static" 2019-08-20 14:06:02 +00:00
jenkins-bot f1ab591d27 Merge "Avoid implicit casts from DUNDEFINED to something else" 2019-08-20 13:04:48 +00:00
Daimona Eaytoy d715f6d2c0 Increase cache hits for CachingParser
If $parser->parse returns a falsey value (=null), that's because the
filter doesn't have any statement. But that's not a valid reason not to
cache the filter. Hence, return whatever parse() is returning inside the
callback, so that the result is always cached.

Change-Id: Ib6b0e72d882dc484456a3be6bbc74da36ef48bf7
2019-08-13 18:03:13 +02:00
Daimona Eaytoy 4b0911ee01 Make some other AFPData methods non-static
Change-Id: I22ea337a36f911c57d3dadb9a3c45fc2c8b7c628
2019-08-12 14:40:51 +02:00
Daimona Eaytoy 3f171dc0a5 Move keywords handlers to the Parser
Just like we do for functions, it doesn't really make sense to have
keywords separately, in AFPData.

Change-Id: I208a9b1ce2bd12038e9fbcc515c48d604ec80eb8
2019-08-12 14:29:56 +02:00
Daimona Eaytoy 2fdf091eb9 Make several AFPData functions non-static
The keywords-related ones will be handled in a subsequent patch.

Change-Id: Ifcfad438023ef136dc6f2cd5529e867df9b23789
2019-08-12 14:12:16 +02:00
Daimona Eaytoy 1fe3647268 Avoid implicit casts from DUNDEFINED to something else
This patch keeps the current behaviour for everything (since DUNDEFINED
was always casted to boolean false), but handles the cast at a higher
level instead of relying on what AFPData::castTypes will do. This way
it's easier to spot places where we may get DUNDEFINED, and decide how
to handle them one by one.

Change-Id: I1070e15ea03c7dd4a4231b87afbc42240a558581
2019-08-12 11:18:15 +02:00
Daimona Eaytoy 69ad23da98 Ban variable variables
As explained on phab, it's not worth the effort of keeping this feature.

Bug: T229947
Change-Id: Ic6067cab8e1ede98545e704888c99e2ed9a004e4
2019-08-11 01:47:35 +00:00
Daimona Eaytoy 2ed6272bb2 Partly handle set and set_var in shortcircuit
This is more complicated than the := operator, because the var name
could be a complicated expression, and we have to handle a function
call. This patch only covers the case where the variable name is a
literal, which is enough for WMF production.

Bug: T214674
Change-Id: I6c0f8e95663919a0235b5ccf0c88ad0a539315a7
2019-08-06 16:14:34 +02:00
Daimona Eaytoy 517919fca8 Allow accessing offsets of built-in variables
I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7 almost fixed this bug, but we
also have to make it possible to access builtin variables as arrays.
This will only make sense for a few variables (e.g. added_lines and
removed_lines), but I don't think we should validate it when checking
syntax.

Bug: T198531
Change-Id: I417e1b8d4802bbfccd091ce5c7617659cfd1e4ea
2019-08-04 17:14:44 +00:00
Daimona Eaytoy 9049be3609 Specialize empty AFPData types
As described in T156096#5389655.

Change-Id: Ifbf95a6b72a280cd77db6affbd8d642499bbfedc
2019-08-04 15:26:57 +00:00
Daimona Eaytoy a85e1ccc59 Make AbuseFilterParser::$funcCache non-static
Change-Id: I312efe3ce4d1f06e697aa4564aeec1bacbaf97d3
2019-08-03 09:19:49 +00:00
Daimona Eaytoy 09d0254172 Better handling of DNONE
This patch includes:
 * Making it possible to access offsets of a DNONE (returning a DNONE)
 * Initializing user-defined variables as DNONE inside short-circuited branches
 * Make DNONE propagate with other operators
 * Make DNONE count as false for logic operators
 * Remove a now-outaded bit in doLevelAtom. In case of shortcircuit,
   $result is now DNONE instead of DNULL, and thus it's possible to
   access offsets of it. Performance++!
 * Don't allow modifying or adding an element of a DNONE as if it were an
    array (to avoid inconsistencies)

This re-applies Id85c673337fa90a3782fd22eb9690cd996967111 with several fixes.

NOTE: Haven't tested locally, although I'm pretty confident thanks to
the amount of tests added.

Bug: T214674
Bug: T228677
Change-Id: I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7
2019-08-02 21:05:08 +00:00
Daimona Eaytoy 13cdb86dd2 Revert "Initialize user-defined variables during shortcircuit"
Reason for revert: T214674#5374806

This reverts commit 56e6117afd.

Bug: T214674
Change-Id: Iccce248d2693cd9877a740b74e72a577e730435e
2019-07-29 23:06:23 +00:00
Daimona Eaytoy 56e6117afd Initialize user-defined variables during shortcircuit
Bug: T214674
Depends-On: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05
Change-Id: Id85c673337fa90a3782fd22eb9690cd996967111
2019-07-23 12:20:53 +00:00
Daimona Eaytoy 18d7d2ed62 Start using AFPData::DNONE
This should allow more flexibility when checking syntax, and a saner
behaviour overall.
Aside from not throwing exception in certain cases, the results should
be almost equal to the ones you would get without this patch. However,
there are still a few things to improve (which for convenience I wrote
inside the parser test) and many to test.

Bug: T204654
Depends-On: I69bfec45c76509fb1112641393f78e8d8834adcd
Change-Id: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05
2019-07-14 08:48:47 +00:00
jenkins-bot c3dcd95733 Merge "Start making APFData members private" 2019-07-09 09:23:17 +00:00
jenkins-bot 35ab35978b Merge "Add a new data type for non-initialized stuff" 2019-07-09 08:58:48 +00:00
Daimona Eaytoy 3aaeb20063 Start making APFData members private
$data and $type are meant to be read-only and should have getter
functions, but as usual they're just public. Add getter methods, a
comment with a @private annotation and remove usages in our codebase.

Change-Id: I5e51efc9f982a4e340b48d20cb1b38a75bb10021
2019-07-09 10:57:00 +02:00
Daimona Eaytoy 304b58d46a Make AbuseFilterVariableHolder::mVars private
This property is meant to be private, since it has all kinds of
getters/setters, aside from one which is introduced in this patch.

Change-Id: I217b1e22cabd3c0468c84b1d6a69a6ed3c6fa8e6
2019-07-08 16:25:10 +02:00
Daimona Eaytoy bc79962803 Add a new data type for non-initialized stuff
Split from I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05.
This adds a new data type to use for empty AFPDatas. Using NULL for that
makes it impossible to distinguish cases where we really got a null
value, and cases where there was nothing to parse.
For now, DNONE is the same as DNULL, but I've explicited DNULL where
necessary. A subsequent patch will make proper use of DNONE.

Bug: T156096
Change-Id: I69bfec45c76509fb1112641393f78e8d8834adcd
2019-07-08 15:35:02 +02:00
Daimona Eaytoy 382751a707 Move conditions-related stuff inside AbuseFilterParser
Instead of relying on static methods and members in the AbuseFilter
class, move everything related to conditions inside the Parser, as the
amount of used conditions is something pertaining a single
AbuseFilter(Caching)Parser instance.
This change requires changing some signatures and adding parameters,
but will make introducing the new AbuseFilterRunner class easier (and
that will clean signatures, too).

Depends-On: I5b29ff556eca45fe59d15e2e3df4d06f1f6b3934
Change-Id: I7c1ea17adf7f42cf9260d416906bfbf3b8a20688
2019-06-19 15:14:17 +00:00
Daimona Eaytoy 39fc7c12af Restore unit tests for CachingParser and fix it
Added cachingParser back to *all* the parser tests, fixed a couple of
differences with the normal parser, and added a couple of tests so that
any cachingParser-related file has 100% coverage. Also move the remaining
get_matches tests inside parserTests, and specify the parser used in case of failure.
This also adds a new base class for parser-related tests with a couple
of util methods.

Bug: T201193
Change-Id: I980aec3481a52ecc35f1811a366014a5581a7cdb
2019-05-25 10:55:24 +02:00
Daimona Eaytoy f2c1beec44 Replace double-equals with triple-equals
Since double-equals are evil. I left some of them in place where I
wasn't sure, but I may be changed some which were intended to be
doubles. It could be a good idea to delay merging this patch until we'll
have more code coverage.

Change-Id: I1721a3ba532d481e3ecf35f51099c1438b6b73b2
2019-03-22 16:12:13 +01:00
Daimona Eaytoy 53ab2b5067 Fix documentation errors reported by Phan
Change-Id: I5788147ba1998235ded9eedbf64ebad37fce236f
2019-03-16 09:27:05 +00:00
Daimona Eaytoy cea1b0aec3 Compact some other comments
Remove unnecessary ones, make clear what the cryptic ones mean, and
inline them when possible.

Change-Id: I384859871a66ced8cb0d81260c06c5a5b278866f
2018-07-17 17:17:44 +02:00
Daimona Eaytoy c75bc35f7d Rename lists to arrays
Arrays were introduced with the name "lists". While it **may** look
user-friendlier and so on, it actually uses a wrong name: lists are
different from arrays. I ran a grep and I should've replaced
every occurrence, plus everything seems to work, however a double check
wouldn't be bad.

Change-Id: I6a858f02f5dd9250ba7e1abf9c6422fd98758c9e
2018-06-26 14:42:23 +02:00
Daimona Eaytoy caa4b1c763 Add phan configuration
This is taken from I6a57a28f22600aafb2e529587ecce6083e9f7da4 and makes
all the needed changes to make phan pass. Seccheck will instead fail,
but since it's not clear how to fix it (and it is non-voting), for the
moment we may merge this and enable phan on IC.

Bug: T192325
Change-Id: I77648b6f8e146114fd43bb0f4dfccdb36b7ac1ac
2018-04-30 08:32:58 +00:00
Daimona Eaytoy 3c3a521fec Fix coding conventions exclusion rules
This should fix every error with excluded rules, leaving only the one
for $wgTitle. A double check would be nice in order to avoid regressions
due to stupid mistakes.

Bug: T178007
Change-Id: I22c179f3a01d652640304b59e43fcb5b5a9abac3
2018-04-20 08:40:18 +00:00
Umherirrender a2ebd0c70a Improve some parameter docs
Change-Id: Ibac10a20243a4eedd826485d56eddd5234da6fec
2017-10-07 00:54:58 +02:00
Umherirrender a063e33ee8 Use short array syntax
Done by phpcbf over composer fix

Change-Id: I53fd1fc8d056b9b60194d2d630852cfca37aadea
2017-06-15 17:02:57 +02:00