This will decouple a bit the huge and chaotic tangle of AF classes. Some
boilerplate code for AbuseFilter services is also added with this patch.
Note that this requires injecting a KeywordsManager in
AbuseFilterVariableHolder, or unit tests would fail. This is still
incomplete, and the Manager is only injected in tests, because
VariableHolder still has to be refactored.
The test for the UpdateVarDumps script had to be updated, because
serializing VHs in there was a bad choice. As pointed out in a comment,
the test is likely going to break again once we remove the BC code, but
I hope that we'll be able to remove the test at that point.
Change-Id: I12a656a310adb8c5f75cab63f6db9e121e109717
Even if the array is DUNDEFINED, we need to check the offset to ensure
that it's valid.
Bug: T237351
Change-Id: Ibfa360c4ae1d80abe14d9fdf66991b76cb5954df
For the new parser, xhgui shows that AbuseFilterParser::getVarValue is
taking up a lot of time; in turn, most of the time spent inside
getVarValue is used to log the use of deprecated variables. Hence, given
that:
- We should keep the new parser performant
- There are tons of deprecated variables out there and they likely
won't be replaced
- Having gazillions of debugLog entries doesn't help
log them only in the cached phase.
Bug: T234427
Change-Id: I2bfc692c829c3cbe889e5076f5205e2c99097087
This is identical to I8a3c31e7385283d95b4712d457784016239a0b3b, except
for the array append case.
Bug: T236870
Change-Id: Iac033ba467232f6ff110d575920e968759ce0e15
This allows sharing the code between cachingparser and the old parser
(for DRY-ness), and even when the old parser will be killed, having the
logic outside of the generic parse method seems saner.
This copies what I446a307e5395ea8cc8ec5ca5d5390b074bea2f24 did for
functions.
Change-Id: Ie6290243a6c78661510a9b4cb713d6e7b2778248
This emits its own error because:
1- It's clearer to understand
2- It's easier to find where we're dealing with negative offsets, if
we'll ever want to allow that.
Note that trying to use a negative index already results in a hard PHP
error being thrown.
Bug: T237219
Change-Id: Ib11eaaca5e21f740269141c75e62bac48093e8d0
This is a micro-optimization, but IMHO it's necessary. The AF parser
code is executed for every active filter, for every
edit/move/deletion/accountcreation. In PHP, foreach is usually faster
than array_map. Especially in the case of variadic functions potentially
taking hundreds of strings, foreach will consume less time.
Bug: T234427
Change-Id: I1beedf419a6637a9a3dd668635645df950ceda21
This is similar to the old parser: when discarding a node, actually
evaluate it if short-circuit is not allowed.
Add a whole lot of tests for all possible exceptions.
Move the logic to extract a message from an AFPUserVisibleException away
from the parser, to keep unit tests working.
Bug: T232498
Change-Id: I31ee4e255c6a87dd693b9bcd582539fdf57acd45
This implements T230982#5475400, and it should speed up the CachingParser by roughly 40%.
Bug: T230982
Change-Id: I803cc58637d50eb90e57decf243f5ca78075d63d
This will help mitigating problems like T230256 by enforcing that the
requested variables must exist. For now, it will only log bad usages,
thus providing a way to identify affected filters and fix them.
Bug: T230256
Change-Id: I7a61916576e444a56f0e07da7b6e5033346226bd
IMHO these can be considered unit tests; they were already fast, but now
they're executed in an instant.
This requires several changes: 1 - delay retrieving messages in
AFPUserVisibleException, to avoid having to deal with i18n whenever we
want to test exceptions; 2 - Use some DI for Parser and Tokenizer.
Equivset-dependend tests are also moved to a new class, thus helping to
fix the AF part of T189560.
Change-Id: If4585bf9bb696857005cf40a0d6985c36ac7e7a8
While this is not as important as throwing for too few parameters, IMHO
it's still important to fail in this case. Mostly because if a function
receives too many parameters, chances are that who wrote the filter
didn't do that intendedly, and thus there may be a hidden bug.
Bonus: fix a few docblocks.
Bug: T230803
Change-Id: Iac2931f17b50ace8c8f4c2faa44b3f54ca134c54
This allows a little bit more of abstraction: we can store other data in the
tree, without having to store it in a specific node (e.g. the variables map,
which is still unused). It also adds a few typehints, and specializes
the return value of eval'ing the AST: previously, it was the one of
evalNode, which wasn't guaranteed to be an AFPData. Now we have this
guarantee. Last but not least, we can now measure runtime metrics for
evalTree, which doesn't recurse.
Bonus: fix a check in the old parser, which used the wrong variable when
reporting outofbounds errors.
Change-Id: Iff806793b1d968e9bb6220f1459f3d0ac587c7da
Ensure that the variable isn't set before marking it as DUNDEFINED:
that's only for when we cannot use a default, but if the variable is set
we already have one. Most notably, this fixes conditionals handling: right
now, if you have a conditional with an assignment in both
branches, the variable will be undefined. That's obviously wrong, so
it's fixed in this patch.
Plus: catch only AFPExceptions in a test to avoid unintentionally
catching the assert exception; simplify some assignments using wfSetVar.
Depends-On: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24
Change-Id: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45
This patch includes various fixes to how func arguments are handled in
CachingParser:
- Add a comment about a future improvement of checkSyntax, which we
could limit to try building the AST.
- Having enough args for each function is now also checked when
building the AST. This allows implementing the previous point without
stopping to report notenoughargs at syntaxcheck-time (otherwise it'd be
a runtime error). And it also ensure that we check for the params count
inside skipped branches, e.g. inside if/else: these were already only
discovered at runtime in CachingParser. The old parser is not affected
by this change, because when checking syntax it will always execute
all branches, and at runtime it will skip braces altogether.
- Fix arg count for CachingParser, which previously added a bogus param
in case of a function called without parameters. This was fixed for
the other parser in I484fe2994292970276150d2e417801453339e540, and I
just ported the updated fix. Also note that the CachingParser was
already failing for e.g. `count()`, but instead of complaining about
missing arguments, it failed hard when trying to pass NULL to
evalNode.
- Fixed some tests not to use setExpectedException, which caused the
previous point to remain unnoticed: calling that method prevents the
loop from continuing, and thus only the AbuseFilterParser part was
being executed. The new implementation checks the exception ID and is
thus more future-proof if the i18n message changes.
- Fixed some function names in error reporting for the old parser.
- The arg count is now checked outside of the function handlers, thus
it's no more necessary to call checkEnoughArguments at the beginning
of each handler. This also produces clearer error messages in case of
aliases (e.g. set/set_var).
- Check the args count even if some of the args are DUNDEFINED. This is
much easier now that the check is outside of the handler. This will
make syntax check fail for e.g. `contains_any(added_lines)`.
Bug: T156095
Change-Id: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24
If $parser->parse returns a falsey value (=null), that's because the
filter doesn't have any statement. But that's not a valid reason not to
cache the filter. Hence, return whatever parse() is returning inside the
callback, so that the result is always cached.
Change-Id: Ib6b0e72d882dc484456a3be6bbc74da36ef48bf7
Just like we do for functions, it doesn't really make sense to have
keywords separately, in AFPData.
Change-Id: I208a9b1ce2bd12038e9fbcc515c48d604ec80eb8
This patch keeps the current behaviour for everything (since DUNDEFINED
was always casted to boolean false), but handles the cast at a higher
level instead of relying on what AFPData::castTypes will do. This way
it's easier to spot places where we may get DUNDEFINED, and decide how
to handle them one by one.
Change-Id: I1070e15ea03c7dd4a4231b87afbc42240a558581
This is more complicated than the := operator, because the var name
could be a complicated expression, and we have to handle a function
call. This patch only covers the case where the variable name is a
literal, which is enough for WMF production.
Bug: T214674
Change-Id: I6c0f8e95663919a0235b5ccf0c88ad0a539315a7
I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7 almost fixed this bug, but we
also have to make it possible to access builtin variables as arrays.
This will only make sense for a few variables (e.g. added_lines and
removed_lines), but I don't think we should validate it when checking
syntax.
Bug: T198531
Change-Id: I417e1b8d4802bbfccd091ce5c7617659cfd1e4ea
This patch includes:
* Making it possible to access offsets of a DNONE (returning a DNONE)
* Initializing user-defined variables as DNONE inside short-circuited branches
* Make DNONE propagate with other operators
* Make DNONE count as false for logic operators
* Remove a now-outaded bit in doLevelAtom. In case of shortcircuit,
$result is now DNONE instead of DNULL, and thus it's possible to
access offsets of it. Performance++!
* Don't allow modifying or adding an element of a DNONE as if it were an
array (to avoid inconsistencies)
This re-applies Id85c673337fa90a3782fd22eb9690cd996967111 with several fixes.
NOTE: Haven't tested locally, although I'm pretty confident thanks to
the amount of tests added.
Bug: T214674
Bug: T228677
Change-Id: I5ec4ab44c4e88aaf18c0d7b73355d27050beeda7
This should allow more flexibility when checking syntax, and a saner
behaviour overall.
Aside from not throwing exception in certain cases, the results should
be almost equal to the ones you would get without this patch. However,
there are still a few things to improve (which for convenience I wrote
inside the parser test) and many to test.
Bug: T204654
Depends-On: I69bfec45c76509fb1112641393f78e8d8834adcd
Change-Id: I5a14d4b2bc3ffd9caaaa095f16f36b9b6009db05
$data and $type are meant to be read-only and should have getter
functions, but as usual they're just public. Add getter methods, a
comment with a @private annotation and remove usages in our codebase.
Change-Id: I5e51efc9f982a4e340b48d20cb1b38a75bb10021