Commit graph

191 commits

Author SHA1 Message Date
Daimona Eaytoy a77a59b962 Hard-deprecate empty operands
This bumps the level to WARN, and makes it very clear that people should
fix the affected filters. It also removes the calling method, which was
mostly meant for debugging purposes, and changes the type to 'op_type'
to avoid conflicting with type:mediawiki in logstash.

Bug: T156096
Change-Id: Ie73f1604e8ed82bc2e1be9fc90fa065be37889a3
2019-11-12 11:39:25 +00:00
Daimona Eaytoy f7ac35d5c6 Hard-deprecate too many params
Bug: T230803
Change-Id: Icec8bcb8ab23956654857acc8b3d235889f587a9
2019-11-10 12:59:33 +00:00
jenkins-bot 91bc961712 Merge "Check for 0-like floats passed to the modulo operator" 2019-11-10 11:51:28 +00:00
Daimona Eaytoy c0f8374624 Check for 0-like floats passed to the modulo operator
That throws an error in PHP.

Bug: T237459
Change-Id: Ia0b29d6a8b9f4aac6b5b72ce8f2f45afb03f4c99
2019-11-10 11:22:04 +00:00
jenkins-bot 7ff4b95aec Merge "Expand the list of types that can be cast to int" 2019-11-10 11:00:36 +00:00
Daimona Eaytoy 585d6cdb24 Make to sure to report division by zero when the LHS is undefined
Bug: T234339
Change-Id: I1575ec013c1e7e321a8f13f40804ebc5ab076268
2019-11-08 14:08:52 +00:00
Daimona Eaytoy 1abaff1aac Better handling of keywords and functions
Always run the keyword/function handler, even if there are DUNDEFINED
arguments, so that the handler can perform further validation on the
input and report any error to the user. However, replace DUNDEFINED with
DNULL before running the handler, to avoid special-casing DUNDEFINED in
every handler. If any argument was a DUNDEFINED, we will return
DUNDEFINED anyway.

Also centralize the keyword handling logic to a new method, like it
happens for functions.

Bug: T234339
Change-Id: I875cb77418a39790e91fe5867c49917bfe406ed4
2019-11-08 15:07:20 +01:00
Daimona Eaytoy e98799a00a Centralize the code for calling keywords
This allows sharing the code between cachingparser and the old parser
(for DRY-ness), and even when the old parser will be killed, having the
logic outside of the generic parse method seems saner.

This copies what I446a307e5395ea8cc8ec5ca5d5390b074bea2f24 did for
functions.

Change-Id: Ie6290243a6c78661510a9b4cb713d6e7b2778248
2019-11-08 15:02:17 +01:00
Daimona Eaytoy b7c7ae168d Explicitly forbid negative indexes in arrays
This emits its own error because:
1- It's clearer to understand
2- It's easier to find where we're dealing with negative offsets, if
we'll ever want to allow that.

Note that trying to use a negative index already results in a hard PHP
error being thrown.

Bug: T237219
Change-Id: Ib11eaaca5e21f740269141c75e62bac48093e8d0
2019-11-08 05:55:56 +00:00
Daimona Eaytoy a7b28369ea Expand the list of types that can be cast to int
Bug: T237624
Change-Id: I2220cb8a8ec998a433a4469d7e0591ec0b4f2b12
2019-11-07 15:14:17 +01:00
Daimona Eaytoy b9e4475985 build: Upgrade mediawiki-phan-config to 0.8.0
This is to verify that our CI is able to handle the new version.

Bug: T235049
Change-Id: Ib7427e15f673a575738489476e604c387f449ddd
2019-10-09 19:12:51 +02:00
jenkins-bot c6ee722273 Merge "Remove AFPData::dup" 2019-10-04 19:42:52 +00:00
jenkins-bot 9ab13cf24b Merge "Replace array_map with foreach" 2019-10-04 19:42:49 +00:00
Daimona Eaytoy c7fa503e9b Remove AFPData::dup
The method, which simply duplicates an AFPData instance, is only used
when casting types, to return a different instance when the object
already has the desired type.
However, nothing is assuming that, so we can just return the original
instance and save some time.

Bug: T234427
Change-Id: Id8067b418a00260ceead35f234e55268390699ab
2019-10-04 19:15:08 +00:00
Daimona Eaytoy 703835e835 Drop HHVM support
Change-Id: Ib7ccb4f68278ba8ca009e9d18e9d8b127f799cde
2019-10-03 12:27:18 +00:00
Daimona Eaytoy 337771f83b Replace array_map with foreach
This is a micro-optimization, but IMHO it's necessary. The AF parser
code is executed for every active filter, for every
edit/move/deletion/accountcreation. In PHP, foreach is usually faster
than array_map. Especially in the case of variadic functions potentially
taking hundreds of strings, foreach will consume less time.

Bug: T234427
Change-Id: I1beedf419a6637a9a3dd668635645df950ceda21
2019-10-02 11:29:19 +00:00
Daimona Eaytoy 4c8be4d374 Add profiling points throughout the code for the CachingParser switch
Bug: T156095
Change-Id: Ib934be34a953166fe1b94cfe8ed216afe3b906ca
2019-09-18 10:02:55 +00:00
jenkins-bot 48713c824b Merge "Throw AFPUserVisibleExceptions for empty operands in CachingParser" 2019-09-15 08:36:39 +00:00
Daimona Eaytoy a4e25c1ac9 Throw AFPUserVisibleExceptions for empty operands in CachingParser
Instead of TypeErrors. Basically, only empty parenthesis had to be
fixed.

Bug: T156096
Change-Id: I019615c7bfaa179c2184b5d3ea2c6b5da91366e3
2019-09-14 18:35:40 +00:00
Daimona Eaytoy 5267082c85 Better logging for unset variables
We have many log entries, so we need some more debug data.

Bug: T230256
Change-Id: I0e9638c1ffe537ea6cfd6886ff32ef447fdacc28
2019-09-14 16:49:55 +00:00
Daimona Eaytoy 6e9a9a3bc2 CachingParser: ensure to catch errors inside short-circuited blocks
This is similar to the old parser: when discarding a node, actually
evaluate it if short-circuit is not allowed.
Add a whole lot of tests for all possible exceptions.
Move the logic to extract a message from an AFPUserVisibleException away
from the parser, to keep unit tests working.

Bug: T232498
Change-Id: I31ee4e255c6a87dd693b9bcd582539fdf57acd45
2019-09-13 21:13:15 +00:00
Daimona Eaytoy 004ccfdb5c Annotate the AST with var names before caching the AST
This implements T230982#5475400, and it should speed up the CachingParser by roughly 40%.

Bug: T230982
Change-Id: I803cc58637d50eb90e57decf243f5ca78075d63d
2019-09-13 19:43:50 +00:00
Daimona Eaytoy 7b06be0204 Allow dangling commas in variargs
This is because there are many filters using this feature. Moreover, it
could make it a little easier to add new arguments, just like dangling
commas in PHP arrays do.
Also re-align the CachingParser code of doLevelFunctions to the one in
the old Parser.

Bug: T153251
Change-Id: Ie4325159f47310788da57415a5e36e62aa4efad0
2019-09-07 11:19:14 +02:00
jenkins-bot 5be19f6f65 Merge "Add a 'strict' option to VariableHolder::getVar" 2019-09-05 19:23:23 +00:00
Daimona Eaytoy 489da0d229 Add a 'strict' option to VariableHolder::getVar
This will help mitigating problems like T230256 by enforcing that the
requested variables must exist. For now, it will only log bad usages,
thus providing a way to identify affected filters and fix them.

Bug: T230256
Change-Id: I7a61916576e444a56f0e07da7b6e5033346226bd
2019-09-04 18:19:23 +00:00
Daimona Eaytoy 13b1e880f2 Hotfix other DUNDEFINED casts to bool
These were spotted on testwiki with wmf.21.

Change-Id: Ic4d67a2b83aedfeb574fa1363a9fc618b2862f95
2019-09-04 18:06:22 +00:00
Daimona Eaytoy ce8539e2a5 Move parser tests back to /unit
Using `new LanguageEn()` involved a global, so use a MockObject instead.
Also fix LoggerFactory usage in Tokenizer to use DI instead.

Change-Id: I94d03f9459ab6444e239386eb96a0c2434bfe3dc
2019-09-03 13:23:11 +00:00
jenkins-bot 25d63aa639 Merge "Add new number syntax as experimental" 2019-08-31 17:12:10 +00:00
Daimona Eaytoy d51ca862c6 Move parser tests to /unit
IMHO these can be considered unit tests; they were already fast, but now
they're executed in an instant.
This requires several changes: 1 - delay retrieving messages in
AFPUserVisibleException, to avoid having to deal with i18n whenever we
want to test exceptions; 2 - Use some DI for Parser and Tokenizer.
Equivset-dependend tests are also moved to a new class, thus helping to
fix the AF part of T189560.

Change-Id: If4585bf9bb696857005cf40a0d6985c36ac7e7a8
2019-08-28 16:36:37 +00:00
Daimona Eaytoy 71730f7d44 Warn if a function has been given too many parameters
While this is not as important as throwing for too few parameters, IMHO
it's still important to fail in this case. Mostly because if a function
receives too many parameters, chances are that who wrote the filter
didn't do that intendedly, and thus there may be a hidden bug.
Bonus: fix a few docblocks.

Bug: T230803
Change-Id: Iac2931f17b50ace8c8f4c2faa44b3f54ca134c54
2019-08-26 20:29:49 +02:00
Daimona Eaytoy 4d86758a49 Add new number syntax as experimental
For now it will only report successful parse. Next step is formally
deprecating the old one (escalated to warning), then removing it in
favour of the new one (in another MW version).

Bug: T212730
Change-Id: I5dd11fd67d8e57d1d0c52ddfa026920ebfc5ee13
2019-08-26 08:15:55 +00:00
jenkins-bot ff2f6ee26f Merge "Add a new class for the CachingParser's AST" 2019-08-25 18:00:24 +00:00
Daimona Eaytoy d515af0ae6 Add a new class for the CachingParser's AST
This allows a little bit more of abstraction: we can store other data in the
tree, without having to store it in a specific node (e.g. the variables map,
which is still unused). It also adds a few typehints, and specializes
the return value of eval'ing the AST: previously, it was the one of
evalNode, which wasn't guaranteed to be an AFPData. Now we have this
guarantee. Last but not least, we can now measure runtime metrics for
evalTree, which doesn't recurse.
Bonus: fix a check in the old parser, which used the wrong variable when
reporting outofbounds errors.

Change-Id: Iff806793b1d968e9bb6220f1459f3d0ac587c7da
2019-08-25 17:29:16 +00:00
jenkins-bot 6196801178 Merge "Log more empty operands" 2019-08-24 20:53:01 +00:00
Daimona Eaytoy 2d031d0bee Log more empty operands
And fix a couple of minor bugs.

Bug: T156096
Depends-On: I3b85087677607573f4fa68681735dc35348dcd87
Change-Id: Ia4c713a1d45827f6a8bc5566a8d8835c49f8108a
2019-08-24 19:59:53 +00:00
jenkins-bot 47838715fa Merge "Allow if without else" 2019-08-20 20:12:19 +00:00
jenkins-bot 5e605aaa62 Merge "Even better handling of DUNDEFINED" 2019-08-20 20:00:52 +00:00
jenkins-bot bf8ccccade Merge "Fix a bug in the return value of the CachingParser" 2019-08-20 19:58:38 +00:00
Daimona Eaytoy af7744781f Allow if without else
Bug: T230727
Depends-On: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45
Change-Id: I3b85087677607573f4fa68681735dc35348dcd87
2019-08-20 19:36:14 +00:00
Daimona Eaytoy 963221ad6d Even better handling of DUNDEFINED
Ensure that the variable isn't set before marking it as DUNDEFINED:
that's only for when we cannot use a default, but if the variable is set
we already have one. Most notably, this fixes conditionals handling: right
now, if you have a conditional with an assignment in both
branches, the variable will be undefined. That's obviously wrong, so
it's fixed in this patch.
Plus: catch only AFPExceptions in a test to avoid unintentionally
catching the assert exception; simplify some assignments using wfSetVar.

Depends-On: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24
Change-Id: I8e7f7710b8cb37ada8531b631456a3ce7b27ee45
2019-08-20 19:17:30 +00:00
Daimona Eaytoy fa76405ea7 Fix a bug in the return value of the CachingParser
This has always been wrong, and remained unnoticed. Also added a
typehint for added safety.

Change-Id: I8a3c31e7385283d95b4712d457784016239a0b3b
2019-08-20 20:54:19 +02:00
Daimona Eaytoy aa867bd370 Better handling of function params in CachingParser
This patch includes various fixes to how func arguments are handled in
CachingParser:
- Add a comment about a future improvement of checkSyntax, which we
  could limit to try building the AST.
- Having enough args for each function is now also checked when
  building the AST. This allows implementing the previous point without
  stopping to report notenoughargs at syntaxcheck-time (otherwise it'd be
  a runtime error). And it also ensure that we check for the params count
  inside skipped branches, e.g. inside if/else: these were already only
  discovered at runtime in CachingParser. The old parser is not affected
  by this change, because when checking syntax it will always execute
  all branches, and at runtime it will skip braces altogether.
- Fix arg count for CachingParser, which previously added a bogus param
  in case of a function called without parameters. This was fixed for
  the other parser in I484fe2994292970276150d2e417801453339e540, and I
  just ported the updated fix. Also note that the CachingParser was
  already failing for e.g. `count()`, but instead of complaining about
  missing arguments, it failed hard when trying to pass NULL to
  evalNode.
- Fixed some tests not to use setExpectedException, which caused the
  previous point to remain unnoticed: calling that method prevents the
  loop from continuing, and thus only the AbuseFilterParser part was
  being executed. The new implementation checks the exception ID and is
  thus more future-proof if the i18n message changes.
- Fixed some function names in error reporting for the old parser.
- The arg count is now checked outside of the function handlers, thus
  it's no more necessary to call checkEnoughArguments at the beginning
  of each handler. This also produces clearer error messages in case of
  aliases (e.g. set/set_var).
- Check the args count even if some of the args are DUNDEFINED. This is
  much easier now that the check is outside of the handler. This will
  make syntax check fail for e.g. `contains_any(added_lines)`.

Bug: T156095
Change-Id: I446a307e5395ea8cc8ec5ca5d5390b074bea2f24
2019-08-20 15:32:02 +00:00
jenkins-bot 7addec7b4a Merge "Make some other AFPData methods non-static" 2019-08-20 14:16:16 +00:00
jenkins-bot 1f45336157 Merge "Move keywords handlers to the Parser" 2019-08-20 14:16:10 +00:00
jenkins-bot f18d0814e2 Merge "Make several AFPData functions non-static" 2019-08-20 14:06:02 +00:00
jenkins-bot f1ab591d27 Merge "Avoid implicit casts from DUNDEFINED to something else" 2019-08-20 13:04:48 +00:00
jenkins-bot ea01809f5e Merge "Add the filter ID to empty operand logging" 2019-08-20 13:01:14 +00:00
jenkins-bot d32b03ca10 Merge "Increase cache hits for CachingParser" 2019-08-20 12:50:31 +00:00
jenkins-bot d0b30c2534 Merge "Make parser aware of the filter it is parsing" 2019-08-20 12:50:26 +00:00
Daimona Eaytoy d715f6d2c0 Increase cache hits for CachingParser
If $parser->parse returns a falsey value (=null), that's because the
filter doesn't have any statement. But that's not a valid reason not to
cache the filter. Hence, return whatever parse() is returning inside the
callback, so that the result is always cached.

Change-Id: Ib6b0e72d882dc484456a3be6bbc74da36ef48bf7
2019-08-13 18:03:13 +02:00