Go to file
Sorawee Porncharoenwase 320e3d696f Add a static analyzer for the filter language
This commit adds a class AFPSyntaxChecker which can statically analyze
a filter code to detect the following errors:

- unbound variables (which comes in two modes: conservative and liberal,
  default to conservative)
- unused variables (disabled by default for compatibilty)
- assignment on built-in identifiers
- function application's arity mismatch
- function application's invalid function name
- non-string literal in the first argument of set / set_var

The existing parser and evaluator are modified as follows:

- The new (caching) evaluator no longer needs to perform variable
  hoisting at runtime.
  - Note that for array assignment, this changes the semantics.
- The new parser is more lenient, reducing parsing errors.
  The static analyzer will catch these errors instead, allowing us
  to give a much better error message and reduces the complexity of
  the parser.
  * The parser now allows function name to be any identifier.
  * The parser now allows arity mismatch to occur.
  * The parser now allows the first argument of set to be any expression.

Concretely, obvious changes that users will see are:

1. a := [1]; false & (a[] := 2); a[0] === 1

   would evaluate to true, while it used to evaluate to the undefined value
   due to hoisting

2. f(1)

   will now error with 'f is not a valid function' as opposed to
   'Unexpected "T_BRACE"'

3. length

   will now error with 'Illegal use of built-in identifier "length"'
   as opposed to 'Expected a ('

Appendix: conservative and liberal mode

The conservative mode is completely compatible with the current evaluator.
That is,

false & (a := 1); a

will not deem `a` as unbound, though this is actually undesirable because
`a` would then be bound to the troublesome undefined value.

The liberal mode rejects the above pattern by deeming `a` as unbound.
However, it also rejects

true & (a := 1); a

even though (a := 1) is always executed. Since there are several filters
in Wikimedia projects that rely on this behavior, we default the mode
to conservative for now.

Note that even the liberal mode doesn't really respect lexical scope
appeared in some other programming languages (see also T234690).
For instance:

(if true then (a := 1) else (a := 2) end); a

would be accepted by the liberal checker, even though under lexical scope,
`a` would be unbound. However, it is unlikely that lexical scope
will be suitable for the filter language, as most filters in
Wikimedia projects that have user-defined variable do violate lexical scope.

Bug: T260903
Bug: T238709
Bug: T237610
Bug: T234690
Bug: T231536
Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2
2021-08-31 03:28:24 +02:00
.phan Notify of a throttled filter 2020-12-19 10:31:29 +01:00
db_patches Give MySQL indexes explicit names, align MySQL and SQLite 2021-04-15 11:30:30 +02:00
i18n Add a static analyzer for the filter language 2021-08-31 03:28:24 +02:00
includes Add a static analyzer for the filter language 2021-08-31 03:28:24 +02:00
maintenance build: Updating dependencies 2021-07-21 18:51:18 +00:00
modules Check response code and prevent exception in worker-abusefilter 2021-02-12 00:01:51 +00:00
tests Add a static analyzer for the filter language 2021-08-31 03:28:24 +02:00
.eslintrc.json build: Updating eslint-config-wikimedia to 0.19.0 2021-03-10 23:05:48 +00:00
.gitignore Add config for Selenium and basic tests 2019-09-17 16:23:07 +00:00
.gitreview Whoops, track not trace 2016-10-24 17:01:30 -07:00
.phpcs.xml Merge "Use FauxRequest::setUpload in AbuseFilterUploadTestTrait::doUpload" 2021-05-15 12:22:10 +00:00
.stylelintrc.json build: Bump devDependencies to latest 2018-02-10 21:00:53 +00:00
AbuseFilter.alias.php Add aliases for Serbian language 2018-12-24 02:16:55 +00:00
CODE_OF_CONDUCT.md build: Updating mediawiki/phan-taint-check-plugin to 1.4.0 2018-09-01 05:29:54 +00:00
composer.json build: Updating dependencies 2021-07-21 18:51:18 +00:00
COPYING
extension.json Switch filterable actions hooks to the new system 2021-08-16 14:18:56 +00:00
Gruntfile.js eslint config tweaks 2020-06-09 19:39:03 +01:00
package-lock.json build: Updating path-parse to 1.0.7 2021-08-11 06:47:32 +00:00
package.json selenium: Upgrade WebdriverIO to v7 2021-06-09 20:45:38 +05:30