Commit graph

9 commits

Author SHA1 Message Date
Daimona Eaytoy 7c26c4b8d5 More cleanup for parser-related classes
Change-Id: I6a2bbf519e1d5c6fe2778f69624bd80b9ea1ef86
2021-09-10 12:50:20 +00:00
Daimona Eaytoy f8e9ac7e2a Rename AbuseFilterCachingParser -> FilterEvaluator
It's an evaluator, not a parser.

Change-Id: Ib6d33e8423ea72709cf5a33f4397ba33e352ea80
2021-09-08 13:40:47 +02:00
Daimona Eaytoy 6684ea6450 Remove AFPTransitionBase
Also cleanup the mPos hack in the CachingParser.

Change-Id: Ib5693802a3ceb80cb736880ed65e27340abef689
2021-09-06 19:33:48 +00:00
Sorawee Porncharoenwase 320e3d696f Add a static analyzer for the filter language
This commit adds a class AFPSyntaxChecker which can statically analyze
a filter code to detect the following errors:

- unbound variables (which comes in two modes: conservative and liberal,
  default to conservative)
- unused variables (disabled by default for compatibilty)
- assignment on built-in identifiers
- function application's arity mismatch
- function application's invalid function name
- non-string literal in the first argument of set / set_var

The existing parser and evaluator are modified as follows:

- The new (caching) evaluator no longer needs to perform variable
  hoisting at runtime.
  - Note that for array assignment, this changes the semantics.
- The new parser is more lenient, reducing parsing errors.
  The static analyzer will catch these errors instead, allowing us
  to give a much better error message and reduces the complexity of
  the parser.
  * The parser now allows function name to be any identifier.
  * The parser now allows arity mismatch to occur.
  * The parser now allows the first argument of set to be any expression.

Concretely, obvious changes that users will see are:

1. a := [1]; false & (a[] := 2); a[0] === 1

   would evaluate to true, while it used to evaluate to the undefined value
   due to hoisting

2. f(1)

   will now error with 'f is not a valid function' as opposed to
   'Unexpected "T_BRACE"'

3. length

   will now error with 'Illegal use of built-in identifier "length"'
   as opposed to 'Expected a ('

Appendix: conservative and liberal mode

The conservative mode is completely compatible with the current evaluator.
That is,

false & (a := 1); a

will not deem `a` as unbound, though this is actually undesirable because
`a` would then be bound to the troublesome undefined value.

The liberal mode rejects the above pattern by deeming `a` as unbound.
However, it also rejects

true & (a := 1); a

even though (a := 1) is always executed. Since there are several filters
in Wikimedia projects that rely on this behavior, we default the mode
to conservative for now.

Note that even the liberal mode doesn't really respect lexical scope
appeared in some other programming languages (see also T234690).
For instance:

(if true then (a := 1) else (a := 2) end); a

would be accepted by the liberal checker, even though under lexical scope,
`a` would be unbound. However, it is unlikely that lexical scope
will be suitable for the filter language, as most filters in
Wikimedia projects that have user-defined variable do violate lexical scope.

Bug: T260903
Bug: T238709
Bug: T237610
Bug: T234690
Bug: T231536
Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2
2021-08-31 03:28:24 +02:00
Daimona Eaytoy 704364a5e7 Move parser exceptions to specific namespace and rename them
Create a dedicated "Exception" sub-namespace and remove the "AFP"
prefix, a leftover from the pre-namespace era.

Change-Id: I7e5fded9316d8b7d1628bc1a6ba8b1879ac901e1
2021-08-29 23:38:31 +00:00
libraryupgrader 5377ebe819 build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)

Change-Id: I2b382f3bb236fb44eb24c6a257b13b8fd886541c
2021-07-21 18:51:18 +00:00
Daimona Eaytoy f8438a4647 Remove the old parser
All methods were moved to the new parser. Tests and other pieces were
adjusted to expect just a single parser. There are still some TODOs
(remove AFPTransitionBase, remove $this->mCur), but these are left for
another commit.

Note that the new parser was not renamed: this is because the names are
wrong anyway (CachingParser is more of an Evaluator than a Parser, and
AFPTreeParser is the real parser, and should be renamed as well).

NOTE to reviewers: this patch looks quite big, but if you diff the old
parser with the new version of the CachingParser, you'll notice that the
diff is actually small, since everything was basically copied verbatim.

Bug: T239990
Change-Id: Ie914ef64c70503a201b4d2dec698ca2fa8e69b10
2021-04-09 13:23:07 +00:00
Daimona Eaytoy 2bb5c3c7b5 Align arg counting between the parsers
1 - Change the structure of if/elseif for readability
2 - In the old parser, if there's an empty argument, never add it (the
new parser was already doing that).

Bug: T156095
Bug: T156096
Change-Id: I4237b1a0ba01e7ce04dcc945f7daf34612fcf07d
2021-02-20 14:33:56 +00:00
Daimona Eaytoy da1c71ec4c Move parser classes to a dedicated namespace
Names were kept for now.

Change-Id: Ib2eb5d7b523a64f2a0f72fdcdde2043a76cc9a37
2020-12-09 01:30:20 +00:00
Renamed from includes/parser/AFPTreeParser.php (Browse further)