This commit adds a class AFPSyntaxChecker which can statically analyze
a filter code to detect the following errors:
- unbound variables (which comes in two modes: conservative and liberal,
default to conservative)
- unused variables (disabled by default for compatibilty)
- assignment on built-in identifiers
- function application's arity mismatch
- function application's invalid function name
- non-string literal in the first argument of set / set_var
The existing parser and evaluator are modified as follows:
- The new (caching) evaluator no longer needs to perform variable
hoisting at runtime.
- Note that for array assignment, this changes the semantics.
- The new parser is more lenient, reducing parsing errors.
The static analyzer will catch these errors instead, allowing us
to give a much better error message and reduces the complexity of
the parser.
* The parser now allows function name to be any identifier.
* The parser now allows arity mismatch to occur.
* The parser now allows the first argument of set to be any expression.
Concretely, obvious changes that users will see are:
1. a := [1]; false & (a[] := 2); a[0] === 1
would evaluate to true, while it used to evaluate to the undefined value
due to hoisting
2. f(1)
will now error with 'f is not a valid function' as opposed to
'Unexpected "T_BRACE"'
3. length
will now error with 'Illegal use of built-in identifier "length"'
as opposed to 'Expected a ('
Appendix: conservative and liberal mode
The conservative mode is completely compatible with the current evaluator.
That is,
false & (a := 1); a
will not deem `a` as unbound, though this is actually undesirable because
`a` would then be bound to the troublesome undefined value.
The liberal mode rejects the above pattern by deeming `a` as unbound.
However, it also rejects
true & (a := 1); a
even though (a := 1) is always executed. Since there are several filters
in Wikimedia projects that rely on this behavior, we default the mode
to conservative for now.
Note that even the liberal mode doesn't really respect lexical scope
appeared in some other programming languages (see also T234690).
For instance:
(if true then (a := 1) else (a := 2) end); a
would be accepted by the liberal checker, even though under lexical scope,
`a` would be unbound. However, it is unlikely that lexical scope
will be suitable for the filter language, as most filters in
Wikimedia projects that have user-defined variable do violate lexical scope.
Bug: T260903
Bug: T238709
Bug: T237610
Bug: T234690
Bug: T231536
Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2
Create a dedicated "Exception" sub-namespace and remove the "AFP"
prefix, a leftover from the pre-namespace era.
Change-Id: I7e5fded9316d8b7d1628bc1a6ba8b1879ac901e1
Previously, for non-newly-created pages, AbuseFilter would get the text
for filtering twice: once in AbuseFilterHooks::filterEdit(), and then
again in RunVariableGenerator::getEditTextForFiltering(). (Plus another
call for the text of the previous revision.) The first copy of the text
is only passed into RunVariableGenerator::getEditVars(), and there only
used if the title doesn’t exist, otherwise it’s overwritten with the
second copy. Instead, let’s make AbuseFilterHooks not get the text at
all, and only get the text from the content when we actually need it
(the content is new).
Change-Id: Id12430fa6ba4643113b945e0d0c01b9c0ee1742f
Regression tests to make sure T286140 does not
happen again.
In the process, discovered what caused that bug
with afl_rev_id not being set: EditRevUpdater::updateRev()
compares the WikiPage given in the PageSaveComplete hook
to the one given to it by AbuseFilterHooks from
onEditFilterMergedContent, and compares the two using
`===`, meaning that they must refer to the same underlying
object. That bug was caused because AbuseFilterHooks
changed to providing a different object, despite still
referring to the same underlying page.
We should probably change that behavior in EditRevUpdater,
but for now updated AbuseFilterConsequencesTest to pass
the same object around by using RequestContext::setWikiPage()
and providing the WikiPage object to
MediaWikiIntegrationTestCase::editPage().
Bug: T286140
Change-Id: I6562f513c463538af6b59b12a64564b254024613
This reverts commit 15fc159cb1.
Reason for revert: this is breaking the addition of rev ids to filter
hits after edits are saved. I suspect this is because the context wikipage
is for a different title than the one being edited, though I'm not sure
way - regardless, testing on patchdemo shows that with this revert
is applied, rev ids are once again added to filter hits.
Bug: T286140
Change-Id: I3ab6324a73050154cef1c20a2bf8307eb11eea2d