Commit graph

33 commits

Author SHA1 Message Date
Daimona Eaytoy a722dfe1a4 Rename ParserFactory -> RuleCheckerFactory
The old parser now has the correct name "Evaluator", so the
ParserFactory name was outdated. Additionally, the plan is to create a
new RuleChecker class, acting as a facade for the different
parsing-related stages (lexer, parser, evaluator, etc.), which is what
most if not all callers should use. The RuleCheckerFactory still returns
a FilterEvaluator for now.
Also, "Parser" is a specific term defining *how* things happen
internally, whereas "RuleChecker" describes *what* callers should expect
from the new class.

Change-Id: I25b47a162d933c1e385175aae715ca38872b1442
2021-09-08 21:59:34 +02:00
Daimona Eaytoy 357ddd498c Clean up / simplify parser-related classes
Remove unnecessary setters, injecting everything in the constructor.
These were leftovers from before the introduction of ParserFactory.
Remove public access to the conds used, include the information inside
the returned ParserStatus instead, and consequently simplify callers.

Change-Id: I0a30e044877c6c858af3ff73f819d5ec7c4cc769
2021-09-08 13:41:52 +02:00
Daimona Eaytoy f8e9ac7e2a Rename AbuseFilterCachingParser -> FilterEvaluator
It's an evaluator, not a parser.

Change-Id: Ib6d33e8423ea72709cf5a33f4397ba33e352ea80
2021-09-08 13:40:47 +02:00
Daimona Eaytoy 6684ea6450 Remove AFPTransitionBase
Also cleanup the mPos hack in the CachingParser.

Change-Id: Ib5693802a3ceb80cb736880ed65e27340abef689
2021-09-06 19:33:48 +00:00
Sorawee Porncharoenwase 320e3d696f Add a static analyzer for the filter language
This commit adds a class AFPSyntaxChecker which can statically analyze
a filter code to detect the following errors:

- unbound variables (which comes in two modes: conservative and liberal,
  default to conservative)
- unused variables (disabled by default for compatibilty)
- assignment on built-in identifiers
- function application's arity mismatch
- function application's invalid function name
- non-string literal in the first argument of set / set_var

The existing parser and evaluator are modified as follows:

- The new (caching) evaluator no longer needs to perform variable
  hoisting at runtime.
  - Note that for array assignment, this changes the semantics.
- The new parser is more lenient, reducing parsing errors.
  The static analyzer will catch these errors instead, allowing us
  to give a much better error message and reduces the complexity of
  the parser.
  * The parser now allows function name to be any identifier.
  * The parser now allows arity mismatch to occur.
  * The parser now allows the first argument of set to be any expression.

Concretely, obvious changes that users will see are:

1. a := [1]; false & (a[] := 2); a[0] === 1

   would evaluate to true, while it used to evaluate to the undefined value
   due to hoisting

2. f(1)

   will now error with 'f is not a valid function' as opposed to
   'Unexpected "T_BRACE"'

3. length

   will now error with 'Illegal use of built-in identifier "length"'
   as opposed to 'Expected a ('

Appendix: conservative and liberal mode

The conservative mode is completely compatible with the current evaluator.
That is,

false & (a := 1); a

will not deem `a` as unbound, though this is actually undesirable because
`a` would then be bound to the troublesome undefined value.

The liberal mode rejects the above pattern by deeming `a` as unbound.
However, it also rejects

true & (a := 1); a

even though (a := 1) is always executed. Since there are several filters
in Wikimedia projects that rely on this behavior, we default the mode
to conservative for now.

Note that even the liberal mode doesn't really respect lexical scope
appeared in some other programming languages (see also T234690).
For instance:

(if true then (a := 1) else (a := 2) end); a

would be accepted by the liberal checker, even though under lexical scope,
`a` would be unbound. However, it is unlikely that lexical scope
will be suitable for the filter language, as most filters in
Wikimedia projects that have user-defined variable do violate lexical scope.

Bug: T260903
Bug: T238709
Bug: T237610
Bug: T234690
Bug: T231536
Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2
2021-08-31 03:28:24 +02:00
Daimona Eaytoy 704364a5e7 Move parser exceptions to specific namespace and rename them
Create a dedicated "Exception" sub-namespace and remove the "AFP"
prefix, a leftover from the pre-namespace era.

Change-Id: I7e5fded9316d8b7d1628bc1a6ba8b1879ac901e1
2021-08-29 23:38:31 +00:00
libraryupgrader 5377ebe819 build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)

Change-Id: I2b382f3bb236fb44eb24c6a257b13b8fd886541c
2021-07-21 18:51:18 +00:00
jenkins-bot 0dc93136d6 Merge "Improve test coverage of API modules" 2021-04-18 16:03:25 +00:00
Matěj Suchánek a2ee8c41e2 Improve test coverage of API modules
Also solve one a TODO.

Change-Id: I61a38f3c741274f00ad0ad4789106a943daef222
2021-04-18 10:37:38 +02:00
Daimona Eaytoy f8438a4647 Remove the old parser
All methods were moved to the new parser. Tests and other pieces were
adjusted to expect just a single parser. There are still some TODOs
(remove AFPTransitionBase, remove $this->mCur), but these are left for
another commit.

Note that the new parser was not renamed: this is because the names are
wrong anyway (CachingParser is more of an Evaluator than a Parser, and
AFPTreeParser is the real parser, and should be renamed as well).

NOTE to reviewers: this patch looks quite big, but if you diff the old
parser with the new version of the CachingParser, you'll notice that the
diff is actually small, since everything was basically copied verbatim.

Bug: T239990
Change-Id: Ie914ef64c70503a201b4d2dec698ca2fa8e69b10
2021-04-09 13:23:07 +00:00
Umherirrender 32f7ae140e Use ::class for class name
This works also for non-existing classes,
because it is resolved on compile time

Change-Id: Ia3a1484c9c4f46a128c367ddd057c41dd560111d
2021-04-08 20:54:48 +02:00
Daimona Eaytoy 4dbde4dcf0 Use a different message prefix for parser warnings
The abusefilter-warning prefix is reserved for filter warnings. Pointed
out by Matěj.

Change-Id: I169e4c3d29b08c7f5af2136a683fc4427f8e93f5
2021-02-06 15:42:33 +00:00
jenkins-bot a94b2247f6 Merge "Cover some API modules by tests" 2021-02-04 23:43:02 +00:00
Matěj Suchánek a0fcfbcc32 Cover some API modules by tests
Change-Id: Icc57e260b3b06a58fc05f304d6e63dc40f970fe9
2021-02-04 15:17:00 +01:00
Daimona Eaytoy b0058c0f1b Use Authority in TextExtractor
And make its test a pure unit test, as per TODO comment.

Change-Id: Ia3ca38702ea61c5e551a581248d2b9471ef881fb
2021-02-02 00:43:01 +00:00
Petr Pchelko 6aa8f6f67b Do not mock User in TextExtractorTest.
In I63d9807264d7e2295afef51fc9d982447f92fcbd we are
changing how the permission checks are applied for revision,
so it uses passed User instance as Authority. However, when
user is mocked, the tests are breaking since the new user methods
are not mocked. Pass a real user for now to fix the test. Once
Authority reaches maturity and is ok to use in extensions, the
test should be rewritten to use authority directly.

Bug: T271458
Change-Id: Iacab813b253cc6e1439007e573e8ace06645860f
2021-01-20 09:32:18 -06:00
Daimona Eaytoy 005cc83642 Increase coverage for more classes
Change-Id: Iae6a24291f821fda77a45d8c1584de010af6a834
2021-01-17 17:38:58 +00:00
Daimona Eaytoy 8639e0c368 Introduce subclasses of Filter with specific use cases
In particular, this brings stronger typing for getID(), and we can get
rid of many phan suppressions.

Change-Id: Icbf3a6f7db8105082646ec227f62c09449fb165d
2021-01-17 00:47:29 +00:00
Daimona Eaytoy 8368b5d9b7 Use overrideUserPermissions in TextExtractorTest
This allows merging I1acd55c07d07b4a0d43fd838e11374b6d9be98d9.

Change-Id: I99ab3a69c41b3ec6721f9504ad6c77d3122df591
2021-01-06 12:46:11 +01:00
Daimona Eaytoy a5eab82204 Add a bunch of tests
Code change: in buildVarDumpTable remove special-cased null value. This
was used to avoid passing null to Html::element, but is no longer
necessary, since we now pretty-print the value.

Change-Id: I6180f6c53448d2a8c8c6066f222e9fd9df577554
2021-01-04 15:54:54 +01:00
jenkins-bot b0e8a76b2e Merge "DI for AbuseFilterSpecialPage" 2021-01-03 12:40:04 +00:00
Daimona Eaytoy 762d71c51d Create a dedicated namespace for variables-related classes
Some cleanup is left for later to keep the diff easier to read.

Change-Id: Ife445b5e47e707ab77ec867ac3b005866aa74ef2
2021-01-02 18:16:48 +01:00
Daimona Eaytoy d3b330b6d4 Create a VariablesManager service
This makes VariableHolder a true value object, and introduces a
stateless service, VariableManager, to operate on it.

Note, in theory, this new service is still cyclically coupled with
LazyVariableComputed. However, it's now two stateless service being
coupled, not two smart/god value objects, so we've still earned
something. For now, the dependency is hidden by using a callback. Some
alternatives for that are mentioned in a code comment.

Bug: T261069
Change-Id: I2f2c84c8e91472ba36084a8bbb4a923f6e04354b
2021-01-02 17:15:31 +00:00
Matěj Suchánek de5b7ee8ea DI for AbuseFilterSpecialPage
Change-Id: I5c702990398e0adb5fa73be54638cb8b6b268beb
2021-01-02 11:13:41 +01:00
Matěj Suchánek f5b18a36bf Move special page classes to own namespace
Change-Id: Ic2d13518924e77b1be96d1a7489abcd07e6d1dab
2021-01-02 10:54:13 +01:00
jenkins-bot d2884049be Merge "Add a TextExtractor service" 2021-01-01 19:36:42 +00:00
Daimona Eaytoy aafd3bcfcd Inject the condition limit into AbuseFilterParser
Change-Id: I487ba25ca3f3ac4b84c3afaf88b35678944cdb4d
2021-01-01 18:27:06 +01:00
Daimona Eaytoy fad9a11f7a Add a TextExtractor service
This is an important step towards removing the AbuseFilter class. Note:
proposals for the name of the new service are welcome.

Change-Id: Ib4632173f728b1bdafadef96e01645a833bfceaa
2021-01-01 18:25:32 +01:00
Daimona Eaytoy e381d1995b Partly reorg integration tests
Move to 'integration' all tests that are meant to stay there. Move
SaveTest outside because, while we might want to finalize it as an
integration test, some parts can still be moved to a unit test.

Change-Id: Id4b6deaac6875fdd85eebbebf0c5fb952d1fbb06
2021-01-01 15:54:52 +00:00
Matěj Suchánek d76affb1db Move ChangeTags stuff to separate namespace
Change-Id: I6d7bed0e62f001f82c00a3528cc0018388c9c70e
2020-11-27 15:13:34 +00:00
Matěj Suchánek 872b6118f4 Introduce ChangeTagValidator service
Just moving code around. Without a unit test because DI
coverage of change tags in core isn't available yet.

Change-Id: Iac861e1e24dae13581b8d9173357a1d6c94be88a
2020-11-27 15:11:48 +01:00
Daimona Eaytoy 9595bd9da5 Introduce a service for saving filters
Change-Id: I6b7d16ad7ea1124989ed67c74413979cfd0275c4
2020-11-20 22:33:21 +01:00
Daimona Eaytoy 1bcfdc3b13 Introduce a FilterValidator
This moves a lot of things away from the AbuseFilter class. There's a
nasty static dependency on ChangeTags, but it's very limited anyway, and
it's going to be fixed once T245964 is resolved.

Change-Id: Ia7df4b4d3289c2722323f59ceecf3fdd38277785
2020-11-18 01:41:31 +00:00