Commit graph

40 commits

Author SHA1 Message Date
Daimona Eaytoy d5bb976f51 Fix logging for parser exceptions
This was likely a rebase artefact or something: the 'implode' was meant
to be called with two parameters as usual. Currently, the parameters are
simply concatenated which makes the logs quite hard to read.

Change-Id: I84f9a7cb05e210f60a791d513dfb5b74fa7dfb8a
2022-03-07 13:32:54 +01:00
Daimona Eaytoy 2f5a587b1d Normalize logged parser error messages
Change-Id: I31cf73533a46ab5e452c2870fccb8603bb54d3df
2022-02-26 12:57:42 +01:00
Huji 52827acbab Make rmspecials preserve whitespace
The existing filters on WMF wikis has been changes such that calls
to rmspecials() are now rmspecials(rmwhitespace()) to ensure no change
is made in behaviour. Filter admins can change this back if filter is
not meant to trigger when part of the input is contains spaces.

Bug: T263024
Change-Id: Idde09b50fb8eda357afbedc1199a5483fa8217c1
2022-02-06 06:07:46 +00:00
jenkins-bot 72d03778d0 Merge "Refactor ParserStatus" 2021-09-24 09:34:20 +00:00
Daimona Eaytoy b2dc2c4dd8 Refactor ParserStatus
ParserStatus is now more lightweight, and doesn't know about "result"
and "from cache". Instead, it has an isValid() method which is merely a
shorthand for checking whether getException() is null.

Introduce a child class, RuleCheckerStatus, which knows about result and
cache and can be (un)serialized.

This removes the ambiguity of the $result field, and helps the
transition to a new RuleChecker class.

Change-Id: I0dac7ab4febbfdabe72596631db630411d967ab5
2021-09-17 11:25:54 +00:00
jenkins-bot 0ba45db169 Merge "Remove various AtEase and error_reporting" 2021-09-16 15:29:36 +00:00
Timo Tijhof 3f33e08bac Remove various AtEase and error_reporting
Something somewhere is leaving error_reporting in a dirty state
causing AbuseFilter's ConsequencesExecutorTest case to fail for
the core change Ic9fee6cdd88001025.

Per T253461, we're meant to eventually remove this anyway, so might
as well remove it in areas that are known to get it wrong somehow.

Change-Id: I2a665f09a357f2f2cc258d8c4011d49a7ab9c13b
2021-09-16 02:59:37 +00:00
Daimona Eaytoy 7c26c4b8d5 More cleanup for parser-related classes
Change-Id: I6a2bbf519e1d5c6fe2778f69624bd80b9ea1ef86
2021-09-10 12:50:20 +00:00
Daimona Eaytoy a722dfe1a4 Rename ParserFactory -> RuleCheckerFactory
The old parser now has the correct name "Evaluator", so the
ParserFactory name was outdated. Additionally, the plan is to create a
new RuleChecker class, acting as a facade for the different
parsing-related stages (lexer, parser, evaluator, etc.), which is what
most if not all callers should use. The RuleCheckerFactory still returns
a FilterEvaluator for now.
Also, "Parser" is a specific term defining *how* things happen
internally, whereas "RuleChecker" describes *what* callers should expect
from the new class.

Change-Id: I25b47a162d933c1e385175aae715ca38872b1442
2021-09-08 21:59:34 +02:00
Daimona Eaytoy 357ddd498c Clean up / simplify parser-related classes
Remove unnecessary setters, injecting everything in the constructor.
These were leftovers from before the introduction of ParserFactory.
Remove public access to the conds used, include the information inside
the returned ParserStatus instead, and consequently simplify callers.

Change-Id: I0a30e044877c6c858af3ff73f819d5ec7c4cc769
2021-09-08 13:41:52 +02:00
Daimona Eaytoy f8e9ac7e2a Rename AbuseFilterCachingParser -> FilterEvaluator
It's an evaluator, not a parser.

Change-Id: Ib6d33e8423ea72709cf5a33f4397ba33e352ea80
2021-09-08 13:40:47 +02:00
libraryupgrader 2a4860e322 build: Updating mediawiki/mediawiki-phan-config to 0.11.0
Change-Id: I097d051e3c30e61d74a8e329b6110b219c72ec1a
2021-09-07 19:30:42 -07:00
Daimona Eaytoy 6684ea6450 Remove AFPTransitionBase
Also cleanup the mPos hack in the CachingParser.

Change-Id: Ib5693802a3ceb80cb736880ed65e27340abef689
2021-09-06 19:33:48 +00:00
Sorawee Porncharoenwase 320e3d696f Add a static analyzer for the filter language
This commit adds a class AFPSyntaxChecker which can statically analyze
a filter code to detect the following errors:

- unbound variables (which comes in two modes: conservative and liberal,
  default to conservative)
- unused variables (disabled by default for compatibilty)
- assignment on built-in identifiers
- function application's arity mismatch
- function application's invalid function name
- non-string literal in the first argument of set / set_var

The existing parser and evaluator are modified as follows:

- The new (caching) evaluator no longer needs to perform variable
  hoisting at runtime.
  - Note that for array assignment, this changes the semantics.
- The new parser is more lenient, reducing parsing errors.
  The static analyzer will catch these errors instead, allowing us
  to give a much better error message and reduces the complexity of
  the parser.
  * The parser now allows function name to be any identifier.
  * The parser now allows arity mismatch to occur.
  * The parser now allows the first argument of set to be any expression.

Concretely, obvious changes that users will see are:

1. a := [1]; false & (a[] := 2); a[0] === 1

   would evaluate to true, while it used to evaluate to the undefined value
   due to hoisting

2. f(1)

   will now error with 'f is not a valid function' as opposed to
   'Unexpected "T_BRACE"'

3. length

   will now error with 'Illegal use of built-in identifier "length"'
   as opposed to 'Expected a ('

Appendix: conservative and liberal mode

The conservative mode is completely compatible with the current evaluator.
That is,

false & (a := 1); a

will not deem `a` as unbound, though this is actually undesirable because
`a` would then be bound to the troublesome undefined value.

The liberal mode rejects the above pattern by deeming `a` as unbound.
However, it also rejects

true & (a := 1); a

even though (a := 1) is always executed. Since there are several filters
in Wikimedia projects that rely on this behavior, we default the mode
to conservative for now.

Note that even the liberal mode doesn't really respect lexical scope
appeared in some other programming languages (see also T234690).
For instance:

(if true then (a := 1) else (a := 2) end); a

would be accepted by the liberal checker, even though under lexical scope,
`a` would be unbound. However, it is unlikely that lexical scope
will be suitable for the filter language, as most filters in
Wikimedia projects that have user-defined variable do violate lexical scope.

Bug: T260903
Bug: T238709
Bug: T237610
Bug: T234690
Bug: T231536
Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2
2021-08-31 03:28:24 +02:00
Daimona Eaytoy 704364a5e7 Move parser exceptions to specific namespace and rename them
Create a dedicated "Exception" sub-namespace and remove the "AFP"
prefix, a leftover from the pre-namespace era.

Change-Id: I7e5fded9316d8b7d1628bc1a6ba8b1879ac901e1
2021-08-29 23:38:31 +00:00
jenkins-bot 9b93b0256a Merge "Avoid passing invalid offset to mb_strpos" 2021-08-18 18:45:12 +00:00
libraryupgrader 5377ebe819 build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)

Change-Id: I2b382f3bb236fb44eb24c6a257b13b8fd886541c
2021-07-21 18:51:18 +00:00
Daimona Eaytoy 069fa064f5 Avoid passing invalid offset to mb_strpos
Bug: T285978
Change-Id: I3d100fd05f34fe3b01ecbbce5361badc613f9406
2021-07-02 14:07:46 +00:00
Daimona Eaytoy 57f11631ba Pass a valid regexp to preg_match in checkRegexMatchesEmpty
Bug: T283966
Change-Id: I99688aa8f3e62e410392a9142df56b1a3c708987
2021-05-29 11:38:07 +00:00
Umherirrender 1fa7a83f60 Use static closures where safe to use
Created by I25a17fb22b6b669e817317a0f45051ae9c608208

Change-Id: I533690311ca559685de8a4bf123348c9bcfa5931
2021-04-30 20:55:35 +02:00
Daimona Eaytoy f8438a4647 Remove the old parser
All methods were moved to the new parser. Tests and other pieces were
adjusted to expect just a single parser. There are still some TODOs
(remove AFPTransitionBase, remove $this->mCur), but these are left for
another commit.

Note that the new parser was not renamed: this is because the names are
wrong anyway (CachingParser is more of an Evaluator than a Parser, and
AFPTreeParser is the real parser, and should be renamed as well).

NOTE to reviewers: this patch looks quite big, but if you diff the old
parser with the new version of the CachingParser, you'll notice that the
diff is actually small, since everything was basically copied verbatim.

Bug: T239990
Change-Id: Ie914ef64c70503a201b4d2dec698ca2fa8e69b10
2021-04-09 13:23:07 +00:00
Daimona Eaytoy 2bb5c3c7b5 Align arg counting between the parsers
1 - Change the structure of if/elseif for readability
2 - In the old parser, if there's an empty argument, never add it (the
new parser was already doing that).

Bug: T156095
Bug: T156096
Change-Id: I4237b1a0ba01e7ce04dcc945f7daf34612fcf07d
2021-02-20 14:33:56 +00:00
Daimona Eaytoy e64049c30b Create dedicated types of parser exceptions
Introduce a clear distinction between internal exceptions and
user-visible exceptions, leaving AFPException as base abstract class.

Later, it should be possible to narrow some types around, e.g. in
ParserStatus (that might work with user-visible exceptions only).

Also a future TODO is putting all the exceptions in their own namespace
(probably ...\Parser\Exception).

Change-Id: I4e33a45117f0a3e73af03cc1e3f2734beaf2b5e1
2021-02-12 13:56:02 +00:00
Matěj Suchánek a51b9bf903 Serialize all data for edit stash
Thanks to this, we will be able to provide more information
to consequences and watchers, which will open door for new
features and possibly cleaner code.

Change-Id: I7135509823ea84b2a2923d2c1831ce293b98a9f9
2021-02-11 15:09:50 +01:00
jenkins-bot 27c0130d53 Merge "Skip regexp validation if the regex is (partly) unknown" 2021-02-06 21:50:35 +00:00
Daimona Eaytoy 4dbde4dcf0 Use a different message prefix for parser warnings
The abusefilter-warning prefix is reserved for filter warnings. Pointed
out by Matěj.

Change-Id: I169e4c3d29b08c7f5af2136a683fc4427f8e93f5
2021-02-06 15:42:33 +00:00
Daimona Eaytoy 1893120748 Fix doc of AbuseFilterParser::evaluateExpression
It was changed to use AFPData::toNative, so it no longer returns a
string. Instead, it can return any PHP native type.

Change-Id: I92eba03a5fa1149860634a97318b5b15807eb5a5
2021-02-05 16:23:37 +01:00
Daimona Eaytoy 28bd23f38d Skip regexp validation if the regex is (partly) unknown
Bug: T273809
Change-Id: Ib8ab29ad69088baf5b826d9cdada0ded29a58871
2021-02-04 15:16:22 +00:00
Daimona Eaytoy 5c43c0ab35 Allow single IPs in ip_in_range
Also add a bunch of tests for this function.

REMINDER: Change the docs on mw.org when this will be merged.

Bug: T218074
Depends-On: I155024341e8e6b13240e37b30c31b95dc83a47e0
Change-Id: I979e45110bc0e76b499679184993085062ffcac5
2021-01-26 04:37:51 +00:00
Daimona Eaytoy a9722868ab Improve coverage of parser-related classes
Change-Id: I229c528505f0208b34f37d8c969450731e5a08a3
2021-01-15 03:16:48 +00:00
Daimona Eaytoy 6e27a9ddb3 Cleanup variables-related classes
Change-Id: I20a7fe1a40255043ed0d125dee61ea6052dda69c
2021-01-02 18:19:38 +01:00
Daimona Eaytoy 762d71c51d Create a dedicated namespace for variables-related classes
Some cleanup is left for later to keep the diff easier to read.

Change-Id: Ife445b5e47e707ab77ec867ac3b005866aa74ef2
2021-01-02 18:16:48 +01:00
Daimona Eaytoy d3b330b6d4 Create a VariablesManager service
This makes VariableHolder a true value object, and introduces a
stateless service, VariableManager, to operate on it.

Note, in theory, this new service is still cyclically coupled with
LazyVariableComputed. However, it's now two stateless service being
coupled, not two smart/god value objects, so we've still earned
something. For now, the dependency is hidden by using a callback. Some
alternatives for that are mentioned in a code comment.

Bug: T261069
Change-Id: I2f2c84c8e91472ba36084a8bbb4a923f6e04354b
2021-01-02 17:15:31 +00:00
Daimona Eaytoy aafd3bcfcd Inject the condition limit into AbuseFilterParser
Change-Id: I487ba25ca3f3ac4b84c3afaf88b35678944cdb4d
2021-01-01 18:27:06 +01:00
Daimona Eaytoy 5d4b2fde27 Avoid 'finally' clause in AbuseFilterParser::parseDetailed
Bug: T270514
Change-Id: I1e3e6675ec8c3bfd435797cb044b85b3d2a34450
2020-12-19 11:17:58 +00:00
Daimona Eaytoy 7c1d1c6d7d Return warnings from the parser, add warning for catch-all regexps
This commit introduces some boilerplate for emitting warnings from the
AbuseFilter parser, and also code for showing these warnings in the ace
editor. Adding new warnings should be as simple as appending to
AbuseFilterParser::warnings (and adding the relevant i18n).

Bug: T264768
Bug: T269770
Change-Id: Ic11021b379f997a89f59c8c0572338d957e089a6
2020-12-18 18:22:41 +01:00
jenkins-bot 6f848578ea Merge "Allow the parsers to return extra info" 2020-12-11 16:35:25 +00:00
Daimona Eaytoy 3e0c30ff92 Allow the parsers to return extra info
This is achieved by creating a new ParserStatus class. Aside from the
result of parse(), it contains whether the cache was warm. This can be
used to differentiate profiling data as part of T231112.

Another use case is returning non-fatal warnings (T269770).

Change-Id: Ifcbda861ce1a44bbe9bffba5b83cd9ef338a8dba
2020-12-11 15:03:23 +00:00
libraryupgrader 281eec8e4d build: Updating mediawiki/mediawiki-phan-config to 0.10.5
Change-Id: Ie3fcfdf733885aac2ef0ee07cc1a8d4f3fedb7d7
2020-12-10 18:28:54 +00:00
Daimona Eaytoy da1c71ec4c Move parser classes to a dedicated namespace
Names were kept for now.

Change-Id: Ib2eb5d7b523a64f2a0f72fdcdde2043a76cc9a37
2020-12-09 01:30:20 +00:00