wikimedia/mediawiki-extensions-AbuseFilter

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/AbuseFilter.git synced 2024-12-02 17:46:30 +00:00

Author	SHA1	Message	Date
Huji	52827acbab	Make rmspecials preserve whitespace The existing filters on WMF wikis has been changes such that calls to rmspecials() are now rmspecials(rmwhitespace()) to ensure no change is made in behaviour. Filter admins can change this back if filter is not meant to trigger when part of the input is contains spaces. Bug: T263024 Change-Id: Idde09b50fb8eda357afbedc1199a5483fa8217c1	2022-02-06 06:07:46 +00:00
jenkins-bot	72d03778d0	Merge "Refactor ParserStatus"	2021-09-24 09:34:20 +00:00
Daimona Eaytoy	b2dc2c4dd8	Refactor ParserStatus ParserStatus is now more lightweight, and doesn't know about "result" and "from cache". Instead, it has an isValid() method which is merely a shorthand for checking whether getException() is null. Introduce a child class, RuleCheckerStatus, which knows about result and cache and can be (un)serialized. This removes the ambiguity of the $result field, and helps the transition to a new RuleChecker class. Change-Id: I0dac7ab4febbfdabe72596631db630411d967ab5	2021-09-17 11:25:54 +00:00
jenkins-bot	0ba45db169	Merge "Remove various AtEase and error_reporting"	2021-09-16 15:29:36 +00:00
Timo Tijhof	3f33e08bac	Remove various AtEase and error_reporting Something somewhere is leaving error_reporting in a dirty state causing AbuseFilter's ConsequencesExecutorTest case to fail for the core change Ic9fee6cdd88001025. Per T253461, we're meant to eventually remove this anyway, so might as well remove it in areas that are known to get it wrong somehow. Change-Id: I2a665f09a357f2f2cc258d8c4011d49a7ab9c13b	2021-09-16 02:59:37 +00:00
Daimona Eaytoy	7c26c4b8d5	More cleanup for parser-related classes Change-Id: I6a2bbf519e1d5c6fe2778f69624bd80b9ea1ef86	2021-09-10 12:50:20 +00:00
Daimona Eaytoy	a722dfe1a4	Rename ParserFactory -> RuleCheckerFactory The old parser now has the correct name "Evaluator", so the ParserFactory name was outdated. Additionally, the plan is to create a new RuleChecker class, acting as a facade for the different parsing-related stages (lexer, parser, evaluator, etc.), which is what most if not all callers should use. The RuleCheckerFactory still returns a FilterEvaluator for now. Also, "Parser" is a specific term defining how things happen internally, whereas "RuleChecker" describes what callers should expect from the new class. Change-Id: I25b47a162d933c1e385175aae715ca38872b1442	2021-09-08 21:59:34 +02:00
Daimona Eaytoy	357ddd498c	Clean up / simplify parser-related classes Remove unnecessary setters, injecting everything in the constructor. These were leftovers from before the introduction of ParserFactory. Remove public access to the conds used, include the information inside the returned ParserStatus instead, and consequently simplify callers. Change-Id: I0a30e044877c6c858af3ff73f819d5ec7c4cc769	2021-09-08 13:41:52 +02:00
Daimona Eaytoy	f8e9ac7e2a	Rename AbuseFilterCachingParser -> FilterEvaluator It's an evaluator, not a parser. Change-Id: Ib6d33e8423ea72709cf5a33f4397ba33e352ea80	2021-09-08 13:40:47 +02:00
libraryupgrader	2a4860e322	build: Updating mediawiki/mediawiki-phan-config to 0.11.0 Change-Id: I097d051e3c30e61d74a8e329b6110b219c72ec1a	2021-09-07 19:30:42 -07:00
Daimona Eaytoy	6684ea6450	Remove AFPTransitionBase Also cleanup the mPos hack in the CachingParser. Change-Id: Ib5693802a3ceb80cb736880ed65e27340abef689	2021-09-06 19:33:48 +00:00
Sorawee Porncharoenwase	320e3d696f	Add a static analyzer for the filter language This commit adds a class AFPSyntaxChecker which can statically analyze a filter code to detect the following errors: - unbound variables (which comes in two modes: conservative and liberal, default to conservative) - unused variables (disabled by default for compatibilty) - assignment on built-in identifiers - function application's arity mismatch - function application's invalid function name - non-string literal in the first argument of set / set_var The existing parser and evaluator are modified as follows: - The new (caching) evaluator no longer needs to perform variable hoisting at runtime. - Note that for array assignment, this changes the semantics. - The new parser is more lenient, reducing parsing errors. The static analyzer will catch these errors instead, allowing us to give a much better error message and reduces the complexity of the parser. * The parser now allows function name to be any identifier. * The parser now allows arity mismatch to occur. * The parser now allows the first argument of set to be any expression. Concretely, obvious changes that users will see are: 1. a := [1]; false & (a[] := 2); a[0] === 1 would evaluate to true, while it used to evaluate to the undefined value due to hoisting 2. f(1) will now error with 'f is not a valid function' as opposed to 'Unexpected "T_BRACE"' 3. length will now error with 'Illegal use of built-in identifier "length"' as opposed to 'Expected a (' Appendix: conservative and liberal mode The conservative mode is completely compatible with the current evaluator. That is, false & (a := 1); a will not deem `a` as unbound, though this is actually undesirable because `a` would then be bound to the troublesome undefined value. The liberal mode rejects the above pattern by deeming `a` as unbound. However, it also rejects true & (a := 1); a even though (a := 1) is always executed. Since there are several filters in Wikimedia projects that rely on this behavior, we default the mode to conservative for now. Note that even the liberal mode doesn't really respect lexical scope appeared in some other programming languages (see also T234690). For instance: (if true then (a := 1) else (a := 2) end); a would be accepted by the liberal checker, even though under lexical scope, `a` would be unbound. However, it is unlikely that lexical scope will be suitable for the filter language, as most filters in Wikimedia projects that have user-defined variable do violate lexical scope. Bug: T260903 Bug: T238709 Bug: T237610 Bug: T234690 Bug: T231536 Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2	2021-08-31 03:28:24 +02:00
Daimona Eaytoy	704364a5e7	Move parser exceptions to specific namespace and rename them Create a dedicated "Exception" sub-namespace and remove the "AFP" prefix, a leftover from the pre-namespace era. Change-Id: I7e5fded9316d8b7d1628bc1a6ba8b1879ac901e1	2021-08-29 23:38:31 +00:00
jenkins-bot	9b93b0256a	Merge "Avoid passing invalid offset to mb_strpos"	2021-08-18 18:45:12 +00:00
libraryupgrader	5377ebe819	build: Updating dependencies composer: * mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0 npm: * postcss: 7.0.35 → 7.0.36 * https://npmjs.com/advisories/1693 (CVE-2021-23368) Change-Id: I2b382f3bb236fb44eb24c6a257b13b8fd886541c	2021-07-21 18:51:18 +00:00
Daimona Eaytoy	069fa064f5	Avoid passing invalid offset to mb_strpos Bug: T285978 Change-Id: I3d100fd05f34fe3b01ecbbce5361badc613f9406	2021-07-02 14:07:46 +00:00
Daimona Eaytoy	57f11631ba	Pass a valid regexp to preg_match in checkRegexMatchesEmpty Bug: T283966 Change-Id: I99688aa8f3e62e410392a9142df56b1a3c708987	2021-05-29 11:38:07 +00:00
Umherirrender	1fa7a83f60	Use static closures where safe to use Created by I25a17fb22b6b669e817317a0f45051ae9c608208 Change-Id: I533690311ca559685de8a4bf123348c9bcfa5931	2021-04-30 20:55:35 +02:00
Daimona Eaytoy	f8438a4647	Remove the old parser All methods were moved to the new parser. Tests and other pieces were adjusted to expect just a single parser. There are still some TODOs (remove AFPTransitionBase, remove $this->mCur), but these are left for another commit. Note that the new parser was not renamed: this is because the names are wrong anyway (CachingParser is more of an Evaluator than a Parser, and AFPTreeParser is the real parser, and should be renamed as well). NOTE to reviewers: this patch looks quite big, but if you diff the old parser with the new version of the CachingParser, you'll notice that the diff is actually small, since everything was basically copied verbatim. Bug: T239990 Change-Id: Ie914ef64c70503a201b4d2dec698ca2fa8e69b10	2021-04-09 13:23:07 +00:00
Daimona Eaytoy	2bb5c3c7b5	Align arg counting between the parsers 1 - Change the structure of if/elseif for readability 2 - In the old parser, if there's an empty argument, never add it (the new parser was already doing that). Bug: T156095 Bug: T156096 Change-Id: I4237b1a0ba01e7ce04dcc945f7daf34612fcf07d	2021-02-20 14:33:56 +00:00
Daimona Eaytoy	e64049c30b	Create dedicated types of parser exceptions Introduce a clear distinction between internal exceptions and user-visible exceptions, leaving AFPException as base abstract class. Later, it should be possible to narrow some types around, e.g. in ParserStatus (that might work with user-visible exceptions only). Also a future TODO is putting all the exceptions in their own namespace (probably ...\Parser\Exception). Change-Id: I4e33a45117f0a3e73af03cc1e3f2734beaf2b5e1	2021-02-12 13:56:02 +00:00
Matěj Suchánek	a51b9bf903	Serialize all data for edit stash Thanks to this, we will be able to provide more information to consequences and watchers, which will open door for new features and possibly cleaner code. Change-Id: I7135509823ea84b2a2923d2c1831ce293b98a9f9	2021-02-11 15:09:50 +01:00
jenkins-bot	27c0130d53	Merge "Skip regexp validation if the regex is (partly) unknown"	2021-02-06 21:50:35 +00:00
Daimona Eaytoy	4dbde4dcf0	Use a different message prefix for parser warnings The abusefilter-warning prefix is reserved for filter warnings. Pointed out by Matěj. Change-Id: I169e4c3d29b08c7f5af2136a683fc4427f8e93f5	2021-02-06 15:42:33 +00:00
Daimona Eaytoy	1893120748	Fix doc of AbuseFilterParser::evaluateExpression It was changed to use AFPData::toNative, so it no longer returns a string. Instead, it can return any PHP native type. Change-Id: I92eba03a5fa1149860634a97318b5b15807eb5a5	2021-02-05 16:23:37 +01:00
Daimona Eaytoy	28bd23f38d	Skip regexp validation if the regex is (partly) unknown Bug: T273809 Change-Id: Ib8ab29ad69088baf5b826d9cdada0ded29a58871	2021-02-04 15:16:22 +00:00
Daimona Eaytoy	5c43c0ab35	Allow single IPs in ip_in_range Also add a bunch of tests for this function. REMINDER: Change the docs on mw.org when this will be merged. Bug: T218074 Depends-On: I155024341e8e6b13240e37b30c31b95dc83a47e0 Change-Id: I979e45110bc0e76b499679184993085062ffcac5	2021-01-26 04:37:51 +00:00
Daimona Eaytoy	a9722868ab	Improve coverage of parser-related classes Change-Id: I229c528505f0208b34f37d8c969450731e5a08a3	2021-01-15 03:16:48 +00:00
Daimona Eaytoy	6e27a9ddb3	Cleanup variables-related classes Change-Id: I20a7fe1a40255043ed0d125dee61ea6052dda69c	2021-01-02 18:19:38 +01:00
Daimona Eaytoy	762d71c51d	Create a dedicated namespace for variables-related classes Some cleanup is left for later to keep the diff easier to read. Change-Id: Ife445b5e47e707ab77ec867ac3b005866aa74ef2	2021-01-02 18:16:48 +01:00
Daimona Eaytoy	d3b330b6d4	Create a VariablesManager service This makes VariableHolder a true value object, and introduces a stateless service, VariableManager, to operate on it. Note, in theory, this new service is still cyclically coupled with LazyVariableComputed. However, it's now two stateless service being coupled, not two smart/god value objects, so we've still earned something. For now, the dependency is hidden by using a callback. Some alternatives for that are mentioned in a code comment. Bug: T261069 Change-Id: I2f2c84c8e91472ba36084a8bbb4a923f6e04354b	2021-01-02 17:15:31 +00:00
Daimona Eaytoy	aafd3bcfcd	Inject the condition limit into AbuseFilterParser Change-Id: I487ba25ca3f3ac4b84c3afaf88b35678944cdb4d	2021-01-01 18:27:06 +01:00
Daimona Eaytoy	5d4b2fde27	Avoid 'finally' clause in AbuseFilterParser::parseDetailed Bug: T270514 Change-Id: I1e3e6675ec8c3bfd435797cb044b85b3d2a34450	2020-12-19 11:17:58 +00:00
Daimona Eaytoy	7c1d1c6d7d	Return warnings from the parser, add warning for catch-all regexps This commit introduces some boilerplate for emitting warnings from the AbuseFilter parser, and also code for showing these warnings in the ace editor. Adding new warnings should be as simple as appending to AbuseFilterParser::warnings (and adding the relevant i18n). Bug: T264768 Bug: T269770 Change-Id: Ic11021b379f997a89f59c8c0572338d957e089a6	2020-12-18 18:22:41 +01:00
jenkins-bot	6f848578ea	Merge "Allow the parsers to return extra info"	2020-12-11 16:35:25 +00:00
Daimona Eaytoy	3e0c30ff92	Allow the parsers to return extra info This is achieved by creating a new ParserStatus class. Aside from the result of parse(), it contains whether the cache was warm. This can be used to differentiate profiling data as part of T231112. Another use case is returning non-fatal warnings (T269770). Change-Id: Ifcbda861ce1a44bbe9bffba5b83cd9ef338a8dba	2020-12-11 15:03:23 +00:00
libraryupgrader	281eec8e4d	build: Updating mediawiki/mediawiki-phan-config to 0.10.5 Change-Id: Ie3fcfdf733885aac2ef0ee07cc1a8d4f3fedb7d7	2020-12-10 18:28:54 +00:00
Daimona Eaytoy	da1c71ec4c	Move parser classes to a dedicated namespace Names were kept for now. Change-Id: Ib2eb5d7b523a64f2a0f72fdcdde2043a76cc9a37	2020-12-09 01:30:20 +00:00

38 commits