Commit graph

67 commits

Author SHA1 Message Date
Anne Haunime 69ea21dc99 Support named capturing groups in get_matches()
AF rules don't support associative arrays, so the named capturing groups are provided in the array only by their numeric keys.

Bug: T374294
Change-Id: I53b39917e6677f3a5b8f68bcf0faebf48668ea27
2024-09-07 11:25:48 +00:00
Bartosz Dziewoński 237d54d545 Replace gettype() with get_debug_type() in exception messages etc.
get_debug_type() does the same thing but better (spelling type names
in the same way as in type declarations, and including names of
object classes and resource types). It was added in PHP 8, but the
symfony/polyfill-php80 package provides it while we still support 7.4.

Also remove uses of get_class() where the new method already provides
the same information.

For reference:
https://www.php.net/manual/en/function.get-debug-type.php
https://www.php.net/manual/en/function.gettype.php

Change-Id: I5e65a0759df7fa0c10bfa26ebc3cda436630f456
2024-08-12 23:05:16 +02:00
Matěj Suchánek bf180e0490 Simplify FilterEvaluator::getUsedVars using ::checkSyntax
Alternative approach to fixing the regression proposed by
Daimona in I78d3a2cd7bada962d7ef9b0f2c39d898bf8987ce.

Bug: T368203
Change-Id: I637367c3b3850f7988d890379fef7f4753159953
2024-07-05 11:32:09 +02:00
Daimona Eaytoy 99bb44beb4 Miscellaneous minor fixes
- Rename `$hidden` to `$privacyLevel` in Flags::__construct for
  consistency with other places.
- Rename `shouldProtectFilter` and simplify its return value to always
  be an array, since that's how it's currently used. Rename a variable
  that is assigned the return value of this method.
- Add a missing message key to a list of dynamic message keys.
- Rename a property from 'hidden' to 'privacy' in FilterStoreTest for
  consistency. Add a test for removing the protected flag.
- Update old comment referencing `filterHidden`; the method was removed
  in I40b8c8452d9df.
- Use ISQLPlatform::bitAnd() instead of manual SQL in
  AbuseFilterHistoryPager.
- Update mysterious reference to "formatRow" in SpecialAbuseLog.
- Update other references to the very same method in two other places,
  this time credited as "SpecialAbuseLog".
- Add type hints to a few methods; this not only helps with type safety,
  but it also allows PHPUnit to automatically use the proper type in
  mocks.

Change-Id: Ib0167d993b761271c1e5311808435a616b6576fe
2024-07-03 02:31:38 +02:00
Umherirrender c3af3157b4 Use namespaced classes
Changes to the use statements done automatically via script
Addition of missing use statement done manually

Change-Id: I48fcc02c61d423c9c5111ae545634fdc5c5cc710
2024-06-12 20:01:35 +02:00
STran f5d7b68908 Force full evaluation of filter in FilterEvaluator->getUsedVars()
In some cases, evaluation short-circuits when getting a list of
used variables resulting in an incomplete array of variables. This
subsequently causes issues when using those arrays for validation
checks (eg. if protected variables are used).

- Force full evaluation by setting `mAllowShort` to false

Bug: T364485
Change-Id: Idf2112d9ebf63846cde3ce9b8a8ade0ed909505d
2024-06-11 02:43:47 -07:00
STran bf28dbce0e Allow variables to be restricted by user right
Some exposed variables (eg. `user_ip`) used in filters are sensitive
and need to only be available to restricted groups of users.

Back-end changes:
- Add `AbuseFilterProtectedVariables` which defines what variables are
  protected by the new right `abusefilter-access-protected-vars`
- Add the concept of a `protected` variable, the use of which will
  denote the entire filter as protected via a flag on `af_hidden`

New UX features:
- Display changes to the protected status of filters on history and diff
  pages
- Check for protected variables and the right to see them in filter
  validation and don't allow a filter to be saved if it uses a variable
  that the user doesn't have access to
- Check for the right to view protected variables before allowing access
  and edits to existing filters that use them

Bug: T364465
Bug: T363906
Change-Id: I828bbb4015e87040f69a8e10c7888273c4f24dd3
2024-06-04 06:54:53 -07:00
thiemowmde c9f8343173 Use modern str_starts_with() and [ ... ] syntax
Change-Id: I2f2182e1e0850d7ebf832b7b8e0630ce56aad88b
2024-04-11 14:07:43 +02:00
libraryupgrader a8c9fab2cc build: Updating mediawiki/mediawiki-codesniffer to 43.0.0
The following sniffs are failing and were disabled:
* MediaWiki.Commenting.FunctionComment.MissingDocumentationPublic

Change-Id: I6075c76d53a899aac56af027f9a956a6b9e6a667
2024-03-16 18:53:05 +00:00
Umherirrender bd84a6514c Use namespaced classes
This requires 1.42 for some new names

Changes to the use statements done automatically via script
Addition of missing use statements and changes to docs done manually

Change-Id: Ic1e2c9a0c891382744e4792bba1effece48e53f3
2023-12-10 23:03:12 +01:00
Matěj Suchánek 9beeca3752 Fix various typos and documentation issues
Change-Id: I1e9d297f665282d251343598e102e1d342488965
2023-09-04 12:55:17 +02:00
jenkins-bot 0c33716f5b Merge "Mark protected stuff in classes with no subclasses as private" 2023-06-23 18:35:48 +00:00
thiemowmde 9316a7d65f Mark some unused public class features as private
These are not used anywhere outside of these classes.

Change-Id: I0a0a5cf1e84133bae69b95da771c285ee27f926c
2023-06-23 12:32:38 +02:00
thiemowmde 24888bea15 Mark protected stuff in classes with no subclasses as private
Protected effectively means "public to subclasses" and should be
avoided for the same reasons as marking everything as public should
be avoided.

Change-Id: Iba674b486ce53fd1f94f70163d47824e969abb77
2023-06-23 12:28:06 +02:00
thiemowmde 7e6132d4d7 Remove bits of unused code across the codebase
Mostly found with the code inspection tools in PHPStorm.

Change-Id: I7f59dddca0aaab0ddd1093d52c07ec12efd20d6d
2023-06-14 19:41:00 +00:00
Daimona Eaytoy caee78c24d Replace deprecated MWException
These are all unchecked.

Bug: T328220
Change-Id: I8d2f098a8b634d4a226b40ddaef31f0303a0789f
2023-06-07 17:41:20 +02:00
Jean-Luc Hassec 6c500f8ea9 Clean up unused DEMPTY data type
Bug: T334640
Change-Id: Ie20d760b6e31a9dc97083d3fe4008fb31c990076
2023-04-13 05:27:38 +00:00
Brian Wolff c6d3e6638c Explicitly cast mod (%) operhands to ints.
PHP does this automatically, however in PHP8 this causes an
E_DEPRECATED warning.

This fixes a phpunit test

Change-Id: Ie2b2dbf4a1c0ff500ba251ee43a37823432e3047
2022-10-03 08:30:45 -07:00
Timo Tijhof d2fc2ff8bb maintenance,includes: Clean up file headers
Follows-up Iaa1b4683c5c856.

* Match $IP pattern verbatim from most other WMF extensions.

* Improve descriptions a bit, and move/merge any meaningful
  information from file docblock into class docblock. The file blocks
  are visually ignored and identical in each file, and often out of
  date or duplicated when given text separately from the class block.

  See also similar changes in core:
  https://gerrit.wikimedia.org/r/q/message:ingroup+owner:Krinkle

* Use `@internal` instead of `@private` as per Stable interface
  policy.

Change-Id: I8bed9a625af003446c7e25f6b794931164767b5a
2022-09-29 17:56:49 +01:00
Umherirrender 4fca77068c Clean up line indent with mixed tabs and whitespaces
Change-Id: Icc418130ad34e5f169bfc51bb13b58a7806bd636
2022-07-31 16:34:07 +02:00
jenkins-bot 1a6985469b Merge "Inline/simplify smaller pieces of duplicate/complex PHP code" 2022-06-03 20:38:22 +00:00
Thiemo Kreuz bbded6231c Inline/simplify smaller pieces of duplicate/complex PHP code
Change-Id: I59d0f17b77c8c3d47bc532bdefd9d8c0883f180b
2022-06-03 21:04:38 +02:00
jenkins-bot bb94c0914c Merge "Add support for regex string replacements." 2022-05-31 14:54:33 +00:00
Daimona Eaytoy a46db47bd5 Fix validation for ip_in_ranges
We want to make sure that all parameters are valid regardless of whether
there's a match.

Also make the minimum number of parameters = 2, so it's easier to switch
between this function and ip_in_range.

Change-Id: I141558a7ef4533485e315b3d93ea9b64f0959db7
2022-05-21 15:39:21 +02:00
fossifer b1739a588f Add ip_in_ranges function
Added support for ip_in_ranges which allow multiple ranges to be
checked at the same time. If the IP is in any of the ranges, the
function returns true.

Bug: T305017
Change-Id: Ic75c87ecd4cacf47ce2ff1b04173405230ff81d0
2022-05-11 12:27:16 +08:00
Thiemo Kreuz a25e2c784a Fix capitalization of method calls accross the codebase
Change-Id: Icbbad4858735c24611daee693c53af479c75d1fb
2022-04-26 17:42:34 +02:00
proc 1d1215bafb
Add support for regex string replacements.
Bug: T285468
Change-Id: I25f8ad1b58cc10f4c6f6ef5ebab99fe58ec71b1e
2022-04-20 18:38:24 +01:00
Daimona Eaytoy d5bb976f51 Fix logging for parser exceptions
This was likely a rebase artefact or something: the 'implode' was meant
to be called with two parameters as usual. Currently, the parameters are
simply concatenated which makes the logs quite hard to read.

Change-Id: I84f9a7cb05e210f60a791d513dfb5b74fa7dfb8a
2022-03-07 13:32:54 +01:00
Daimona Eaytoy 2f5a587b1d Normalize logged parser error messages
Change-Id: I31cf73533a46ab5e452c2870fccb8603bb54d3df
2022-02-26 12:57:42 +01:00
Huji 52827acbab Make rmspecials preserve whitespace
The existing filters on WMF wikis has been changes such that calls
to rmspecials() are now rmspecials(rmwhitespace()) to ensure no change
is made in behaviour. Filter admins can change this back if filter is
not meant to trigger when part of the input is contains spaces.

Bug: T263024
Change-Id: Idde09b50fb8eda357afbedc1199a5483fa8217c1
2022-02-06 06:07:46 +00:00
jenkins-bot 72d03778d0 Merge "Refactor ParserStatus" 2021-09-24 09:34:20 +00:00
Daimona Eaytoy b2dc2c4dd8 Refactor ParserStatus
ParserStatus is now more lightweight, and doesn't know about "result"
and "from cache". Instead, it has an isValid() method which is merely a
shorthand for checking whether getException() is null.

Introduce a child class, RuleCheckerStatus, which knows about result and
cache and can be (un)serialized.

This removes the ambiguity of the $result field, and helps the
transition to a new RuleChecker class.

Change-Id: I0dac7ab4febbfdabe72596631db630411d967ab5
2021-09-17 11:25:54 +00:00
jenkins-bot 0ba45db169 Merge "Remove various AtEase and error_reporting" 2021-09-16 15:29:36 +00:00
Timo Tijhof 3f33e08bac Remove various AtEase and error_reporting
Something somewhere is leaving error_reporting in a dirty state
causing AbuseFilter's ConsequencesExecutorTest case to fail for
the core change Ic9fee6cdd88001025.

Per T253461, we're meant to eventually remove this anyway, so might
as well remove it in areas that are known to get it wrong somehow.

Change-Id: I2a665f09a357f2f2cc258d8c4011d49a7ab9c13b
2021-09-16 02:59:37 +00:00
Daimona Eaytoy 7c26c4b8d5 More cleanup for parser-related classes
Change-Id: I6a2bbf519e1d5c6fe2778f69624bd80b9ea1ef86
2021-09-10 12:50:20 +00:00
Daimona Eaytoy a722dfe1a4 Rename ParserFactory -> RuleCheckerFactory
The old parser now has the correct name "Evaluator", so the
ParserFactory name was outdated. Additionally, the plan is to create a
new RuleChecker class, acting as a facade for the different
parsing-related stages (lexer, parser, evaluator, etc.), which is what
most if not all callers should use. The RuleCheckerFactory still returns
a FilterEvaluator for now.
Also, "Parser" is a specific term defining *how* things happen
internally, whereas "RuleChecker" describes *what* callers should expect
from the new class.

Change-Id: I25b47a162d933c1e385175aae715ca38872b1442
2021-09-08 21:59:34 +02:00
Daimona Eaytoy 357ddd498c Clean up / simplify parser-related classes
Remove unnecessary setters, injecting everything in the constructor.
These were leftovers from before the introduction of ParserFactory.
Remove public access to the conds used, include the information inside
the returned ParserStatus instead, and consequently simplify callers.

Change-Id: I0a30e044877c6c858af3ff73f819d5ec7c4cc769
2021-09-08 13:41:52 +02:00
Daimona Eaytoy f8e9ac7e2a Rename AbuseFilterCachingParser -> FilterEvaluator
It's an evaluator, not a parser.

Change-Id: Ib6d33e8423ea72709cf5a33f4397ba33e352ea80
2021-09-08 13:40:47 +02:00
libraryupgrader 2a4860e322 build: Updating mediawiki/mediawiki-phan-config to 0.11.0
Change-Id: I097d051e3c30e61d74a8e329b6110b219c72ec1a
2021-09-07 19:30:42 -07:00
Daimona Eaytoy 6684ea6450 Remove AFPTransitionBase
Also cleanup the mPos hack in the CachingParser.

Change-Id: Ib5693802a3ceb80cb736880ed65e27340abef689
2021-09-06 19:33:48 +00:00
Sorawee Porncharoenwase 320e3d696f Add a static analyzer for the filter language
This commit adds a class AFPSyntaxChecker which can statically analyze
a filter code to detect the following errors:

- unbound variables (which comes in two modes: conservative and liberal,
  default to conservative)
- unused variables (disabled by default for compatibilty)
- assignment on built-in identifiers
- function application's arity mismatch
- function application's invalid function name
- non-string literal in the first argument of set / set_var

The existing parser and evaluator are modified as follows:

- The new (caching) evaluator no longer needs to perform variable
  hoisting at runtime.
  - Note that for array assignment, this changes the semantics.
- The new parser is more lenient, reducing parsing errors.
  The static analyzer will catch these errors instead, allowing us
  to give a much better error message and reduces the complexity of
  the parser.
  * The parser now allows function name to be any identifier.
  * The parser now allows arity mismatch to occur.
  * The parser now allows the first argument of set to be any expression.

Concretely, obvious changes that users will see are:

1. a := [1]; false & (a[] := 2); a[0] === 1

   would evaluate to true, while it used to evaluate to the undefined value
   due to hoisting

2. f(1)

   will now error with 'f is not a valid function' as opposed to
   'Unexpected "T_BRACE"'

3. length

   will now error with 'Illegal use of built-in identifier "length"'
   as opposed to 'Expected a ('

Appendix: conservative and liberal mode

The conservative mode is completely compatible with the current evaluator.
That is,

false & (a := 1); a

will not deem `a` as unbound, though this is actually undesirable because
`a` would then be bound to the troublesome undefined value.

The liberal mode rejects the above pattern by deeming `a` as unbound.
However, it also rejects

true & (a := 1); a

even though (a := 1) is always executed. Since there are several filters
in Wikimedia projects that rely on this behavior, we default the mode
to conservative for now.

Note that even the liberal mode doesn't really respect lexical scope
appeared in some other programming languages (see also T234690).
For instance:

(if true then (a := 1) else (a := 2) end); a

would be accepted by the liberal checker, even though under lexical scope,
`a` would be unbound. However, it is unlikely that lexical scope
will be suitable for the filter language, as most filters in
Wikimedia projects that have user-defined variable do violate lexical scope.

Bug: T260903
Bug: T238709
Bug: T237610
Bug: T234690
Bug: T231536
Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2
2021-08-31 03:28:24 +02:00
Daimona Eaytoy 704364a5e7 Move parser exceptions to specific namespace and rename them
Create a dedicated "Exception" sub-namespace and remove the "AFP"
prefix, a leftover from the pre-namespace era.

Change-Id: I7e5fded9316d8b7d1628bc1a6ba8b1879ac901e1
2021-08-29 23:38:31 +00:00
jenkins-bot 9b93b0256a Merge "Avoid passing invalid offset to mb_strpos" 2021-08-18 18:45:12 +00:00
libraryupgrader 5377ebe819 build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 36.0.0 → 37.0.0

npm:
* postcss: 7.0.35 → 7.0.36
  * https://npmjs.com/advisories/1693 (CVE-2021-23368)

Change-Id: I2b382f3bb236fb44eb24c6a257b13b8fd886541c
2021-07-21 18:51:18 +00:00
Daimona Eaytoy 069fa064f5 Avoid passing invalid offset to mb_strpos
Bug: T285978
Change-Id: I3d100fd05f34fe3b01ecbbce5361badc613f9406
2021-07-02 14:07:46 +00:00
Daimona Eaytoy 57f11631ba Pass a valid regexp to preg_match in checkRegexMatchesEmpty
Bug: T283966
Change-Id: I99688aa8f3e62e410392a9142df56b1a3c708987
2021-05-29 11:38:07 +00:00
Umherirrender 1fa7a83f60 Use static closures where safe to use
Created by I25a17fb22b6b669e817317a0f45051ae9c608208

Change-Id: I533690311ca559685de8a4bf123348c9bcfa5931
2021-04-30 20:55:35 +02:00
Daimona Eaytoy f8438a4647 Remove the old parser
All methods were moved to the new parser. Tests and other pieces were
adjusted to expect just a single parser. There are still some TODOs
(remove AFPTransitionBase, remove $this->mCur), but these are left for
another commit.

Note that the new parser was not renamed: this is because the names are
wrong anyway (CachingParser is more of an Evaluator than a Parser, and
AFPTreeParser is the real parser, and should be renamed as well).

NOTE to reviewers: this patch looks quite big, but if you diff the old
parser with the new version of the CachingParser, you'll notice that the
diff is actually small, since everything was basically copied verbatim.

Bug: T239990
Change-Id: Ie914ef64c70503a201b4d2dec698ca2fa8e69b10
2021-04-09 13:23:07 +00:00
Daimona Eaytoy 2bb5c3c7b5 Align arg counting between the parsers
1 - Change the structure of if/elseif for readability
2 - In the old parser, if there's an empty argument, never add it (the
new parser was already doing that).

Bug: T156095
Bug: T156096
Change-Id: I4237b1a0ba01e7ce04dcc945f7daf34612fcf07d
2021-02-20 14:33:56 +00:00
Daimona Eaytoy e64049c30b Create dedicated types of parser exceptions
Introduce a clear distinction between internal exceptions and
user-visible exceptions, leaving AFPException as base abstract class.

Later, it should be possible to narrow some types around, e.g. in
ParserStatus (that might work with user-visible exceptions only).

Also a future TODO is putting all the exceptions in their own namespace
(probably ...\Parser\Exception).

Change-Id: I4e33a45117f0a3e73af03cc1e3f2734beaf2b5e1
2021-02-12 13:56:02 +00:00