Most notably, make it not use additional DB tables to test global
filters. Instead, just pretend that the local database is not local (via
config) and "hide" local filters with a simple test-only flag in
FilterLookup.
Change-Id: Ib431dbf6c9d84978ee84e7f0671cfcbf8a54d7a2
Echo will at some point try to load the user with the given ID, and will
throw an exception if it doesn't exist. The test is currently passing
just because we're not properly cleaning DB tables, and the user with ID
1 happens to exist at that point, but it will fail with core change
Ie2f1809d.
Change-Id: Ie686f4d5c2842e45a6ed564b311bb5d9b0265091
tests\selenium\specs\importingFilters.js
81:1 warning This line has a length of 106. Maximum allowed is 100
max-len
Change-Id: Ic8bee37971afb6fe4ba7102d687f0fa212092f4c
These tests were accessing the Database, for mainly 3 reasons:
- User::newSystemUser
- Static methods in ChangeTags
- Echo's Event class
There isn't much we can do about them, so add tests to the Database
group where needed. In some cases, there are already comments that these
tests should be made unit tests once possible.
Bug: T155147
Change-Id: I8a0d52e0a4cae8a4059b62867853a73e60c878a1
TestUser requires a DB connection, so avoid using it in database-less
tests. Add to the Database group tests that are making DB writes (e.g.,
for log entries).
Change-Id: I211cb60296e5c2446128fcdf2caaadc728a8c272
I'm planning to add support for bypass and regex-based blocking which
means it'll grow a bit. So let's give it a dedicated class.
Bug: T337431
Change-Id: I5a6fe2fd2f1efdebd8cada0ba6c481341f830e27
The handler class uses hook interfaces from the CheckUser extension, so
it can't run if CheckUser is not installed.
Change-Id: I5f40366f27cc885e95e1bb93ec421b09c7caa9a6
This method actually consists of two: add derived vars, and initialize
content vars. The former part depends on no parameters of this method.
On the other hand, the latter part combines multiple implementations
for some of the content variables using branching.
The branching is a dirty workaround and inferior to the GRASP principle:
"When related alternatives or behaviors vary by type, assign
responsibility for the behavior to the types for which the behavior
varies."
In other words, the callers (extensions) should be responsible for
choosing the initialization strategy themselves, instead of letting
VariableGenerator figure it out.
As the first step, split the former part to a separate method.
For now, it will be implicitly called by ::addEditVars.
Change-Id: I5ff00dbdbf29ec54eabfd95c44a4fd7f713969f5
Service wiring should only depend on config, not on request state.
Creating a session object during service wiring causes issues with entry
points such as opensearch_desc.php that disable the session.
Bug: T340113
Change-Id: I2450b0b6821ff0b097e283ff660a0b8aeea9590a
Protected effectively means "public to subclasses" and should be
avoided for the same reasons as marking everything as public should
be avoided.
Change-Id: Iba674b486ce53fd1f94f70163d47824e969abb77
Unlike what the 20-year old source comments in UrlUtils.php would
have you believe, parse_url() works fine nowadays, including for
protocol-relative URLs and indeed lots of prod code uses it directly.
The class still has some convenience value for case where you need to
expand or manipulate URLs, but for the common case of extracting a part
of it, you really don't need it.
Test plan:
$ php phpunit.php ../../extensions/AbuseFilter/tests/phpunit/integration/FilteredActionsHandlerTest.php
Bug: T337431
Change-Id: I1e76d2f5aef65365743214530faba656325b965a
Or at least one step forward to better isolation.
Some providers were replaced with individual test cases.
Bug: T337144
Change-Id: I5cd8c5e79993260f18c3a17c40b8501a4da3c17f
And register AbuseFilterRunnerFactory as a service name that’s allowed
to not have a getRunnerFactory() method without the test complaining
(the service was renamed, getFilterRunnerFactory() exists).
Change-Id: Idedb87e64a6df02b0edae8d9e7dbf441752dc480
Needed-By: If5af88e7f70b83d53f66b9617a5ef37daf81830f
When forFilter is true and PreparedUpdate is available
(most save operations), retrieve all_links from
PreparedUpdate::getParserOutputForMetaData. Otherwise
do what was done before.
Note that this change probably leaves some dead code. It will be dealt
with later.
NOTE: this changes code potentially executed on every save operation.
Bug: T65632
Bug: T264104
Change-Id: I3628a56e5277846c1b90444fb55983870eb54c1e
The method for old_links retrieval depends on the "forFilter"
value, which we know in advance. If it's true, old_links should
be retrieved from the database. Make a case in the switch
that does nothing but retrieves links from the database,
and direct the evaluation to it.
This change was split from I3628a56e5 to make its review easier.
NOTE: this changes code potentially executed on every save operation.
Change-Id: I33b688f6be3c58beec403f7bf26407a42e7c18ab
Regarding array building: Instead of adding to array with
$array[] = 'foo' and then doing array_flip(), simply do
$array['foo'] = true;
Regarding tests: I originally wanted to create a unit test but I ended
up mocking so many things that it wasn't worth it and the config variable
is globaly which first we need to clean up after deployment is done.
Bug: T337431
Change-Id: Iac8dca7078668ee3441d19b6aafe499c1aa0d732
Treat temporary users the same as IP users. Neither has user groups,
so return early for both.
Bug: T335062
Change-Id: I20b48608cf6ba5f8e8e36a378d66c603d84b032f
It is behind a feature flag. Improvements on it can happen in follow
ups. The patch is already quite massive.
Bug: T337431
Bug: T279275
Change-Id: I3df949c4d41ce65bb4afa013da9c691ac05fc760
* Set the same block expiry for temp and anon users
* Don't block autopromote for temp users; they can't be autopromoted
* Bail early from CheckUserHandler if the user is temporary
Bug: T335062
Change-Id: I6b72537f568c4c70a0b86f1825ea30b767f5634a
- Do not try to switch the editor if CodeEditor is not installed
- Make sure that the state before the test runs is known
- Add more specific assertion on error message
- Add missing await to async function call
- Re-enable test disabled in I47b4794c4098a25696ffe42c42d00fd767971b5d.
Selenium tests were run 100 times in CI in PS2 and they always passed.
Bug: T300790
Change-Id: Id9ca618dcc59396980b1ee38819677583107738c
WebdriverIO has dropped support of sync mode, hence changed to async.
Update npm packages: @wdio/*, wdio-mediawiki
because async mode needs at least @wdio v7.9.
Remove npm packages: @wdio/dot-reporter and @wdio/sync.
Bug: T300790
Change-Id: Ifcc95a8c7e47ba4513f6910620ecce83e1ead15a
This patch migrates abuse_filter and abuse_filter_history tables
to new actor schema.
MigrateActorsAF was copy-pasted from core's
maintenance/includes/MigrateActors.php before removal (ba3155214).
Bug: T188180
Change-Id: Ic755526d5f989c4a66b1d37527cda235f61cb437
Use the very new getPrimaryDatabase and getReplicaDatabase.
We skip FilterLookup and CentralDBManager in this patch.
Change-Id: I22c6f8fa60be90599ee177a4ac4a97e1547f79be
Hook on to CheckUserInsertPrivateEventRow and CheckUserInsertLogEventRow
to override the IP, XFF and User-Agent string when the user is the
abuse filter user for log events.
These two hooks are being added as log entries are being removed from
cu_changes and added into two new tables. Because the columns and their
names are different for these tables, reusing the same hook won't work
for callers that rely on setting values for a specific column name.
Edits and log entries performed by the abuse filter user need to be
marked as being by the software (and not using the IP, XFF and
User-Agent provided in the main request).
These hooks will not be run until the appropriate config is set to
write to the two new tables. Until that point using the one currently
defined hook will work for all actions.
Bug: T324907
Bug: T44345
Depends-On: I7c7754323ade9a8d96273c1742f30b1b5fbe5828
Follow-Up: Idd77545af94f9f9930d9ff38ab6423a72e680df9
Change-Id: Id78417e9d95220946f110afbe1430df5b3bb4f4f
We might consider adding an in-process cache because there
will be a duplicate database lookup for content model and
wikitext of the same revision.
Bug: T230295
Change-Id: I9723f21069e03a49fa7131bd8f79c6e7e442104b
The motivation is to have a single immutable object providing
information about the action. It can represent the current
action being filtered, but also a past action stored in the
abuse log. It will hopefully help us get rid of passing
User(Identity) and Title/LinkTarget objects around together.
Change-Id: I52fa3a7ea14c98d33607d4260acfed3d3ba60f65
For fixing bugs like T65632, T105325, or T264104, we will need
to update code in more than one place at once. To prevent
regressions, create an integration test which tests the whole
pipeline, from the request submission to variable evaluation.
Edits are simulated using action=edit API call because the hook
AbuseFilter uses is run from EditPage.
To increase confidence in test coverage, remove some annotations
from AbuseFilterConsequencesTest or make them less greedy.
Ideally, it would only test consequences.
This patch includes refactoring of AbuseFilterCreateAccountTestTrait
which now only inserts the user into the database if it really
should be created.
It also restores test coverage of some other classes.
Change-Id: I661f4e0e2bcac4770e499708fca4e4e153f31fed
Change the IP to 127.0.0.1 (to indicate an internal IP), and blank
the XFF and UA when the performer of an action being logged by
CheckUser is the abuse filter user. Actions performed by the abuse
filter user can only be initated by the software, and as such should
not use the request's IP, XFF and UA. Also test the newly added
code.
Bug: T44345
Depends-On: I28acaaebd2d0067b700da0930e7b7ba924fa5c1c
Change-Id: Idd77545af94f9f9930d9ff38ab6423a72e680df9
Test that null edits do not trigger filters, but sole
content model change does.
Also do some cleanup in AbuseFilterConsequencesTest.
For better isolation, do not access the service
container and do not initialize objects in
the constructor.
Change-Id: I043ecb312226a69d1f485a8382d558ccb899a270
A new core facility written for this use case.
Bug: T310662
Depends-On: I26b1cdba0a06ad16ad8bb71b455e1b6180924d17
Change-Id: I2b902d034a8c3308c0ba9878b69e873ca8fbda52
In action=abusefilterunblockautopromote, leave UserIdentity
instantiation to the parent. Note that this changes the "code"
in the response from "baduser_user" to "baduser".
Change-Id: I97d2bf3fa3c5486e461823f840cad2763e1bcfea
Almost all callers already provide an Authority in the form
of a User object, so mostly just need to change the typehints
Depends-On: I58661943c7e1acb6ff09798ee1a30be0fde3f459
Change-Id: I2ad86859c8194c14d7331f58db62b7cff4698085
- use unique ids to find rc entry, to support parallel unit tests and
rdbms where the auto increment value must not increase in time
- Change from Title::newFromText to Title::makeTitle to avoid parsing
the title
- Pass the title to editPage() to avoid reparse of the title
- Use assertSame to compare values
Change-Id: I455b4412a6669475463dee7dea0969ae1cbd8ebb
These are apparently the only two variables for which we can
quickly determine their value in such simple way.
Later, we can also try it for recent contributions.
Bug: T102944
Change-Id: Iecfa9e5c5ba8c078691334b676cc6f289790cb74
Using Title::newFromText is parsing the string, which is expensive.
Just use Title::makeTitle when the result is known.
editPage() can take a Title or WikiPage instead of a string, avoid
creation of Title there.
The default ns on editPage() is only needed when giving a string
Change-Id: Ie303b9e6d6b8d6ac80286059f8e86bfc76b779af
This was most definitely my intention when I introduced the concept of
"generic vars", so it's a bit surprising to discover, 3.5 years later,
that the timestamp isn't computed there.
Also make the timestamp always be a string for consistency, since that's
the type documented on mw.org. I've manually checked all filters on
Wikimedia wikis using the timestamp variable, and added explicit int
casts where needed (although I think they'd still work due to implicit
casts).
Change-Id: Ib6e15225dd95c2eead7e48c200d203d6918e0c18
We want to make sure that all parameters are valid regardless of whether
there's a match.
Also make the minimum number of parameters = 2, so it's easier to switch
between this function and ip_in_range.
Change-Id: I141558a7ef4533485e315b3d93ea9b64f0959db7
Added support for ip_in_ranges which allow multiple ranges to be
checked at the same time. If the IP is in any of the ranges, the
function returns true.
Bug: T305017
Change-Id: Ic75c87ecd4cacf47ce2ff1b04173405230ff81d0
- Define it with the extension.json key, instead of using the
registration callback
- Inject the services it needs
- Replace direct User instantiation with UserFactory
- Move log subtypes to extension.json as well
Change-Id: I86a761c7fa844b1f417b974798373622a15f6411
Convert a few integration tests to unit tests now that it's possible,
split the AbuseFilterSaveTest file into three different classes.
Change-Id: Ia2c0d7ab878b20a89324336a532abdc44f1e6b74
Introduce shorter methods, one for each steps, so that it's easier to
understand what the code is doing and figure out if the order makes
sense. The ConsequencesExecutor test is now a proper unit test. Also
simplify AbuseFilterConsequencesTest, removing old/wrong logic and
fixing two expected values that were actually wrong (but worked because
of the aforementioned wrong logic).
The only functional changes should be:
- We pick the longest block *after* checking the ConsequenceDisabler
consequences, so e.g. if a filter has a long block + warn and another
filter has a shorter block, we still keep the second one if warn will
disable the block.
- Remove disallow in presence of dangerous actions after checking
ConsequenceDisabler's and deduplicating blocks. Otherwise we may
remove disallow for filters where block (etc.) doesn't end up being
disabled. We may also want to consider not removing disallow at all,
now that messages are customizable.
Bug: T303059
Change-Id: If00adbf2056758222eaaea70b16d3b4f89502c20
- Use a /64 range for IPv6 instead of /16.
- Fix a curious and serious bug for IPv6, where grouping by range
would only use the first (!) number of the IP address, due to the
'v6-' prefix returned by IP::toHex.
- Fail hard if the identifier is unknown -- it's not something that's
supposed to happen.
- Include the type name in each identifier, instead of prefixing all
type names to all identifiers. This makes it easier to understand the
parts of the key.
- Test the whole lot.
Bug: T211101
Change-Id: I54c4209f2f0d5a4c5e7b81bed240ca3e28a2ded7
assertStatusMessage is being added to MediaWikiTestCaseTrait, rename
a method of the same name in FilterValidatorTest to avoid conflicts.
Change-Id: I642a3b620ab4d8ad620f7a1253fed98d6796883d
NeededBy: Ic01715b9a55444d3df6b5d4097e78cb8ac082b3e
List which actions were disabled, or explicitly say that no actions were
disabled if that's the case. Also avoid the word "throttle" in messages
as it may be hard to translate. Also don't suggest optimizations to the
filter conditions -- unoptimized rules have nothing to do with a filter
being throttled.
Bug: T200036
Change-Id: Id989fb185453d068b7685241ee49189a2df67b5f
This is a plain value object that represents the action being filtered,
replacing associative arrays that were being used up to this point.
We should now check whether it's possible to make it not require an
accountname (which complicates things), and then use it in related
classes as well, e.g. Parameters.
Change-Id: I9550c14819b600c97c46b632cc1c2d447972d69c
The existing filters on WMF wikis has been changes such that calls
to rmspecials() are now rmspecials(rmwhitespace()) to ensure no change
is made in behaviour. Filter admins can change this back if filter is
not meant to trigger when part of the input is contains spaces.
Bug: T263024
Change-Id: Idde09b50fb8eda357afbedc1199a5483fa8217c1
WikiPage::factory() is deprecated since 1.36 and should be replaced
with WikiPageFactory::newFromTitle().
Bug: T297688
Change-Id: I85d3566519ab977aad8c517cc48fc8c271e5589a
This replaces the previous pattern of callers having to use
RevisionLookup if the result was 'implicit'. Also, in some cases where
we were just hiding things if the visibility was !== true, properly
handle the implicit case by using the new method. Make the new method
return string constants rather than bool|string.
The new method also fixes some potential info leaks which happened when
the row was hidden, the user could view suppressed AbuseLog entries, but
the associated revision was also deleted and the user couldn't see it
(this shouldn't be relevant for WMF wikis since AF deletion is
oversight-level).
Also add a bunch of tests for the various cases to ensure we don't
regress again.
Bug: T261532
Change-Id: I929f865acf5d207b739cb3af043f70cb59243ee0
ParserStatus is now more lightweight, and doesn't know about "result"
and "from cache". Instead, it has an isValid() method which is merely a
shorthand for checking whether getException() is null.
Introduce a child class, RuleCheckerStatus, which knows about result and
cache and can be (un)serialized.
This removes the ambiguity of the $result field, and helps the
transition to a new RuleChecker class.
Change-Id: I0dac7ab4febbfdabe72596631db630411d967ab5
The old parser now has the correct name "Evaluator", so the
ParserFactory name was outdated. Additionally, the plan is to create a
new RuleChecker class, acting as a facade for the different
parsing-related stages (lexer, parser, evaluator, etc.), which is what
most if not all callers should use. The RuleCheckerFactory still returns
a FilterEvaluator for now.
Also, "Parser" is a specific term defining *how* things happen
internally, whereas "RuleChecker" describes *what* callers should expect
from the new class.
Change-Id: I25b47a162d933c1e385175aae715ca38872b1442
Remove unnecessary setters, injecting everything in the constructor.
These were leftovers from before the introduction of ParserFactory.
Remove public access to the conds used, include the information inside
the returned ParserStatus instead, and consequently simplify callers.
Change-Id: I0a30e044877c6c858af3ff73f819d5ec7c4cc769
So that the method can be typehinted in core.
Also add phan-var to fix broken master build due to typehint additions
in core.
Change-Id: I4a072e00ffeeb437753fc3d3c1f15de9929df510
This commit adds a class AFPSyntaxChecker which can statically analyze
a filter code to detect the following errors:
- unbound variables (which comes in two modes: conservative and liberal,
default to conservative)
- unused variables (disabled by default for compatibilty)
- assignment on built-in identifiers
- function application's arity mismatch
- function application's invalid function name
- non-string literal in the first argument of set / set_var
The existing parser and evaluator are modified as follows:
- The new (caching) evaluator no longer needs to perform variable
hoisting at runtime.
- Note that for array assignment, this changes the semantics.
- The new parser is more lenient, reducing parsing errors.
The static analyzer will catch these errors instead, allowing us
to give a much better error message and reduces the complexity of
the parser.
* The parser now allows function name to be any identifier.
* The parser now allows arity mismatch to occur.
* The parser now allows the first argument of set to be any expression.
Concretely, obvious changes that users will see are:
1. a := [1]; false & (a[] := 2); a[0] === 1
would evaluate to true, while it used to evaluate to the undefined value
due to hoisting
2. f(1)
will now error with 'f is not a valid function' as opposed to
'Unexpected "T_BRACE"'
3. length
will now error with 'Illegal use of built-in identifier "length"'
as opposed to 'Expected a ('
Appendix: conservative and liberal mode
The conservative mode is completely compatible with the current evaluator.
That is,
false & (a := 1); a
will not deem `a` as unbound, though this is actually undesirable because
`a` would then be bound to the troublesome undefined value.
The liberal mode rejects the above pattern by deeming `a` as unbound.
However, it also rejects
true & (a := 1); a
even though (a := 1) is always executed. Since there are several filters
in Wikimedia projects that rely on this behavior, we default the mode
to conservative for now.
Note that even the liberal mode doesn't really respect lexical scope
appeared in some other programming languages (see also T234690).
For instance:
(if true then (a := 1) else (a := 2) end); a
would be accepted by the liberal checker, even though under lexical scope,
`a` would be unbound. However, it is unlikely that lexical scope
will be suitable for the filter language, as most filters in
Wikimedia projects that have user-defined variable do violate lexical scope.
Bug: T260903
Bug: T238709
Bug: T237610
Bug: T234690
Bug: T231536
Change-Id: Ic6d030503e554933f8d220c6f87b680505918ae2
Create a dedicated "Exception" sub-namespace and remove the "AFP"
prefix, a leftover from the pre-namespace era.
Change-Id: I7e5fded9316d8b7d1628bc1a6ba8b1879ac901e1
Regression tests to make sure T286140 does not
happen again.
In the process, discovered what caused that bug
with afl_rev_id not being set: EditRevUpdater::updateRev()
compares the WikiPage given in the PageSaveComplete hook
to the one given to it by AbuseFilterHooks from
onEditFilterMergedContent, and compares the two using
`===`, meaning that they must refer to the same underlying
object. That bug was caused because AbuseFilterHooks
changed to providing a different object, despite still
referring to the same underlying page.
We should probably change that behavior in EditRevUpdater,
but for now updated AbuseFilterConsequencesTest to pass
the same object around by using RequestContext::setWikiPage()
and providing the WikiPage object to
MediaWikiIntegrationTestCase::editPage().
Bug: T286140
Change-Id: I6562f513c463538af6b59b12a64564b254024613