* Having separate config variables to enable the maintenance
migrateNamespace and migrateTagTemplate scripts is duplicitous
and should be shared with the write enable config variables.
Bug: T329342
Change-Id: I4cb453fc0678b065cb42a2ca59863da1ab9cdbe4
* The linter migrate code for linter_tag field and linter_template
field are constrained by the database schema to 32 characters
for the tag field and 255 characters for the template field.
In some anomalous circumstances parsoid can report tag and or
template fields in the linter_params object that exceed those
character limits. This code truncates these excessively long
strings to protect the database migrate update code from a
length exceeded error.
Bug: T329113
Change-Id: I8af7c44759f172eae77d3519a6eac47110e9b1e7
* The linter write code for linter_tag field and linter_template
field are constrained by the database schema to 32 characters
for the tag field and 255 characters for the template field.
In some anomalous circumstances parsoid can report tag and or
template fields in the linter_params object that exceed those
character limits. This code truncates these anomalous strings
to protect the database update code from a length exceeded
error.
Bug: T328979
Change-Id: I057ae2e32a9e1a7735b5300409e5693e8db5c764
* The migrate code is designed to perform a one-time update of
linter_params JSON encoded template and tag information into
the new discrete template and tag text fields for use as
additional search criteria. The function can be restarted if
it is interrupted.
* It now uses configurable batching and sleep times between
batches to allow the database to do other work and replication
to occur without stressing infrastructure.
* The migrate code is only called by test code and needs to be
called one-time from a maintenance script.
Bug: T175177
Change-Id: Idc4ca88d4762bc7a3bcbc4e66c0f275562083867
* Migrates namespace info from the page tables page_namespace field
to the new linter table field linter_namespace. This duplication
of the namespace value was requested to greatly reduce the amount
of database activity required by the linter search and reporting
code.
* This patch has been prepared as a dark launch patch enabled with
config value LinterMigrateNamespaceStage and assumes that the
Linter table has had the linter_namespace column added to it,
and recording of the namespace field is already enabled and is
populating the namespace column.
* The migrate code now runnable from Linter/maintenance directory,
using migrateNamespace.php, which will be deployed in a separate
patch. The maintenance code creates an appropriate environment
to call migrateNamespace( in Database.php.
Bug: T299612
Change-Id: I73cb80729d6a5a8716fe93164ad1e42e6958d672
If908c4dc99c966cde2981f9a03be38a577406a4e introduced the
wgLinterWriteNamespaceColumnStage global but didn't add it to the
extension.json
Change-Id: I8ee5849f67ddfd894a25425582b59404eb52aef2
* Writes namespaceID to new "linter_namespace" field if global
"wgLinterWriteNamespaceColumnStage" is present
Bug: T299612
Change-Id: If908c4dc99c966cde2981f9a03be38a577406a4e
The article id of the title is set to 0 when the page is deleted so,
although the lint job from the hook runs, it doesn't remove anything.
Reverts most of I06b821b65f65609ddac8ed4e7c662336082d8266
Bug: T298782
Bug: T170313
Change-Id: I2610b9b16d4032b0e18b3537cc9ed51bfdaff299
Currently we select 20 rows, and return the accurate count if it's less
than that, so up to 19 rows. Since we want to return an accurate count
if it's 20 rows or less, select one more row, 21, so we can differentiate
between only having 20 result rows or hitting the limit. This is the same
technique used in MediaWiki's Pager system.
Change-Id: I50fa96238eb4c7178414ee92c53799fd69520926
* The code now produces an accurate count if the number of
errors for a category is below the threshold set by a
public constant MAX_ACCURATE_COUNT (currently 20).
The database record count limit was originally set to 1,
to determine accurately, if there were actually 0 errors
in a category as the estimate code would never report 0.
If not 0, it would use the estimated count which does not
produce an accurate count for any other number of errors.
For low error counts this is annoying to editors and
unnecessary. The additional CPU/disk activity to accurately
check for low error counts is not significantly more than
checking for 0 or 1, as checking for 0 likely requires
a complete table scan which is probably expensive compared
to a low count that early outs when it hits to record limit.
* An improvement to consider is recording the accurate count in
a separate tiny table, and maintaining an accurate count there
which is used in preference to doing the select with row limit
based on say a 30 second TTL, to prevent a stampede of requests
from doing extraneous database operations.
* Added unit test coverage for accurately counting low error
conditions that are lower than the threshold and also verify
that the estimate is inaccurate beyond the error count
threshold.
Bug: T194872
Change-Id: I4f74cfe3bf9601baa0dc8fa6464a68030ac2bc4b
This eases deployment dependencies by allowing Parsoid to supply an
appropriate database category ID so that new lint categories can be
appropriately stored during the interval between adding a new lint
category to Parsoid and deploying an Extension:Linter patch to
describe it.
Change-Id: Ib7b2342168fa53ca2abac7d5f54fe313be341eb7
On large wikis with lots of lint errors, counting the entire table can
be problematic from a performance perspective, sometimes taking minutes.
Instead, use Database::estimateRowCount(), which uses EXPLAIN SELECT
COUNT(*) to get an approximate value for the number of rows. We do make
sure that if the category actually has no rows, that it will return 0.
This should be considered a temporary solution, and we should look into
doing something like the SiteStats incremental updates table in the long
run.
Bug: T184280
Change-Id: I2d4dcc615477fd60e41dfed4a3d1a3ad52a9f4af
If a newer version of MediaWiki gets rolled back, it's possible for
there to be lint entries in the database that don't exist according to
CategoryManager.
Instead of showing an error to the user, just silently hide those rows.
All callers to CategoryManager::getCategoryId() already check the
category exists. The callers for CategoryManager::getCategoryName() will
catch the MissingCategoryException, and log it if necessary. Notably
LinterError::makeLintError() will return false on invalid rows, and all
callers have been updated to handle that.
Bug: T179423
Change-Id: Ia5f56f18a51fa871511b02410222a6079efbfff6
The query itself is too expensive to be run on large Wikimedia wikis. So
put it behind WAN cache and touch the check keys for each category
whenever those have errors added or deleted from them.
If this happens to get out of sync, it will get fully refreshed
regularly when the totals are sent to statsd.
WANObjectCache's 'lockTSE' feature will help avoid cache stampedes that
made this query expensive in the past.
Change-Id: I3774103a29fa0f29d36283950f136259fa71bffe
The most likely scenario of duplicate key errors is that it's the exact
same lint error and there's just a race condition when calculating which
new errors need to be inserted, so just ignore them.
Follows-up 419610bcdb.
Change-Id: I84749ab221bbd517b474be8875bb6a59e4f3258e
These tests insert variations of fake lint errors into the database, and
then read out of the database to check they round-trip properly.
And while we're at it, improve the setForPage() return value.
These tests can be run with something like:
php tests/phpunit/phpunit.php extensions/Linter/tests/phpunit/
Change-Id: Ifdba8a8a104d218a822f909bc5d7b3512aca499d
Move location to two separate columns in the database: linter_start and
linter_end. This allows us to have the database enforce the uniqueness
of those fields, instead of just relying upon the PHP code to do so,
which could be bypassed since we have multiple servers and concurrent
processes.
Change-Id: I3e67ce1b7cb3c93866a388ec3248af4cff2a81e0
If no errors existed for the page, inserting new ones would fail with a
database error since $errors was key-ed by unique id, which the database
wrapper interpreted as field names, causing issues. Using array_values()
gets rid of the keys, fixing the issue.
Change-Id: I7645de4e5d3ac4462d7980374c8ef8be6280442b
It's possible to have duplicate, identical lint errors if the same exact
error is repeated in a template transclusion (e.g. {{1x|<b/> <b/>}})
since the position via dsr is the same. In this case, just de-duplicate
the errors since we can't differentiate them.
At the same time, trim excessive errors on the same page in the same
category. It's most likely that if a page has that many of the same
errors, the editor or bot will just fix all of them at the same time, so
we don't need to include all of them in the database. 20 is kind of a
low value, but we can always increase it later on as necessary.
Change-Id: I9cded720169870d0eea574e1a930ce4e9b190ac0
Instead of having a complex auto-increment category manager that had
additional caching, just hardcode the (currently 6) ids for each
category, which allows us to simplify a lot of code.
If Parsoid sends a lint error in a category that we don't know about, it
is silently dropped.
Bug: T151287
Change-Id: Ice6edf1b7985390aa0c1c410d357bc565bb69108
linter_cat is now an int ID that points to a row in the new
lint_categories table.
The mapping between category names and ids is all handled in PHP, and
cached in APC.
Note that you will need to drop the `linter` table manually and re-run
update.php for this to take effect.
Change-Id: I369d9b4d8d08289b4a20d1cd29a2e327bad28ef8