When RESTBase is turned off, Parsoid runs will no longer be triggered
on template changes. This creates a new mechanism to do that, based on
the RevisionDataUpdates hook called by DerivedPageDataUpdater. The new
behavior is controlled by a feature flag, LinterParseOnDerivedDataUpdate,
which is enabled per default. In WMF production, this should be
turned off as long as we are still triggering Parsoid parses through
the pregeneration mechanism in RESTBase.
Note that this will not write ParserOutput to the ParserCache. On edits,
pages will get parsed with Parsoid twice, once to trigger the lint data
update, and once by ParsoidCachePrewarmJob to populate the ParserCache.
Both parses will trigger the ParserLogLinterData hook, the lint data
from the second parse is redundant.
However, while ParsoidCachePrewarmJob and RevisionDataUpdates get
triggered together on edits, they also get triggered separately:
ParsoidCachePrewarmJob by page views with parser cache misses; and
RevisionDataUpdates when pages get invalidated due to template changes.
Because ParsoidCachePrewarmJob and RevisionDataUpdates generally get
triggered in different situations, it seems cleaner to keep the two
mechanisms independent of each other, and live with the duplicate parse
on edit.
Bug: T361013
Change-Id: If53841ee583ce240dd245d640b9ea9c97e1eaa55
Instead, pass the page id when using methods for a page. The change
avoids constructing Database a dummy page id when those methods aren't
going to be used.
getFromId doesn't seem like it needs a page id, since the linter id is
the primary key.
Also, a namespace id should no longer optional to setForPage. The
LinterWriteNamespaceColumnStage option already gates whether to include
it in the row.
Follows-Up: I9fd6e7724dcf33be0b1feb19ec8eb448738cab09
Change-Id: Ib3d3622144b670ebe1a4ce04e6db6811584d42c8
And addresses some other cleanup from review comments.
Follows-Up: I9fd6e7724dcf33be0b1feb19ec8eb448738cab09
Change-Id: If87b0bf91930f0f8d89ed046d18aadb8f346f9aa
When searching for a specific page title, it's necessary to specify
page_namespace, not just linter_namespace, so that the relevant index in
the page table can be used.
Submitting the form with an empty namespace box led to a search for
namespace zero, because getCheck() returns true for an empty string.
It's not easy to search for a title part in all namespaces. So drop
that hidden feature and interpret a title part with a missing namespace
as being a search for namespace 0.
It's possible to search for a category with an empty title and zero or
more namespaces. Implement the namespace filter in this case using the
linter_namespace field. But ignore the namespace filter if there is no
category, since there is no index on linter_namespace alone.
Bug: T360865
Change-Id: I00934eaaf1a99e4098f177166b43069d33d9f137
* Using namespacesmultiselect type in HTMLForm element to
provide multiple namespace selection criteria in reports.
* New namespace URL encoding implemented which matches namespace
parameters against active namespaces to ensure parameter
security and validation.
* Test system updated to use new URL namespace encoding
Bug: T231161
Change-Id: Ic3190cffe259aecdea429c10e35122eabdbe10d4
Use the methods provided by MediaWikiIntegrationTestCase to more simply
create an existing test page. The manual user creation could be replaced
by getTestUser, but it turns out these tests don't really need a
specific account, so we can let IntegrationTestCase use whatever account
it wants to use.
Also remove @var annotations that can easily be inferred by the doc
comment on the relevant methods.
Change-Id: I8bfd0799b79721c7c9de0d2a10a97c498d192a15
* Tag and Template search is enabled using config variable
'LinterUserInterfaceTagAndTemplateStage' and also checks for
the linter table column 'linter_tag' to exist to protect the
report code from error if the column is absent. As the linter
table alter maintenance added both the linter_tag and
linter_template at the same time, there is no reason to check
both. The user interface code does not check for the field
presence only the config variable.
* This code depends on the recordLintJob code writing the tag
and template data which is enabled by the config variable
'LinterWriteTagAndTemplateColumnsStage' and also assumes the
data migration maintenance script migrateTagTemplate.php has
been run to populate linter error records created prior to
the table alter and the write code being enabled.
Bug: T175177
Change-Id: I2f951dfcd34e3dc6ca17e8754cfaeba8baa3e835
* Having separate config variables to enable the maintenance
migrateNamespace and migrateTagTemplate scripts is duplicitous
and should be shared with the write enable config variables.
Bug: T329342
Change-Id: I4cb453fc0678b065cb42a2ca59863da1ab9cdbe4
* The linter migrate code for linter_tag field and linter_template
field are constrained by the database schema to 32 characters
for the tag field and 255 characters for the template field.
In some anomalous circumstances parsoid can report tag and or
template fields in the linter_params object that exceed those
character limits. This code truncates these excessively long
strings to protect the database migrate update code from a
length exceeded error.
Bug: T329113
Change-Id: I8af7c44759f172eae77d3519a6eac47110e9b1e7
* The linter write code for linter_tag field and linter_template
field are constrained by the database schema to 32 characters
for the tag field and 255 characters for the template field.
In some anomalous circumstances parsoid can report tag and or
template fields in the linter_params object that exceed those
character limits. This code truncates these anomalous strings
to protect the database update code from a length exceeded
error.
Bug: T328979
Change-Id: I057ae2e32a9e1a7735b5300409e5693e8db5c764
* The migrate code is designed to perform a one-time update of
linter_params JSON encoded template and tag information into
the new discrete template and tag text fields for use as
additional search criteria. The function can be restarted if
it is interrupted.
* It now uses configurable batching and sleep times between
batches to allow the database to do other work and replication
to occur without stressing infrastructure.
* The migrate code is only called by test code and needs to be
called one-time from a maintenance script.
Bug: T175177
Change-Id: Idc4ca88d4762bc7a3bcbc4e66c0f275562083867
* Migrates namespace info from the page tables page_namespace field
to the new linter table field linter_namespace. This duplication
of the namespace value was requested to greatly reduce the amount
of database activity required by the linter search and reporting
code.
* This patch has been prepared as a dark launch patch enabled with
config value LinterMigrateNamespaceStage and assumes that the
Linter table has had the linter_namespace column added to it,
and recording of the namespace field is already enabled and is
populating the namespace column.
* The migrate code now runnable from Linter/maintenance directory,
using migrateNamespace.php, which will be deployed in a separate
patch. The maintenance code creates an appropriate environment
to call migrateNamespace( in Database.php.
Bug: T299612
Change-Id: I73cb80729d6a5a8716fe93164ad1e42e6958d672
* Add "mw-blank" as another tag value that erases all lint errors
for a page as a blank page cannot have any lint errors.
Bug: T280193
Change-Id: Iaad8ce75950588b2676de5dfb5f5221d64231f0e
* Determines if new content type is not wikitext and if so
deletes all existing lint error records for that pageID.
Bug: T298343
Change-Id: I20fac9a0c901f3e7a5cc898566a4487fbe70798f
WikiPage::factory() is deprecated since 1.36 and should be replaced
with WikiPageFactory::newFromTitle().
Bug: T297688
Change-Id: I63bf3ba1c2ad6f8b59d369d91777af0418746a6b
* Adapted other core phpunit test user, title and page creation
code to avoid creating a MOCK title such that the job runner
finds the page(title) in the database and runs the job without
hackery of populating the title in the constructor of
RecordLinkJob. When the getForPage() runs, it finds the
page and its lint errors through the standard code paths.
Bug: T225337
Change-Id: Ibb57523ee2f066c7bd0465c14f0dcb2bab51286b
Currently we select 20 rows, and return the accurate count if it's less
than that, so up to 19 rows. Since we want to return an accurate count
if it's 20 rows or less, select one more row, 21, so we can differentiate
between only having 20 result rows or hitting the limit. This is the same
technique used in MediaWiki's Pager system.
Change-Id: I50fa96238eb4c7178414ee92c53799fd69520926
* The code now produces an accurate count if the number of
errors for a category is below the threshold set by a
public constant MAX_ACCURATE_COUNT (currently 20).
The database record count limit was originally set to 1,
to determine accurately, if there were actually 0 errors
in a category as the estimate code would never report 0.
If not 0, it would use the estimated count which does not
produce an accurate count for any other number of errors.
For low error counts this is annoying to editors and
unnecessary. The additional CPU/disk activity to accurately
check for low error counts is not significantly more than
checking for 0 or 1, as checking for 0 likely requires
a complete table scan which is probably expensive compared
to a low count that early outs when it hits to record limit.
* An improvement to consider is recording the accurate count in
a separate tiny table, and maintaining an accurate count there
which is used in preference to doing the select with row limit
based on say a 30 second TTL, to prevent a stampede of requests
from doing extraneous database operations.
* Added unit test coverage for accurately counting low error
conditions that are lower than the threshold and also verify
that the estimate is inaccurate beyond the error count
threshold.
Bug: T194872
Change-Id: I4f74cfe3bf9601baa0dc8fa6464a68030ac2bc4b
No integration needed.
Requires bumping minimum version of mediawiki to when
MediaWikiUnitTestCase was introduced in 1.34.
Change-Id: Ibc0a1028cc61a7bdc149081aeaa1109de18ee119
"Using assertContains() with string haystacks is
deprecated and will not be supported in PHPUnit 9.
Refactor your test to use assertStringContainsString()
or assertStringContainsStringIgnoringCase() instead."
Change-Id: I88df8a91660eb332a0ec87070eff31cfcf8c4955
The following sniffs are failing and were disabled:
* MediaWiki.Commenting.FunctionComment.MissingDocumentationPrivate
Additional changes:
* Also sorted "composer fix" command to run phpcbf last.
Change-Id: Icdd0d0e60dd543921a5757162548ae149c3316ea
This eases deployment dependencies by allowing Parsoid to supply an
appropriate database category ID so that new lint categories can be
appropriately stored during the interval between adding a new lint
category to Parsoid and deploying an Extension:Linter patch to
describe it.
Change-Id: Ib7b2342168fa53ca2abac7d5f54fe313be341eb7
This test previously wasn't running because the foreach() in the data
provider was totally wrong.
Also the -details variant for fostered isn't supposed to exist, so
hardcode in an exception.
Finally, add the @coversNothing annotation since this test is just
verifying the contents of en.json, not any PHP code.
Change-Id: I7ffffcc3a910aefb082f7ff59265d3be8bc46347
The following sniffs are failing and were disabled:
* MediaWiki.FunctionComment.Missing.Protected
* MediaWiki.FunctionComment.Missing.Public
Change-Id: I96e32df48d13040893bfd1be6d90d0db4f7c7d0a
The query itself is too expensive to be run on large Wikimedia wikis. So
put it behind WAN cache and touch the check keys for each category
whenever those have errors added or deleted from them.
If this happens to get out of sync, it will get fully refreshed
regularly when the totals are sent to statsd.
WANObjectCache's 'lockTSE' feature will help avoid cache stampedes that
made this query expensive in the past.
Change-Id: I3774103a29fa0f29d36283950f136259fa71bffe