Commit graph

59 commits

Author SHA1 Message Date
C. Scott Ananian e6a510fbed Use ParserOutputAccess for LintUpdate job
This avoids a duplicate parse with DiscussionTools (T376325) and also
reduces some redundancy by using the metrics-gathering code from
ParserOutput instead of having to clone it here.  Finally, it allows
the parse to use the output of a previous parse for selective
update.

Bug: T376325
Follows-Up: I64a4556a74da4f735a5b562070c21310ecda36d1
Change-Id: I11386e307caaa9fce34870b08bd4dce4c5e6eb25
2024-10-02 20:06:15 -04:00
C. Scott Ananian 0937838f1e Collect selective update statistics from LintUpdate job
This ensures that all parsoid parses are accounted for in our
statistics.  In the future we might want to query the cache for
an existing 'dirty' parse in this codepath to potentially allow
for selective update, but for now assume that selective updates
are not possible here.

Bug: T371713
Depends-On: I5b8c7ab48d5a1d6c1e311149fcac6abdc523aa13
Change-Id: I391e928175f60a1ff2e5c181e20ed72efe4dfd66
2024-09-19 14:00:48 -04:00
C. Scott Ananian ba41d323f9 LintUpdate: use content handler instead of directly invoking ParsoidParser
We don't need to directly handle the ParsoidParserFactory in the
LintUpdate job; use the existing ContentHandler pathways to reduce
dependencies.

Change-Id: I64a4556a74da4f735a5b562070c21310ecda36d1
2024-09-17 16:44:37 -04:00
Arlo Breault ed8e449e13 Drop disabled lints
Covered by RecordLintJobTest::testDropInlineMediaCaptionLints

Change-Id: I564389ec9bd20cf36ec7a9bf96b1aebf7777cbbc
2024-07-25 11:29:36 -04:00
sbailey 0dfaa5523e Remove linter tag and template dual mode config and code
* Removed the write and user interface config variables and
  fixed the tests affected by their removal.

Bug: T331883
Change-Id: If44ceedae7278f498158b8cdd528dfa32be609eb
2024-06-14 15:40:47 -04:00
sbailey 72653441b2 Remove linter namespace field dual mode config and code
* Manual tests completed and query code reviewed

Bug: T331883
Change-Id: Ie1628799bb40ad74a24ab57a27a4176c2364fb82
2024-06-14 09:29:07 -07:00
daniel 8b22ad5d78 Trigger Parsoid run when page metadata is being updated
When RESTBase is turned off, Parsoid runs will no longer be triggered
on template changes. This creates a new mechanism to do that, based on
the RevisionDataUpdates hook called by DerivedPageDataUpdater. The new
behavior is controlled by a feature flag, LinterParseOnDerivedDataUpdate,
which is enabled per default. In WMF production, this should be
turned off as long as we are still triggering Parsoid parses through
the pregeneration mechanism in RESTBase.

Note that this will not write ParserOutput to the ParserCache. On edits,
pages will get parsed with Parsoid twice, once to trigger the lint data
update, and once by ParsoidCachePrewarmJob to populate the ParserCache.
Both parses will trigger the ParserLogLinterData hook, the lint data
from the second parse is redundant.

However, while ParsoidCachePrewarmJob and RevisionDataUpdates get
triggered together on edits, they also get triggered separately:
ParsoidCachePrewarmJob by page views with parser cache misses; and
RevisionDataUpdates when pages get invalidated due to template changes.

Because ParsoidCachePrewarmJob and RevisionDataUpdates generally get
triggered in different situations, it seems cleaner to keep the two
mechanisms independent of each other, and live with the duplicate parse
on edit.

Bug: T361013
Change-Id: If53841ee583ce240dd245d640b9ea9c97e1eaa55
2024-06-03 16:50:17 -05:00
Arlo Breault 261339c2a3 Inject Database into TotalsLookup
Change-Id: I01e6b89b4ce9b1cea241bba9cad7ef6673803166
2024-04-11 12:24:42 -04:00
Arlo Breault ffc266eae6 Drop DatabaseFactory, just have Database as the service
Change-Id: Id25271c82bc7ba833d32dff3fb11d3dfe15a3f02
2024-04-10 21:21:40 -04:00
Arlo Breault c04b075858 Stop constructing Database with a page id
Instead, pass the page id when using methods for a page.  The change
avoids constructing Database a dummy page id when those methods aren't
going to be used.

getFromId doesn't seem like it needs a page id, since the linter id is
the primary key.

Also, a namespace id should no longer optional to setForPage.  The
LinterWriteNamespaceColumnStage option already gates whether to include
it in the row.

Follows-Up: I9fd6e7724dcf33be0b1feb19ec8eb448738cab09
Change-Id: Ib3d3622144b670ebe1a4ce04e6db6811584d42c8
2024-04-10 21:07:08 -04:00
Arlo Breault 1c53684200 Construct services with ServiceOptions
And addresses some other cleanup from review comments.

Follows-Up: I9fd6e7724dcf33be0b1feb19ec8eb448738cab09
Change-Id: If87b0bf91930f0f8d89ed046d18aadb8f346f9aa
2024-04-10 12:34:05 -04:00
C. Scott Ananian 4f991b5d0c [DI] Clean up LintErrorsPager
Inject the services required by LintErrorsPager from the SpecialLintErrors
class.

Change-Id: Ie20e00cccef895fbad8536a94dfc1978f20c4220
2024-04-09 18:35:34 -04:00
C. Scott Ananian 633d6024a4 [DI] Make TotalsLookup an injectable service
Change-Id: I71d41ca5b0a901afd59950b3539d8e19c4cead5f
2024-04-09 18:35:32 -04:00
C. Scott Ananian 24f771a6a3 [DI] Make CategoryManager and Database injectable services
Change-Id: I9fd6e7724dcf33be0b1feb19ec8eb448738cab09
2024-04-09 18:33:13 -04:00
C. Scott Ananian fde916fff5 [DI] Use dependency injection for RecordLintJob
Change-Id: I3b8cd95e075af92c77a7dec4f12a0a81eab3ae4b
2024-04-04 21:42:10 -04:00
C. Scott Ananian d8970278d1 [DI] Use dependency injection for SpecialLintErrors
Change-Id: I211d70d5fb4a321cf302cc10f6e160480468a347
2024-04-04 18:43:10 -04:00
Tim Starling 4dd75df2e8 Fix index usage when searching for page titles
When searching for a specific page title, it's necessary to specify
page_namespace, not just linter_namespace, so that the relevant index in
the page table can be used.

Submitting the form with an empty namespace box led to a search for
namespace zero, because getCheck() returns true for an empty string.
It's not easy to search for a title part in all namespaces. So drop
that hidden feature and interpret a title part with a missing namespace
as being a search for namespace 0.

It's possible to search for a category with an empty title and zero or
more namespaces. Implement the namespace filter in this case using the
linter_namespace field. But ignore the namespace filter if there is no
category, since there is no index on linter_namespace alone.

Bug: T360865
Change-Id: I00934eaaf1a99e4098f177166b43069d33d9f137
2024-03-27 11:44:59 +11:00
sbailey dd0836d232 Implement multiple namespace selection for Linter filters
* Using namespacesmultiselect type in HTMLForm element to
   provide multiple namespace selection criteria in reports.

 * New namespace URL encoding implemented which matches namespace
   parameters against active namespaces to ensure parameter
   security and validation.

 * Test system updated to use new URL namespace encoding

Bug: T231161
Change-Id: Ic3190cffe259aecdea429c10e35122eabdbe10d4
2023-10-26 10:39:57 -07:00
Daimona Eaytoy 0ba87557d5 tests: Simplify creation of fixtures
Use the methods provided by MediaWikiIntegrationTestCase to more simply
create an existing test page. The manual user creation could be replaced
by getTestUser, but it turns out these tests don't really need a
specific account, so we can let IntegrationTestCase use whatever account
it wants to use.

Also remove @var annotations that can easily be inferred by the doc
comment on the relevant methods.

Change-Id: I8bfd0799b79721c7c9de0d2a10a97c498d192a15
2023-09-04 19:46:53 +02:00
Bartosz Dziewoński 8e5d85e6ee SpecialLintErrorsTest: Add @group Database
Several tests seem to require it.

Change-Id: I822e9185133edd6ab0e45dbd8d0d1cf17312e932
2023-09-04 19:21:08 +02:00
gerritbot e181c2ef66 Replace some moved Title class uses, now MediaWiki\Title\Title
Bug: T321681
Change-Id: Id325b25e154b8b2bbd1d0b1d7b1c7830b40873f6
2023-08-19 12:37:07 +00:00
gerritbot e4fafd1cf1 Update moved class FauxRequest
See T321882. Moved in I832b133aaf61ee

Bug: T321681
Change-Id: Ia110be1e079628b30fff5bfcc0a58b0cbf82a372
2023-05-19 10:24:44 +00:00
sbailey 6aa4cdeba9 Linter Tag and Template search feature, UI and report code
* Tag and Template search is enabled using config variable
   'LinterUserInterfaceTagAndTemplateStage' and also checks for
   the linter table column 'linter_tag' to exist to protect the
   report code from error if the column is absent. As the linter
   table alter maintenance added both the linter_tag and
   linter_template at the same time, there is no reason to check
   both. The user interface code does not check for the field
   presence only the config variable.

 * This code depends on the recordLintJob code writing the tag
   and template data which is enabled by the config variable
   'LinterWriteTagAndTemplateColumnsStage' and also assumes the
   data migration maintenance script migrateTagTemplate.php has
   been run to populate linter error records created prior to
   the table alter and the write code being enabled.

Bug: T175177
Change-Id: I2f951dfcd34e3dc6ca17e8754cfaeba8baa3e835
2023-02-27 06:55:06 -08:00
sbailey d12bf639f6 Change linter maintenance scripts to use existing config varaibles
* Having separate config variables to enable the maintenance
   migrateNamespace and migrateTagTemplate scripts is duplicitous
   and should be shared with the write enable config variables.

Bug: T329342
Change-Id: I4cb453fc0678b065cb42a2ca59863da1ab9cdbe4
2023-02-14 09:43:54 -08:00
sbailey 2768a70218 Fix migrate data error when params has excessively long strings
* The linter migrate code for linter_tag field and linter_template
   field are constrained by the database schema to 32 characters
   for the tag field and 255 characters for the template field.
   In some anomalous circumstances parsoid can report tag and or
   template fields in the linter_params object that exceed those
   character limits. This code truncates these excessively long
   strings to protect the database migrate update code from a
   length exceeded error.

Bug: T329113
Change-Id: I8af7c44759f172eae77d3519a6eac47110e9b1e7
2023-02-09 18:20:46 +00:00
sbailey 07046457f0 Fix write error when linter_params has excessively long strings
* The linter write code for linter_tag field and linter_template
   field are constrained by the database schema to 32 characters
   for the tag field and 255 characters for the template field.
   In some anomalous circumstances parsoid can report tag and or
   template fields in the linter_params object that exceed those
   character limits. This code truncates these anomalous strings
   to protect the database update code from a length exceeded
   error.

Bug: T328979
Change-Id: I057ae2e32a9e1a7735b5300409e5693e8db5c764
2023-02-08 10:40:12 -08:00
sbailey 350d677c5b Phase 3 of T175177: Migrate linter_params into new fields
* The migrate code is designed to perform a one-time update of
   linter_params JSON encoded template and tag information into
   the new discrete template and tag text fields for use as
   additional search criteria. The function can be restarted if
   it is interrupted.
 * It now uses configurable batching and sleep times between
   batches to allow the database to do other work and replication
   to occur without stressing infrastructure.
 * The migrate code is only called by test code and needs to be
   called one-time from a maintenance script.

Bug: T175177
Change-Id: Idc4ca88d4762bc7a3bcbc4e66c0f275562083867
2022-12-09 12:01:06 -08:00
sbailey 702ce215d0 Phase 3 migrate code for namespace column add to Linter table
* Migrates namespace info from the page tables page_namespace field
   to the new linter table field linter_namespace. This duplication
   of the namespace value was requested to greatly reduce the amount
   of database activity required by the linter search and reporting
   code.

 * This patch has been prepared as a dark launch patch enabled with
   config value LinterMigrateNamespaceStage and assumes that the
   Linter table has had the linter_namespace column added to it,
   and recording of the namespace field is already enabled and is
   populating the namespace column.

 * The migrate code now runnable from Linter/maintenance directory,
   using migrateNamespace.php, which will be deployed in a separate
   patch. The maintenance code creates an appropriate environment
   to call migrateNamespace( in Database.php.

Bug: T299612
Change-Id: I73cb80729d6a5a8716fe93164ad1e42e6958d672
2022-11-28 08:07:54 -08:00
Reedy 89d3f6152b Minor cleanup
Change-Id: I0b8abdbeaece73fe8759ee220b9a3aefce240e68
2022-09-07 02:48:18 +01:00
sbailey b358b20dca Second phase of T175177: Adds template and tag to RecordLintJob
Bug: T175177
Change-Id: I59be7cabb80ace98da3c7f6f36a0d3d4f6b17d23
2022-08-22 12:47:01 -07:00
Arlo Breault 8f043ce7e0 Disable flaky tests
These tests seem to be making false assumptions about the estimate
EXPLAIN returns.

Change-Id: I8ae90b2173aba5286727b9b85bdb67fbdfee1baf
2022-08-04 12:11:38 -04:00
sbailey 79e825a466 Provide search by title prefix for any category of lint error
Bug: T185685
Change-Id: Ib667fcf5b2b1e752fde297b32b8bbe37dceabc5a
2022-06-16 13:27:14 -07:00
sbailey 6925519cb5 Delete lint errors when blank page saved while changing content type
* Add "mw-blank" as another tag value that erases all lint errors
   for a page as a blank page cannot have any lint errors.

Bug: T280193
Change-Id: Iaad8ce75950588b2676de5dfb5f5221d64231f0e
2022-02-28 15:03:16 -08:00
sbailey 0e56c22277 Delete lint error records when content model changes from wikitext
* Determines if new content type is not wikitext and if so
   deletes all existing lint error records for that pageID.

Bug: T298343
Change-Id: I20fac9a0c901f3e7a5cc898566a4487fbe70798f
2022-02-25 13:22:10 -08:00
Subramanya Sastry 70ffca650e Drop 'inline-media-caption' lint requests
Bug: T297443
Bug: T299302
Change-Id: Id158f1fef8be06ddac733c71b7c1e26a58270955
2022-01-17 12:55:51 -06:00
Alexander Vorwerk 9a1ce6e392 Avoid using WikiPage::factory()
WikiPage::factory() is deprecated since 1.36 and should be replaced
with WikiPageFactory::newFromTitle().

Bug: T297688
Change-Id: I63bf3ba1c2ad6f8b59d369d91777af0418746a6b
2021-12-16 23:00:32 +00:00
Arlo Breault 2fa7a30f14 Remove hardcoded list of categories with no parameters
Change-Id: Ic8b9ced613c873cada0a9909ed0d3799160504a1
2021-12-15 17:27:36 -05:00
Alexander Vorwerk 2b3ca01871 MediaWikiTestCase -> MediaWikiIntegrationTestCase
MediaWikiTestCase has been renamed to MediaWikiIntegrationTestCase in 1.34.

Bug: T293043
Change-Id: I2e76733232bad0201a4e1e97617f5f7c1cf97235
2021-10-12 21:52:50 +02:00
sbailey cc2e08546b Fix broken RecordLintJobTest
* Adapted other core phpunit test user, title and page creation
   code to avoid creating a MOCK title such that the job runner
   finds the page(title) in the database and runs the job without
   hackery of populating the title in the constructor of
   RecordLinkJob. When the getForPage() runs, it finds the
   page and its lint errors through the standard code paths.

Bug: T225337
Change-Id: Ibb57523ee2f066c7bd0465c14f0dcb2bab51286b
2021-08-11 15:39:49 -07:00
Kunal Mehta 4f4b700fbd Fix off-by-one error around MAX_ACCURATE_COUNT
Currently we select 20 rows, and return the accurate count if it's less
than that, so up to 19 rows. Since we want to return an accurate count
if it's 20 rows or less, select one more row, 21, so we can differentiate
between only having 20 result rows or hitting the limit. This is the same
technique used in MediaWiki's Pager system.

Change-Id: I50fa96238eb4c7178414ee92c53799fd69520926
2021-08-06 13:05:29 -07:00
libraryupgrader 577a074b69 build: Updating composer dependencies
* mediawiki/mediawiki-codesniffer: 35.0.0 → 36.0.0
* php-parallel-lint/php-parallel-lint: 1.2.0 → 1.3.0

Change-Id: Ib1e2319da19d8c5589d1d41d3c0fe8f882792721
2021-05-05 06:09:03 +00:00
jenkins-bot 46aa330369 Merge "Make Linter category counts more accurate when counts are low" 2021-04-27 18:00:36 +00:00
sbailey 201b47e01d Make Linter category counts more accurate when counts are low
* The code now produces an accurate count if the number of
   errors for a category is below the threshold set by a
   public constant MAX_ACCURATE_COUNT (currently 20).
   The database record count limit was originally set to 1,
   to determine accurately, if there were actually 0 errors
   in a category as the estimate code would never report 0.
   If not 0, it would use the estimated count which does not
   produce an accurate count for any other number of errors.
   For low error counts this is annoying to editors and
   unnecessary. The additional CPU/disk activity to accurately
   check for low error counts is not significantly more than
   checking for 0 or 1, as checking for 0 likely requires
   a complete table scan which is probably expensive compared
   to a low count that early outs when it hits to record limit.

 * An improvement to consider is recording the accurate count in
   a separate tiny table, and maintaining an accurate count there
   which is used in preference to doing the select with row limit
   based on say a 30 second TTL, to prevent a stampede of requests
   from doing extraneous database operations.

 * Added unit test coverage for accurately counting low error
   conditions that are lower than the threshold and also verify
   that the estimate is inaccurate beyond the error count
   threshold.

Bug: T194872
Change-Id: I4f74cfe3bf9601baa0dc8fa6464a68030ac2bc4b
2021-04-27 10:38:24 -07:00
Kunal Mehta cb9329672a Update Legoktm's email address
Change-Id: Iceef061f4882b83661e5be6a931d85628b566f4c
2021-04-11 19:08:44 -07:00
DannyS712 830f879e22 Convert LintErrorTest to pure unit tests
No integration needed.

Requires bumping minimum version of mediawiki to when
MediaWikiUnitTestCase was introduced in 1.34.

Change-Id: Ibc0a1028cc61a7bdc149081aeaa1109de18ee119
2021-03-27 02:28:25 +00:00
Bartosz Dziewoński 11421eab59 Update for deprecations in PHPUnit
"Using assertContains() with string haystacks is
deprecated and will not be supported in PHPUnit 9.
Refactor your test to use assertStringContainsString()
or assertStringContainsStringIgnoringCase() instead."

Change-Id: I88df8a91660eb332a0ec87070eff31cfcf8c4955
2020-07-09 17:00:41 +02:00
libraryupgrader 210cada8e6 build: Updating mediawiki/mediawiki-codesniffer to 29.0.0
The following sniffs are failing and were disabled:
* MediaWiki.Commenting.FunctionComment.MissingDocumentationPrivate

Additional changes:
* Also sorted "composer fix" command to run phpcbf last.

Change-Id: Icdd0d0e60dd543921a5757162548ae149c3316ea
2020-01-10 10:06:28 +00:00
C. Scott Ananian 551a1fb398 Allow Parsoid to provide category ID hints
This eases deployment dependencies by allowing Parsoid to supply an
appropriate database category ID so that new lint categories can be
appropriately stored during the interval between adding a new lint
category to Parsoid and deploying an Extension:Linter patch to
describe it.

Change-Id: Ib7b2342168fa53ca2abac7d5f54fe313be341eb7
2019-12-03 23:26:34 -05:00
Max Semenik c87c38eb20 tests: getMock() is deprecated
Bug: T192167
Change-Id: I0513626d69ee7fbfac40f3d648865e7bb9e23421
2019-10-21 22:15:34 -07:00
Kunal Mehta 5bea96cb43 Mark RecordLintJobTest as Broken
Change-Id: I3fa2cd4049a3d4ba3065b56343e63ab0b093ee94
2019-06-07 12:27:38 -04:00