Commit graph

127 commits

Author SHA1 Message Date
jdlrobson 00b9f9fdc4 Restrict page images to lead section
When $wgPageImagesLeadSectionOnly is true restrict
the scoring of page images to images in the lead section.
This results in an additional parse of the lead section of the
content, but this should only happen once after an edit
has occurred and is deferred.

If false all images will be considered as candidates for
the page image choice.

Note that the choice between modes is per site not by
request. This is intentional to avoid having to store 4
different properties respecting license and
article position. As a result when enabling or disabling
this switch on existing setups, there will be a transitional
period where pages previously parsed will show pageimages
as calculated by the previous value of this config variable.

Bug: T87336
Change-Id: I09bdae82515f6e93f5606553259f10b3a10e9eaa
2016-12-08 10:41:21 -08:00
Erik Bernhardson 9b20854a59 Wrap waitForReplication in try/catch
Very few of these jobs seem to be finishing, due to some
replicas being so far behind we get DBReplicationWaitError
thrown, which causes the job to be restarted. Catch and log,
but don't stop processing because of it.

There is also a problem with jobs that blow out the memory limits,
but not sure what to do with that.

Change-Id: Idffcfad76936f5e62e9018c58f2cb57db35af4b8
2016-12-07 10:18:53 -08:00
Erik Bernhardson 03e14d0c86 Add job queue option for initImageData maintenance script
Trying to run this script in the cluster fatals out due to memory
problems somewhat regularly. The --start option helps to restart
it where it fell down, but when trying to run against hundreds of
wiki's that is a one-off solution that makes ensuring everything is
actually visited a pain.

To try and isolate errors add an option to push the parsing into the
job queue. There is still the possibility to miss pages, but job queue
retries should take care of us for the most part. Attempts to keep
load down on the databases by making sure no more than a specified
number of jobs are queued/processing at a given time.

Bug: T152155
Change-Id: I3a4e3a415b2f03de0bb36ac0515241e950130fde
2016-12-06 10:56:52 -08:00
Baha 9215a7f9d9 Choose the best image
Follow up to If6cbb82f01fa298945c615a2f25e972a9d767d58
Change-Id: I5e46985bfd22b2f96eb659a3d14137bdadbc4b99
2016-12-03 02:44:11 +00:00
jenkins-bot 7fa933cf00 Merge "Return any image, not just the non-free image" 2016-12-03 01:32:39 +00:00
Baha 250103022d Return any image, not just the non-free image
The mobileview API directly accesses the page image,
bypassing the PageImages API.

I1d35e965dc37c8c4ecdcc43313b3198e951e1978 fixes the issue for
the PageImages API.

This patch fixes the issue for the mobileview API.

Change-Id: If6cbb82f01fa298945c615a2f25e972a9d767d58
2016-12-02 20:29:35 -05:00
jenkins-bot 2cf50b6340 Merge "Convert to new array syntax" 2016-12-02 23:11:38 +00:00
rlot c45564e22f Implement "Help" link for the PageImages Api
Implemented getHelpUrls() function in ApiQueryPageImages.php

Bug: T97835
Change-Id: Ia6416bd67d92809422a54431764602da555236e9
2016-12-02 20:04:09 +01:00
Max Semenik 5145dd3d74 Convert to new array syntax
Change-Id: Iaec0c9ad47d28559adb8c46a82a00a61fba3602d
2016-12-01 16:49:13 -08:00
Erik Bernhardson f6f6bf00e1 Page images return non-free images by default
Page images was updated to have a split between the 'best' page
image, and the best free page image. Unfortunately the deployment
plan didn't take into account that the default 'free' would be
pointing to an unpopulated page prop, which will not be populated
until LinksUpdate has run for every page on every wiki which could
take weeks or months.

To restore some semblance of order, make the default point at the
currently populated field. A followup will need to be done to
populate the appropriate field.

Bug: T152155
Change-Id: I1d35e965dc37c8c4ecdcc43313b3198e951e1978
2016-12-01 23:54:14 +00:00
Brad Jorsch 26a2dab573 Update for API error i18n
See Iae0e2ce3.

Change-Id: Ib87f051a50f3073d5d4e60094036a307bfe198cb
2016-11-29 12:43:18 -05:00
Baha c34a838f73 Fix phpcs warnings and errors
Change-Id: I5c3c685f286fb379c7a1be3d483665cbb43ca803
2016-11-21 18:29:28 -05:00
Baha 382c70f981 Allow querying non-free images too
The API accepts a new query parameter `license`, whose
value can either be `free` or `any`. `free` is the default value.

When the value of `licence` is:
  * `free`, then only the best image whose copyright allows
    reusing it will be returned;
  * `any`, then the best image, regardless of its copyright
    status, will be returned.

Bug: T131105
Change-Id: I83ac5266e382d2d121aff3f7d28711787251c03b
2016-11-21 17:29:25 -05:00
Baha 76108cba37 Add phpunit tests
Bug: T131105
Change-Id: Ib774f18e62f050f48783e5ccccaedc23000533d2
2016-11-16 16:34:19 -05:00
Mukunda Modell c399c4156e Don't attempt to call methods on $file if it's not an object
This fixes a fatal which is blocking deployment of 1.29.0-wmf.1
refs T149059 fixes T149849

Bug: 149849
Change-Id: Ieb61585af3aa60b7af58597091151d0b494b2fd6
2016-11-03 18:36:32 +00:00
Reedy d52f94205f Convert PageImages to extension registration
Bug: T87953
Change-Id: Icc9096060899b9e401d53590f38865b0937a73ff
2016-10-16 20:45:54 +01:00
Thiemo Mättig 4fee70fc1f Remove not needed "return true" from hook handlers
* Do not "return true" in hook handlers. I was told it's good practice
  to either return false on failure, or nothing/null.
* Remove "static" from method that does not need to be static.
* Make some type hints more specific.
* Add missing imports. I wonder how this can work without the imports.
  My PHPStorm complains.

Change-Id: Ia0e980ff99f0e004d700b22dd07ff17f04bed4ec
2016-03-07 12:07:06 +01:00
Petr Matas 5a60c10b1a Thumbnails generated from SVG may be larger than original
Bug: T128278
Change-Id: I2b10ff28389fd9c99f0703d7ae4ee2d63e0bd000
2016-03-02 12:21:00 +00:00
Gergő Tisza 7c78ba622c Weigh images by copyright status
Fetch the extended metadata for every image, and if the NonFree key
(provided by CommonsMetadata) is present, rank down the image.

The check is run on LinksUpdate so changes to image copyright
templates won't have effect until the page is reparsed.

Unlike other weights, the work of extracting the weight factor from
the file/page is wholly done on LinksUpdate, instead of doing most
of it on parse and storing as ParserOutput extension metadata.
This is to avoid getting the article page parsing and the file
description page parsing conflicting with each other.

Bug: T124225
Change-Id: I21111ecbc80ded864806a2a93002f3254c3c68a9
2016-03-02 00:59:02 +00:00
Thiemo Mättig 0e8d85a333 Rename class to plural ParserFileProcessingHookHandlers
As discussed in I7fdc582.

Change-Id: I76ffd85323f2b6b8ad4a22ee4671f0bb5a95456d
2016-02-14 12:21:00 +01:00
jenkins-bot 9f09934913 Merge "Extract ParserMakeImageParams/AfterParserFetchFileAndTitle hook handlers" 2016-02-12 17:29:37 +00:00
Kunal Mehta 59afdff5d5 Don't pass default pref as fallback to User::getOption()
User::getOption() already handles that, and this code bypassed the hook
in User::getDefaultOptions().

Change-Id: I41f9df177988dffd62de0060cb691a97161729e4
2016-01-26 10:07:39 -08:00
Thiemo Mättig fe0da50344 Extract ParserMakeImageParams/AfterParserFetchFileAndTitle hook handlers
This patch only moves existing code around, but does not change any
implementation detail.

I found it very suprising that all code called by these two hook handlers
is 100% exclusive to these hook handlers. There is zero interaction
between these hook handlers code and all other code. Why is it in the
same file then? And why is it all static? It doesn't have to be. I
had to change literally nothing, except cutting and pasting, removing
all "static" and replacing all "self::..." with "$this->...". That's
all.

Change-Id: I7fdc582db425d3b95f7d02934b439eb9c102e712
2016-01-18 13:00:21 +01:00
Thiemo Mättig d9df7178a0 Extract LinksUpdateHookHandler to separate file
This patch only moves existing code around, but does not change any
implementation detail.

I found it very suprising that all code called by this hook handler
is 100% exclusive to this hook handler. There is zero interaction
between this hook handlers code and all other code. Why is it in the
same file then? And why is it all static? It doesn't have to be. I
had to change literally nothing, except cutting and pasting, removing
all "static" and replacing all "self::..." with "$this->...". That's
all.

Change-Id: I5ffe6fdf4e57135e6f3b32636c80f22be758607c
2015-12-30 13:40:48 +01:00
Thiemo Mättig 9b9d7d93c2 Author prefers full name in @author tag
Requested by the author in
https://gerrit.wikimedia.org/r/#/c/253327/1/includes/PageImages.php

Change-Id: I0e48ea1f3e7bc7098901562e7f048849c27e153a
2015-11-17 09:54:24 +01:00
Thiemo Mättig 369eda24b8 Add missing and fix wrong @license tags
Change-Id: I07f77735bb0050e2e59491d3c045b11705f3a08f
2015-11-16 15:59:34 +01:00
Thiemo Mättig 9b65ac4def Organize code in includes/maintenance directories
Change-Id: I3a243074049a5b8b212de5bcd1e341c36e4a13e0
2015-11-12 08:33:43 +01:00