Commit graph

43 commits

Author SHA1 Message Date
wfan 8bbcae9355 Migrate MediaWiki.pdfhandler.shell.retrieve_meta_data.rate to statslib
Bug: T359370
Change-Id: I12f5bddf1007625fd4a8e62a7361239263d6749b
2024-05-05 14:01:53 +00:00
Derk-Jan Hartman f87fc5a6ad Improve logging for Pdf's retrieveMetadata.sh
- Don't check for file execution, but for command execution. This way
  pdfinfo and pdftext work without specifying the path
- Only pipe the stdout content of the commands to the outputfiles
- Exit as failure when the pdfinfo command is available, but it's
  execution failed
- Check and log the error output of retrieveMetadata.sh

Bug: T299521
Change-Id: Ia072469f4df6cce51793ab48823c7f4e4e13997b
2024-03-16 09:37:34 +00:00
jenkins-bot a9945906c7 Merge "Centralise configuration of Shellbox /bin/sh location" 2024-01-19 01:32:33 +00:00
Tim Starling 1e1e7ef135 Centralise configuration of Shellbox /bin/sh location
Use $wgShellboxShell, introduced for this purpose in
If41a2baada2e4e2462518c1f437af458feb29632.

Change-Id: Ic35598f26a868624a8b2e37ab064a6c0c27e392f
2024-01-19 11:36:47 +11:00
Umherirrender 11a07899fa Use namespaced PoolCounterWorkViaCallback class
This requires 1.42 for some new names

Done automatically via script

Change-Id: Ia272567b9ca6d178d35910b2129491b803ed7f65
2024-01-05 22:41:23 +01:00
Timo Tijhof a43a11136a PdfHandler: Set cache with clearer key group and finite expiry
* The key group should generally reflect a single logical key,
  usually a single getWithSet call, with the rest being variants
  of that key, e.g. not a namespace.

* Give it an explicit expiry. I noticed this issue by reviewing the
  remaining data stored on WMF's mainstash-redis instances.

  There were some old file-pdf:* keys there from long ago. They were
  able to survive this long due to not having an expiry set.

  Ref https://phabricator.wikimedia.org/T280586#8168908
  Ref https://phabricator.wikimedia.org/T314453#8168858

Change-Id: Ibad24c0ac7d1e7f965227f49320814b96375858e
2022-09-02 15:18:23 +02:00
Derk-Jan Hartman b846970ae2 Use the PDF cropbox for rendering
By default the mediabox is used. This is the full potential area of
pages, as also used by PDF editors and can contain areas outside of
the page.
The cropbox is also the size that is reported by pdfinfo as the
pagesize.

Bug: T167420
Change-Id: I92267a9dbe81b6e0e471b8eae1e4c2ba4e5d84e9
2022-06-15 18:39:35 +00:00
Umherirrender 42c1ee7c0a build: Remove unneeded suppression after ImageHandler::normaliseParams
Depends-On: I8d14e49340d30c56642422cd88169019dd2c4286
Change-Id: Ie4561840fe70e0ef92e8eec1c6edc335f0da6c1f
2022-04-06 21:32:32 +02:00
zoranzoki21 68b4048e0e Fix excluded PHPCS error MediaWiki.Commenting.FunctionComment.MissingDocumentationProtected
Change-Id: Iba99ebfc6e96bdfbd61f2a8151c54f1e50272d26
2022-03-06 13:23:43 +00:00
Umherirrender 44e2d41b8f build: Disable phan option scalar_implicit_cast and make pass
Change-Id: I30c11acd17334f292c7a83dba49cf59d182728f3
2021-09-26 12:18:33 +00:00
Umherirrender 85fbb12dbe Remove unneeded check for return value of explode
explode returns an array with one item,
but the empty string is already checked before the explode

Change-Id: I441309978b25754bad04eeba69993913de4d48c3
2021-09-26 11:46:08 +00:00
Kunal Mehta b253dc04c4 Port retrieveMetaData to BoxedCommand
Combine all 3 shellouts into one script, retrieveMetaData.sh.

The script is executed by /bin/sh by default, it can be changed for
Windows users by setting $wgPdfHandlerShell.

pdftotext is a bit special since it's behavior varies based on the
program's exit code, so save that in a file so we can check it
independently of the overall exit status.

Bug: T289228
Change-Id: I29750bcc282bd5f9b8e2f79aa340869738ea5f5b
2021-09-20 10:28:27 -07:00
Reedy 7d6a851e1d Code tweaks
* Some better variable names
* Remove some temporary variables

Change-Id: I71ac904c43449891e34769e6dfdc271fe91ef865
2021-09-19 01:56:26 +01:00
Kunal Mehta bbb15eb87c Remove questionable PdfHandler::isEnabled() implementation
The variables are set to default values in extension.json, so
someone would have to manually set them to null values to trigger
this check. At that point, surely you'd just disable the extension.

A check like is_executable() might've made sense, however with the
introduction of Shellbox, it would be totally fine for the binaries
to not exist on the host where MediaWiki is running, but only in the
container. So just removing the check seems like the most
straightforward thing to do.

Bug: T289228
Change-Id: I5da0625959fdfa01c36c955c82320dbc591b3f23
2021-09-03 01:07:49 -07:00
Kunal Mehta 2328f802b1 Move pdfhandler.messages module to extension.json
These messages have not changed since they were originally defined
in f4f87ceb (2015). Instead of using a hook to register the module
at runtime, put it in extension.json with a note next to the warning
configuration to keep the messages in sync.

Change-Id: I135bf1a9f2cd59926a40cc565e5c8a2a6f2483c0
2021-09-03 00:59:19 -07:00
jenkins-bot 4e38d0ed99 Merge "Remove $wgPdfCreateThumbnailsInJobQueue" 2021-06-29 00:47:10 +00:00
Tim Starling 86df1cd6c6 Fix broken PDF XMP extraction
XMP extraction does not work for me with libpoppler 0.86, because when
the output of the two commands is concatenated, there is no "Metadata:"
prefix introducing the XMP. It ends up splitting every line of the XML
on colon characters in attribute names, spamming lots of little
properties into the final result.

I can confirm that it's also broken in production.

So, just treat the output of pdfinfo -meta as plain XML.

Change-Id: Ia3df17daed0f27e95294b5d97872ec064c79965c
2021-06-11 15:57:04 +10:00
Tim Starling 989b42b8eb Use the new metadata splitting facility to improve PDF performance
* Migrate to the new metadata system: override getSizeAndMetadata()
* Use getHandlerState() instead of a custom property on the File object.
* Opt in to metadata splitting. Avoid loading the text item unless it is
  really needed.
* In getDimensionInfo(), use getHandlerState() instead of the
  WANObjectCache process cache (pcTTL). This is just a
  micro-optimisation, informed by profiling, which showed 90 calls to
  this function during an image page view.

Depends-On: I876ea5c9d3a1881e278f689d2f8a3ae20240c703
Change-Id: I30d0b0009fcb11c14d14663bd1f2c2a3dfac55d6
2021-06-11 15:56:57 +10:00
Tim Starling 06fec68e08 Remove $wgPdfCreateThumbnailsInJobQueue
The feature is nonfunctional due to the page count always being zero
when the hook is called. The core feature $wgUploadThumbnailRenderMap
can be used as a replacement, after I add multipage support to it.

Bug: T284416
Change-Id: Id83a6a148f1ca12f1399b5e11951a9d80afb5c2d
2021-06-07 14:56:34 +10:00
libraryupgrader d82137564f build: Updating dependencies
composer:
* mediawiki/mediawiki-codesniffer: 35.0.0 → 36.0.0
  The following sniffs now pass and were enabled:
  * MediaWiki.Commenting.PropertyDocumentation.MissingDocumentationPublic

* php-parallel-lint/php-parallel-lint: 1.2.0 → 1.3.0

npm:
* grunt: 1.3.0 → 1.4.0
* lodash: 4.17.19 → 4.17.21
  * https://npmjs.com/advisories/1673 (CVE-2021-23337)

Change-Id: I1afc4814a0d4645c4ff08f9c6845f0bbc2353900
2021-05-12 14:12:43 +00:00
Umherirrender 5740d02758 Use ::class for class name
This works also for non-existing classes,
because it is resolved on compile time

Change-Id: I789df6994b22529a8bc09123369dae0f1b52d565
2021-04-08 21:02:46 +02:00
vladshapik c73192e23d Avoid using User::getDefaultOption
Remove using of User::getDefaultOption since this method will be hard-deprecated. Now it is soft-deprecated

Bug: T276035
Change-Id: I6b489dc7236998bcfee6fa136167c3712757dd39
2021-03-01 12:25:47 +02:00
Reedy f375ff3bde Code cleanup
Change-Id: I8eaba727c73560eadb11ae471853e5cedc547809
2021-02-11 20:33:34 +00:00
Reedy 5f0b70972d Namespace extension
Change-Id: I1e80a32a71e4b15d38e1e91b866dbcca848f188c
2021-02-11 04:14:37 +00:00
libraryupgrader 85b38761df build: Updating mediawiki/mediawiki-phan-config to 0.10.4
Change-Id: Id8810e6914149d0163aa28ee0eb99cd68b7752ce
2020-11-20 13:24:35 +00:00
C. Scott Ananian f0799bec7b Don't try to format pdf-specific metadata as numbers
Bug: T266677
Depends-On: I184a7976f2e63f2e70a87257d7749af688659c9d
Change-Id: I80ba13af986859f8f2d751d320a0fcfc73f1672c
2020-10-30 12:44:11 -04:00
Reedy e52d6252ea Fix PSR12.Properties.ConstantVisibility.NotFound
Bug: T253169
Change-Id: I99eeade79dc53eb0b7f00f4304ec0526524943b6
2020-05-20 00:36:48 +01:00
Umherirrender e0d1ad38f3 Use MediaWikiServices::getRepoGroup
This required MediaWiki 1.34

Change-Id: I5d2a2078755a63d6209aaa3d884c7f05718ee819
2020-03-14 14:46:27 +01:00
jenkins-bot 7f04c083c2 Merge "Send ghostscript errors to stderr instead of stdout" 2020-03-12 20:27:37 +00:00
libraryupgrader d082c93d2d build: Updating mediawiki/mediawiki-codesniffer to 29.0.0
Additional changes:
* Also sorted "composer fix" command to run phpcbf last.

Change-Id: I62df724a7c5a7d01dc02c4be7c43c953a9687f69
2020-01-14 09:01:59 +00:00
Umherirrender 6bc6eff1e3 Revert explict casts and use implict casts as before
This reverts commit df484dbe70.

Bug: T242517
Change-Id: I60adf4aa64586d457a32cb220b1fcd7518d32a5e
2020-01-12 09:06:35 +00:00
libraryupgrader df484dbe70 build: Updating mediawiki/mediawiki-phan-config to 0.9.0
Additional changes:
* Added .eslintcache to .gitignore.

Change-Id: I51c91ac0d00d272a93162528a5ee16096def0881
2019-12-28 19:03:34 +00:00
Umherirrender 3cfaa49fef build: Updating mediawiki/mediawiki-phan-config to 0.8.0
Bug: T235049
Change-Id: Ie482803032eb2682b165525c4d418d89e64e43c5
2019-11-04 18:51:59 +01:00
Seb35 3182cba012 Send ghostscript errors to stderr instead of stdout
According to ghostscript developers the parameter -sstdout should
be after -sOutputFile.

Bug: T50007
Change-Id: I13fd25ada571aee9eb793cd6e195a04eb86bce63
2019-10-22 16:32:17 +02:00
Derick Alangi 51185ca9cb Avoid usage of deprecated ObjectCache::getMainWANInstance()
Replacement with services made available in 1.28 and this extension
requires 1.32. So, the replacement is good.

Change-Id: I7939726f5a1d516f17e416bec1999faab95db806
2019-07-03 14:32:32 +01:00
Kunal Mehta 16abfa4af8 Upgrade to newer phan
Bug: T216935
Change-Id: I31b3dd55ffe1d6d5532d25081ac0b2c1ce467237
2019-03-16 22:17:42 -07:00
Reedy 0affe889db Update MediaWiki namespaced AtEase global functions
Change-Id: Icf009b46a2c63ebba6ff94ddd66c5b27129a3a4f
2019-02-13 00:24:51 +00:00
Umherirrender 8a7814ba47 Add method scope visibility
Change-Id: If65af857042ee67122b2bc623176efb177cbe0bb
2018-11-01 21:44:19 +01:00
Kunal Mehta 57b8c36e38 Use librarized XMPReader class and minor cleanup
* Use ::isSupported() instead of checking for a specific function manually
* Remove mention of the XMPGetInfo hook, which was removed in 4feb2ac7f2224d

Depends-On: Ic9044bf3260d1a474a6c74844949602441ffc865
Change-Id: I4333d427a2039aaffb897a1f41504b74d60c3c8b
2018-05-31 19:49:15 -07:00
Brion Vibber 8c345b2784 Fix for pdfinfo changes in poppler-utils 0.48
PDF metadata querying was done with pdfinfo's "-meta" and "-l" options
at the same time, which was supported in poppler 0.26 but not in
poppler 0.48.

Upstream change: https://bugs.freedesktop.org/show_bug.cgi?id=96801

Local change is to run the two as separate commands, then send the
output together into the existing processing. Should work with older
poppler-utils on Jessie as well as current one on Stretch.

Bug: T117839
Bug: T193200
Change-Id: Ib4ee9cf12ac04304c576087727eff5dc521ae751
2018-04-26 15:37:52 -07:00
Kunal Mehta 484c07ca62 Fix PhanTypeMismatchArgumentInternal error
Change-Id: I46fa1de4fa55add4e19db15d39233ec3d73eab5e
2018-02-24 17:57:17 -08:00
Kunal Mehta 0a049abdcd Add phan configuration
This required updating ThumbnailImage constructors to the new call
signature.

Change-Id: Ia04d4dd523e1778992dcd5f45e9d3126649369c1
2018-02-24 16:43:55 -08:00
Kunal Mehta b89ddbca99 Move classes to includes/
Change-Id: I4ad03611ac644541903897276e8da37c3cfeed8b
2018-02-24 16:43:51 -08:00