Commit graph

21 commits

Author SHA1 Message Date
wfan 8bbcae9355 Migrate MediaWiki.pdfhandler.shell.retrieve_meta_data.rate to statslib
Bug: T359370
Change-Id: I12f5bddf1007625fd4a8e62a7361239263d6749b
2024-05-05 14:01:53 +00:00
Derk-Jan Hartman f87fc5a6ad Improve logging for Pdf's retrieveMetadata.sh
- Don't check for file execution, but for command execution. This way
  pdfinfo and pdftext work without specifying the path
- Only pipe the stdout content of the commands to the outputfiles
- Exit as failure when the pdfinfo command is available, but it's
  execution failed
- Check and log the error output of retrieveMetadata.sh

Bug: T299521
Change-Id: Ia072469f4df6cce51793ab48823c7f4e4e13997b
2024-03-16 09:37:34 +00:00
Tim Starling 1e1e7ef135 Centralise configuration of Shellbox /bin/sh location
Use $wgShellboxShell, introduced for this purpose in
If41a2baada2e4e2462518c1f437af458feb29632.

Change-Id: Ic35598f26a868624a8b2e37ab064a6c0c27e392f
2024-01-19 11:36:47 +11:00
Umherirrender 44e2d41b8f build: Disable phan option scalar_implicit_cast and make pass
Change-Id: I30c11acd17334f292c7a83dba49cf59d182728f3
2021-09-26 12:18:33 +00:00
Umherirrender 85fbb12dbe Remove unneeded check for return value of explode
explode returns an array with one item,
but the empty string is already checked before the explode

Change-Id: I441309978b25754bad04eeba69993913de4d48c3
2021-09-26 11:46:08 +00:00
Kunal Mehta b253dc04c4 Port retrieveMetaData to BoxedCommand
Combine all 3 shellouts into one script, retrieveMetaData.sh.

The script is executed by /bin/sh by default, it can be changed for
Windows users by setting $wgPdfHandlerShell.

pdftotext is a bit special since it's behavior varies based on the
program's exit code, so save that in a file so we can check it
independently of the overall exit status.

Bug: T289228
Change-Id: I29750bcc282bd5f9b8e2f79aa340869738ea5f5b
2021-09-20 10:28:27 -07:00
Reedy 7d6a851e1d Code tweaks
* Some better variable names
* Remove some temporary variables

Change-Id: I71ac904c43449891e34769e6dfdc271fe91ef865
2021-09-19 01:56:26 +01:00
Tim Starling 86df1cd6c6 Fix broken PDF XMP extraction
XMP extraction does not work for me with libpoppler 0.86, because when
the output of the two commands is concatenated, there is no "Metadata:"
prefix introducing the XMP. It ends up splitting every line of the XML
on colon characters in attribute names, spamming lots of little
properties into the final result.

I can confirm that it's also broken in production.

So, just treat the output of pdfinfo -meta as plain XML.

Change-Id: Ia3df17daed0f27e95294b5d97872ec064c79965c
2021-06-11 15:57:04 +10:00
Tim Starling 989b42b8eb Use the new metadata splitting facility to improve PDF performance
* Migrate to the new metadata system: override getSizeAndMetadata()
* Use getHandlerState() instead of a custom property on the File object.
* Opt in to metadata splitting. Avoid loading the text item unless it is
  really needed.
* In getDimensionInfo(), use getHandlerState() instead of the
  WANObjectCache process cache (pcTTL). This is just a
  micro-optimisation, informed by profiling, which showed 90 calls to
  this function during an image page view.

Depends-On: I876ea5c9d3a1881e278f689d2f8a3ae20240c703
Change-Id: I30d0b0009fcb11c14d14663bd1f2c2a3dfac55d6
2021-06-11 15:56:57 +10:00
Reedy f375ff3bde Code cleanup
Change-Id: I8eaba727c73560eadb11ae471853e5cedc547809
2021-02-11 20:33:34 +00:00
Reedy 5f0b70972d Namespace extension
Change-Id: I1e80a32a71e4b15d38e1e91b866dbcca848f188c
2021-02-11 04:14:37 +00:00
C. Scott Ananian f0799bec7b Don't try to format pdf-specific metadata as numbers
Bug: T266677
Depends-On: I184a7976f2e63f2e70a87257d7749af688659c9d
Change-Id: I80ba13af986859f8f2d751d320a0fcfc73f1672c
2020-10-30 12:44:11 -04:00
Umherirrender 6bc6eff1e3 Revert explict casts and use implict casts as before
This reverts commit df484dbe70.

Bug: T242517
Change-Id: I60adf4aa64586d457a32cb220b1fcd7518d32a5e
2020-01-12 09:06:35 +00:00
libraryupgrader df484dbe70 build: Updating mediawiki/mediawiki-phan-config to 0.9.0
Additional changes:
* Added .eslintcache to .gitignore.

Change-Id: I51c91ac0d00d272a93162528a5ee16096def0881
2019-12-28 19:03:34 +00:00
Umherirrender 3cfaa49fef build: Updating mediawiki/mediawiki-phan-config to 0.8.0
Bug: T235049
Change-Id: Ie482803032eb2682b165525c4d418d89e64e43c5
2019-11-04 18:51:59 +01:00
Kunal Mehta 16abfa4af8 Upgrade to newer phan
Bug: T216935
Change-Id: I31b3dd55ffe1d6d5532d25081ac0b2c1ce467237
2019-03-16 22:17:42 -07:00
Umherirrender 8a7814ba47 Add method scope visibility
Change-Id: If65af857042ee67122b2bc623176efb177cbe0bb
2018-11-01 21:44:19 +01:00
Kunal Mehta 57b8c36e38 Use librarized XMPReader class and minor cleanup
* Use ::isSupported() instead of checking for a specific function manually
* Remove mention of the XMPGetInfo hook, which was removed in 4feb2ac7f2224d

Depends-On: Ic9044bf3260d1a474a6c74844949602441ffc865
Change-Id: I4333d427a2039aaffb897a1f41504b74d60c3c8b
2018-05-31 19:49:15 -07:00
Brion Vibber 8c345b2784 Fix for pdfinfo changes in poppler-utils 0.48
PDF metadata querying was done with pdfinfo's "-meta" and "-l" options
at the same time, which was supported in poppler 0.26 but not in
poppler 0.48.

Upstream change: https://bugs.freedesktop.org/show_bug.cgi?id=96801

Local change is to run the two as separate commands, then send the
output together into the existing processing. Should work with older
poppler-utils on Jessie as well as current one on Stretch.

Bug: T117839
Bug: T193200
Change-Id: Ib4ee9cf12ac04304c576087727eff5dc521ae751
2018-04-26 15:37:52 -07:00
Kunal Mehta 0a049abdcd Add phan configuration
This required updating ThumbnailImage constructors to the new call
signature.

Change-Id: Ia04d4dd523e1778992dcd5f45e9d3126649369c1
2018-02-24 16:43:55 -08:00
Kunal Mehta b89ddbca99 Move classes to includes/
Change-Id: I4ad03611ac644541903897276e8da37c3cfeed8b
2018-02-24 16:43:51 -08:00
Renamed from PdfHandler.image.php (Browse further)