- Don't check for file execution, but for command execution. This way
pdfinfo and pdftext work without specifying the path
- Only pipe the stdout content of the commands to the outputfiles
- Exit as failure when the pdfinfo command is available, but it's
execution failed
- Check and log the error output of retrieveMetadata.sh
Bug: T299521
Change-Id: Ia072469f4df6cce51793ab48823c7f4e4e13997b
* The key group should generally reflect a single logical key,
usually a single getWithSet call, with the rest being variants
of that key, e.g. not a namespace.
* Give it an explicit expiry. I noticed this issue by reviewing the
remaining data stored on WMF's mainstash-redis instances.
There were some old file-pdf:* keys there from long ago. They were
able to survive this long due to not having an expiry set.
Ref https://phabricator.wikimedia.org/T280586#8168908
Ref https://phabricator.wikimedia.org/T314453#8168858
Change-Id: Ibad24c0ac7d1e7f965227f49320814b96375858e
By default the mediabox is used. This is the full potential area of
pages, as also used by PDF editors and can contain areas outside of
the page.
The cropbox is also the size that is reported by pdfinfo as the
pagesize.
Bug: T167420
Change-Id: I92267a9dbe81b6e0e471b8eae1e4c2ba4e5d84e9
explode returns an array with one item,
but the empty string is already checked before the explode
Change-Id: I441309978b25754bad04eeba69993913de4d48c3
Combine all 3 shellouts into one script, retrieveMetaData.sh.
The script is executed by /bin/sh by default, it can be changed for
Windows users by setting $wgPdfHandlerShell.
pdftotext is a bit special since it's behavior varies based on the
program's exit code, so save that in a file so we can check it
independently of the overall exit status.
Bug: T289228
Change-Id: I29750bcc282bd5f9b8e2f79aa340869738ea5f5b
The variables are set to default values in extension.json, so
someone would have to manually set them to null values to trigger
this check. At that point, surely you'd just disable the extension.
A check like is_executable() might've made sense, however with the
introduction of Shellbox, it would be totally fine for the binaries
to not exist on the host where MediaWiki is running, but only in the
container. So just removing the check seems like the most
straightforward thing to do.
Bug: T289228
Change-Id: I5da0625959fdfa01c36c955c82320dbc591b3f23
These messages have not changed since they were originally defined
in f4f87ceb (2015). Instead of using a hook to register the module
at runtime, put it in extension.json with a note next to the warning
configuration to keep the messages in sync.
Change-Id: I135bf1a9f2cd59926a40cc565e5c8a2a6f2483c0
XMP extraction does not work for me with libpoppler 0.86, because when
the output of the two commands is concatenated, there is no "Metadata:"
prefix introducing the XMP. It ends up splitting every line of the XML
on colon characters in attribute names, spamming lots of little
properties into the final result.
I can confirm that it's also broken in production.
So, just treat the output of pdfinfo -meta as plain XML.
Change-Id: Ia3df17daed0f27e95294b5d97872ec064c79965c
* Migrate to the new metadata system: override getSizeAndMetadata()
* Use getHandlerState() instead of a custom property on the File object.
* Opt in to metadata splitting. Avoid loading the text item unless it is
really needed.
* In getDimensionInfo(), use getHandlerState() instead of the
WANObjectCache process cache (pcTTL). This is just a
micro-optimisation, informed by profiling, which showed 90 calls to
this function during an image page view.
Depends-On: I876ea5c9d3a1881e278f689d2f8a3ae20240c703
Change-Id: I30d0b0009fcb11c14d14663bd1f2c2a3dfac55d6
The feature is nonfunctional due to the page count always being zero
when the hook is called. The core feature $wgUploadThumbnailRenderMap
can be used as a replacement, after I add multipage support to it.
Bug: T284416
Change-Id: Id83a6a148f1ca12f1399b5e11951a9d80afb5c2d
Remove using of User::getDefaultOption since this method will be hard-deprecated. Now it is soft-deprecated
Bug: T276035
Change-Id: I6b489dc7236998bcfee6fa136167c3712757dd39
According to ghostscript developers the parameter -sstdout should
be after -sOutputFile.
Bug: T50007
Change-Id: I13fd25ada571aee9eb793cd6e195a04eb86bce63
Replacement with services made available in 1.28 and this extension
requires 1.32. So, the replacement is good.
Change-Id: I7939726f5a1d516f17e416bec1999faab95db806
* Use ::isSupported() instead of checking for a specific function manually
* Remove mention of the XMPGetInfo hook, which was removed in 4feb2ac7f2224d
Depends-On: Ic9044bf3260d1a474a6c74844949602441ffc865
Change-Id: I4333d427a2039aaffb897a1f41504b74d60c3c8b
PDF metadata querying was done with pdfinfo's "-meta" and "-l" options
at the same time, which was supported in poppler 0.26 but not in
poppler 0.48.
Upstream change: https://bugs.freedesktop.org/show_bug.cgi?id=96801
Local change is to run the two as separate commands, then send the
output together into the existing processing. Should work with older
poppler-utils on Jessie as well as current one on Stretch.
Bug: T117839
Bug: T193200
Change-Id: Ib4ee9cf12ac04304c576087727eff5dc521ae751