* The key group should generally reflect a single logical key,
usually a single getWithSet call, with the rest being variants
of that key, e.g. not a namespace.
* Give it an explicit expiry. I noticed this issue by reviewing the
remaining data stored on WMF's mainstash-redis instances.
There were some old file-pdf:* keys there from long ago. They were
able to survive this long due to not having an expiry set.
Ref https://phabricator.wikimedia.org/T280586#8168908
Ref https://phabricator.wikimedia.org/T314453#8168858
Change-Id: Ibad24c0ac7d1e7f965227f49320814b96375858e
By default the mediabox is used. This is the full potential area of
pages, as also used by PDF editors and can contain areas outside of
the page.
The cropbox is also the size that is reported by pdfinfo as the
pagesize.
Bug: T167420
Change-Id: I92267a9dbe81b6e0e471b8eae1e4c2ba4e5d84e9
Used File::getMetadataItems is part of
I039785d5b6439d71dcc21dcb972177dba5c3a67d
Follow-Up: I30d0b0009fcb11c14d14663bd1f2c2a3dfac55d6
Change-Id: Iddacb31efc9ec9da3ba87c7c8cdac04a36a1af39
explode returns an array with one item,
but the empty string is already checked before the explode
Change-Id: I441309978b25754bad04eeba69993913de4d48c3
Combine all 3 shellouts into one script, retrieveMetaData.sh.
The script is executed by /bin/sh by default, it can be changed for
Windows users by setting $wgPdfHandlerShell.
pdftotext is a bit special since it's behavior varies based on the
program's exit code, so save that in a file so we can check it
independently of the overall exit status.
Bug: T289228
Change-Id: I29750bcc282bd5f9b8e2f79aa340869738ea5f5b
The variables are set to default values in extension.json, so
someone would have to manually set them to null values to trigger
this check. At that point, surely you'd just disable the extension.
A check like is_executable() might've made sense, however with the
introduction of Shellbox, it would be totally fine for the binaries
to not exist on the host where MediaWiki is running, but only in the
container. So just removing the check seems like the most
straightforward thing to do.
Bug: T289228
Change-Id: I5da0625959fdfa01c36c955c82320dbc591b3f23
These messages have not changed since they were originally defined
in f4f87ceb (2015). Instead of using a hook to register the module
at runtime, put it in extension.json with a note next to the warning
configuration to keep the messages in sync.
Change-Id: I135bf1a9f2cd59926a40cc565e5c8a2a6f2483c0
XMP extraction does not work for me with libpoppler 0.86, because when
the output of the two commands is concatenated, there is no "Metadata:"
prefix introducing the XMP. It ends up splitting every line of the XML
on colon characters in attribute names, spamming lots of little
properties into the final result.
I can confirm that it's also broken in production.
So, just treat the output of pdfinfo -meta as plain XML.
Change-Id: Ia3df17daed0f27e95294b5d97872ec064c79965c
* Migrate to the new metadata system: override getSizeAndMetadata()
* Use getHandlerState() instead of a custom property on the File object.
* Opt in to metadata splitting. Avoid loading the text item unless it is
really needed.
* In getDimensionInfo(), use getHandlerState() instead of the
WANObjectCache process cache (pcTTL). This is just a
micro-optimisation, informed by profiling, which showed 90 calls to
this function during an image page view.
Depends-On: I876ea5c9d3a1881e278f689d2f8a3ae20240c703
Change-Id: I30d0b0009fcb11c14d14663bd1f2c2a3dfac55d6
The feature is nonfunctional due to the page count always being zero
when the hook is called. The core feature $wgUploadThumbnailRenderMap
can be used as a replacement, after I add multipage support to it.
Bug: T284416
Change-Id: Id83a6a148f1ca12f1399b5e11951a9d80afb5c2d