This code was left over and forgotten about from my gsoc project
back in 2010 (made minor modifications)
(note about message named: they start with "exif-" to be compatible
built in metadata code, which uses that prefix to be compatible with
the older messages).
Change-Id: I9e546d9e6ae9a60604c9dd1633cb2225c9d1109d
* use UtfNormal::cleanUp() for UTF-8 and control char cleanup instead of iconv() and a manual strip
* remove the htmlspecialchars() which looks like it shouldn't be here; this is for internal data storage not HTML output
* Use a nice simple PHP array instead of constructing unnecessary XML. This removes the dependency on PHP 5.1.3 for a SimpleXML method.
* Tell pdfinfo to give us metadata encoded in UTF-8. If we start outputting title and creator info this will be nice!
* Tell pdfinfo to give us page size information for all pages (at least through page 99999 :) rather than just the first page
* Make use of that per-page size information so we can properly render pages of differing size. Without this, they get stretched or squooshed in interesting days.
* Rename the pdf_no_xml message to pdf_no_metadata (in English)
* require PHP 5.1.3 (for SimpleXMLElement::addChild())
* Properly escape the filename before passing it to the shell
* arrays created by explode() start at 0