[BUGFIX] Avoid double UTF-8 encoded PDF metadata in file indexer
There are different versions of pdfinfo available and used by different providers/distributions. a) Debian/Fedora use pdfinfo (>v20) from the poppler-utils package. Also hosters like Hetzner use this version. This variant defaults to UTF-8 output for metadata: https://linux.die.net/man/1/pdfinfo > -enc encoding-name Sets the encoding to use for text output. This defaults to "UTF-8". pdfinfo -v pdfinfo version 21.08.0 Copyright 2005-2021 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC b) Older servers and hosters with legacy software (Mittwald, Domainfactory) use pdfinfo v3. This one defaults to Latin1 output: https://www.xpdfreader.com/pdfinfo-man.html > −enc encoding-name > Sets the encoding to use for text output. […] > This defaults to "Latin1" pdfinfo -v pdfinfo version 3.02 Copyright 1996-2007 Glyph & Cog, LLC Both versions support an -enc UTF-8 option, which is nowused to circumvent the differences between these tools, instead of implying Latin1 output (as done in #80085) which breaks variant a) by interpreting valid UTF-8 as ISO-8859-1 and thus applying a double encoding. Resolves: #99352 Related: #80085 Releases: main, 11.5, 10.4 Change-Id: Ib8f7ae742c5edc73036afcb7d2608cd01f4176fd Reviewed-on: https://review.typo3.org/c/Packages/TYPO3.CMS/+/77081 Reviewed-by:Benni Mack <benni@typo3.org> Tested-by:
Benjamin Franzke <bfr@qbus.de> Tested-by:
Benni Mack <benni@typo3.org> Reviewed-by:
Stefan Bürk <stefan@buerk.tech> Tested-by:
Stefan Bürk <stefan@buerk.tech> Reviewed-by:
Benjamin Franzke <bfr@qbus.de> Tested-by:
core-ci <typo3@b13.com>
Please register or sign in to comment