From 390092d83f5122c2e64bbff26858e3078d60a85d Mon Sep 17 00:00:00 2001 From: Garvin Hicking <gh@faktor-e.de> Date: Thu, 23 Nov 2023 13:03:55 +0100 Subject: [PATCH] [BUGFIX] No "update storage index" FAL task fail with too many records MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Indexer builds a large array of all actual files on a storage (identifiedFileUids). If many files exists, this array can get very large. This array was then passed to a QueryBuilder to fetch all records NOT IN that array. Since a NOT IN query is passed as a string to the database, it can exceed the string size allowed in a query, making the whole task fail. Since a NOT IN query cannot be chunked easily, the whole logic has been adapted in a different way. Instead of fetching a restricted list of database records, all records are fetched and iterated. Even with a million of sys_file_records of a single (!) storage this will perform alright, and be within practical usage scenarios. Each database record is then checked for a match in the large array of known records, and then execute the same logic as before. To benchmark the implications, the following test was run: Baseline: * sys_file with 50.736 entries * 16.912 marked as missing * 33.824 marked as existing * Filesystem with 8.771 actual files Tested setup via a script which: * Resets to baseline sys_file storage * Executes scheduler task "File Abstraction Layer: Update storage index (scheduler)" * Flags 41.965 files as missing, 8.771 as found. Script execution was performed 50 times, and a mean avery was calculated, once once with the patch in place, once without. Old variant (using NOT IN query): 11.787 seconds New variant (fetching all records): 12.0544 seconds On top of being within the same performance level, using the new method, no database exception will be provoked (see ticket). Resolves: #102295 Releases: main, 12.4 Change-Id: Id998d7cd062fe75aac738b896bfb307b51f5cef8 Reviewed-on: https://review.typo3.org/c/Packages/TYPO3.CMS/+/82237 Tested-by: Stefan Bürk <stefan@buerk.tech> Reviewed-by: Stefan Bürk <stefan@buerk.tech> Tested-by: core-ci <typo3@b13.com> --- .../core/Classes/Resource/Index/Indexer.php | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/typo3/sysext/core/Classes/Resource/Index/Indexer.php b/typo3/sysext/core/Classes/Resource/Index/Indexer.php index 4a9f0877b8a6..cb8d48198666 100644 --- a/typo3/sysext/core/Classes/Resource/Index/Indexer.php +++ b/typo3/sysext/core/Classes/Resource/Index/Indexer.php @@ -160,17 +160,25 @@ class Indexer implements LoggerAwareInterface } /** - * Since by now all files in filesystem have been looked at it is save to assume, - * that files that are in indexed but not touched in this run are missing + * Since by now all files in filesystem have been looked at, it is safe to assume, + * that files that are indexed, but not touched in this run, are missing */ protected function detectMissingFiles() { - $indexedNotExistentFiles = $this->getFileIndexRepository()->findInStorageAndNotInUidList( + $allCurrentFiles = $this->getFileIndexRepository()->findInStorageAndNotInUidList( $this->storage, - $this->identifiedFileUids + [] ); - foreach ($indexedNotExistentFiles as $record) { + foreach ($allCurrentFiles as $record) { + // Check if the record retrieved from the database was associated + // with an existing file. + // If yes: All is good, file is in index and in database. + // If no: Database record may need to be marked as removed (extra check!) + if (in_array($record['uid'], $this->identifiedFileUids, true)) { + continue; + } + if (!$this->storage->hasFile($record['identifier'])) { $this->getFileIndexRepository()->markFileAsMissing($record['uid']); } -- GitLab