[BUGFIX] Links on external pages don't get indexed
Allows the crawler to start indexing a specific file like www.domain.tld/foobar.html instead of just www.domain.tld/ This is just about the comparison against the base URL and enables the Crawler to start crawling at e.g. a file that contains a manually generated list of links to follow. Before that change, even links to targets on the same domain were rejected by the checkUrl() method in case the base Url was pointing to some file instead of "/". This was because the base URL was then not part of the target URL. After stripping off any path from the base URL for this comparison this can now also be used to start crawling from a file. Change-Id: I2727a9a447754b88d2c279c24b32b5c3a2df26c0 Resolves: #16534 Releases: 6.2, 6.1, 6.0, 4.7, 4.5 Reviewed-on: https://review.typo3.org/6990 Reviewed-by: Michael Stucki Tested-by: Michael Stucki Reviewed-by: Georg Ringer Tested-by: Georg Ringer
Please register or sign in to comment