[!!!][FEATURE] New TypoScript parser
This adds a rewritten TypoScript syntax parser to ext:core. The heart of the existing parser dates back to Kasper's initial commit in 2003 and most parts have never been touched structurally. Even though various parts of TypoScript became less important in the Frontend rendering chain over time, it still plays a central point. Looking at the given parser approach, it became clear the project can benefit from a revamped parser based on modern PHP code. Goals: * A well structured, flexible and tested codebase. * Fix long standing syntax shenanigans and limits of the current parser approach and make the syntax interpretation more robust and resilient. * Have a better cache layer that kicks in more often to ultimately gain speed building the FE TypoScript. * Allow improving the Backend "Template" module to show and analyze more TypoScript details. * Have an object based structure for the final TypoScript tree, and allow an array export of it to keep backwards compatibility. The new implementation aims to fully substitute the old parser. This first patch however only brings the "library" part of needed code changes: There is no usage of the new parser whatsoever with this patch, except of course unit and functional tests. To understand the structure of the new parser, it might be helpful to have a rough understanding of the old parser. We'll look at the frontend parsing steps only, Backend usages in Template module are similar, though. When accessing a frontend page, TYPO3 at some points finds the requested page uid. It then builds the "rootline" of page records from the top page root node down to the requested page. This creates a list of page database records, mountpoints and similar are taken care of at this point, too. This rootline is fed to the old TypoScript parser. The parser now finds all relevant sys_template records attached to the rootline pages. These are the "entry" TypoScript snippets to parse. It then resolves all "includes": Snippets from sys_template include_static_file db records, snippets from various globals, snippets from sys_template basedOn, and so on. It also resolves @import and <INCLUDE_TYPOSCRIPT: syntax and substitutes them with the included content. All that is gathered in one huge string. The main parsing process then goes through this string and creates the TypoScript array, while taking care of conditions and constants substitution at the same time. This approach has various drawbacks: First, gathering everything into one main string forces to parse the entire string a-new for each and every page: A different page could have different includes or sys_template records attached. Secondly, parsing conditions and constants while creating the final array from the source string, ties "runtime" information (especially conditions) hard into the main parsing process, rendering an effective cache layer impossible. Third, the main strategy of gathering everything in one string leads to various funny details, for instance when opening a condition in one file and not closing it with [end] or [global], this state leaks to the next file. Bracket handling "}" has similar issues. These structural issues can't be changed by refactoring the given codebase, the only option is to rewrite the entire thing. IncludeTree: The entry point to the new parser is IncludeTree/TreeBuilder: This one again receives the pages "rootline" from Frontend. But instead of creating one huge string, it creates an object tree of includes. Each attached sys_template record is a child of "RootInclude" in the tree, and each sys_template include node can have children for further includes (like include_static_file). Single source snippets of each include are tokenized (see below) and analyzed for @import and <INCLUDE_TYPOSCRIPT: and conditions: If they exist, an include is marked as "split". Each part is then represented by child includes. We thus receive a tree of objects where imports are already resolved, and conditions are child nodes in the tree. The main advantage is this tree does not carry runtime information. A single snippet *always* leads to the same tree, no matter if a condition matched or not. This way, the tree can be cached and the full tree, or parts of it can be re-used when requesting a different page. To do that efficiently, a new cache "typoscript" is established to store a serialized representation of the tree as php file, which can be unserialized relatively quickly. Tokenizer: The tokenizer takes a single source snipped and creates a stream of Line objects from it, with tokens representing the source line. For instance, a TypoScript snipped like "foo = fooValue" creates a IdentifierAssignmentLine object, with a T_IDENTIFIER token for "foo" and a T_VALUE token for "fooValue". There are Line classes for all the different Lines (assignment, copy, condition, ...) and Tokens for the various details (T_VALUE, T_CONSTANT, ...). Tokens are sometimes encapsulated in TokenStreams, for example multiple identifiers "foo.bar = barValue" create a stream of the two T_IDENTIFIER tokens "foo" and "bar". This depends on the line type. We also have two different tokenizers: The Frontend uses the LossyTokenizer which creates a stream only of "relevant" tokens: Empty lines, comments and lines with invalid syntax are ignored. The LosslessTokenizer however creates a 1:1 representation of the source snipped, including token positions (line and start column). This allows detailed analysis in the Backend Template module. AstBuilder: This is the third part of the structure: A representation of parsed TypoScript as object tree. This is similar to the array we're dealing with now, but it can carry additional functionality. For instance, the well-known TypoScript array can be extracted from it. Using the AstBuilder works like this: Create the IncludeTree first (from cache). Next, traverse the IncludeTree with a visitor that looks at condition includes and sets an information if they match. Then, feed all includes that matched to the AstBuilder to create the object tree. The AstBuilder receives a LineStream of each include and extends the object tree depending on the line type. This means: IncludeTree building and source tokenizing can be cached in Frontend, applying constants and condition verdicts together with building the finally TypoScript object tree is a runtime step on each page. Separating tokenizing and AST building with a cache layer in between is roughly twice as fast in Frontend compared to the previous solution. The patch establishes these three main structures and comes with an extensive set of unit and functional tests. It is an extract of a bigger WIP patch that already contains usages of the new structure in Frontend and some parts of the Backend. The patch also adds a ReST file to explain some subtle TypoScript syntax changes. These will kick in as soon as we start using the structure with upcoming changes. The new parser should be rather robust already. Further changes that start using the new parser will probably only change minor things. For now, the entire structure is marked @internal, though: API is still the array representation, only. This will change later when we actively start using the TypoScript object tree in the core. Change-Id: I4047a878494078b6ac149553fa305c1d69329e37 Resolves: #97816 Resolves: #96503 Resolves: #90146 Resolves: #41327 Resolves: #76447 Releases: main Reviewed-on: https://review.typo3.org/c/Packages/TYPO3.CMS/+/74987 Tested-by:core-ci <typo3@b13.com> Tested-by:
Benni Mack <benni@typo3.org> Tested-by:
Stefan Bürk <stefan@buerk.tech> Tested-by:
Christian Kuhn <lolli@schwarzbu.ch> Reviewed-by:
Benni Mack <benni@typo3.org> Reviewed-by:
Stefan Bürk <stefan@buerk.tech> Reviewed-by:
Christian Kuhn <lolli@schwarzbu.ch>
Showing
- typo3/sysext/core/Classes/TypoScript/AST/AstBuilder.php 321 additions, 0 deletionstypo3/sysext/core/Classes/TypoScript/AST/AstBuilder.php
- typo3/sysext/core/Classes/TypoScript/AST/CurrentObjectPath/CurrentObjectPath.php 73 additions, 0 deletions...es/TypoScript/AST/CurrentObjectPath/CurrentObjectPath.php
- typo3/sysext/core/Classes/TypoScript/AST/CurrentObjectPath/CurrentObjectPathStack.php 63 additions, 0 deletions...poScript/AST/CurrentObjectPath/CurrentObjectPathStack.php
- typo3/sysext/core/Classes/TypoScript/AST/Node/AbstractChildNode.php 88 additions, 0 deletions...xt/core/Classes/TypoScript/AST/Node/AbstractChildNode.php
- typo3/sysext/core/Classes/TypoScript/AST/Node/AbstractNode.php 148 additions, 0 deletions.../sysext/core/Classes/TypoScript/AST/Node/AbstractNode.php
- typo3/sysext/core/Classes/TypoScript/AST/Node/ChildNode.php 25 additions, 0 deletionstypo3/sysext/core/Classes/TypoScript/AST/Node/ChildNode.php
- typo3/sysext/core/Classes/TypoScript/AST/Node/ChildNodeInterface.php 28 additions, 0 deletions...t/core/Classes/TypoScript/AST/Node/ChildNodeInterface.php
- typo3/sysext/core/Classes/TypoScript/AST/Node/NodeInterface.php 107 additions, 0 deletions...sysext/core/Classes/TypoScript/AST/Node/NodeInterface.php
- typo3/sysext/core/Classes/TypoScript/AST/Node/ReferenceChildNode.php 51 additions, 0 deletions...t/core/Classes/TypoScript/AST/Node/ReferenceChildNode.php
- typo3/sysext/core/Classes/TypoScript/AST/Node/RootNode.php 96 additions, 0 deletionstypo3/sysext/core/Classes/TypoScript/AST/Node/RootNode.php
- typo3/sysext/core/Classes/TypoScript/AST/Traverser/AstTraverser.php 73 additions, 0 deletions...xt/core/Classes/TypoScript/AST/Traverser/AstTraverser.php
- typo3/sysext/core/Classes/TypoScript/AST/Visitor/AstSortChildrenVisitor.php 43 additions, 0 deletions...Classes/TypoScript/AST/Visitor/AstSortChildrenVisitor.php
- typo3/sysext/core/Classes/TypoScript/AST/Visitor/AstVisitorInterface.php 36 additions, 0 deletions...re/Classes/TypoScript/AST/Visitor/AstVisitorInterface.php
- typo3/sysext/core/Classes/TypoScript/IncludeTree/IncludeNode/AbstractConditionInclude.php 83 additions, 0 deletions...ript/IncludeTree/IncludeNode/AbstractConditionInclude.php
- typo3/sysext/core/Classes/TypoScript/IncludeTree/IncludeNode/AbstractInclude.php 180 additions, 0 deletions...es/TypoScript/IncludeTree/IncludeNode/AbstractInclude.php
- typo3/sysext/core/Classes/TypoScript/IncludeTree/IncludeNode/AtImportInclude.php 29 additions, 0 deletions...es/TypoScript/IncludeTree/IncludeNode/AtImportInclude.php
- typo3/sysext/core/Classes/TypoScript/IncludeTree/IncludeNode/ConditionElseInclude.php 39 additions, 0 deletions...poScript/IncludeTree/IncludeNode/ConditionElseInclude.php
- typo3/sysext/core/Classes/TypoScript/IncludeTree/IncludeNode/ConditionInclude.php 31 additions, 0 deletions...s/TypoScript/IncludeTree/IncludeNode/ConditionInclude.php
- typo3/sysext/core/Classes/TypoScript/IncludeTree/IncludeNode/ConditionIncludeTyposcriptInclude.php 30 additions, 0 deletions...udeTree/IncludeNode/ConditionIncludeTyposcriptInclude.php
- typo3/sysext/core/Classes/TypoScript/IncludeTree/IncludeNode/DefaultTypoScriptInclude.php 28 additions, 0 deletions...ript/IncludeTree/IncludeNode/DefaultTypoScriptInclude.php
Please register or sign in to comment