Extensions
Parser Extensions
Forte's extension system lets you teach the parser new syntax. You can create custom tokens during lexing, custom AST nodes during tree building, or custom attribute prefixes inside HTML elements. Extensions integrate with the full parsing pipeline. The nodes they produce work with traversal, XPath, and rewriting just like built-in nodes.
#Extension Types
Forte provides three kinds of extension, each targeting a different phase of the parsing pipeline:
| Type | Phase | Purpose |
|---|---|---|
LexerExtension |
Tokenization | Recognize new syntax and emit custom tokens |
TreeExtension |
Tree building | Convert custom tokens into AST nodes |
AttributeExtension |
Attribute parsing | Recognize custom attribute prefixes inside HTML elements |
Most extensions need both lexing and tree building. Forte provides abstract base classes that combine these phases and reduce boilerplate:
| Class | Implements | Use when |
|---|---|---|
AbstractExtension |
Lexer + Tree | You need to tokenize new syntax and build AST nodes (most common) |
AbstractLexerExtension |
Lexer only | You only need custom tokens, not custom nodes |
AbstractTreeExtension |
Tree only | You consume tokens produced by another extension |
AbstractAttributeExtension |
Attribute | You need a custom attribute prefix like # or ... |
#Building a Hashtag Extension
The most common extension pattern tokenizes custom syntax and builds AST nodes from the resulting tokens. Here we'll build a #hashtag extension step by step using AbstractExtension.
#Defining the Node
Extension nodes should extend GenericNode so that XPath queries can discover them through the DOM mapper. The getDocumentContent method returns the source text covered by the node:
1<?php
2
3use Forte\Ast\GenericNode;
4
5class HashtagNode extends GenericNode
6{
7 public function hashtag(): string
8 {
9 return ltrim($this->getDocumentContent(), '#');
10 }
11}
#The Extension Class
Extending AbstractExtension requires four methods:
id(): a unique string identifier for the extensiontriggerCharacters(): which source characters cause the lexer to consult this extensionregisterTypes(TokenTypeRegistry): register custom token types with the lexerdoTokenize(LexerContext): the tokenization logic itself
Two additional methods are optional:
registerKinds(NodeKindRegistry): register custom node kinds with the tree builderdoHandle(TreeContext): custom tree-building logic (defaults to creating a node of the first registered kind)
1<?php
2
3use Forte\Extensions\AbstractExtension;
4use Forte\Lexer\Extension\LexerContext;
5use Forte\Lexer\Tokens\TokenTypeRegistry;
6use Forte\Parser\NodeKindRegistry;
7
8class HashtagExtension extends AbstractExtension
9{
10 private int $hashtagType;
11
12 public function id(): string
13 {
14 return 'hashtags';
15 }
16
17 public function triggerCharacters(): string
18 {
19 return '#';
20 }
21
22 protected function registerTypes(TokenTypeRegistry $registry): void
23 {
24 $this->hashtagType = $this->registerType($registry, 'Hashtag');
25 }
26
27 protected function registerKinds(NodeKindRegistry $registry): void
28 {
29 $this->registerKind($registry, 'Hashtag', HashtagNode::class);
30 }
31
32 protected function doTokenize(LexerContext $ctx): bool
33 {
34 if ($ctx->current() !== '#') {
35 return false;
36 }
37
38 $start = $ctx->position();
39 $ctx->advance(); // skip #
40
41 if (! ctype_alnum($ctx->current() ?? '')) {
42 return false;
43 }
44
45 while ($ctx->current() !== null && (ctype_alnum($ctx->current()) || $ctx->current() === '_')) {
46 $ctx->advance();
47 }
48
49 $ctx->emit($this->hashtagType, $start, $ctx->position());
50
51 return true;
52 }
53}
When doTokenize returns true, the lexer records the emitted token and advances past the consumed input. When it returns false, the lexer falls back to its built-in handling for the current position.
#Using the Extension
Pass the extension class to ParserOptions::withExtensions() and use the resulting options with Forte::parse(). Extension nodes appear alongside regular nodes and can be found with findAll:
1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6$options = ParserOptions::withExtensions(HashtagExtension::class);
7$doc = Forte::parse('Check out #php and #laravel!', $options);
8
9$hashtags = $doc->findAll(fn ($n) => $n instanceof HashtagNode);
10
11count($hashtags); // 2
12$hashtags[0]->hashtag(); // "php"
13$hashtags[1]->hashtag(); // "laravel"
Extension nodes preserve the original source content, so render() always reproduces the original template exactly:
1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6$template = 'Hello #world from #php!';
7$options = ParserOptions::withExtensions(HashtagExtension::class);
8$doc = Forte::parse($template, $options);
9
10$doc->render(); // "Hello #world from #php!"
#Registering Extensions
You register extensions through ParserOptions, which handles wiring them into the lexer and tree builder. You can pass either a class string (Forte instantiates it with new) or a pre-configured instance:
1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6// Static constructor
7$options = ParserOptions::withExtensions(HashtagExtension::class);
8
9$options->hasExtensions(); // true
10
11$doc = Forte::parse('#test', $options);
12$doc->findAll(fn ($n) => $n instanceof HashtagNode); // [HashtagNode]
The fluent extension() and extensions() methods work on any ParserOptions instance:
1<?php
2
3use Forte\Parser\ParserOptions;
4
5$options = ParserOptions::defaults()
6 ->extension(HashtagExtension::class)
7 ->extension(new AnotherExtension());
8
9// Or register multiple at once
10$options = ParserOptions::defaults()
11 ->extensions([HashtagExtension::class, AnotherExtension::class]);
#Building an Attribute Extension
Attribute extensions handle custom attribute prefixes inside HTML element tags. Use AbstractAttributeExtension when you need to teach Forte about a new attribute syntax like @click, #ref, or ...$spread.
#A Vue Event Extension
The extension below recognizes simple @event attributes. The shouldActivate method receives context positioned after the prefix character. Returning false lets the lexer fall through to default handling, which is critical for distinguishing @click from @ in email addresses:
1<?php
2
3use Forte\Extensions\AbstractAttributeExtension;
4use Forte\Lexer\Extension\AttributeLexerContext;
5
6class VueEventExtension extends AbstractAttributeExtension
7{
8 public function id(): string
9 {
10 return 'vue-event';
11 }
12
13 public function attributePrefix(): string
14 {
15 return '@';
16 }
17
18 public function shouldActivate(AttributeLexerContext $ctx): bool
19 {
20 $nextChar = $ctx->peek() ?? '';
21
22 return ctype_alpha($nextChar);
23 }
24}
Extension attributes are identified on the resulting element nodes with isExtensionAttribute() and extensionId():
1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6$options = ParserOptions::withExtensions(VueEventExtension::class);
7$doc = Forte::parse('<button @click="handleClick">', $options);
8
9$element = $doc->elements->first();
10$attr = $element->getAttributes()[0];
11
12$attr->isExtensionAttribute(); // true
13$attr->extensionId(); // "vue-event"
14$attr->rawName(); // "@click"
15$attr->valueText(); // "handleClick"
#Attribute Extension Methods
| Method | Required | Description |
|---|---|---|
id() |
Yes | Unique extension identifier |
attributePrefix() |
Yes | The character(s) that trigger this extension (e.g., @, #, ...) |
shouldActivate() |
No | Whether to activate for this specific position (default: true) |
acceptsValue() |
No | Whether this attribute type accepts ="value" (default: true) |
tokenizeAttributeName() |
No | Custom name tokenization logic |
buildAttributeNode() |
No | Custom node building logic |
#Configuration
The configure method merges new options with existing ones and returns the extension instance for fluent chaining. Use option to retrieve values with optional defaults:
1<?php
2
3$ext = new HashtagExtension();
4$ext->configure(['maxLength' => 20, 'strict' => true]);
5
6$ext->option('maxLength'); // 20
7$ext->option('strict'); // true
8$ext->option('missing', 'default'); // "default"
9$ext->hasOption('maxLength'); // true
10$ext->hasOption('missing'); // false
#Diagnostics
Extensions can report warnings, errors, and informational messages during tokenization using the warn, error, and info methods from HasDiagnostics. Retrieve them after parsing with getDiagnostics:
1<?php
2
3$ext = new DiagnosticExtension();
4$options = ParserOptions::withExtensions($ext);
5
6Forte::parse('some input', $options);
7
8$diagnostics = $ext->getDiagnostics();
9$diagnostics[0]->message; // "Warning message"
10$diagnostics[0]->isWarning(); // true
11
12$ext->clearDiagnostics();
13count($ext->getDiagnostics()); // 0
#Extension Registry
When you register extensions through ParserOptions, they are managed by an ExtensionRegistry that handles dependency resolution and conflict detection automatically.
#Dependencies and Conflicts
Extensions can declare dependencies via the dependencies() method. The registry performs a topological sort to ensure extensions initialize in the correct order. If a required dependency is missing, Forte throws a RuntimeException. Similarly, conflicts() declares extensions that cannot coexist:
1<?php
2
3use Forte\Extensions\AbstractExtension;
4
5class HighlightExtension extends AbstractExtension
6{
7 public function id(): string
8 {
9 return 'highlight';
10 }
11
12 public function dependencies(): array
13 {
14 return ['hashtags']; // requires HashtagExtension
15 }
16
17 // ...
18}
#Querying the Registry
Use getExtensionRegistry() on ParserOptions to inspect registered extensions:
1<?php
2
3use Forte\Parser\ParserOptions;
4
5$options = ParserOptions::withExtensions(HashtagExtension::class);
6$registry = $options->getExtensionRegistry();
7
8$registry->has('hashtags'); // true
9$registry->has('unknown'); // false
10
11$ext = $registry->get('hashtags');
12$ext->id(); // "hashtags"
#LexerContext Reference
The LexerContext object is passed to your doTokenize method and provides methods for inspecting and consuming source bytes:
| Method | Returns | Description |
|---|---|---|
position() |
int |
Current byte offset in the source |
current() |
?string |
Current byte, or null at end |
peek(int $offset) |
?string |
Byte at offset from current position |
matches(string $needle) |
bool |
Whether source matches needle at current position |
advance(int $bytes = 1) |
void |
Move forward by N bytes |
advanceUntil(string $char) |
string |
Advance until character found, return consumed text |
advancePast(string $needle) |
bool |
Advance past needle if it matches |
emit(int $type, int $start, int $end) |
void |
Emit a token of the given type |
substr(int $start, int $length) |
string |
Extract substring from source |
source() |
string |
Full source string |
#TreeContext Reference
The TreeContext object is passed to your doHandle method and provides methods for building AST nodes from the token stream:
| Method | Returns | Description |
|---|---|---|
position() |
int |
Current index in the token stream |
currentToken() |
?array |
Current token ({type, start, end}) or null |
peekToken(int $offset) |
?array |
Token at offset from current position |
tokenText(array $token) |
string |
Source text of a token |
addNode(int $kind, int $start, int $count) |
int |
Create a node and return its index |
addChild(int $nodeIndex) |
void |
Add node as child of current parent |
pushElement(int $nodeIndex) |
void |
Push node onto open element stack |
popElement() |
?int |
Pop current element from stack |
setNodeMeta(int $index, string $key, mixed $value) |
void |
Attach metadata during building |
source() |
string |
Full source string |
#See also
- Extension DOM Mapping: Query extension nodes with XPath
- Parser Options: Configure the parser and register extensions
- Diagnostics: Understanding Forte's diagnostic system
- Traversal: Navigate and query the document tree