Extensions
Parser Extensions
Forte's extension system lets you teach the parser new syntax. You can create custom tokens during lexing, custom AST nodes during tree building, or custom attribute prefixes inside HTML elements. Extensions integrate with the full parsing pipeline. The nodes they produce work with traversal, XPath, and rewriting just like built-in nodes.
#Extension Types
Forte provides three kinds of extension, each targeting a different phase of the parsing pipeline:
| Type | Phase | Purpose |
|---|---|---|
LexerExtension |
Tokenization | Recognize new syntax and emit custom tokens |
TreeExtension |
Tree building | Convert custom tokens into AST nodes |
AttributeExtension |
Attribute parsing | Recognize custom attribute prefixes inside HTML elements |
Most extensions need both lexing and tree building. Forte provides abstract base classes that combine these phases and reduce boilerplate:
| Class | Implements | Use when |
|---|---|---|
AbstractExtension |
Lexer + Tree | You need to tokenize new syntax and build AST nodes (most common) |
AbstractLexerExtension |
Lexer only | You only need custom tokens, not custom nodes |
AbstractTreeExtension |
Tree only | You consume tokens produced by another extension |
AbstractAttributeExtension |
Attribute | You need a custom attribute prefix like # or ... |
#Building a Hashtag Extension
The most common extension pattern tokenizes custom syntax and builds AST nodes from the resulting tokens. Here we'll build a #hashtag extension step by step using AbstractExtension.
#Defining the Node
Extension nodes should extend GenericNode so that XPath queries can discover them through the DOM mapper. The getDocumentContent method returns the source text covered by the node:
<?php
use Forte\Ast\GenericNode;
class HashtagNode extends GenericNode
{
public function hashtag(): string
{
return ltrim($this->getDocumentContent(), '#');
}
}
#The Extension Class
Extending AbstractExtension requires four methods:
id(): a unique string identifier for the extensiontriggerCharacters(): which source characters cause the lexer to consult this extensionregisterTypes(TokenTypeRegistry): register custom token types with the lexerdoTokenize(LexerContext): the tokenization logic itself
Two additional methods are optional:
registerKinds(NodeKindRegistry): register custom node kinds with the tree builderdoHandle(TreeContext): custom tree-building logic (defaults to creating a node of the first registered kind)
<?php
use Forte\Extensions\AbstractExtension;
use Forte\Lexer\Extension\LexerContext;
use Forte\Lexer\Tokens\TokenTypeRegistry;
use Forte\Parser\NodeKindRegistry;
class HashtagExtension extends AbstractExtension
{
private int $hashtagType;
public function id(): string
{
return 'hashtags';
}
public function triggerCharacters(): string
{
return '#';
}
protected function registerTypes(TokenTypeRegistry $registry): void
{
$this->hashtagType = $this->registerType($registry, 'Hashtag');
}
protected function registerKinds(NodeKindRegistry $registry): void
{
$this->registerKind($registry, 'Hashtag', HashtagNode::class);
}
protected function doTokenize(LexerContext $ctx): bool
{
if ($ctx->current() !== '#') {
return false;
}
$start = $ctx->position();
$ctx->advance(); // skip #
if (! ctype_alnum($ctx->current() ?? '')) {
return false;
}
while ($ctx->current() !== null && (ctype_alnum($ctx->current()) || $ctx->current() === '_')) {
$ctx->advance();
}
$ctx->emit($this->hashtagType, $start, $ctx->position());
return true;
}
}
When doTokenize returns true, the lexer records the emitted token and advances past the consumed input. When it returns false, the lexer falls back to its built-in handling for the current position.
#Using the Extension
Pass the extension class to ParserOptions::withExtensions() and use the resulting options with Forte::parse(). Extension nodes appear alongside regular nodes and can be found with findAll:
<?php
use Forte\Facades\Forte;
use Forte\Parser\ParserOptions;
$options = ParserOptions::withExtensions(HashtagExtension::class);
$doc = Forte::parse('Check out #php and #laravel!', $options);
$hashtags = $doc->findAll(fn ($n) => $n instanceof HashtagNode);
count($hashtags); // 2
$hashtags[0]->hashtag(); // "php"
$hashtags[1]->hashtag(); // "laravel"
Extension nodes preserve the original source content, so render() always reproduces the original template exactly:
<?php
use Forte\Facades\Forte;
use Forte\Parser\ParserOptions;
$template = 'Hello #world from #php!';
$options = ParserOptions::withExtensions(HashtagExtension::class);
$doc = Forte::parse($template, $options);
$doc->render(); // "Hello #world from #php!"
#Registering Extensions
You register extensions through ParserOptions, which handles wiring them into the lexer and tree builder. You can pass either a class string (Forte instantiates it with new) or a pre-configured instance:
<?php
use Forte\Facades\Forte;
use Forte\Parser\ParserOptions;
// Static constructor
$options = ParserOptions::withExtensions(HashtagExtension::class);
$options->hasExtensions(); // true
$doc = Forte::parse('#test', $options);
$doc->findAll(fn ($n) => $n instanceof HashtagNode); // [HashtagNode]
The fluent extension() and extensions() methods work on any ParserOptions instance:
<?php
use Forte\Parser\ParserOptions;
$options = ParserOptions::defaults()
->extension(HashtagExtension::class)
->extension(new AnotherExtension());
// Or register multiple at once
$options = ParserOptions::defaults()
->extensions([HashtagExtension::class, AnotherExtension::class]);
#Building an Attribute Extension
Attribute extensions handle custom attribute prefixes inside HTML element tags. Use AbstractAttributeExtension when you need to teach Forte about a new attribute syntax like @click, #ref, or ...$spread.
#A Vue Event Extension
The extension below recognizes simple @event attributes. The shouldActivate method receives context positioned after the prefix character. Returning false lets the lexer fall through to default handling, which is critical for distinguishing @click from @ in email addresses:
<?php
use Forte\Extensions\AbstractAttributeExtension;
use Forte\Lexer\Extension\AttributeLexerContext;
class VueEventExtension extends AbstractAttributeExtension
{
public function id(): string
{
return 'vue-event';
}
public function attributePrefix(): string
{
return '@';
}
public function shouldActivate(AttributeLexerContext $ctx): bool
{
$nextChar = $ctx->peek() ?? '';
return ctype_alpha($nextChar);
}
}
Extension attributes are identified on the resulting element nodes with isExtensionAttribute() and extensionId():
<?php
use Forte\Facades\Forte;
use Forte\Parser\ParserOptions;
$options = ParserOptions::withExtensions(VueEventExtension::class);
$doc = Forte::parse('<button @click="handleClick">', $options);
$element = $doc->elements->first();
$attr = $element->getAttributes()[0];
$attr->isExtensionAttribute(); // true
$attr->extensionId(); // "vue-event"
$attr->rawName(); // "@click"
$attr->valueText(); // "handleClick"
#Attribute Extension Methods
| Method | Required | Description |
|---|---|---|
id() |
Yes | Unique extension identifier |
attributePrefix() |
Yes | The character(s) that trigger this extension (e.g., @, #, ...) |
shouldActivate() |
No | Whether to activate for this specific position (default: true) |
acceptsValue() |
No | Whether this attribute type accepts ="value" (default: true) |
tokenizeAttributeName() |
No | Custom name tokenization logic |
buildAttributeNode() |
No | Custom node building logic |
#Configuration
The configure method merges new options with existing ones and returns the extension instance for fluent chaining. Use option to retrieve values with optional defaults:
<?php
$ext = new HashtagExtension();
$ext->configure(['maxLength' => 20, 'strict' => true]);
$ext->option('maxLength'); // 20
$ext->option('strict'); // true
$ext->option('missing', 'default'); // "default"
$ext->hasOption('maxLength'); // true
$ext->hasOption('missing'); // false
#Diagnostics
Extensions can report warnings, errors, and informational messages during tokenization using the warn, error, and info methods from HasDiagnostics. Retrieve them after parsing with getDiagnostics:
<?php
$ext = new DiagnosticExtension();
$options = ParserOptions::withExtensions($ext);
Forte::parse('some input', $options);
$diagnostics = $ext->getDiagnostics();
$diagnostics[0]->message; // "Warning message"
$diagnostics[0]->isWarning(); // true
$ext->clearDiagnostics();
count($ext->getDiagnostics()); // 0
#Extension Registry
When you register extensions through ParserOptions, they are managed by an ExtensionRegistry that handles dependency resolution and conflict detection automatically.
#Dependencies and Conflicts
Extensions can declare dependencies via the dependencies() method. The registry performs a topological sort to ensure extensions initialize in the correct order. If a required dependency is missing, Forte throws a RuntimeException. Similarly, conflicts() declares extensions that cannot coexist:
<?php
use Forte\Extensions\AbstractExtension;
class HighlightExtension extends AbstractExtension
{
public function id(): string
{
return 'highlight';
}
public function dependencies(): array
{
return ['hashtags']; // requires HashtagExtension
}
// ...
}
#Querying the Registry
Use getExtensionRegistry() on ParserOptions to inspect registered extensions:
<?php
use Forte\Parser\ParserOptions;
$options = ParserOptions::withExtensions(HashtagExtension::class);
$registry = $options->getExtensionRegistry();
$registry->has('hashtags'); // true
$registry->has('unknown'); // false
$ext = $registry->get('hashtags');
$ext->id(); // "hashtags"
#LexerContext Reference
The LexerContext object is passed to your doTokenize method and provides methods for inspecting and consuming source bytes:
| Method | Returns | Description |
|---|---|---|
position() |
int |
Current byte offset in the source |
current() |
?string |
Current byte, or null at end |
peek(int $offset) |
?string |
Byte at offset from current position |
matches(string $needle) |
bool |
Whether source matches needle at current position |
advance(int $bytes = 1) |
void |
Move forward by N bytes |
advanceUntil(string $char) |
string |
Advance until character found, return consumed text |
advancePast(string $needle) |
bool |
Advance past needle if it matches |
emit(int $type, int $start, int $end) |
void |
Emit a token of the given type |
substr(int $start, int $length) |
string |
Extract substring from source |
source() |
string |
Full source string |
#TreeContext Reference
The TreeContext object is passed to your doHandle method and provides methods for building AST nodes from the token stream:
| Method | Returns | Description |
|---|---|---|
position() |
int |
Current index in the token stream |
currentToken() |
?array |
Current token ({type, start, end}) or null |
peekToken(int $offset) |
?array |
Token at offset from current position |
tokenText(array $token) |
string |
Source text of a token |
addNode(int $kind, int $start, int $count) |
int |
Create a node and return its index |
addChild(int $nodeIndex) |
void |
Add node as child of current parent |
pushElement(int $nodeIndex) |
void |
Push node onto open element stack |
popElement() |
?int |
Pop current element from stack |
setNodeMeta(int $index, string $key, mixed $value) |
void |
Attach metadata during building |
source() |
string |
Full source string |
#See also
- Extension DOM Mapping: Query extension nodes with XPath
- Parser Options: Configure the parser and register extensions
- Diagnostics: Understanding Forte's diagnostic system
- Traversal: Navigate and query the document tree