Extensions

Parser Extensions

Forte's extension system lets you teach the parser new syntax. You can create custom tokens during lexing, custom AST nodes during tree building, or custom attribute prefixes inside HTML elements. Extensions integrate with the full parsing pipeline. The nodes they produce work with traversal, XPath, and rewriting just like built-in nodes.

#Extension Types

Forte provides three kinds of extension, each targeting a different phase of the parsing pipeline:

Type Phase Purpose
LexerExtension Tokenization Recognize new syntax and emit custom tokens
TreeExtension Tree building Convert custom tokens into AST nodes
AttributeExtension Attribute parsing Recognize custom attribute prefixes inside HTML elements

Most extensions need both lexing and tree building. Forte provides abstract base classes that combine these phases and reduce boilerplate:

Class Implements Use when
AbstractExtension Lexer + Tree You need to tokenize new syntax and build AST nodes (most common)
AbstractLexerExtension Lexer only You only need custom tokens, not custom nodes
AbstractTreeExtension Tree only You consume tokens produced by another extension
AbstractAttributeExtension Attribute You need a custom attribute prefix like # or ...

#Building a Hashtag Extension

The most common extension pattern tokenizes custom syntax and builds AST nodes from the resulting tokens. Here we'll build a #hashtag extension step by step using AbstractExtension.

#Defining the Node

Extension nodes should extend GenericNode so that XPath queries can discover them through the DOM mapper. The getDocumentContent method returns the source text covered by the node:

1<?php
2
3use Forte\Ast\GenericNode;
4
5class HashtagNode extends GenericNode
6{
7 public function hashtag(): string
8 {
9 return ltrim($this->getDocumentContent(), '#');
10 }
11}

#The Extension Class

Extending AbstractExtension requires four methods:

  • id(): a unique string identifier for the extension
  • triggerCharacters(): which source characters cause the lexer to consult this extension
  • registerTypes(TokenTypeRegistry): register custom token types with the lexer
  • doTokenize(LexerContext): the tokenization logic itself

Two additional methods are optional:

  • registerKinds(NodeKindRegistry): register custom node kinds with the tree builder
  • doHandle(TreeContext): custom tree-building logic (defaults to creating a node of the first registered kind)
1<?php
2
3use Forte\Extensions\AbstractExtension;
4use Forte\Lexer\Extension\LexerContext;
5use Forte\Lexer\Tokens\TokenTypeRegistry;
6use Forte\Parser\NodeKindRegistry;
7
8class HashtagExtension extends AbstractExtension
9{
10 private int $hashtagType;
11
12 public function id(): string
13 {
14 return 'hashtags';
15 }
16
17 public function triggerCharacters(): string
18 {
19 return '#';
20 }
21
22 protected function registerTypes(TokenTypeRegistry $registry): void
23 {
24 $this->hashtagType = $this->registerType($registry, 'Hashtag');
25 }
26
27 protected function registerKinds(NodeKindRegistry $registry): void
28 {
29 $this->registerKind($registry, 'Hashtag', HashtagNode::class);
30 }
31
32 protected function doTokenize(LexerContext $ctx): bool
33 {
34 if ($ctx->current() !== '#') {
35 return false;
36 }
37
38 $start = $ctx->position();
39 $ctx->advance(); // skip #
40
41 if (! ctype_alnum($ctx->current() ?? '')) {
42 return false;
43 }
44
45 while ($ctx->current() !== null && (ctype_alnum($ctx->current()) || $ctx->current() === '_')) {
46 $ctx->advance();
47 }
48
49 $ctx->emit($this->hashtagType, $start, $ctx->position());
50
51 return true;
52 }
53}

When doTokenize returns true, the lexer records the emitted token and advances past the consumed input. When it returns false, the lexer falls back to its built-in handling for the current position.

#Using the Extension

Pass the extension class to ParserOptions::withExtensions() and use the resulting options with Forte::parse(). Extension nodes appear alongside regular nodes and can be found with findAll:

1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6$options = ParserOptions::withExtensions(HashtagExtension::class);
7$doc = Forte::parse('Check out #php and #laravel!', $options);
8
9$hashtags = $doc->findAll(fn ($n) => $n instanceof HashtagNode);
10
11count($hashtags); // 2
12$hashtags[0]->hashtag(); // "php"
13$hashtags[1]->hashtag(); // "laravel"

Extension nodes preserve the original source content, so render() always reproduces the original template exactly:

1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6$template = 'Hello #world from #php!';
7$options = ParserOptions::withExtensions(HashtagExtension::class);
8$doc = Forte::parse($template, $options);
9
10$doc->render(); // "Hello #world from #php!"

#Registering Extensions

You register extensions through ParserOptions, which handles wiring them into the lexer and tree builder. You can pass either a class string (Forte instantiates it with new) or a pre-configured instance:

1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6// Static constructor
7$options = ParserOptions::withExtensions(HashtagExtension::class);
8
9$options->hasExtensions(); // true
10
11$doc = Forte::parse('#test', $options);
12$doc->findAll(fn ($n) => $n instanceof HashtagNode); // [HashtagNode]

The fluent extension() and extensions() methods work on any ParserOptions instance:

1<?php
2
3use Forte\Parser\ParserOptions;
4
5$options = ParserOptions::defaults()
6 ->extension(HashtagExtension::class)
7 ->extension(new AnotherExtension());
8
9// Or register multiple at once
10$options = ParserOptions::defaults()
11 ->extensions([HashtagExtension::class, AnotherExtension::class]);

#Building an Attribute Extension

Attribute extensions handle custom attribute prefixes inside HTML element tags. Use AbstractAttributeExtension when you need to teach Forte about a new attribute syntax like @click, #ref, or ...$spread.

#A Vue Event Extension

The extension below recognizes simple @event attributes. The shouldActivate method receives context positioned after the prefix character. Returning false lets the lexer fall through to default handling, which is critical for distinguishing @click from @ in email addresses:

1<?php
2
3use Forte\Extensions\AbstractAttributeExtension;
4use Forte\Lexer\Extension\AttributeLexerContext;
5
6class VueEventExtension extends AbstractAttributeExtension
7{
8 public function id(): string
9 {
10 return 'vue-event';
11 }
12
13 public function attributePrefix(): string
14 {
15 return '@';
16 }
17
18 public function shouldActivate(AttributeLexerContext $ctx): bool
19 {
20 $nextChar = $ctx->peek() ?? '';
21
22 return ctype_alpha($nextChar);
23 }
24}

Extension attributes are identified on the resulting element nodes with isExtensionAttribute() and extensionId():

1<?php
2
3use Forte\Facades\Forte;
4use Forte\Parser\ParserOptions;
5
6$options = ParserOptions::withExtensions(VueEventExtension::class);
7$doc = Forte::parse('<button @click="handleClick">', $options);
8
9$element = $doc->elements->first();
10$attr = $element->getAttributes()[0];
11
12$attr->isExtensionAttribute(); // true
13$attr->extensionId(); // "vue-event"
14$attr->rawName(); // "@click"
15$attr->valueText(); // "handleClick"

#Attribute Extension Methods

Method Required Description
id() Yes Unique extension identifier
attributePrefix() Yes The character(s) that trigger this extension (e.g., @, #, ...)
shouldActivate() No Whether to activate for this specific position (default: true)
acceptsValue() No Whether this attribute type accepts ="value" (default: true)
tokenizeAttributeName() No Custom name tokenization logic
buildAttributeNode() No Custom node building logic

#Configuration

The configure method merges new options with existing ones and returns the extension instance for fluent chaining. Use option to retrieve values with optional defaults:

1<?php
2
3$ext = new HashtagExtension();
4$ext->configure(['maxLength' => 20, 'strict' => true]);
5
6$ext->option('maxLength'); // 20
7$ext->option('strict'); // true
8$ext->option('missing', 'default'); // "default"
9$ext->hasOption('maxLength'); // true
10$ext->hasOption('missing'); // false

#Diagnostics

Extensions can report warnings, errors, and informational messages during tokenization using the warn, error, and info methods from HasDiagnostics. Retrieve them after parsing with getDiagnostics:

1<?php
2
3$ext = new DiagnosticExtension();
4$options = ParserOptions::withExtensions($ext);
5
6Forte::parse('some input', $options);
7
8$diagnostics = $ext->getDiagnostics();
9$diagnostics[0]->message; // "Warning message"
10$diagnostics[0]->isWarning(); // true
11
12$ext->clearDiagnostics();
13count($ext->getDiagnostics()); // 0

#Extension Registry

When you register extensions through ParserOptions, they are managed by an ExtensionRegistry that handles dependency resolution and conflict detection automatically.

#Dependencies and Conflicts

Extensions can declare dependencies via the dependencies() method. The registry performs a topological sort to ensure extensions initialize in the correct order. If a required dependency is missing, Forte throws a RuntimeException. Similarly, conflicts() declares extensions that cannot coexist:

1<?php
2
3use Forte\Extensions\AbstractExtension;
4
5class HighlightExtension extends AbstractExtension
6{
7 public function id(): string
8 {
9 return 'highlight';
10 }
11
12 public function dependencies(): array
13 {
14 return ['hashtags']; // requires HashtagExtension
15 }
16
17 // ...
18}

#Querying the Registry

Use getExtensionRegistry() on ParserOptions to inspect registered extensions:

1<?php
2
3use Forte\Parser\ParserOptions;
4
5$options = ParserOptions::withExtensions(HashtagExtension::class);
6$registry = $options->getExtensionRegistry();
7
8$registry->has('hashtags'); // true
9$registry->has('unknown'); // false
10
11$ext = $registry->get('hashtags');
12$ext->id(); // "hashtags"

#LexerContext Reference

The LexerContext object is passed to your doTokenize method and provides methods for inspecting and consuming source bytes:

Method Returns Description
position() int Current byte offset in the source
current() ?string Current byte, or null at end
peek(int $offset) ?string Byte at offset from current position
matches(string $needle) bool Whether source matches needle at current position
advance(int $bytes = 1) void Move forward by N bytes
advanceUntil(string $char) string Advance until character found, return consumed text
advancePast(string $needle) bool Advance past needle if it matches
emit(int $type, int $start, int $end) void Emit a token of the given type
substr(int $start, int $length) string Extract substring from source
source() string Full source string

#TreeContext Reference

The TreeContext object is passed to your doHandle method and provides methods for building AST nodes from the token stream:

Method Returns Description
position() int Current index in the token stream
currentToken() ?array Current token ({type, start, end}) or null
peekToken(int $offset) ?array Token at offset from current position
tokenText(array $token) string Source text of a token
addNode(int $kind, int $start, int $count) int Create a node and return its index
addChild(int $nodeIndex) void Add node as child of current parent
pushElement(int $nodeIndex) void Push node onto open element stack
popElement() ?int Pop current element from stack
setNodeMeta(int $index, string $key, mixed $value) void Attach metadata during building
source() string Full source string

#See also