4.8 KiB
AST generation and tokenization
AST generation
To generate AST (abstract syntax tree) for your JavaScript code just write:
$source = "var a = 1"; //JavaScript code
$ast = Peast\Peast::latest($source, $options)->parse();
The previous code generates this structure:
Peast\Syntax\Node\Program
getSourceType() => "script"
getBody() => array(
Peast\Syntax\Node\VariableDeclaration
getKind() => "var"
getDeclarations() => array(
Peast\Syntax\Node\VariableDeclarator
getId() => Peast\Syntax\Node\Identifier
getName() => "a"
getInit() => Peast\Syntax\Node\NumericLiteral
getFormat() => "decimal"
getValue() => 1
)
)
Tokenization
To tokenize your JavaScript code just write:
$source = "var a = 1"; //JavaScript code
$tokens = Peast\Peast::latest($source, $options)->tokenize();
This function produces an array of tokens from your code:
array(
Peast\Syntax\Token
getType() => "Keyword"
getValue() => "var"
Peast\Syntax\Token
getType() => "Identifier"
getValue() => "a"
Peast\Syntax\Token
getType() => "Punctuator"
getValue() => "="
Peast\Syntax\Token
getType() => "Numeric"
getValue() => "1"
)
EcmaScript version
Peast can parse different versions of EcmaScript, you can choose the version by using the relative method on the main class. Available methods are:
Peast::ES2015(source, options)
orPeast::ES6(source, options)
: parse using EcmaScript 2015 (ES6) syntaxPeast::ES2016(source, options)
orPeast::ES7(source, options)
: parse using EcmaScript 2016 (ES7) syntaxPeast::ES2017(source, options)
orPeast::ES8(source, options)
: parse using EcmaScript 2017 (ES8) syntaxPeast::ES2018(source, options)
orPeast::ES9(source, options)
: parse using EcmaScript 2018 (ES9) syntaxPeast::ES2019(source, options)
orPeast::ES10(source, options)
: parse using EcmaScript 2019 (ES10) syntaxPeast::ES2020(source, options)
orPeast::ES11(source, options)
: parse using EcmaScript 2020 (ES11) syntaxPeast::ES2021(source, options)
orPeast::ES12(source, options)
: parse using EcmaScript 2021 (ES12) syntaxPeast::ES2022(source, options)
orPeast::ES13(source, options)
: parse using EcmaScript 2022 (ES13) syntaxPeast::ES2023(source, options)
orPeast::ES14(source, options)
: parse using EcmaScript 2023 (ES14) syntaxPeast::latest(source, options)
: parse using the latest EcmaScript syntax version implemented
Options
In the examples above you may have noticed the $options
parameter. This parameter is an associative array that specifies parsing settings for the parser. Available options are:
- "sourceType": this can be one of the source type constants defined in the Peast class:
Peast\Peast::SOURCE_TYPE_SCRIPT
: this is the default source type and indicates that the code is a script, this means thatimport
andexport
keywords are not parsedPeast\Peast::SOURCE_TYPE_MODULE
: this indicates that the code is a module and it activates the parsing ofimport
andexport
keywords
- "comments" (from version 1.5): enables comments parsing and attaches the comments to the nodes in the tree. You can get comments attached to nodes using
getLeadingComments
andgetTrailingComments
methods. - "jsx" (from version 1.8): enables parsing of JSX syntax.
- "sourceEncoding": to specify the encoding of the code to parse, if not specified the parser will assume UTF-8.
- "strictEncoding": if false the parser will handle invalid UTF8 characters in the source code by replacing them with the character defined in the "mbstring.substitute_character" ini setting, otherwise it will throw an exception. (available from version 1.9.4)
Differences from ESTree
There is only one big difference from ESTree: parenthesized expressions. This type of expressions have been introduced to let the user know if when an expression is wrapped in round brackets. For example (a + b)
is a parenthesized expression and generates a ParenthesizedExpression node.
From version 1.3, literals have their own classes: StringLiteral
, NumericLiteral
, BooleanLiteral
and NullLiteral
.
From version 1.8, when parsing JSX, 2 new token types are emitted: JSXIdentifier
, that represents a valid JSX identifier, and JSXText
, that represents text inside JSX elements and fragments.
From version 1.13.7, the new rawName
property has been added to Identifiers
nodes. This property reports the raw name of the identifier with unconverted unicode escape sequences.