xarxaprod-wp-theme/vendor/mck89/peast/doc/ast-and-tokenization.md

95 lines
4.8 KiB
Markdown
Raw Permalink Normal View History

2024-01-09 16:13:20 +01:00
AST generation and tokenization
==========
AST generation
-------------
To generate AST (abstract syntax tree) for your JavaScript code just write:
```php
$source = "var a = 1"; //JavaScript code
$ast = Peast\Peast::latest($source, $options)->parse();
```
The previous code generates this structure:
```
Peast\Syntax\Node\Program
getSourceType() => "script"
getBody() => array(
Peast\Syntax\Node\VariableDeclaration
getKind() => "var"
getDeclarations() => array(
Peast\Syntax\Node\VariableDeclarator
getId() => Peast\Syntax\Node\Identifier
getName() => "a"
getInit() => Peast\Syntax\Node\NumericLiteral
getFormat() => "decimal"
getValue() => 1
)
)
```
Tokenization
-------------
To tokenize your JavaScript code just write:
```php
$source = "var a = 1"; //JavaScript code
$tokens = Peast\Peast::latest($source, $options)->tokenize();
```
This function produces an array of tokens from your code:
```
array(
Peast\Syntax\Token
getType() => "Keyword"
getValue() => "var"
Peast\Syntax\Token
getType() => "Identifier"
getValue() => "a"
Peast\Syntax\Token
getType() => "Punctuator"
getValue() => "="
Peast\Syntax\Token
getType() => "Numeric"
getValue() => "1"
)
```
EcmaScript version
-------------
Peast can parse different versions of EcmaScript, you can choose the version by using the relative method on the main class.
Available methods are:
* ```Peast::ES2015(source, options)``` or ```Peast::ES6(source, options)```: parse using EcmaScript 2015 (ES6) syntax
* ```Peast::ES2016(source, options)``` or ```Peast::ES7(source, options)```: parse using EcmaScript 2016 (ES7) syntax
* ```Peast::ES2017(source, options)``` or ```Peast::ES8(source, options)```: parse using EcmaScript 2017 (ES8) syntax
* ```Peast::ES2018(source, options)``` or ```Peast::ES9(source, options)```: parse using EcmaScript 2018 (ES9) syntax
* ```Peast::ES2019(source, options)``` or ```Peast::ES10(source, options)```: parse using EcmaScript 2019 (ES10) syntax
* ```Peast::ES2020(source, options)``` or ```Peast::ES11(source, options)```: parse using EcmaScript 2020 (ES11) syntax
* ```Peast::ES2021(source, options)``` or ```Peast::ES12(source, options)```: parse using EcmaScript 2021 (ES12) syntax
* ```Peast::ES2022(source, options)``` or ```Peast::ES13(source, options)```: parse using EcmaScript 2022 (ES13) syntax
* ```Peast::ES2023(source, options)``` or ```Peast::ES14(source, options)```: parse using EcmaScript 2023 (ES14) syntax
* ```Peast::latest(source, options)```: parse using the latest EcmaScript syntax version implemented
Options
-------------
In the examples above you may have noticed the `$options` parameter. This parameter is an associative array that specifies parsing settings for the parser. Available options are:
* "sourceType": this can be one of the source type constants defined in the Peast class:
* `Peast\Peast::SOURCE_TYPE_SCRIPT`: this is the default source type and indicates that the code is a script, this means that `import` and `export` keywords are not parsed
* `Peast\Peast::SOURCE_TYPE_MODULE`: this indicates that the code is a module and it activates the parsing of `import` and `export` keywords
* "comments" (from version 1.5): enables comments parsing and attaches the comments to the nodes in the tree. You can get comments attached to nodes using `getLeadingComments` and `getTrailingComments` methods.
* "jsx" (from version 1.8): enables parsing of JSX syntax.
* "sourceEncoding": to specify the encoding of the code to parse, if not specified the parser will assume UTF-8.
* "strictEncoding": if false the parser will handle invalid UTF8 characters in the source code by replacing them with the character defined in the "mbstring.substitute_character" ini setting, otherwise it will throw an exception. (available from version 1.9.4)
Differences from ESTree
-------------
There is only one big difference from ESTree: parenthesized expressions. This type of expressions have been introduced to let the user know if when an expression is wrapped in round brackets. For example `(a + b)` is a parenthesized expression and generates a ParenthesizedExpression node.
From version 1.3, literals have their own classes: `StringLiteral`, `NumericLiteral`, `BooleanLiteral` and `NullLiteral`.
From version 1.8, when parsing JSX, 2 new token types are emitted: `JSXIdentifier`, that represents a valid JSX identifier, and `JSXText`, that represents text inside JSX elements and fragments.
From version 1.13.7, the new `rawName` property has been added to `Identifiers` nodes. This property reports the raw name of the identifier with unconverted unicode escape sequences.