lrparser.dll
Classes:
#module Root.System.Parsers
Class LRParser implement "lazy" parser.
The parser is constructed by the specification definition which in turn is
performed via the language of regular expressions for lexical scanner and BNF-
like language for parser in the Construct method or constructor.
For instance:
... = instance LRParser(
<scanner-specification>,
<parser-specification>,
<using-exact-regular-expressions-flag>,
<complete-constructing-flag>,
<constant-registration-flag>
);
The description of lexical scanner specification can be found in the comments to
the LexScanner class.
<parser-specification> :=
"
<constant[s]-definition>
...
%%
<grammatical-expression-and-node-parameters>
...
%%
";
<constant[s]-definition-block>:=
#define <constant-name> <integer-constant>
or
enum {
<constant-name> [ = <integer-constant>] ,
<constant-name> [ = <integer-constant>] ,
...
} // as in C but without enumeration type;
<integer-constant> := decimal or hexadecimal C-constant
Constants defined in the lexical scanner are added to parser constants
<grammatical-expression-and-node-parameters>:=
<grammatical-expression> '{|' [ <constant-name>[ '('<mask>')' ] ] '|}'
<grammatical-expression>:=
<nonterminal> ':'
[ <terminal-or-nonterminal > ... ]
['|'[ <terminal-or-nonterminal> ... ] ... ]
<terminal>:= <constant-name> defined in the scanner specification which is
used as a token id or "exact" regular expression
from the scanner specification, if this is allowed
in the Construct method.
<nonterminal>:= identifier which consists of numbers, Russian and Latin
characters, '-' and '_' symbols. Number can not be the first
character, the '-' symbol can not be the first and the last
character.
<mask>:= <position> [ , <position> ... ]
<position>:= the number of a character position to the right of the ':'
symbol.
Grammatical expression.
Grammatical expression is the definition of the nonterminal which precedes
the ':' symbol via the terminals and nonterminals which stands to the
right of the ':' symbol.
The nonterminal is called starting if its defining expression is the first
in the specification.
For each nonterminal that stands in the right part of any grammatical
expression there should be at least one grammatical expression which
defines it.
The '?' sign can be added to the end of a terminal or nonterminal name,
that is <certain-nonterminal>?, this results in automatic addition the
following expression to the specification:
<certain-nonterminal>? : <certain-nonterminal> | \*empty*\ {||}
Grammatical expression parameters.
Semantic tree corresponding to the input sequence is the result of parser
work. This tree is made of objects of the TreeNode type which represent
its nodes and objects of the LexToken type which represent its leaves.
Each node has code, and vector of leaves and child nodes. Grammatical
expression of the parser specification that was used for the input
sequence parsing corresponds to each node. Node code is defined by a
constant which is specified by the grammatical expression parameters.
Leaves correspond to grammatical expression terminals, child nodes of the
certian node correspond to nonterminals.
Optional mask in the expression parameters represents the list of positions
in this expression that are mapped as the vector of leaves and sub-nodes of
the appropriate node. Positions to the right of the ':' symbol are
considered. The first position has number 1. The mask determines which
characters (terminals and nonterminals) and in what order to map to the
appropriate node. If mask is omitted all characters of the grammatical
expression, which are situated to the right of the ':' symbol, are mapped to
the vector of leaves and sub-nodes in their sequence order.
If all grammatical expression parameters are absent (i. e. {| \*empty*\ |}),
this means that new node is not created within the tree when this expression
is used for input sequence parsing. One leaf or child node of this (not
created node) is attached as a child node or leaf to the higher tree node.
Selection is carried out in the following manner:
* if one character is present in the right part of the grammatical
expression, it is selected as a leaf (if it is terminal) or child node
(if it I snonterminal);
* if there are many characters in the right part of the grammatical
expression then only one nonterminal should be among them and this
nonterminal is selected as a child node;
Methods:
- Construct(refer object String)
- Construct(refer object String, refer object String)
- Construct(refer object String, refer object String, boolean)
- Construct(refer object String, refer object String, boolean, boolean)
- Construct(refer object String, refer object String, boolean, boolean, boolean)
- GetStackData(int)
- GetStackDataPos(int)
- GetStackDepth(void)
- IsValid(void)
- LRParser(refer object String)
- LRParser(refer object String, boolean, boolean, boolean)
- LRParser(refer object String, refer object String)
- LRParser(refer object String, refer object String, boolean)
- LRParser(refer object String, refer object String, boolean, boolean)
- LRParser(refer object String, refer object String, boolean, boolean, boolean)
- LRParser(void)
- NumberOfProductions(void)
- Parse(refer object String)
- Parse(refer object String, refer object SymbolTable)
- ParseValidate(refer object String)
- ParseValidate(refer object String, refer object SymbolTable)
- ReBuild(refer object String, boolean)
- Size(void)
- TraceOn(boolean)
- TryClose(void)
- ~LRParser(void)
param specName;
Constructs parser according to the scanner specification defined via the
specName parameter and parser specification defined via the specName parameter.
Specifications are loaded from the global SpecSet field of the LexScanner and
LRParser classes accordingly.
Performs partial parser construction in order to work in the "idle" mode using
the "exact" regular expressions;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
Performs partial parser construction in order to work in the "idle" mode using
the "exact" regular expressions;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
Performs partial parser construction for the "idle" mode work;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe, fullConstruct;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used (for more information see the parser and scanner
specification description);
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe, fullConstruct, regConstantes;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used (for more information see the parser and scanner
specification description);
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
The TRUE value of the regConstants parameter results in registration of
constants in the Pluk-system. Moreover, the constants defined in the lexical
scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param n;
Returns a node in the parsing stack that lies at the depth specified via the n
parameter.
Used for processing nested grammars.
param n;
Returns the position in the analyzed text that corresponds to a node in the
parsing stack that lies at the depth specified via the n parameter.
Used for processing nested grammars.
Returns the current depth of the parsing stack.
Used for processing nested grammars.
Returns TRUE/FALSE. Defines whether parser is already constructed, that is
whether the specification is loaded in it.
param specName;
Constructs parser according to the scanner specification defined via the
specName parameter and parser specification defined via the specName parameter.
Specifications are loaded from the global SpecSet field of the LexScanner and
LRParser classes accordingly.
Performs partial parser construction in order to work in the "idle" mode using
the "exact" regular expressions;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param specName, useExactRe, fullConstruct, regConstantes;
Constructs parser according to the scanner specification defined via the
specName parameter and parser specification defined via the specName parameter.
Specifications are loaded from the global SpecSet field of the LexScanner and
LRParser classes accordingly.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
The TRUE value of the regConstants parameter results in registration of
constants in the Pluk-system. Moreover, the constants defined in the lexical
scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
Performs partial parser construction in order to work in the "idle" mode using
the "exact" regular expressions;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
Performs partial parser construction for the "idle" mode work;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe, fullConstruct;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe, fullConstruct, regConstantes;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
The TRUE value of the regConstants parameter results in registration of
constants in the Pluk-system. Moreover, the constants defined in the lexical
scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
Creates parser object.
Parser constructing is performed by the Construct method.
Returns the number of grammatical expressions defined in the parser.
param inpData;
Performs the semantic tree designing for the string specified via the inpData parameter.
Returns an object of the TreeNode class which represents the top of tree.
Objects of the TreeNode class can be child nodes of this objects and objects of
the LexToken class can be its leaves.
If input data does not meet the parser syntax then a Pluk-error is raised.
param inpData, symTable;
Performs the semantic tree designing for the string specified via the inpData
parameter.
Returns an object of the TreeNode class which represents the top of tree.
Objects of the TreeNode class can be child nodes of this objects and objects of
the LexToken class can be its leaves. If the .val field of the LexScanner class
object has the int type then it represents the id of a line stored in the
character table.
If input data does not meet the parser syntax then a Pluk-error is raised.
param inpData;
Performs the semantic tree designing for the string specified via the inpData
parameter.
Returns an object of the TreeNode class which represents the top of tree.
Objects of the TreeNode class can be child nodes of this objects and objects of
the LexToken class can be its leaves.
If input data does not meet the parser syntax then EMPTY is returned in contrast
to the Parse method where a Pluk-error is raised.
param inpData, symTable;
Performs the semantic tree designing for the string specified via the inpData
parameter.
Returns an object of the TreeNode class which represents the top of tree.
Objects of the TreeNode class can be child nodes of this objects and objects of
the LexToken class can be its leaves. If the .val field of the LexScanner class
object has the int type then it represents the id of a line stored in the
character table.
If input data does not meet the parser syntax then EMPTY is returned in contrast
to the Parse method where a Pluk-error is raised.
param parSpec, useExactRe;
Reconstructs parser according to the parser specification defined via the
parSpec parameter when the same scanner is used.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used (for more information see the parser and scanner
specification description);
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
Returns the current number of lines in the parser transition table.
param traceOn;
Turns on/off mode that traces information about used grammatical expressions
during syntactic analysis of input data with the help of the Parse and
ParseValidate methods.
Tries to get root output from the parser grammar (i. e. to complete parsing)
with the help of only those nodes that are currently situated within the parsing
stack.
Returns the root node of a parse tree if succeeded; Otherwise - EMPTY.
Used for processing nested grammars.
#module Root.System.Parsers
The LRParserEngine class realizes the "idle" parsers it differs from the
LRParser class because it does not builds semantic tree.
When the Parse method of this object type is called, the syntactic analysis
of the input sequence is performed. For each grammatical expression, which
is used for parsing, the stated method of the certain object is called with
the parameters specified by this expression. This method should be of the
following kind:
method_name( int , int , refer object Vector ) = <|
param prodNo, prodId, params;
...
|>;
where:
prodNo - the number of the used grammatical expression;
prodId - the constant defined in the expression parameters. It is
used by the LRParser as a node code;
params - the vector of objects that correspond to grammatical
expression characters. In the LRParser it is represent the
vector of leaves and sub-nodes, but here it may be objects
of any type. Rules of associate with grammatical
expression characters are the same as in the LRParser.
This method can return the object of any type and EMPTY.
Methods:
- LRParserEngine(refer object String)
- LRParserEngine(refer object String, boolean, boolean, boolean)
- LRParserEngine(refer object String, refer object String)
- LRParserEngine(refer object String, refer object String, boolean)
- LRParserEngine(refer object String, refer object String, boolean, boolean)
- LRParserEngine(refer object String, refer object String, boolean, boolean, boolean)
- LRParserEngine(void)
- OnProduction(int, int, refer object Vector)
- Parse(refer object String)
- Parse(refer object String, refer object String)
- Parse(refer object String, refer object String, refer any)
- Parse(refer object String, refer object SymbolTable)
- Parse(refer object String, refer object SymbolTable, refer object String)
- Parse(refer object String, refer object SymbolTable, refer object String, refer any)
- ParseValidate(refer object String)
- ParseValidate(refer object String, refer object String)
- ParseValidate(refer object String, refer object String, refer any)
- ParseValidate(refer object String, refer object SymbolTable)
- ParseValidate(refer object String, refer object SymbolTable, refer object String)
- ParseValidate(refer object String, refer object SymbolTable, refer object String, refer any)
param specName;
Constructs parser according to the scanner specification defined via the
specName parameter and parser specification defined via the specName parameter.
Specifications are loaded from the global SpecSet field of the LexScanner and
LRParser classes accordingly.
Performs partial parser construction in order to work in the "idle" mode using
the "exact" regular expressions;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param specName, useExactRe, fullConstruct, regConstantes;
Constructs parser according to the scanner specification defined via the
specName parameter and parser specification defined via the specName parameter.
Specifications are loaded from the global SpecSet field of the LexScanner and
LRParser classes accordingly.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
The TRUE value of the regConstants parameter results in registration of
constants in the Pluk-system. Moreover, the constants defined in the lexical
scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
Performs partial parser construction in order to work in the "idle" mode using
the "exact" regular expressions;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
Performs partial parser construction for the "idle" mode work;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe, fullConstruct;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
Constants are registered in the Pluk-system. Moreover, the constants defined in
the lexical scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
param lexSpec, parSpec, useExactRe, fullConstruct, regConstantes;
Constructs parser according to the scanner specification defined via the lexSpec
parameter and parser specification defined via the parSpec parameter.
The useExactRe parameter determines whether "exact" regular expressions defined
in the lexical scanner are used ( for more information see the parser and
scanner specification description );
The TRUE value of the fullConstruct parameter results in complete parser
construction, The FALSE value of the fullConstruct parameter results in partial
parser construction for the "idle" mode work;
The TRUE value of the regConstants parameter results in registration of
constants in the Pluk-system. Moreover, the constants defined in the lexical
scanner are also registered.
Parser and scanner specification descriptions are given in the comments to the
classes of these objects.
Creates an object.
Parser constructing is performed by the LRParser::Construct(...) method.
param prodNo , prodID , prodParams;
This method is called by default by parser during the syntactic analysis
process. The number of the next grammatical expression used for parsing, The ID
of this expression indicated in the specification, the elements of the analyzed
text that correspond to the grammar elements and were indicated in the
specification in order to be passed to this function - are passed as parameters.
This function does nothing. It is assumed that it will be overridden or another
method of another object maybe will be pointed out.
param inpData;
It is equivalent to calling the LRParserEngine::Parse( inpData, "OnProduction", self ) method.
param inpData, methodName;
It is equivalent to calling the LRParserEngine::Parse( inpData, methodName, self ) method.
param inpData, methodName, objectRef;
Performs syntactic analysis of the string specified via the inpData parameter.
In process of parsing it calls the method specified via the methodeName
parameter of the object specified via the objectRef parameter and passes numbers
of identifier and parameters of grammatical expressions used for parsing to it.
Grammatical expressions without identifiers do not result in calling this method.
If input data does not meet the parser grammar then a Pluk-error is raised.
It is supposed that the method specified via the methodeName parameter has the
following structure:
method_name( int , int , refer object Vector ) = <|
param prodNo, prodId, params;
...
|>;
where:
prodNo - the number of the used grammatical expression;
prodId - the constant defined in the expression parameters. It is
used by the LRParser as a node code;
params - the vector of objects that correspond to grammatical
expression characters. In the LRParser it is represent the
vector of leaves and sub-nodes, but here it may be objects
of any type. Rules of associate with grammatical
expression characters are the same as in the LRParser.
This method can return the object of any type and EMPTY.
param inpData, symTable;
It is equivalent to calling the LRParserEngine::Parse( inpData, symTable, "OnProduction", self ) method;
param inpData, symTable, methodName;
It is equivalent to calling the LRParserEngine::Parse( inpData, symTable, methodName, self ) method;
param inpData, symTable, methodName, objectRef;
Performs syntactic analysis of the string specified via the inpData parameter.
In process of parsing it calls the method specified via the methodeName
parameter of the object specified via the objectRef parameter and passes numbers
of identifier and parameters of grammatical expressions used for parsing to it.
Grammatical expressions without identifiers do not result in calling this method.
If input data does not meet the parser grammar then a Pluk-error is raised.
The character table specified via the symTable parameter is used by the lexical
scanner to store strings.
It is supposed that the method specified via the methodeName parameter has the
following structure:
method_name( int , int , refer object Vector ) = <|
param prodNo, prodId, params;
...
|>;
where:
prodNo - the number of the used grammatical expression;
prodId - the constant defined in the expression parameters. It is
used by the LRParser as a node code;
params - the vector of objects that correspond to grammatical
expression characters. In the LRParser it is represent the
vector of leaves and sub-nodes, but here it may be objects
of any type. Rules of associate with grammatical
expression characters are the same as in the LRParser.
This method can return the object of any type and EMPTY.
param inpData;
It is equivalent to calling the LRParserEngine::ParseValidate( inpData, "OnProduction", self ) method.
param inpData, methodName;
It is equivalent to calling the LRParserEngine::ParseValidate( inpData, methodName, self ) method.
param inpData, methodName, objectRef;
Performs syntactic analysis of the string specified via the inpData parameter.
In process of parsing it calls the method specified via the methodeName
parameter of the object specified via the objectRef parameter and passes numbers
of identifier and parameters of grammatical expressions used for parsing to it.
Grammatical expressions without identifiers do not result in calling this method.
If input data does not meet the parser syntax then EMPTY is returned in contrast
to the Parse method where a Pluk-error is raised.
It is supposed that the method specified via the methodeName parameter has the
following structure:
method_name( int , int , refer object Vector ) = <|
param prodNo, prodId, params;
...
|>;
where:
prodNo - the number of the used grammatical expression;
prodId - the constant defined in the expression parameters. It is
used by the LRParser as a node code;
params - the vector of objects that correspond to grammatical
expression characters. In the LRParser it is represent the
vector of leaves and sub-nodes, but here it may be objects
of any type. Rules of associate with grammatical
expression characters are the same as in the LRParser.
This method can return the object of any type and EMPTY.
param inpData, symTable;
It is equivalent to calling the LRParserEngine::ParseValidate( inpData, symTable, "OnProduction", self ) method;
param inpData, symTable, methodName;
It is equivalent to calling the LRParserEngine::ParseValidate( inpData, symTable, methodName, self ) method;
param inpData, symTable, methodName, objectRef;
Performs syntactic analysis of the string specified via the inpData parameter.
In process of parsing it calls the method specified via the methodeName
parameter of the object specified via the objectRef parameter and passes numbers
of identifier and parameters of grammatical expressions used for parsing to it.
Grammatical expressions without identifiers do not result in calling this method.
If input data does not meet the parser syntax then EMPTY is returned in contrast
to the Parse method where a Pluk-error is raised.
The character table specified via the symTable parameter is used by the lexical
scanner to store strings.
It is supposed that the method specified via the methodeName parameter has the
following structure:
method_name( int , int , refer object Vector ) = <|
param prodNo, prodId, params;
...
|>;
where:
prodNo - the number of the used grammatical expression;
prodId - the constant defined in the expression parameters. It is
used by the LRParser as a node code;
params - the vector of objects that correspond to grammatical
expression characters. In the LRParser it is represent the
vector of leaves and sub-nodes, but here it may be objects
of any type. Rules of associate with grammatical
expression characters are the same as in the LRParser.
This method can return the object of any type and EMPTY.
#module Root.System.Parsers
The LexFilter class expands the lexical scanner with Filter function in order to
filtrate character thread (see the Filter(...) method).
Methods:
param strData, [ ( tknFilter , [onFilter = TRUE] | callBackFunc | callBackMethodeName , objRef ), ... ];
Performs the input string specified via the strData parameter filtration and
composes the output string in the following manner:
- the input string is converted to the token sequence according to the specification;
- the token sequence is passed through the filter which can modify token
strings by the specified in the method's parameters callback function
(method). If the value returned by this function is not EMPTY it is
accepted as the new value for token strings; Otherwise the token is
discarded from the sequence.
- the token sequence is converted from the strings of corresponding tokens
back to the character sequence. In order to decrease the number of
callback function (method) calls it is possible to set the list of token
identifiers which call or not the callback function (method).
The filter runs the following rules:
- if a token does not have an identifier then its string is passed to
the output thread without modification;
- if a token has an identifier and token filter is not indicated then
this token is passed to the callback function (method) which should
return the string value or EMPTY. The string value is placed into
the output thread;
- if a token has an identifier and token filter is indicated then this
token is passed to the callback function (method), if complies with
the specified filter; Otherwise it is discarded. If the callback
function (method) returns string value, this string is placed into
the output thread.
The callback function or method is necessarily to be indicated as the parameter.
The callback method is indicated by setting the string which contains this
method name and the object for which this method should be called.
The callback function should be of the following kind:
<|
param id, val;
...
return ...;
|>
the callback method should be of the following kind:
new ClassName::MethodeName( int , refer object String ) =
<|
param id, val;
...
return ...;
|>
where id and val has the same sense as in the LexToken class.
Additional non-recurrent parameter of three types are possible:
- Token filter. This is the vector of integer numbers (token
identifiers) which are passed or not to call the callback function
(method). The parameter of Boolean type which indicates the manner
in which the token filter is set is possible after the filter. TRUE
corresponds to setting the list of passed tokens, FALSE corresponds
to setting the list of not passed tokens.
If this parameter is omitted TRUE value is supposed.
- Coefficient for the output buffer reservation. This number is
greater than zero. This value gives the initial size which is
reserved for the output buffer when multiplied on the strData string
length.
param lexSpec;
See LexScanner::LexScanner( refer object String )
param lexSpec, fullConstruct;
See LexScanner::LexScanner( refer object String, boolean )
param lexSpec, fullConstruct, regConstants;
See LexScanner::LexScanner( refer object String, boolean, boolean )
See LexScanner::LexScanner( void )
#module Root.System.Parsers
The LexScanner class realize "idle" lexical scanner which serves as a source
for parser class tokens.
The scanner is constructed by the specification definition which in turn is
performed via the language of regular expressions in the Construct method or
constructor.
For instance:
... = instance LexScanner(
<specification>,
...
);
<specification> :=
"
<constant[s]-definition>
...
%%
< denominate-regular-expression> | <token-description>
...
%%
";
< constant[s]-definition-block>:=
#define <constant-name> <integer-constant>
or
enum {
<constant-name> [ = <integer-constant>] ,
<constant-name> [ = <integer-constant>] ,
...
} // as in C but without enumeration;
<integer-constant> := decimal or hexadecimal C-constant
<denominate-regular-expression>:=
#define <regular-expression-name> <regular-expression>
<token-description>:=
<regular-expression> '{|' [ <constant-name>['<'[<modifiers>]'>' ]] '|}'
<regular-expression>:=
<regular-expression> <regular-expression>
| <regular-expression> '|' <regular-expression>
| <regular-expression>'+'
| <regular-expression>'*'
| <regular-expression>'?'
| '(' <regular-expression> ')'
| '{'<regular-expression-name>'}'
| <non-special-character>'[ {' number [, number] '}' ]
| <string>
| '.' [ {' number [, number] '}' ]
| <characters-class> [ {' number [, number] '}' ]
<non-special-character>:= any character except |+*,()'. and carriage return
| any character (except carriage return) included in
single quotes
| <escape-sequence>
The c { n[, m]} entry which follows after a character or class indicates the
number of its replications, for instance:
a{3} := aaa
a{3,5} := aaaa?a?
<string>:= the sequence of any characters (except carriage return) included in
single quotes
<characters-class>:
* the definition of characters class begins with the '[' character and
closes with the ']' character;
* all characters belonged to this characters class are enumerated between
[] characters;
* As the scanner specification is read by strings all characters except
the carriage return are valid. The carriage return can be entered via the
<escape-sequence>;
* besides the aforesaid, ~ and - characters can have special sense within
the characters class description;
* if the ~ character is not the first one then all the characters
enumerated within [] are belong to the characters class;
* if the ~ characters occupies the first position after the '[' character,
this means that all the characters besides the enumerated within [] are
belong to the characters class;
* [~] this means that the characters class consists of one ~ element;
* to describe large character groups you can define the range of
characters, for instance, 0-9;
* the '-' character at the first position, at the second position after
the ~ character and at the last position determines just the '-'
character;
* the '.' within regular expressions is counted as a special character and
determines the "any character except the carriage return" character class
(i.e. [~\n]);
* examples:
regular expression [-+][0-9]+ describes integer decimal numbers;
regular expression [_a-zA-Z][_0-9a-zA-Z]* describes C language
identifiers.
<escape-sequence>:= escape-sequence permitted in C
!!! When the specification is written as a string parameter, Pluk
performs proper conversion of escape-sequences.
Thus the \n escape-sequences will be converted to the carriage return
character and scanner specification disassembler will interpret it as the
end of line that may cause syntax error. In such cases \\n should be
written.
Regular expressions
expression described sequences
--------- ------------------------------
1. ab only 'ab'.
2. ab? 'a' or 'ab'.
3. ab* 'a', 'ab', 'abb', 'abbb', ...
4. ab+ 'ab', 'abb', 'abbb', ...
5. a|b 'a' or 'b'.
6. (ab) only 'ab'.
7. (ab)+ 'ab', 'abab', 'ababab', ...
8. (a|b)+ 'a', 'b', 'aa', 'bb', 'ab', ... any lines from ' a 'and' b '.
9. [ab]+ the same as the example 8.
10. [0-9]+ any unsigned decimal number.
11. (-|'+')[0-9]+ any unsigned decimal number.
12. [-+][0-9]+ the same as the example 11.
13. {let}({let}|{dec})* any identifiers permitted in C, where let and dec are defined in the following manner:
#define let [_a-zA-Z]
#define dec [0-9]
14. \\x{h}{h}? hexadecimal escape-sequences permitted in C, where h is defined in the following manner:
#define h [0-9a-fA-F]
for instance \xa or \x40
Token description
On calling the GetToken method scanner returns the object of LexToken
class. Its fields are populated in the following manner:
.id == <constant-name>;
.val == depends on modifiers (see farther);
!!! if the (||) block in the token description is empty then the strings which
meets the regular expression of this token are passed. For instance, this can be
used to describe blanks in one-line comments.
Some input sequence can be described via several regular expressions from the
specification. Scanner selects the very first token description which contains
such expression.
Regular expression in the token description may be corresponded with
multitude of lines which meet its requirements. But if the expression does
not contain operators (except concatenation) and characters classes, it
can be corresponded only with one satisfying line. Let us call this
expression - "exact" regular expression.
For instance:
if {| OP_IF |}
Scanner does not include the recognition of such expressions in its
transition table and places them in to a special table of "exact" regular expressions. In order to generate token with .id == OP_IF at the "if" input it
is necessary that the expression which contains "if" line among the multitude of
its lines should be defined in the scanner. Besides, the modifier exact should
be indicated in this expression token description. This modifier forces to check
with the table of "exact" regular expressions. If these conditions are not
implemented then token with .id == OP_IF will never be returned.
Modifiers:
There are three groups of modifiers:
* tolower/toupper;
* exact ( + aslower/asupper );
* lexem/id/empty;
One at a time modifier from each group can be enumerated in the token
description.
Let the sequence of characters described by one or several regular
expressions of specification is inputted to a scanner. Scanner selects the very
first token description in the specification which contains such expression and
checks the list of its modifiers.
* At first, if this list contains tolower or toupper words, scanner
converts all characters of the input line to lower or upper case
characters accordingly.
* Then scanner checks whether the table of "exact" regular expressions
contains the input line if this list contains exact word. If such string
exists then the token description which contains this "exact" regular
expressions is selected. It is possible to use key word aslower or asupper
together with the exact word to force scanner to convert the input line
characters to lower or upper case characters accordingly before checking
in the table of "exact" regular expressions. In this case the string
passed together with the token is not modified.
* Finally, the search of lexeme/id/empty modifiers is performed within the
token (initial or selected after checking the table of "exact" regular
expressions) description. These modifiers affect on what is returned in
the token .val field:
* lexeme - the acquired string is returned( String type object);
* id - integer number, which represents the index of the acquired
string in the character table which in turn is connected to scanner;
* empty - returns EMPTY. This modifier is defined by default;
If there are no lists of modifiers in the token description, this means
the token with the .val == EMPTY field is returned and search in the table of
"exact" regular expressions is not performed.
!!! If there is the id modifier in the description of any token and
character table is not connected to the scanner then an error is raised
when this token is detected.
Methods:
- Check(refer object String)
- Check(refer object String, boolean)
- CheckPrefix(refer object String)
- CheckPrefix(refer object String, boolean)
- ClearCallback(void)
- Construct(refer object String)
- Construct(refer object String, boolean)
- Construct(refer object String, boolean, boolean)
- ContinuePrefix(refer object String)
- GetExactReTable(void)
- GetLine(void)
- GetPos(void)
- GetSource(void)
- GetToken(refer object LexToken)
- GetToken(void)
- IsValid(void)
- LexScanner(refer object String)
- LexScanner(refer object String, boolean)
- LexScanner(refer object String, boolean, boolean)
- LexScanner(void)
- LoadBuffer(refer object String)
- LoadBuffer(refer object String, refer object SymbolTable)
- PopState(void)
- PushState(void)
- SetCallback(refer object Vector, boolean, refer object String, refer ...)
- SetPos(int)
- Size(void)
- Tokenize(refer object String, refer ...)
- Trace(refer object String, ...)
- UngetToken(void)
- _Construct(refer object String, boolean, boolean)
- _Construct(refer object String, boolean, boolean, refer object String, int)
- _Init(void)
- ~LexScanner(void)
param strData;
Checks the string on the compliance to the scanner specification. Returns
Boolean value.
param strData, oneToken;
Checks the string on the compliance to the scanner specification.
If oneToken == TRUE, it means that the whole string refers to one token.
Returns Boolean value.
param strData;
Checks whether the string can be the beginning of the expression which complies
with the scanner specification.
Returns Boolean value.
param strData, oneToken;
Checks whether the string can be the beginning of the expression which complies
with the scanner specification.
If oneToken == TRUE, it means that the whole string refers to one token.
Returns Boolean value.
param lexSpec;
Performs initial construction of the lexical scanner according to the
specification determined via the lexSpec parameter. Scanner will work in the
"idle" mode, that is additional constructing is performed on scanning. Registers
constants in the Pluk-system.
Specification format is described in the comments to the LexScanner class.
param lexSpec, fullConstruct;
Performs complete ( fullConstruct == TRUE ) or initial (fullConstruct == FALSE)
construction of the lexical scanner according to the specification determined
via the lexSpec parameter. In the latter case scanner will work in the "idle"
mode, that is additional constructing is performed on scanning. Registers
constants in the Pluk-system.
Specification format is described in the comments to the LexScanner class.
param lexSpec, fullConstruct, regConstants;
Performs complete ( fullConstruct == TRUE ) or initial (fullConstruct == FALSE)
construction of the lexical scanner according to the specification determined
via the lexSpec parameter. In the latter case scanner will work in the "idle"
mode, that is additional constructing is performed on scanning. Registers
constants in the Pluk-system if the regConstants == TRUE.
Specification format is described in the comments to the LexScanner class.
param strData;
Checks whether the string can be the beginning of the expression which complies
with the scanner specification and tries to continue it if there are no other
alternatives.
Returns the continued string if it is possible; Otherwise it returns the initial
string.
Returns EMPTY if CheckPrefix(strData) == FALSE.
Returns the current line number in the scanned buffer.
Returns the current position number in the scanned buffer.
Serialization support.
param token;
Performs the next token readout from the scanner buffer. Populates the token
object. Sets the token to EMPTY if the end of the buffer is detected.
Performs the next token readout from the scanner buffer. Returns the object of
LexToken type. Returns EMPTY if the end of the buffer is detected.
Returns TRUE/FALSE. Defines whether the scanner object is already constructed,
that is whether the specification is loaded in it.
param lexSpec;
Creates an object and performs initial construction of the lexical scanner
according to the specification determined via the lexSpec parameter. Scanner
will work in the "idle" mode, that is additional constructing is performed on
scanning. Registers constants in the Pluk-system.
Specification format is described in the comments to the LexScanner class.
param lexSpec, fullConstruct;
Creates an object and performs complete ( fullConstruct == TRUE ) or initial
(fullConstruct == FALSE ) construction of the lexical scanner according to the
specification determined via the lexSpec parameter. In the latter case scanner
will work in the "idle" mode, that is additional constructing is performed on
scanning. Registers constants in the Pluk-system.
Specification format is described in the comments to the LexScanner class.
param lexSpec, fullConstruct, regConstants;
Creates an object and performs complete ( fullConstruct == TRUE ) or initial
( fullConstruct == FALSE ) construction of the lexical scanner according to the
specification determined via the lexSpec parameter. In the latter case scanner
will work in the "idle" mode, that is additional constructing is performed on
scanning. Registers constants in the Pluk-system if the regConstants == TRUE.
Specification format is described in the comments to the LexScanner class.
Creates an empty object of the lexical analyzer. Constructing is performed by
the Construct method.
param strData;
Loading of the initial string to buffer for scanning. This method is used if
there is no instruction in the scanner specification to use the character table.
Otherwise LoadBuffer( refer object String, refer object SymbolTable) method
is used.
param strData, symTable;
Loading of the initial string to buffer for scanning and connection to the
character table. This method is used if there was the instruction in the scanner
specification to use the character table. Otherwise the LoadBuffer( refer object
String ) method is used.
Restore the scanner's state which was saved by the PushState() method.
Saves the scanner's state. The state is restored by the PopState() method.
Information is lost after reloading the buffer.
Sets the current position in the input buffer. Further scanning is performed
from this position.
Returns the current size (in lines) of the scanner transition table. If the
scanner is in "idle" mode this value may increase.
param strData, [ ( tknFilter , [onFilter = TRUE] | callBackFunc | callBackMethodeName , objRef ), ... ];
Returns vector of LexToken objects which correspond to tokens found in the
string specified via the strData parameter, when deals with one parameter
variant.
Additional non-recurrent parameter of three types are possible:
- Token filter. This is the vector of integer numbers (token
identifiers) which are passed or not to the resultant vector. The
parameter of Boolean type which indicates the manner in which the
token filter is set is possible after the filter. TRUE corresponds
to setting the list of passed tokens, FALSE corresponds to setting
the list of not passed tokens.
If this parameter is omitted TRUE value is supposed.
- Callback function which is called for each token and, if it is
necessary, the object passed as the context to the callback
function. If the value returned by this function is not EMPTY then
it is placed to the resultant vector.
- Name of the callback method which is called for each token and, if
it is necessary, the object for which this method is called. If the
value returned by this function is not EMPTY then it is placed to
the resultant vector.
Filter, callback function (method) or filter in combination with callback
function (method) can be indicated. In the latter case the filtered tokens are
passed to the callback function (method).
If the context object is present, the callback function should be of the
following kind:
<|
param id, val, contextObj;
...
return ...;
|>
If the context object is absent, the callback function should be of the
following kind:
<|
param id, val;
...
return ...;
|>
the callback method should be of the following kind:
new ClassName::MethodeName( int, refer object String ) =
<|
param id, val;
...
return ...;
|>
where id and val has the same sense as in the LexToken class.
param buf[, symTable];
Performs scanning of the input buffer and information output concerning found
tokens.
Performs the rollback of the current position pointer to the beginning of last
selected token in the scanner buffer.
The lexical scanner specification set.
Methods:
param name, spec, comment;
Adds a scanner specification to the specification set.
name - name,
spec - specification,
comment - comment.
param name, spec, comment, file, offset;
Adds a scanner specification to the specification set.
name - name,
spec - specification,
comment - comment,
file - file name,
offset - beginning in the file specified via the file parameter.
param name;
Returns a reference to the comment of the specification specified via
the name parameter in the set of scanner specifications.
param name;
Returns a reference to the record of the specification specified via
the name parameter in the set of scanner specifications.
param name;
Deletes the specification specified via the name parameter from the
set of scanner specifications.
param name;
Returns a reference to the text of the specification specified via the
name parameter in the set of scanner specifications.
#module Root.System.Parsers.Utils
An object of the LexToken class is returned by scanner describes the token of
the scanned buffer. The .id field contains the token identifier the .val
contains the token string itself, integer identifier of the token string in the
character table or remains EMPTY if the .id field is sufficient for token
identification.
Parser specification set.
#module Root.System.Parsers.Utils
The character table is used to store lines which are obtained during scanning
process. This table can be connected to the scanner or parser before scanning or
syntactic analysis accordingly. Integer index is assigned to table lines. This
index can be acquired when placing the line into the table and by the search
method. If the line which is placed into the table already exists, the same
index is returned.
Methods:
Clears the character table.
param str;
Inserts a line into the character table. Returns the index of this line. If the
identical line is already exists, the new one is not inserted and the index of
the old one is returned.
param id;
Seeks for a line within the character table according to its index. Returns this
line if succeeded.
Otherwise returns EMPTY;
param str;
Seeks for a line within the character table. Returns its index if succeeded.
Otherwise returns EMPTY;
Returns the number of elements within the character table.
Creates an object of the SymbolTable class.
See the comment to the class.
#module Root.System.Parsers.Utils
The TreeNode class describes a node in a semantic tree. The parser of the
LRParser type returns the root node of such tree. The .code field contains node
code, that is the identifies specified in the parser specification. The .child
field contains the vector of sub-nodes ( of the TreeNode class ) and leaves ( of
the LexToken class).
Methods:
param parObject;
Prints a tree to a string using constant names from the specified parser object.
In order to be represented correctly the constants should have unique values in
the scanner and parser specifications.
Prints a tree to a string without constant names.
- AddSpec, method of class LexSpecSet
- Check, method of class LexScanner
- CheckPrefix, method of class LexScanner
- Clear, method of class SymbolTable
- Comment, method of class LexSpecSet
- Construct, method of class LexScanner
- Construct, method of class LRParser
- ContinuePrefix, method of class LexScanner
- DFA_DATA, class
- Filter, method of class LexFilter
- GetLine, method of class LexScanner
- GetPos, method of class LexScanner
- GetSource, method of class LexScanner
- GetStackData, method of class LRParser
- GetStackDataPos, method of class LRParser
- GetStackDepth, method of class LRParser
- GetToken, method of class LexScanner
- Insert, method of class SymbolTable
- IsValid, method of class LexScanner
- IsValid, method of class LRParser
- LexFilter, class
- LexFilter, method of class LexFilter
- LexScanner, class
- LexScanner, method of class LexScanner
- LexSpecSet, class
- LexToken, class
- LoadBuffer, method of class LexScanner
- Lookup, method of class SymbolTable
- LRParser, class
- LRParser, method of class LRParser
- LRParserEngine, class
- LRParserEngine, method of class LRParserEngine
- NumberOfProductions, method of class LRParser
- OnProduction, method of class LRParserEngine
- Parse, method of class LRParser
- Parse, method of class LRParserEngine
- ParseValidate, method of class LRParser
- ParseValidate, method of class LRParserEngine
- ParSpecSet, class
- PopState, method of class LexScanner
- PrintTree, method of class TreeNode
- PushState, method of class LexScanner
- ReBuild, method of class LRParser
- Record, method of class LexSpecSet
- RemoveSpec, method of class LexSpecSet
- SetPos, method of class LexScanner
- Size, method of class LexScanner
- Size, method of class LRParser
- Size, method of class SymbolTable
- Spec, method of class LexSpecSet
- SymbolTable, class
- SymbolTable, method of class SymbolTable
- Tokenize, method of class LexScanner
- Trace, method of class LexScanner
- TraceOn, method of class LRParser
- TreeNode, class
- TryClose, method of class LRParser
- UngetToken, method of class LexScanner