Plex Reference

Class Lexicon

Constructor
Token definitions
State definitions

Patterns

Str
Any, AnyBut, AnyChar
Empty
+ operator
| operator
Seq, Alt, Opt
Rep, Rep1
NoCase, Case
Bol, Eol, Eof

Actions

function
IGNORE
TEXT
Begin
returning a value

Scanner States

Class Scanner

Constructor
Methods

read
position
begin
produce
eof

Module Plex.Traditional

re

Class `Lexicon`

A Lexicon instance embodies a collection of lexical token definitions for use by a Scanner. Once constructed, a single Lexicon can be used by many Scanners.

Constructor

Lexicon(specification) builds a lexical analyser from the given specification. The specification consists of a list of specification items. Each specification item may be one of:

A token definition, which is a tuple:

(pattern, action)

pattern

action

Pattern constructors

actions

A state definition:

State(name, tokens)

name

tokens

meaning and usage of states

Patterns

Plex patterns are built using the following constructors.

Str(s)
Matches the literal string s.
Str(s1,s2, ...)
Matches either the string s1 or s2 or ...
Equivalent to Alt(Str(s1),Str(s2),...).
Any(s)
Matches any single character in the string s.
AnyBut(s)
Matches any single character (including newline) which is not in the string s.
AnyChar
Matches any single character (including newline). Equivalent to AnyBut('').
Empty
Matches the empty string.
p1 + p2
Matches the pattern p1 followed by p2. Equivalent to Seq(p1, p2).
p1 | p2
Matches either the pattern p1 or p2. Equivalent to Alt(p1, p2).
Seq(p1, p2, ...)
Matches the pattern p1 followed by p2 followed by ...
Alt(p1, p2, ...)
Matches either the pattern p1 or p2 or ...
Opt(p)
Matches either the pattern p or the empty string. Equivalent to p | Empty.
Rep(p)
Matches zero or more repetitions of the pattern p.
Rep1(p)
Matches one or more repetitions of the pattern p.
NoCase(p)
Matches the same strings as the pattern p, except that, in any part of p not enclosed by a Case(), upper and lower case letters are treated as equivalent.
Case(p)
Matches the same strings as the pattern p, except that, in any part of p not enclosed by a NoCase(), upper and lower case letters are treated as distinct.
Bol
Matches an imaginary character at the beginning of a line (i.e. at the start of the file or just after a newline).
Eol
Matches an imaginary character at the end of a line (i.e. just before a newline or at the end of the file).
Eof
Matches an imaginary character at the end of the file.
Note: The patterns Bol, Eol and Eof will only match once at any given position.

Actions

The action in a token specifation may be one of three things:

A function, which is called as follows:

function(scanner, text)
where scanner is the relevant Scanner instance, and text is the matched text. If the function returns anything other than None, that value is returned as the value of the token. If it returns None, scanning continues as if the IGNORE action were specified (see below).

One of the following special actions:

IGNORE

The recognised characters will be treated as white space and ignored. Scanning will continue until the next non-ignoredtoken is recognised before returning.
TEXT

Causes the scanned text itself to be returned as the value of the token.
Begin(state)

Causes the Scanner to enter the state named state(see below).

Any other value, which is returned as the value of the token.

States

At any given time, the scanner is in one of a number of states. Associated with each state is a set of possible tokens. When scanning, only tokens associated with the current state are recognised.

There is a default state, whose name is the empty string. Token definitions which are not inside any State definition belong to the default state.

The initial state of the scanner is the default state. The state can be changed by:

Using Begin(state_name) as the action of a token.

Calling the begin(state_name) method of the Scanner.

To change back to the default state, use '' as the state name.

Class `Scanner`

A Scanner instance associates a Lexicon with a stream of characters and provides a means of reading tokens from the stream.

Constructor

Scanner(lexicon, stream[, name = ''])

lexicon

Lexicon

stream

read()

name

position()

Methods

read() --> (value, text)
Reads the next lexical token from the stream and returns a tuple (value, text), where value is the value associated with the token as specified by the Lexicon, and text is the actual string read from the stream. Returns (None, '') on end of file.
position() --> (name, line, col)
Returns a tuple (name,line,col) representing the location of the last token read using the read() method. name is the name that was provided to the Scanner constructor; line is the line number in the stream (1-based); col is the position within the line of the first character of the token (0-based).
begin(state_name)
Sets the current state of the Scanner to the state named state_name.
produce(value [, text])
Called from an action procedure, causes value to be returned as the token value from the current call to read(). If text is supplied, it is returned in place of the scanned text.
produce() can be called more than once during a single call to an action procedure. In this case, scanning is suspended and tokens are queued and returned one at a time by subsequent calls to read(). When the queue is empty, scanning resumes.
eof()
This method can be overridden to perform an action when the end of the input stream is encountered. The default implementation does nothing.

Module `Plex.Traditional`

The Traditional submodule provides support for writing regular expressions using a more traditional character-string syntax.

re(

)

.

matches any single character, except a newline.

^

matches the beginning of a line.

$

matches the end of a line.

\

*

+

?

|

(

)

set

set.

set

[^

set

]

not

set.

Plex Reference

Contents

Class Lexicon

Constructor

Patterns

Actions

States

Class Scanner

Constructor

Methods

Module Plex.Traditional

Class `Lexicon`

Class `Scanner`

Module `Plex.Traditional`