org.apache.regexp

Class RECompiler

public class RECompiler extends Object

A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.

Version: $Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $

Author: Jonathan Locke Michael McCallum

See Also: RE

Nested Class Summary
classRECompiler.RERange
Local, nested class for maintaining character ranges for character classes.
Field Summary
intbracketMin
intbracketOpt
static intbracketUnbounded
static intESC_BACKREF
static intESC_CLASS
static intESC_COMPLEX
static intESC_MASK
static HashtablehashPOSIX
intidx
char[]instruction
intlen
intlenInstruction
static intNODE_NORMAL
static intNODE_NULLABLE
static intNODE_TOPLEVEL
intparens
Stringpattern
Constructor Summary
RECompiler()
Constructor.
Method Summary
intatom()
Absorb an atomic character string.
voidbracket()
Match bracket {m,n} expression put results in bracket member variables
intbranch(int[] flags)
Compile body of one branch of an or operator (implements concatenation)
intcharacterClass()
Compile a character class
intclosure(int[] flags)
Compile a possibly closured terminal
REProgramcompile(String pattern)
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
voidemit(char c)
Emit a single character into the program stream.
voidensure(int n)
Ensures that n more characters can fit in the program buffer.
intescape()
Match an escape sequence.
intexpr(int[] flags)
Compile an expression with possible parens around it.
voidinternalError()
Throws a new internal error exception
intnode(char opcode, int opdata)
Adds a new node
voidnodeInsert(char opcode, int opdata, int insertAt)
Inserts a node with a given opcode and opdata at insertAt.
voidsetNextOfEnd(int node, int pointTo)
Appends a node to the end of a node chain
voidsyntaxError(String s)
Throws a new syntax error exception
intterminal(int[] flags)
Match a terminal node.

Field Detail

bracketMin

int bracketMin

bracketOpt

int bracketOpt

bracketUnbounded

static final int bracketUnbounded

ESC_BACKREF

static final int ESC_BACKREF

ESC_CLASS

static final int ESC_CLASS

ESC_COMPLEX

static final int ESC_COMPLEX

ESC_MASK

static final int ESC_MASK

hashPOSIX

static final Hashtable hashPOSIX

idx

int idx

instruction

char[] instruction

len

int len

lenInstruction

int lenInstruction

NODE_NORMAL

static final int NODE_NORMAL

NODE_NULLABLE

static final int NODE_NULLABLE

NODE_TOPLEVEL

static final int NODE_TOPLEVEL

parens

int parens

pattern

String pattern

Constructor Detail

RECompiler

public RECompiler()
Constructor. Creates (initially empty) storage for a regular expression program.

Method Detail

atom

int atom()
Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a closure operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).

Returns: Index of new atom node

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

bracket

void bracket()
Match bracket {m,n} expression put results in bracket member variables

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

branch

int branch(int[] flags)
Compile body of one branch of an or operator (implements concatenation)

Parameters: flags Flags passed by reference

Returns: Pointer to first node in the branch

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

characterClass

int characterClass()
Compile a character class

Returns: Index of class node

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

closure

int closure(int[] flags)
Compile a possibly closured terminal

Parameters: flags Flags passed by reference

Returns: Index of closured node

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

compile

public REProgram compile(String pattern)
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.

Parameters: pattern Regular expression pattern to compile (see RECompiler class for details).

Returns: A compiled regular expression program.

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

See Also: RECompiler RE

emit

void emit(char c)
Emit a single character into the program stream.

Parameters: c Character to add

ensure

void ensure(int n)
Ensures that n more characters can fit in the program buffer. If n more can't fit, then the size is doubled until it can.

Parameters: n Number of additional characters to ensure will fit.

escape

int escape()
Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].

Returns: ESC_* code or character if simple escape

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

expr

int expr(int[] flags)
Compile an expression with possible parens around it. Paren matching is done at this level so we can tie the branch tails together.

Parameters: flags Flag value passed by reference

Returns: Node index of expression in instruction array

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

internalError

void internalError()
Throws a new internal error exception

Throws: Error Thrown in the event of an internal error.

node

int node(char opcode, int opdata)
Adds a new node

Parameters: opcode Opcode for node opdata Opdata for node (only the low 16 bits are currently used)

Returns: Index of new node in program

nodeInsert

void nodeInsert(char opcode, int opdata, int insertAt)
Inserts a node with a given opcode and opdata at insertAt. The node relative next pointer is initialized to 0.

Parameters: opcode Opcode for new node opdata Opdata for new node (only the low 16 bits are currently used) insertAt Index at which to insert the new node in the program

setNextOfEnd

void setNextOfEnd(int node, int pointTo)
Appends a node to the end of a node chain

Parameters: node Start of node chain to traverse pointTo Node to have the tail of the chain point to

syntaxError

void syntaxError(String s)
Throws a new syntax error exception

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

terminal

int terminal(int[] flags)
Match a terminal node.

Parameters: flags Flags

Returns: Index of terminal node (closeable)

Throws: RESyntaxException Thrown if the regular expression has invalid syntax.

Copyright © 2001-2007 Apache Software Foundation. All Rights Reserved.