Regular Expressions

Concept Link IconApplies to

The regular expressions used by SELECTOR are derived from the notation used in automata theory to describe formal languages and finite state machines.

Regular expressions consist of characters: operands such as "a", "O", and ", " and meta-characters (operators) such as "+", "|", and "[". A regular expression, which is similar to the familiar arithmetic expressions, can be either a basic expression or a complex expression that is formed by applying operators to several basic expressions.

The regular meta-characters for expressions, and their uses, are as follows:

Operator

Used for

\

Used in escape sequences to specify characters that would otherwise have no representation (similar to those used in the C language).

\b

Backspace.

\t

Horizontal tab.

\n

Newline or linefeed.

\f

New page or form feed.

\r

Carriage return.

\ddd

Octal value.

\c

c represents any character string that is taken literally.

^

The carat matches the beginning of a string. For example,, "^abc" matches only those strings which start with "abc". When used as the first character of a character class it denotes a negated class.

$

The dollar matches the end of a string. For example,, "$z" will only match strings with "z" as the last character.

.

The full stop matches any single character. Be careful as ".*" will match everything.

[

The open bracket denotes the start of a character class.

]

The closed bracket denotes the end of a character class.

|

The pipe character is the alternation operator, a|b" matched either "a" or "b".

()

Parentheses are used to group expressions in much the same way as for arithmetic expressions.

*

The closure operator matches 0 or more instances of the specified string.

+

The plus sign indicates positive closure and matches 1 or more of the specified expression. For example,, "+z" matches a string of 1 or more "z"s.

?

The question mark matches 0 or 1 instance of the specified expression. For example, "?9" matches either the null string or "9".

Character classes

Character classes are shorthand for matching one of several characters.

For example, [AaBb] is the same as (A|a|B|b)and matches "A", "a", "B", and "b".

There are also character ranges, such as [A-Z] that matches any uppercase character.

Negated character classes specify characters that should not be matched. For example,, [^A-Z] matches everything except upper case alpha characters.

Examples of Regular Expressions

All variable names that have a first branch of "System":

^System

All variable names that end in a number:

[0-9]$

All variable names:

.*