Syntax Reference Regular Expressions

Page Summary

On this page we discuss regular expressions as they are used by some plugins to define how to match and/or parse incoming data.

Regex Intro

Regex is short for Regular Expression. A regular expression is a string composed of special syntax that defines how to match or parse another string. As a simple example, the + character in regular expressions matches one or more instances of the preceeding character or capture group. So a regular expression a+ would match a, aa, aab, baa, and baac because each of those strings have a sequence of one or more a characters. Notice that regular expressions generally do not assume the match must start at the beginning of the string, though as we’ll see we can specify that as well.

Plugins often allow regular expressions to be used in configuration. This is helpful when, for example, matching and parsing string data received from an instrument. We’ll break down much more about regular expressions below, and provide many practical examples which are commonly used in JADE applications.

Basic Regular Expression Syntax

The basics of regular expression syntax is covered in the table below. This is not a complete / comprehensive syntax breakdown, but covers the most common regex syntax and is probably all you’ll ever need in JADE and beyond.

Syntax Description of Behavior Example
^ matches the start of a string Regex ^abc matches abc, abcd, and abc123 but does NOT match xabc, aabc, or bc
$ matches the end of a string Regex xyz$ matches xyz, axyz, and 123xyz but does NOT match xy, ayz, or yz
. matches any single character Regex a.c matches abc, aac, a3c, aZc, and va4c but does NOT match ac, abd, or abbc
* matches 0 or more of the preceeding character or capture group Regex ab*c matches ac, abc, and abbc but does NOT match ab, bc, or aad
+ matches one or more characters or instances of a capture group Regex a+c matches abc, abbc, and abbbc but does NOT match ac
? optionally matches a preceeding single character or capture group Regex ab?c matches abc and ac but does NOT match aec
{n} matches n instances of the preceeding character or capture group Regex ab{2}c matches abbc but does NOT match abc or abbbc
{n,m} matches between n and m instances of the preceeding character or capture group Regex ab{2,3}c matches abbc and abbbc but does NOT match abc or abbbbc
{n,} matches between n or more instances of the preceeding character or capture group Regex ab{2}c matches abbc, abbbc, and abbbbc (and so on) but does NOT match abc or ac
[...] matches any single character specified between the flat brackets [ and ] Regex a[bc]d matches abd and acd but does NOT match aec. To match a literal flat bracked use \[ or \]
[^...] matches any character except those specified between the flat brackets [ and ] Regex a[^bc]d matches aed but does NOT match abd or acd
(...) parentheses around a subexpression defines a capture group Regex a(*)c used against string abc captures b, used against string abbc captures bb, and used against string ac captures an empty string
(?:...) parentheses around a subexpression beginning with ?: indicates not to capture the subexpression Regex a(?:b) matches abc but will not capture b
(?=...) parentheses around a subexpression beginning with ?= performs positive lookahead without capture Regex a(?=b) matches the a in ab but will not match the a in ac
(?!...) parentheses around a subexpression beginning with ?! performs negative lookahead without capture Regex a(?!b) matches the a in ac but will not match the a in ab
(?<=...) parentheses around a subexpression beginning with ?<= performs positive lookbehind without capture Regex (?<=b)c matches the c in bc but will not match the c in ac
(?<!...) parentheses around a subexpression beginning with ?<! performs negative lookbehind without capture Regex (?<!b)c matches the c in ac but will not match the c in bc
(?&name) matches a regular expression subroutine which has been defined with the specified name JADE provides a common subroutine named number which will match any number (integer, floating point, scientific notation). To use this named regex, use the syntax: (?&number) For example, regex (?&number) matches 10, 10.5, 10.5E2, 10.5E+2, and 10.5E-2. Notably, it will not match a leading + character in front of a number.
\c matches a control character Regex \cP matches the character sequence entered when one types Ctrl + P
\s matches a whitespace character, including vertical whitespace characters Regex a\sb matches
a b
but does not match
aXb
\S matches a non-whitespace character Regex a\Sb matches
aXb
but does not match
a b
\d matches a digit character, i.e. one of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Regex a\db matches
a5b
but does not match
aXb
\D matches a non-digit character, i.e. anything but 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Regex a\Sb matches
aXb
but does not match
a b
\w matches a word character, which is essentially equivalent to [a-zA-Z0-9_] or upper and lower case letters, numbers, and underscores Regex a\wb matches
aXb
but does not match
a b
\W matches a non-word character, which is essentially equivalent to [^a-zA-Z0-9_] or anything except upper and lower case letters, numbers, and underscores Regex a\Wb matches
a b
but does not match
aXb
\v matches a vertical whitespace character; this is essentially equivent to the regex: [\r\n] Regex a\vb matches
a
b
but does not match
aXb
\V matches a non-vertical whitespace character; this is essentially equivent to the regex: [^\r\n] Regex a\Vb matches
aXb
but does not match
a
b
\n matches a line feed character Regex a\nb matches
a
b
\r matches a line feed character Regex a\rb matches
a
b
\t matches a tab character Regex a\tb matches
a	b
\f matches a form feed character Regex a\fb matches a followed by a form feed character, followed by b
\0 matches a null character Regex a\0b matches a string beginning with a, followed by a null character, followed by b
\YYY matches octal character YYY Regex \x112 matches J
\xYY matches hexadecimal character YY Regex \x4A matches J

Regex Examples

Below are several examples which show how to combine the syntax described above in meaningful ways.

Example Regex                                                                                                                      Description Sample Matching Strings
^((?&number)),((?&number)),((?&number))\s*$ matches a string which starts with 3 comma-delimited numbers and ends with 0 or more whitespace characters 2,-30,5
2.0,-30.0,5.0
2,-3E1,5.0
^((?&number))\s*,\s*((?&number))\s*,\s*((?&number))\s*$ matches a string which starts with 3 comma-delimited numbers, with 0 or more whitespace characters around the commas, and ends with 0 or more whitespace characters 2, -30, 5, 2.0 , -30.0 , 5.0, 2 ,-3E1 ,5.0

Further Exploration

There’s more to the regex story and several great online resources which dive deeper.