Page Summary
On this page we discuss regular expressions as they are used by some plugins to define how to match and/or parse incoming data.
Regex Intro
Regex is short for Regular Expression. A regular expression is a string composed of special syntax that defines how to match or parse another string. As a simple example, the +
character in regular expressions matches one or more instances of the preceeding character or capture group
. So a regular expression a+
would match a
, aa
, aab
, baa
, and baac
because each of those strings have a sequence of one or more a
characters. Notice that regular expressions generally do not assume the match must start at the beginning of the string, though as we’ll see we can specify that as well.
Plugins often allow regular expressions to be used in configuration. This is helpful when, for example, matching and parsing string data received from an instrument. We’ll break down much more about regular expressions below, and provide many practical examples which are commonly used in JADE applications.
Basic Regular Expression Syntax
The basics of regular expression syntax is covered in the table below. This is not a complete / comprehensive syntax breakdown, but covers the most common regex syntax and is probably all you’ll ever need in JADE and beyond.
Syntax | Description of Behavior | Example |
---|---|---|
^ |
matches the start of a string | Regex ^abc matches abc , abcd , and abc123 but does NOT match xabc , aabc , or bc |
$ |
matches the end of a string | Regex xyz$ matches xyz , axyz , and 123xyz but does NOT match xy , ayz , or yz |
. |
matches any single character | Regex a.c matches abc , aac , a3c , aZc , and va4c but does NOT match ac , abd , or abbc |
* |
matches 0 or more of the preceeding character or capture group | Regex ab*c matches ac , abc , and abbc but does NOT match ab , bc , or aad |
+ |
matches one or more characters or instances of a capture group | Regex a+c matches abc , abbc , and abbbc but does NOT match ac |
? |
optionally matches a preceeding single character or capture group | Regex ab?c matches abc and ac but does NOT match aec |
{n} |
matches n instances of the preceeding character or capture group | Regex ab{2}c matches abbc but does NOT match abc or abbbc |
{n,m} |
matches between n and m instances of the preceeding character or capture group | Regex ab{2,3}c matches abbc and abbbc but does NOT match abc or abbbbc |
{n,} |
matches between n or more instances of the preceeding character or capture group | Regex ab{2}c matches abbc , abbbc , and abbbbc (and so on) but does NOT match abc or ac |
[...] |
matches any single character specified between the flat brackets [ and ] |
Regex a[bc]d matches abd and acd but does NOT match aec . To match a literal flat bracked use \[ or \] |
[^...] |
matches any character except those specified between the flat brackets [ and ] |
Regex a[^bc]d matches aed but does NOT match abd or acd |
(...) |
parentheses around a subexpression defines a capture group | Regex a(*)c used against string abc captures b , used against string abbc captures bb , and used against string ac captures an empty string |
(?:...) |
parentheses around a subexpression beginning with ?: indicates not to capture the subexpression |
Regex a(?:b) matches abc but will not capture b |
(?=...) |
parentheses around a subexpression beginning with ?= performs positive lookahead without capture |
Regex a(?=b) matches the a in ab but will not match the a in ac |
(?!...) |
parentheses around a subexpression beginning with ?! performs negative lookahead without capture |
Regex a(?!b) matches the a in ac but will not match the a in ab |
(?<=...) |
parentheses around a subexpression beginning with ?<= performs positive lookbehind without capture |
Regex (?<=b)c matches the c in bc but will not match the c in ac |
(?<!...) |
parentheses around a subexpression beginning with ?<! performs negative lookbehind without capture |
Regex (?<!b)c matches the c in ac but will not match the c in bc |
(?&name) |
matches a regular expression subroutine which has been defined with the specified name | JADE provides a common subroutine named number which will match any number (integer, floating point, scientific notation). To use this named regex, use the syntax: (?&number) For example, regex (?&number) matches 10 , 10.5 , 10.5E2 , 10.5E+2 , and 10.5E-2 . Notably, it will not match a leading + character in front of a number. |
\c |
matches a control character | Regex \cP matches the character sequence entered when one types Ctrl + P |
\s |
matches a whitespace character, including vertical whitespace characters | Regex a\sb matches a bbut does not match aXb |
\S |
matches a non-whitespace character | Regex a\Sb matches aXbbut does not match a b |
\d |
matches a digit character, i.e. one of 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 |
Regex a\db matches a5bbut does not match aXb |
\D |
matches a non-digit character, i.e. anything but 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 |
Regex a\Sb matches aXbbut does not match a b |
\w |
matches a word character, which is essentially equivalent to [a-zA-Z0-9_] or upper and lower case letters, numbers, and underscores |
Regex a\wb matches aXbbut does not match a b |
\W |
matches a non-word character, which is essentially equivalent to [^a-zA-Z0-9_] or anything except upper and lower case letters, numbers, and underscores |
Regex a\Wb matches a bbut does not match aXb |
\v |
matches a vertical whitespace character; this is essentially equivent to the regex: [\r\n] |
Regex a\vb matches abut does not match aXb |
\V |
matches a non-vertical whitespace character; this is essentially equivent to the regex: [^\r\n] |
Regex a\Vb matches aXbbut does not match a |
\n |
matches a line feed character | Regex a\nb matches a |
\r |
matches a line feed character | Regex a\rb matches a |
\t |
matches a tab character | Regex a\tb matches a b |
\f |
matches a form feed character | Regex a\fb matches a followed by a form feed character, followed by b |
\0 |
matches a null character | Regex a\0b matches a string beginning with a , followed by a null character, followed by b |
\YYY |
matches octal character YYY | Regex \x112 matches J |
\xYY |
matches hexadecimal character YY | Regex \x4A matches J |
Regex Examples
Below are several examples which show how to combine the syntax described above in meaningful ways.
Example Regex | Description | Sample Matching Strings |
---|---|---|
^((?&number)),((?&number)),((?&number))\s*$ |
matches a string which starts with 3 comma-delimited numbers and ends with 0 or more whitespace characters | 2,-30,5 2.0,-30.0,5.0 2,-3E1,5.0 |
^((?&number))\s*,\s*((?&number))\s*,\s*((?&number))\s*$ |
matches a string which starts with 3 comma-delimited numbers, with 0 or more whitespace characters around the commas, and ends with 0 or more whitespace characters | 2, -30, 5 , 2.0 , -30.0 , 5.0 , 2 ,-3E1 ,5.0 |
Further Exploration
There’s more to the regex story and several great online resources which dive deeper.