Wolfram Language & System Documentation Center

RegularExpression

represents the generalized regular expression specified by the string "regex".

Details

RegularExpression can be used to represent classes of strings in functions like StringMatchQ, StringReplace, StringCases, and StringSplit.
RegularExpression supports standard regular expression syntax of the kind used in typical string manipulation languages.
The following basic elements can be used in regular expression strings:

	c	the literal character c
	.	any character except newline
	[c₁c₂…]	any of the characters c_i
	[c₁-c₂]	any character in the range c₁–c₂
	[^c₁c₂…]	any character except the c_i
	p*	p repeated zero or more times
	p+	p repeated one or more times
	p?	zero or one occurrence of p
	p{m,n}	p repeated between m and n times
	p*?,p+?,p??	the shortest consistent strings that match
	(p₁p₂…)	strings matching the sequence p₁, p₂, …
	p₁\|p₂	strings matching p₁ or p₂

The following represent classes of characters:

	\\d	digit 0–9
	\\D	nondigit
	\\s	space, newline, tab, or other whitespace character
	\\S	non-whitespace character
	\\w	word character (letter, digit, or _)
	\\W	nonword character
	[[:class:]]	characters in a named class
	[^[:class:]]	characters not in a named class

The following named classes can be used: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word, xdigit.
The following represent positions in strings:
^ the beginning of the string (or line)

$ the end of the string (or line)

\\b word boundary

\\B anywhere except a word boundary
The following set options for all regular expression elements that follow them:

	(?i)	treat uppercase and lowercase as equivalent (ignore case)
	(?m)	make ^ and $ match start and end of lines (multiline mode)
	(?s)	allow . to match newline
	(?-c)	unset options

\\., \\[, etc. represent literal characters ., [, etc.
Analogs of named Wolfram Language patterns such as x:expr can be set up in regular expression strings using (regex).
Within a regular expression string, \\gn represents the substring matched by the n parenthesized regular expression object (regex). The shorter \\n is often equivalent to \\gn.
For the purpose of functions such as StringReplace and StringCases, any $n appearing in the right‐hand side of a rule RegularExpression["regex"]->rhs is taken to correspond to the substring matched by the n parenthesized regular expression object in regex. $0 represents the whole matched string.

Examples

open all close all

Basic Examples (2)

Find words involving the characters a, b, c, d, e:

Wolfram Language code: StringCases["adefgh12c34", RegularExpression["[a-e]+"]]

Equivalent form using string patterns:

Wolfram Language code: StringCases["adefgh12c34", CharacterRange["a", "e"]..]

Decide whether the string consists of words and whitespace:

Wolfram Language code:

StringMatchQ["abcd
efgh
1234", RegularExpression["(\\w|\\s)*"]]

Equivalent form using string patterns:

Wolfram Language code:

StringMatchQ["abcd
efgh
1234", (WordCharacter | Whitespace)...]

Scope (22)

Basic Constructs (17)

Extract any character except newline:

Wolfram Language code: StringCases["a23b42c63d80, 123", RegularExpression["."]]

Wolfram Language code:

StringCases["a23b42c63d80, 123", Except["
", _] ]

Either of the characters "a" and "b":

Wolfram Language code: StringCases["a13b12c1da32efg", RegularExpression["[ab]"]]

Wolfram Language code: StringCases["a13b12c1da32efg", "a" | "b"]

Any character between "a" and "e", including "a" and "e":

Wolfram Language code: StringCases["adefgh12c34", RegularExpression["[a-e]"]]

Wolfram Language code: StringCases["adefgh12c34", CharacterRange["a", "e"]]

Any character except "a" and "1":

Wolfram Language code: StringCases["a13b12c17a32", RegularExpression["[^a1]"]]

Wolfram Language code: StringCases["a13b12c17a32", Except["a" | "1", _]]

Any digit repeated one or more times:

Wolfram Language code: StringCases["a23b4222c63333d80", RegularExpression["\\d+"]]

Wolfram Language code: StringCases["a23b4222c63333d80", NumberString]

The character "a" repeated 2 or 3 times:

Wolfram Language code: StringCases["aabc1aaaagh2ade", RegularExpression["a{2,3}"]]

Wolfram Language code: StringCases["aabc1aaaagh2ade", w : (x_ ...) /; (2 ≤ StringLength[w] ≤ 3) ]

Any digit:

Wolfram Language code: StringCases["a2322c63333d80", RegularExpression["\\d"]]

Wolfram Language code: StringCases["a2322c63333d80", DigitCharacter]

Nondigit characters:

Wolfram Language code: StringCases["a2322c63333d80", RegularExpression["\\D"]]

Wolfram Language code: StringCases["a2322c63333d80", Except[DigitCharacter]]

Space, newline, tab, or other whitespace character:

Wolfram Language code:

StringCases["13
a22	 bbb", RegularExpression["\\s"]]//InputForm

Wolfram Language code:

StringCases["13
a22	 bbb", WhitespaceCharacter]//InputForm

Non-whitespace characters:

Wolfram Language code:

StringCases["13
a22	 bbb", RegularExpression["\\S"]]

Wolfram Language code:

StringCases["13
a22	 bbb", Except[WhitespaceCharacter]]

Word characters:

Wolfram Language code: StringCases["a23b42c63,d80", RegularExpression["\\w"]]

Wolfram Language code: StringCases["a23b42c63,d80", WordCharacter]

Nonword characters:

Wolfram Language code: StringCases["a23b:42c63;d80", RegularExpression["\\W"]]

Wolfram Language code: StringCases["a23b:42c63;d80", Except[WordCharacter]]

Find all uppercase letters:

Wolfram Language code: StringCases["AaBBccDDeefG", RegularExpression["[[:upper:]]+"]]

Wolfram Language code: StringCases["AaBBccDDeefG", CharacterRange["A", "Z"]..]

Split a string at the beginning of a new line:

Wolfram Language code:

StringSplit["line1
line2
line3", RegularExpression["(?m)^"]]//InputForm

Wolfram Language code:

StringSplit["line1
line2
line3", StartOfLine]//InputForm

Split a string at the end of a new line:

Wolfram Language code:

StringSplit["line1
line2
line3", RegularExpression["(?m)$"]]//InputForm

Wolfram Language code:

StringSplit["line1
line2
line3", EndOfLine]//InputForm

Insert a character at the boundary of each word:

Wolfram Language code: StringReplace["123 45 6 789", RegularExpression["\\b"] :> "X"]

Wolfram Language code: StringReplace["123 45 6 789", WordBoundary :> "X"]

Split a string at every character except at the boundary of a word:

Wolfram Language code: StringSplit["12X X5X X89", RegularExpression["\\B"]]

Wolfram Language code: StringSplit["12X X5X X89", Except[WordBoundary]]

Compound Constructs (5)

StringExpression can contain RegularExpression objects:

Wolfram Language code: StringCases["a13b12c17a32", "a" ~~ x : RegularExpression["\\d+"] -> x]

Wolfram Language code: StringCases["a13b12c17a32", "a" ~~ x : DigitCharacter.. -> x]

Conditional patterns:

Wolfram Language code: StringCases["a23b42c63d80, 123", x : RegularExpression["\\d+"] /; Mod[ToExpression[x], 2] == 0]

Wolfram Language code: StringCases["a23b42c63d80, 123", x : DigitCharacter.. /; Mod[ToExpression[x], 2] == 0]

Use alternatives to match one or more line breaks:

Wolfram Language code:

StringMatchQ["abcd
efgh
1234", RegularExpression["(.*|\\s*)*"]]

Wolfram Language code:

StringMatchQ["abcd
efgh
1234", (WordCharacter... | Whitespace)...]

Non-greedy matches are done by appending a question mark "?" to the quantifiers:

Wolfram Language code: StringCases["abc1agh2cde", RegularExpression["a.+?\\d"]]

Wolfram Language code: StringCases["abc1agh2cde", Shortest["a" ~~ __ ~~ DigitCharacter]]

The $1 refers to the letter matched by (.):

Wolfram Language code: StringCases["aaabcccabbaacba", RegularExpression["(.)\\g1"] -> "$1"]

Wolfram Language code: StringCases["aaabcccabbaacba", x_ ~~ x_ -> x]

Numbered subpatterns:

Wolfram Language code: StringCases["a1b6a3b3a3c3a8b8", RegularExpression["(a(\\d))b\\g2"] -> {"$0", "$1", "$2"}]

Wolfram Language code: StringCases["a1b6a3b3a3c3a8b8", g0 : ((g1 : ("a" ~~ g2 : DigitCharacter)) ~~ "b" ~~ g2_) :> {g0, g1, g2}]

Properties & Relations (3)

Use StringMatchQ to determine string pattern matches:

Wolfram Language code: StringMatchQ["12345", RegularExpression["\\d+"]]

Use StringCases to find matching substrings:

Wolfram Language code: StringCases["aaaa bbbb 1234", RegularExpression["[a-z]+"]]

Use StringSplit to split a string into substrings using a delimiter pattern:

Wolfram Language code: StringSplit["1.23, 4.56 7.89", RegularExpression["(\\s|,)+"]]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

RegularExpression

Details

Examples

Basic Examples (2)

Scope (22)

Basic Constructs (17)

Compound Constructs (5)

Properties & Relations (3)

Text

CMS

APA

BibTeX

BibLaTeX

RegularExpression

Details

Examples

Basic Examples (2)

Scope (22)

Basic Constructs (17)

Compound Constructs (5)

Properties & Relations (3)

See Also

Tech Notes

Related Guides

Related Links

History

Text

CMS

APA

BibTeX

BibLaTeX