About

regex

Tag Info

Info Newest Frequent Score Active Unanswered

Regular expressions provide a declarative language to match patterns within strings. They are commonly used for string validation, parsing, and transformation. Specify the language (PHP, Python, etc) or tool (grep, VS Code, Google Analytics, etc) that you are using. Do not post questions asking for an explanation of what a symbol means or what a particular regular expression will match.

IMPORTANT NOTE: Requests to explain a regular expression pattern or construct will be closed as duplicates of the canonical post What does this regex mean which contains a lot of details on regular expression constructs. The post also contains links to many popular online regular expression testers (where the meanings of regex constructs can be found). One such tool is Regex101.

Regular expressions are a powerful formalism for pattern matching in strings. They are available in a variety of dialects (also known as flavors) in a number of programming languages and text-processing tools, as well as many specialized applications. The term "Regular expression" is typically abbreviated as "RegEx" or "regex".

Before asking a question here, please take the time to review the following brief guidelines.

How To Ask

Specify what tool or language you are using

Regexes are everywhere. Different languages like Python, PHP and Java all use regexes, but with minor differences. Many different tools use regexes as well, from grep to most text editors to Google Analytics, also with their own differences. Specify the tool or language in your question, and ideally add the corresponding tag to your question. (Perhaps see also Why are there so many different regular expression dialects?)
Be clear about what you need.

Keep in mind that regex dialects are different; the lowest common denominator will usually be quite different from what is possible and recommended for a tool with a modern, souped-up regex engine. (See previous section.)

Also, are you looking for a regular expression for input validation (which needs to be rather strict), or do you need one for information extraction (which can be somewhat relaxed)?

If your question relates to regular expressions in the strict computer science/automata theory sense, please state this explicitly.

For most other questions, you should always include sample input, expected output, and an outline of what you have tried, and where you are stuck. Often, an example of what you do not want to match is also very helpful to include.
Show us what you tried.

A link to one of the many online regex testing tools (see link section) with your attempt and some representative data can do wonders.

However, keep in mind, again, that there are many different regular expression dialects. (See earlier bullet points.) A result from an online tool for JavaScript or PHP does not necessarily work in Python or Java or sed or Awk or ... what have you.

Even if you cannot post your problem online, showing us your best attempt helps us focus on what you need help with.
Search for duplicates.

Before posting, check if your issue has already been solved by somebody else asking something similar. See also the following section.

Avoid Common Problems and Pitfalls

There are some common recurring beginner topics.

Do not assume that the tool you are using supports precisely the syntax of another tool.

While modern Perl / Ruby / Python / PHP / Java regular expression support is widespread, you cannot assume that it is universal. In particular, many older tools (Awk, sed, grep, lex, etc.), as well as some newer ones (JavaScript, many text editors), use different dialects, some of which do not necessarily support e.g. non-capturing parentheses (?:...), non-greedy quantifiers *?, backreferences (\1, \2, etc), common character class abbreviations (\t, \d, \b or \</\> word boundaries, POSIX character classes [[:class:]]), arbitrary repetition {m,n}, lookaheads (?=...), (?<=...), (?!...), etc. etc.

If your question is not specific to any particular implementation, try the language-agnostic tag. This will generally imply a fairly minimal set of operators, corresponding to the ones specified in the common mathematical definition of regular languages.
Understand the difference between "glob" expressions and true regular expressions.

Glob patterns are a less potent pattern matching language, which is commonly used for file name wildcards. In glob, * means "anything", while a lone * in a regular expression is, in fact, a syntax error in some dialects (though many engines will silently ignore it, rather than issue a warning; and others still will see it as a literal *).

For the record, the regex way to say (as much as possible of) "anything" is .* where the "any single character (except newline, usually)" . metacharacter is repeated zero or more times (*). But see below about how "any character" and greediness is sometimes problematic.

See also What are the differences between glob-style patterns and regular expressions?
Backslashes can be gnarly

In many languages like C, Javascript, Python, etc, the host language uses strings to represent regular expressions. This leads to a situation where any literal backslash that should make it to the regular expression engine needs to be doubled, because that's how you represent a backslash in a string.

(Python, for its part, offers "raw strings" r"..." in which a backslash is just a literal backslash.)

See also Why do strings given to the RegExp constructor need to be double escaped? (This particular question is about Javascript, but the answers readily apply to other languages with the same mechanism.)

Conversely, some other languages like sed, Awk, Perl, Ruby etc have a native syntax for regular expressions. Commonly, this turns the slash into a syntactic delimiter, so that you then need to backslash-escape any literal slash (though many of these languages also let you use a different delimiter, like in sed you can say s%value/with/slashes%another/one% ... and in the meantime many beginners needlessly backslash-escape literal slashes where they don't have to).
Specifying a single repetition is unnecessary.

Using {1} as a single-repetition quantifier is harmless but never useful. It is basically an indication of inexperience and/or confusion.

h{1}t{1}t{1}p{1} matches the same string as the simpler expression http (or ht{2}p for that matter) but as you can see, the redundant {1} repetitions only make it harder to read.
Square brackets are commonly misunderstood or misused.

Beginners often attempt to use square brackets for everything, including grouping. While [Jun][Jul] may look like a regex for matching months, it actually matches JJ, Ju, Jl, uJ, uu, ul, nJ, nu, or nl; not Jun or Jul. [Jun|Jul] is a wasteful way to write the functionally identical [|Junl] — it matches any one character from the set comprising |, J, u, l, and n.

For the record, [abc] defines a character class which matches a single character which can be a or b or c. The proper way to express alternation is (Jun|Jul|Aug) in many dialects (though BRE and related dialects will need backslashes; $Jun\|Jul\|Aug$ for traditional grep et al.) or, somewhat more parsimoniously, (Ju[nl]|Aug). The round parentheses (as opposed to the square brackets of character classes) perform grouping, and the | operator indicates matching alternatives.

See also What is the difference between square brackets and parentheses in a regex?
Negation is tricky.

Related to the previous, beginners will use negated character classes to attempt to restrict what can be matched. For example, to match turn but not turned, the following does not do what you want: turn[^ed] -- it will match turn followed by any single character which is not e or d (so it will not match turner, for example; and it does not match just turn followed by nothing, because it requires one more character which is not e or d).

In fact, the traditional regex does not allow for this to be expressed easily. With ERE, you could say turn($|[^e]|e$|e[^d]) to say that turn can be followed by nothing, or a character which is not e, or by e if it is not in turn followed by d. Modern regular expression dialects have an extension called lookarounds which allow you to say turn(?!ed)—but make sure your tool supports this syntax before plunging ahead.

Notice also how the character class negation operator is distinct from the beginning of line anchor (^[abc] matches a, b, or c at beginning of the line, whereas [^abc] matches a single character which is not a, b, or c).

See also the next bullet point.
If there is a way to match, the engine will find it.

A common beginner's mistake is to supply useless optional leading or trailing elements. The trailing s? in dogs? does nothing to prevent a match on doggone or endogenous. If you want to prevent those, you will need to elaborate—perhaps something like dogs?\> (provided your dialect supports the final word boundary operator and provided that's what you mean).

As it is, the regular expression dogs? will match exactly the same strings as just dog (though if your application captures the match, only the former will capture a trailing s if there is one).
Matches are greedy.

The regex a.*b will match the entire string "abbbbbb" because * will always match as much as possible. Say a[^ab]*b if that's what you mean, or use non-greedy matching if your dialect supports it.
Watch what you capture

If you use grouping parentheses, the parentheses define what is captured into a backreference. If you edit in parentheses for grouping purposes, make sure you are not renumbering your backreferences.

Also, in particular, watch out for (abc){2,3} which only captures the last occurrence of abc in the matched string. If you want the repetition to be part of the capture, it needs to be inside the parentheses, like this: ((abc){2,3})
Don't use regex for everything!

In particular, using (typically line-oriented) traditional regex tools to handle structured formats like HTML, XML, JSON, configuration files with block structure (Apache, nginx, many name servers, etc.) is likely to fail, or to produce incorrect results in numerous corner cases.

Asking for HTML regexes tends to be met with negative reactions. The reasoning extends to all structured formats. If there is a parser for it, use that instead.

Basic concept of how RegularExpression parsing works
Wikipedia entry on regular expressions
Regular-Expressions.info (informative website for learning regular expressions)
RexEgg (a regular expressions tutorial that goes deep into advanced features)
RegexOne ("learn regular expressions with simple, interactive examples")
Learn Regex The Hard Way (Online book, new version is in planning phase)
From realpython.com:
- Regular Expressions: Regexes in Python (Part 1) - Long Read
- Regular Expressions: Regexes in Python (Part 2) - Long Read
- Regular Expressions and Building Regexes in Python - Video Course
DataCamp: Python Regular Expression Tutorial - Long Read

Books

Documentation for JavaScript

Online sandboxes (for testing and publishing regexes online)

RegexPlanet (supports a variety of flavors to choose from)
Regexpal (ECMAScript flavor, as implemented by JavaScript)
Regexhero (.NET flavor)
RegexStorm.net (.NET flavor with link sharing capability)
RegExr v2.1 (in JavaScript)
RegExr v1.0 (ECMAScript flavor, as implemented by Adobe Flash)
Rubular (Ruby flavor)
myregexp.com (Java-applet with source code)
regexe.com (German; probably Java flavor)
regex101 (in ECMAScript (JavaScript), Python, PHP (PCRE 16-bit), Golang, Java, generates explanation of pattern)
regexper.com (generates graphical representation for ECMAScript flavor)
debuggex (generates graphical representation and shows processing of pattern – JavaScript, Python, and PCRE-compatible)
pyregex.com (Web validator for Python regular expressions)
regviz.org (Visual debugging of regular expressions for JavaScript)
Ultrapico Expresso (a standalone tool for testing .NET regular expressions)
Pythex (Quick way to test your Python regular expressions)

Online Regex generator (for building Regular Expressions via simplified input)

Regex Numeric Range Generator (enter a min and max and receive the Regex for it)

Regex Uses:

Regular expressions are useful in a wide variety of text processing tasks, and more generally string processing, where the data need not be textual. Common applications include data validation, data scraping (especially web scraping), data wrangling, simple parsing, the production of syntax highlighting systems, and many other tasks.

While regular expressions would be useful on Internet search engines, processing them across the entire database could consume excessive computer resources depending on the complexity and design of the regex. Although in many cases system administrators can run regex-based queries internally, most search engines do not offer regex support to the public. Notable exceptions: searchcode, or previously Google Code Search, which has been shut down in 2012.
Google also offers re2 (a C++ a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python): it does not backtrack and guarantees linear runtime growth with input size.

History Excerpt history

Synonyms

regexp regular-expression regular-expressions regularexpression perl-regex more »

Stats

created	15 years, 11 months ago
viewed	27178 times
active	20 days ago
editors	84

Recent Hot Answers

Regex match a string that is either seperated entirely by commas or seperated entirely by semicolons

Regex match a string that is either seperated entirely by commas or seperated entirely by semicolons

Append new line if '\n' appeared in API response

Using REGEXP or REGEXP_SUBSTR in an MariaDB query to extract all of the image source links

Nested named regex groups: how to maintain the nested structure in match result?

more »

Collectives™ on Stack Overflow

About

How To Ask

Avoid Common Problems and Pitfalls

Further Reading