What Are Regular Expressions?
Regular expressions (regex or regexp) are sequences of characters that define search patterns. They are one of the most powerful tools in a developer’s toolkit — used for text matching, validation, extraction, and replacement across virtually every programming language and text editor.
While regex has a reputation for being cryptic, understanding the basics is surprisingly straightforward. This guide will take you from zero to confident, covering the most useful patterns with practical examples you can apply immediately.
Basic Building Blocks
Literal Characters
The simplest regex is just a literal string. The pattern hello matches the text “hello” wherever it appears. Most characters match themselves — the complexity comes from special characters (metacharacters) that have special meanings.
Metacharacters
These characters have special meanings in regex:
.— Matches any single character (except newline)^— Matches the start of a string$— Matches the end of a string*— Matches 0 or more of the preceding element+— Matches 1 or more of the preceding element?— Matches 0 or 1 of the preceding element (makes it optional)|— Alternation (OR):cat|dogmatches “cat” or “dog”\— Escapes a metacharacter to match it literally
Character Classes
Square brackets define a set of characters to match:
[abc]— Matches “a”, “b”, or “c”[a-z]— Matches any lowercase letter[A-Z0-9]— Matches any uppercase letter or digit[^abc]— Matches any character except a, b, or c (negation)
Shorthand Classes
Common character classes have shortcuts:
\d— Any digit (equivalent to[0-9])\w— Any word character (letters, digits, underscore:[a-zA-Z0-9_])\s— Any whitespace character (space, tab, newline)\D,\W,\S— The negated versions (non-digit, non-word, non-whitespace)
Quantifiers: How Many to Match
Quantifiers specify how many times a pattern should repeat:
{3}— Exactly 3 times{2,5}— Between 2 and 5 times{3,}— 3 or more times*— 0 or more (same as{0,})+— 1 or more (same as{1,})?— 0 or 1 (same as{0,1})
Groups and Capturing
Parentheses () create groups that serve two purposes: grouping for quantifiers and capturing matched text for later use.
// Pattern: (https?://)(\w+\.)+\w+
// Matches URLs like https://www.example.com
// Group 1 captures: "https://"
// Group 2 captures: "example." (last match of the repeating group)
Use (?:...) for non-capturing groups when you need grouping but do not need to capture the contents.
Practical Examples
Validate an Email Address
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This matches standard email formats. Note: fully validating emails per RFC 5322 requires a much more complex pattern — for production use, it is better to send a confirmation email.
Extract Phone Numbers
\+?\d{1,3}[\s-]?\(?\d{1,4}\)?[\s-]?\d{3,4}[\s-]?\d{3,4}
Matches international formats like +1 (555) 123-4567 and +49 30 12345678.
Find URLs in Text
https?://[^\s<>"']+
A simple pattern that captures most URLs by matching the protocol followed by any non-whitespace characters.
Match an IPv4 Address
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
Matches patterns like 192.168.1.1. For strict validation (ensuring each octet is 0-255), you would need a more detailed pattern.
Extract HTML Tags
<(\w+)[^>]*>(.*?)</\1>
Matches opening and closing HTML tags with content. The \1 is a backreference to the first capture group, ensuring the closing tag matches the opening tag. Important caveat: regex is generally not suitable for parsing nested HTML — use a proper HTML parser for complex documents.
Common Pitfalls
- Greedy vs. lazy matching — By default, quantifiers are greedy (match as much as possible). Add
?after a quantifier to make it lazy:.*?matches as little as possible. - Escaping special characters — To match a literal dot, use
\.not.. A bare.matches any character. - Anchoring — Without
^and$, your pattern matches anywhere in the string. Use anchors when validating entire strings. - Catastrophic backtracking — Nested quantifiers like
(a+)+can cause exponential processing time. Always test patterns with edge cases.
Regex in Different Languages
While the core syntax is similar, regex implementations vary between languages:
- JavaScript —
/pattern/flagsliteral syntax ornew RegExp(). Supportsg(global),i(case-insensitive),m(multiline),s(dotall), andu(unicode) flags. - Python —
remodule withre.match(),re.search(),re.findall(), andre.sub(). - PHP — PCRE functions:
preg_match(),preg_replace(),preg_match_all().
Practice Makes Perfect
The best way to learn regex is by experimenting. Our Regex Tester lets you write patterns, test them against sample text in real time, and see matches highlighted instantly — all in your browser with no setup required.