Regular Expressions for Beginners: A Practical Guide with Examples

What Are Regular Expressions?

Regular expressions (regex or regexp) are sequences of characters that define search patterns. They are one of the most powerful tools in a developer’s toolkit — used for text matching, validation, extraction, and replacement across virtually every programming language and text editor.

While regex has a reputation for being cryptic, understanding the basics is surprisingly straightforward. This guide will take you from zero to confident, covering the most useful patterns with practical examples you can apply immediately.

Basic Building Blocks

Literal Characters

The simplest regex is just a literal string. The pattern hello matches the text “hello” wherever it appears. Most characters match themselves — the complexity comes from special characters (metacharacters) that have special meanings.

Metacharacters

These characters have special meanings in regex:

  • . — Matches any single character (except newline)
  • ^ — Matches the start of a string
  • $ — Matches the end of a string
  • * — Matches 0 or more of the preceding element
  • + — Matches 1 or more of the preceding element
  • ? — Matches 0 or 1 of the preceding element (makes it optional)
  • | — Alternation (OR): cat|dog matches “cat” or “dog”
  • \ — Escapes a metacharacter to match it literally

Character Classes

Square brackets define a set of characters to match:

  • [abc] — Matches “a”, “b”, or “c”
  • [a-z] — Matches any lowercase letter
  • [A-Z0-9] — Matches any uppercase letter or digit
  • [^abc] — Matches any character except a, b, or c (negation)

Shorthand Classes

Common character classes have shortcuts:

  • \d — Any digit (equivalent to [0-9])
  • \w — Any word character (letters, digits, underscore: [a-zA-Z0-9_])
  • \s — Any whitespace character (space, tab, newline)
  • \D, \W, \S — The negated versions (non-digit, non-word, non-whitespace)

Quantifiers: How Many to Match

Quantifiers specify how many times a pattern should repeat:

  • {3} — Exactly 3 times
  • {2,5} — Between 2 and 5 times
  • {3,} — 3 or more times
  • * — 0 or more (same as {0,})
  • + — 1 or more (same as {1,})
  • ? — 0 or 1 (same as {0,1})

Groups and Capturing

Parentheses () create groups that serve two purposes: grouping for quantifiers and capturing matched text for later use.

// Pattern: (https?://)(\w+\.)+\w+
// Matches URLs like https://www.example.com
// Group 1 captures: "https://"
// Group 2 captures: "example." (last match of the repeating group)

Use (?:...) for non-capturing groups when you need grouping but do not need to capture the contents.

Practical Examples

Validate an Email Address

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This matches standard email formats. Note: fully validating emails per RFC 5322 requires a much more complex pattern — for production use, it is better to send a confirmation email.

Extract Phone Numbers

\+?\d{1,3}[\s-]?\(?\d{1,4}\)?[\s-]?\d{3,4}[\s-]?\d{3,4}

Matches international formats like +1 (555) 123-4567 and +49 30 12345678.

Find URLs in Text

https?://[^\s<>"']+

A simple pattern that captures most URLs by matching the protocol followed by any non-whitespace characters.

Match an IPv4 Address

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

Matches patterns like 192.168.1.1. For strict validation (ensuring each octet is 0-255), you would need a more detailed pattern.

Extract HTML Tags

<(\w+)[^>]*>(.*?)</\1>

Matches opening and closing HTML tags with content. The \1 is a backreference to the first capture group, ensuring the closing tag matches the opening tag. Important caveat: regex is generally not suitable for parsing nested HTML — use a proper HTML parser for complex documents.

Common Pitfalls

  • Greedy vs. lazy matching — By default, quantifiers are greedy (match as much as possible). Add ? after a quantifier to make it lazy: .*? matches as little as possible.
  • Escaping special characters — To match a literal dot, use \. not .. A bare . matches any character.
  • Anchoring — Without ^ and $, your pattern matches anywhere in the string. Use anchors when validating entire strings.
  • Catastrophic backtracking — Nested quantifiers like (a+)+ can cause exponential processing time. Always test patterns with edge cases.

Regex in Different Languages

While the core syntax is similar, regex implementations vary between languages:

  • JavaScript/pattern/flags literal syntax or new RegExp(). Supports g (global), i (case-insensitive), m (multiline), s (dotall), and u (unicode) flags.
  • Pythonre module with re.match(), re.search(), re.findall(), and re.sub().
  • PHP — PCRE functions: preg_match(), preg_replace(), preg_match_all().

Practice Makes Perfect

The best way to learn regex is by experimenting. Our Regex Tester lets you write patterns, test them against sample text in real time, and see matches highlighted instantly — all in your browser with no setup required.