Regex Fundamentals: Complete Beginner's Guide to Regular Expressions
Master regular expressions from scratch with comprehensive examples, practical exercises, and real-world applications. Perfect for developers new to regex.
Regex Fundamentals: Complete Beginner's Guide to Regular Expressions
Regular expressions (regex) are one of the most powerful tools in a developer's toolkit. They provide a concise and flexible way to search, match, and manipulate text patterns. This comprehensive guide will take you from complete beginner to confident regex user with practical examples and hands-on exercises.
Table of Contents
- What are Regular Expressions?
- Basic Regex Syntax and Characters
- Character Classes and Ranges
- Quantifiers: Controlling Repetition
- Anchors and Boundaries
- Groups and Capturing
- Advanced: Lookahead and Lookbehind
- Practical Examples and Use Cases
- Common Regex Patterns Library
- Debugging and Testing Regex
- Best Practices and Performance Tips
What are Regular Expressions?
Regular expressions are patterns used to match character combinations in strings. Think of them as a search language that allows you to:
- Find specific patterns in text (like email addresses or phone numbers)
- Validate user input (ensuring data meets specific formats)
- Replace or extract parts of strings based on patterns
- Split strings at complex boundaries
Real-World Examples
Before diving into syntax, let's see regex in action:
// Find all email addresses in text
const text = "Contact us at support@example.com or sales@company.org";
const emailPattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
const emails = text.match(emailPattern);
console.log(emails); // ["support@example.com", "sales@company.org"]
// Validate phone number format
const phonePattern = /^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
console.log(phonePattern.test("(555) 123-4567")); // true
console.log(phonePattern.test("not-a-phone")); // false
// Replace multiple spaces with single space
const messyText = "Too many spaces";
const cleanText = messyText.replace(/\s+/g, " ");
console.log(cleanText); // "Too many spaces"
Basic Regex Syntax and Characters
Literal Characters
The simplest regex patterns match literal characters:
// Pattern: cat
// Matches: "cat", "catch", "scattered" (anywhere "cat" appears)
// Doesn't match: "Car", "CAT" (case sensitive by default)
const pattern = /cat/;
console.log(pattern.test("I have a cat")); // true
console.log(pattern.test("Dog lover")); // false
Special Characters (Metacharacters)
These characters have special meanings in regex:
| Character | Meaning | Example |
|---|---|---|
. | Matches any single character (except newline) | c.t matches "cat", "cut", "c@t" |
* | Zero or more of the preceding character | ca*t matches "ct", "cat", "caat" |
+ | One or more of the preceding character | ca+t matches "cat", "caat" but not "ct" |
? | Zero or one of the preceding character | ca?t matches "ct" and "cat" |
^ | Start of string | ^cat matches "cat" only at the beginning |
$ | End of string | cat$ matches "cat" only at the end |
Escaping Special Characters
To match special characters literally, escape them with a backslash:
// To match a literal dot
const pattern = /3\.14/; // Matches "3.14"
// To match a literal question mark
const questionPattern = /How are you\?/; // Matches "How are you?"
// To match a literal backslash
const pathPattern = /C:\\Users/; // Matches "C:\Users"
Flags
Flags modify how the pattern matching works:
// Case insensitive matching
const caseInsensitive = /cat/i;
console.log(caseInsensitive.test("CAT")); // true
// Global matching (find all matches)
const text = "cat and cat and cat";
console.log(text.match(/cat/)); // ["cat"] - first match only
console.log(text.match(/cat/g)); // ["cat", "cat", "cat"] - all matches
// Multiline mode
const multiline = /^start/m; // ^ matches start of any line, not just string
// Common flag combinations
const emailPattern = /[a-z]+@[a-z]+\.[a-z]+/gi; // Case insensitive + global
Character Classes and Ranges
Basic Character Classes
Character classes match any character from a specific set:
// Match any vowel
const vowels = /[aeiou]/;
console.log(vowels.test("hello")); // true (matches 'e')
// Match any digit
const digits = /[0123456789]/;
// Shorthand: /[0-9]/
console.log(digits.test("abc123")); // true
// Match any letter
const letters = /[a-zA-Z]/;
console.log(letters.test("123abc")); // true
Predefined Character Classes
Common character classes have shortcuts:
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d | [0-9] | Any digit |
\w | [a-zA-Z0-9_] | Any word character |
\s | [ \t\n\r\f] | Any whitespace |
\D | [^0-9] | Any non-digit |
\W | [^a-zA-Z0-9_] | Any non-word character |
\S | [^\s] | Any non-whitespace |
// Practical examples
const phoneDigits = /\d{3}-\d{3}-\d{4}/; // 123-456-7890
const wordBoundary = /\bcat\b/; // "cat" as whole word
const noSpaces = /\S+/; // Any non-whitespace sequence
// Combining character classes
const alphanumeric = /[a-zA-Z0-9]+/; // Letters and numbers
const notVowels = /[^aeiou]/; // Anything except vowels
Negated Character Classes
Use ^ at the start of a character class to negate it:
// Match anything except digits
const notDigits = /[^0-9]/;
console.log(notDigits.test("abc")); // true
console.log(notDigits.test("123")); // false
// Match anything except vowels
const consonants = /[^aeiouAEIOU]/;
console.log(consonants.test("hello")); // true (matches 'h')
Quantifiers: Controlling Repetition
Basic Quantifiers
| Quantifier | Meaning | Example |
|---|---|---|
* | 0 or more | ab*c matches "ac", "abc", "abbbbc" |
+ | 1 or more | ab+c matches "abc", "abbbbc" but not "ac" |
? | 0 or 1 (optional) | colou?r matches "color" and "colour" |
Specific Quantifiers
Use curly braces to specify exact repetition counts:
// Exact count: {n}
const exactThree = /a{3}/; // Matches "aaa"
// Range: {min,max}
const twoToFour = /a{2,4}/; // Matches "aa", "aaa", "aaaa"
// Minimum: {min,}
const twoOrMore = /a{2,}/; // Matches "aa", "aaa", "aaaa", etc.
// Practical examples
const zipCode = /\d{5}/; // Exactly 5 digits
const zipPlus4 = /\d{5}-\d{4}/; // 12345-6789
const phoneNumber = /\d{3}-\d{3}-\d{4}/; // 123-456-7890
// Password requirements
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/;
// At least 8 chars with lowercase, uppercase, and digit
Greedy vs. Non-Greedy Quantifiers
By default, quantifiers are greedy (match as much as possible):
// Greedy matching
const text = "HelloWorld";
const greedyPattern = /.*<\/div>/;
console.log(text.match(greedyPattern)[0]);
// Result: "HelloWorld" (matches everything!)
// Non-greedy (lazy) matching - add ? after quantifier
const lazyPattern = /.*?<\/div>/;
console.log(text.match(lazyPattern)[0]);
// Result: "Hello" (stops at first closing tag)
// More examples
const greedyQuotes = /".*"/; // Matches from first " to last "
const lazyQuotes = /".*?"/; // Matches individual quoted strings
Anchors and Boundaries
String Anchors
// ^ - Start of string
const startsWithThe = /^The/;
console.log(startsWithThe.test("The quick brown fox")); // true
console.log(startsWithThe.test("A quick brown fox")); // false
// $ - End of string
const endsWithFox = /fox$/;
console.log(endsWithFox.test("The quick brown fox")); // true
console.log(endsWithFox.test("The quick brown fox jumps")); // false
// Combining anchors for exact match
const exactMatch = /^hello$/;
console.log(exactMatch.test("hello")); // true
console.log(exactMatch.test("hello world")); // false
Word Boundaries
// \b - Word boundary (between word and non-word character)
const wholeWord = /\bcat\b/;
console.log(wholeWord.test("cat")); // true
console.log(wholeWord.test("catch")); // false
console.log(wholeWord.test("the cat")); // true
console.log(wholeWord.test("scattered")); // false
// \B - Non-word boundary
const notWordBoundary = /\Bcat\B/;
console.log(notWordBoundary.test("scattered")); // true
console.log(notWordBoundary.test("cat")); // false
// Practical use: Find whole words only
function findWholeWord(text, word) {
const pattern = new RegExp(`\\b${word}\\b`, 'gi');
return text.match(pattern) || [];
}
console.log(findWholeWord("The cat caught a rat", "cat")); // ["cat"]
console.log(findWholeWord("The cat caught a rat", "at")); // []
Groups and Capturing
Basic Groups
Parentheses create groups for applying quantifiers or capturing matches:
// Group with quantifier
const repeatedGroup = /(ab)+/;
console.log(repeatedGroup.test("ababab")); // true
// Without groups (wrong!)
const wrongPattern = /ab+/; // This means "a" followed by one or more "b"
console.log(wrongPattern.test("ababab")); // true, but matches different pattern
// Alternation in groups
const colorPattern = /(red|blue|green)/;
console.log(colorPattern.test("I like blue")); // true
Capturing Groups
Groups capture matched content that you can extract:
// Extract parts of a date
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2023-12-25".match(datePattern);
console.log(match[0]); // "2023-12-25" (full match)
console.log(match[1]); // "2023" (first group - year)
console.log(match[2]); // "12" (second group - month)
console.log(match[3]); // "25" (third group - day)
// Destructuring for cleaner code
const [fullMatch, year, month, day] = "2023-12-25".match(datePattern);
console.log(`Year: ${year}, Month: ${month}, Day: ${day}`);
// Extract name parts
const namePattern = /(\w+)\s+(\w+)/;
const [, firstName, lastName] = "John Doe".match(namePattern);
console.log(`First: ${firstName}, Last: ${lastName}`);
Non-Capturing Groups
Use (?:) when you need grouping but don't want to capture:
// Non-capturing group
const pattern = /(?:Mr|Mrs|Ms)\.\s+(\w+)/;
const match = "Mr. Smith".match(pattern);
console.log(match[1]); // "Smith" (only the name is captured, not the title)
// Why use non-capturing groups?
// 1. Better performance (no memory allocation for capture)
// 2. Cleaner captured groups array
// 3. Still allows quantifiers and alternation
Named Capturing Groups
Modern JavaScript supports named groups for better readability:
// Named groups syntax: (?pattern)
const phonePattern = /(?\d{3})-(?\d{3})-(?\d{4})/;
const match = "555-123-4567".match(phonePattern);
console.log(match.groups.area); // "555"
console.log(match.groups.exchange); // "123"
console.log(match.groups.number); // "4567"
// Destructuring named groups
const { area, exchange, number } = "555-123-4567".match(phonePattern).groups;
console.log(`(${area}) ${exchange}-${number}`);
Advanced: Lookahead and Lookbehind
Positive Lookahead (?=)
Match something followed by something else, without including the "something else":
// Find "Java" only when followed by "Script"
const javaScriptOnly = /Java(?=Script)/;
console.log(javaScriptOnly.test("JavaScript")); // true
console.log(javaScriptOnly.test("Java")); // false
// Password validation: must contain a digit
const hasDigit = /^(?=.*\d).{6,}$/;
console.log(hasDigit.test("password123")); // true
console.log(hasDigit.test("password")); // false
// Complex password validation
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;
// Must have: lowercase, uppercase, digit, special char, min 8 chars
Negative Lookahead (?!)
Match something NOT followed by something else:
// Find "Java" not followed by "Script"
const notJavaScript = /Java(?!Script)/;
console.log(notJavaScript.test("Java programming")); // true
console.log(notJavaScript.test("JavaScript")); // false
// Find words that don't end with "ing"
const notEndingWithIng = /\b\w+(?!ing\b)/;
console.log("running walking sit".match(notEndingWithIng)); // ["sit"]
Positive Lookbehind (?<=)
Match something preceded by something else:
// Find numbers preceded by "$"
const dollarAmounts = /(?<=\$)\d+/;
console.log("$100 and 50 euros".match(dollarAmounts)); // ["100"]
// Extract filenames from paths
const filename = /(?<=\/)[^/]+$/;
console.log("/path/to/file.txt".match(filename)[0]); // "file.txt"
Negative Lookbehind (?
Match something NOT preceded by something else:
// Find numbers not preceded by "$"
const notDollars = /(?
Practical Examples and Use Cases
Form Validation
class FormValidator {
static email(email) {
const pattern = /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;
return pattern.test(email);
}
static phone(phone) {
// Supports various formats: (555) 123-4567, 555-123-4567, 555.123.4567
const pattern = /^\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$/;
return pattern.test(phone);
}
static password(password) {
// At least 8 chars, one uppercase, one lowercase, one digit
const pattern = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d@$!%*?&]{8,}$/;
return pattern.test(password);
}
static creditCard(card) {
// Remove spaces and hyphens
const cleaned = card.replace(/[\s-]/g, '');
// Basic format check (13-19 digits)
const pattern = /^\d{13,19}$/;
if (!pattern.test(cleaned)) return false;
// Luhn algorithm check
return this.luhnCheck(cleaned);
}
static luhnCheck(cardNumber) {
let sum = 0;
let alternate = false;
for (let i = cardNumber.length - 1; i >= 0; i--) {
let n = parseInt(cardNumber.charAt(i), 10);
if (alternate) {
n *= 2;
if (n > 9) n = (n % 10) + 1;
}
sum += n;
alternate = !alternate;
}
return sum % 10 === 0;
}
}
// Usage
console.log(FormValidator.email("user@example.com")); // true
console.log(FormValidator.phone("(555) 123-4567")); // true
console.log(FormValidator.password("MyPass123")); // true
Text Processing
class TextProcessor {
// Extract all URLs from text
static extractURLs(text) {
const urlPattern = /https?:\/\/(?:[-\w.])+(?:[:\d]+)?(?:\/(?:[\w\/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:[\w.])*)?)?/g;
return text.match(urlPattern) || [];
}
// Extract hashtags from social media text
static extractHashtags(text) {
const hashtagPattern = /#[a-zA-Z0-9_]+/g;
return text.match(hashtagPattern) || [];
}
// Extract mentions (@username)
static extractMentions(text) {
const mentionPattern = /@[a-zA-Z0-9_]+/g;
return text.match(mentionPattern) || [];
}
// Clean up extra whitespace
static normalizeWhitespace(text) {
return text
.replace(/\s+/g, ' ') // Multiple spaces to single space
.replace(/^\s+|\s+$/g, ''); // Trim start and end
}
// Convert text to title case
static toTitleCase(text) {
return text.replace(/\b\w+/g, (word) => {
return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
});
}
// Extract file extensions
static getFileExtension(filename) {
const match = filename.match(/\.([^.]+)$/);
return match ? match[1].toLowerCase() : null;
}
// Mask sensitive information
static maskEmail(email) {
return email.replace(/(.)(.*)(.@.*)/, (match, first, middle, domain) => {
return first + '*'.repeat(middle.length) + domain;
});
}
static maskPhone(phone) {
return phone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) ***-$3');
}
}
// Usage examples
const socialText = "Check out https://example.com #coding @john_doe and #javascript";
console.log(TextProcessor.extractURLs(socialText)); // ["https://example.com"]
console.log(TextProcessor.extractHashtags(socialText)); // ["#coding", "#javascript"]
console.log(TextProcessor.extractMentions(socialText)); // ["@john_doe"]
console.log(TextProcessor.toTitleCase("hello world")); // "Hello World"
console.log(TextProcessor.maskEmail("john.doe@example.com")); // "j********@example.com"
Data Parsing and Extraction
class DataExtractor {
// Parse CSV line with quoted fields
static parseCSVLine(line) {
const csvPattern = /,(?=(?:[^"]*"[^"]*")*[^"]*$)/;
return line.split(csvPattern).map(field => {
// Remove surrounding quotes and unescape internal quotes
return field.replace(/^"(.*)"$/, '$1').replace(/""/g, '"');
});
}
// Extract dates in various formats
static extractDates(text) {
const datePatterns = [
/\b(\d{1,2})\/(\d{1,2})\/(\d{4})\b/g, // MM/DD/YYYY
/\b(\d{4})-(\d{2})-(\d{2})\b/g, // YYYY-MM-DD
/\b(\w+)\s+(\d{1,2}),?\s+(\d{4})\b/g // Month DD, YYYY
];
const dates = [];
datePatterns.forEach(pattern => {
let match;
while ((match = pattern.exec(text)) !== null) {
dates.push(match[0]);
}
});
return dates;
}
// Parse log file entries
static parseLogEntry(logLine) {
// Common log format: IP - - [timestamp] "request" status size
const logPattern = /^([\d.]+) - - \[([^\]]+)\] "([^"]+)" (\d{3}) (\d+|-)$/;
const match = logLine.match(logPattern);
if (match) {
return {
ip: match[1],
timestamp: match[2],
request: match[3],
status: parseInt(match[4]),
size: match[5] === '-' ? 0 : parseInt(match[5])
};
}
return null;
}
// Extract structured data from text
static extractStructuredData(text) {
const patterns = {
emails: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
phones: /\b\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b/g,
urls: /https?:\/\/[^\s<>"{}|\\^`\[\]]+/g,
ipAddresses: /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
creditCards: /\b(?:\d{4}[\s-]?){3}\d{4}\b/g,
socialSecurity: /\b\d{3}-\d{2}-\d{4}\b/g
};
const extracted = {};
for (const [key, pattern] of Object.entries(patterns)) {
extracted[key] = text.match(pattern) || [];
}
return extracted;
}
}
// Usage
const csvLine = '"John Doe","john@example.com","(555) 123-4567"';
console.log(DataExtractor.parseCSVLine(csvLine));
// ["John Doe", "john@example.com", "(555) 123-4567"]
const textWithDates = "Meeting on 12/25/2023 and follow-up on January 15, 2024";
console.log(DataExtractor.extractDates(textWithDates));
// ["12/25/2023", "January 15, 2024"]
Common Regex Patterns Library
Validation Patterns
const ValidationPatterns = {
// Email validation (comprehensive)
email: /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/,
// Phone numbers (US format)
phoneUS: /^\+?1?[-.\s]?\(?[2-9]\d{2}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/,
// URLs
url: /^https?:\/\/(?:[-\w.])+(?:[:\d]+)?(?:\/(?:[\w\/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:[\w.])*)?)?$/,
// Credit card numbers
creditCard: {
visa: /^4[0-9]{12}(?:[0-9]{3})?$/,
mastercard: /^5[1-5][0-9]{14}$/,
amex: /^3[47][0-9]{13}$/,
discover: /^6(?:011|5[0-9]{2})[0-9]{12}$/
},
// Password strength
password: {
weak: /^.{6,}$/, // At least 6 characters
medium: /^(?=.*[a-zA-Z])(?=.*\d).{8,}$/, // Letters + numbers, 8+
strong: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/, // Upper + lower + digit, 8+
veryStrong: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/ // All types, 8+
},
// Date formats
date: {
mmddyyyy: /^(0?[1-9]|1[012])\/(0?[1-9]|[12][0-9]|3[01])\/(19|20)\d\d$/,
ddmmyyyy: /^(0?[1-9]|[12][0-9]|3[01])\/(0?[1-9]|1[012])\/(19|20)\d\d$/,
yyyymmdd: /^(19|20)\d\d[-.](0?[1-9]|1[012])[-.](0?[1-9]|[12][0-9]|3[01])$/,
iso8601: /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d{3})?Z?$/
},
// Postal codes
postalCode: {
us: /^\d{5}(-\d{4})?$/, // 12345 or 12345-6789
uk: /^[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}$/i, // UK postcode
canada: /^[A-Z]\d[A-Z]\s?\d[A-Z]\d$/i // Canadian postal code
},
// File types
fileExtension: {
image: /\.(jpg|jpeg|png|gif|bmp|svg|webp)$/i,
document: /\.(pdf|doc|docx|txt|rtf|odt)$/i,
video: /\.(mp4|avi|mov|wmv|flv|webm|mkv)$/i,
audio: /\.(mp3|wav|flac|aac|ogg|wma)$/i
}
};
// Usage helper function
function validateInput(input, type, subtype = null) {
let pattern;
if (subtype && ValidationPatterns[type] && ValidationPatterns[type][subtype]) {
pattern = ValidationPatterns[type][subtype];
} else if (ValidationPatterns[type]) {
pattern = ValidationPatterns[type];
} else {
throw new Error(`Unknown validation type: ${type}`);
}
return pattern.test(input);
}
// Examples
console.log(validateInput("user@example.com", "email")); // true
console.log(validateInput("4111111111111111", "creditCard", "visa")); // true
console.log(validateInput("StrongPass123!", "password", "strong")); // true
Debugging and Testing Regex
Regex Testing Function
class RegexTester {
static test(pattern, testCases) {
console.log(`Testing pattern: ${pattern}`);
console.log('='.repeat(50));
const regex = new RegExp(pattern);
const results = {
passed: 0,
failed: 0,
total: testCases.length
};
testCases.forEach((testCase, index) => {
const { input, expected, description } = testCase;
const actual = regex.test(input);
const passed = actual === expected;
if (passed) {
results.passed++;
} else {
results.failed++;
}
const status = passed ? '✓ PASS' : '✗ FAIL';
const expectedStr = expected ? 'should match' : 'should NOT match';
console.log(`${status} - ${description || `Test ${index + 1}`}`);
console.log(` Input: "${input}"`);
console.log(` Expected: ${expectedStr}`);
console.log(` Actual: ${actual ? 'matched' : 'did not match'}`);
if (!passed) {
console.log(` ❌ Expected ${expected}, got ${actual}`);
}
console.log();
});
console.log('Summary:');
console.log(` Passed: ${results.passed}/${results.total}`);
console.log(` Failed: ${results.failed}/${results.total}`);
console.log(` Success Rate: ${(results.passed/results.total*100).toFixed(1)}%`);
return results;
}
// Test with capture groups
static testCaptures(pattern, testString) {
const regex = new RegExp(pattern);
const match = testString.match(regex);
console.log(`Pattern: ${pattern}`);
console.log(`Input: "${testString}"`);
if (match) {
console.log('✓ Match found!');
console.log(`Full match: "${match[0]}"`);
if (match.length > 1) {
console.log('Captured groups:');
for (let i = 1; i < match.length; i++) {
console.log(` Group ${i}: "${match[i]}"`);
}
}
if (match.groups) {
console.log('Named groups:');
for (const [name, value] of Object.entries(match.groups)) {
console.log(` ${name}: "${value}"`);
}
}
} else {
console.log('✗ No match found');
}
}
// Performance testing
static performance(pattern, testString, iterations = 100000) {
const regex = new RegExp(pattern);
console.log(`Performance test: ${iterations} iterations`);
console.log(`Pattern: ${pattern}`);
console.log(`Input: "${testString}"`);
const start = Date.now();
for (let i = 0; i < iterations; i++) {
regex.test(testString);
}
const end = Date.now();
const duration = end - start;
const opsPerSecond = Math.round(iterations / (duration / 1000));
console.log(`Duration: ${duration}ms`);
console.log(`Operations per second: ${opsPerSecond.toLocaleString()}`);
return { duration, opsPerSecond };
}
}
// Example usage
const emailTestCases = [
{ input: "user@example.com", expected: true, description: "Valid email" },
{ input: "invalid.email", expected: false, description: "Missing @ symbol" },
{ input: "user@", expected: false, description: "Missing domain" },
{ input: "@domain.com", expected: false, description: "Missing local part" },
{ input: "user@domain", expected: false, description: "Missing TLD" }
];
RegexTester.test(
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
emailTestCases
);
Common Debugging Techniques
// 1. Break down complex patterns
function debugComplexPattern() {
// Instead of this complex pattern all at once:
const complex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;
// Test each part separately:
const parts = {
hasLowercase: /(?=.*[a-z])/,
hasUppercase: /(?=.*[A-Z])/,
hasDigit: /(?=.*\d)/,
hasSpecial: /(?=.*[!@#$%^&*])/,
minLength: /.{8,}/
};
const testPassword = "MyPass123!";
console.log('Testing password:', testPassword);
for (const [name, pattern] of Object.entries(parts)) {
const result = pattern.test(testPassword);
console.log(`${name}: ${result ? '✓' : '✗'}`);
}
}
// 2. Visualize what your pattern matches
function visualizeMatches(pattern, text) {
const regex = new RegExp(pattern, 'g');
let result;
const matches = [];
while ((result = regex.exec(text)) !== null) {
matches.push({
match: result[0],
start: result.index,
end: result.index + result[0].length
});
}
console.log('Original text:', text);
console.log('Pattern:', pattern);
console.log('Matches found:', matches.length);
matches.forEach((match, i) => {
console.log(`Match ${i + 1}: "${match.match}" at position ${match.start}-${match.end}`);
});
// Highlight matches in text
let highlighted = text;
matches.reverse().forEach(match => {
highlighted = highlighted.slice(0, match.start) +
`[${highlighted.slice(match.start, match.end)}]` +
highlighted.slice(match.end);
});
console.log('Highlighted:', highlighted);
}
// Usage
visualizeMatches(/\d+/g, "I have 5 cats and 10 dogs");
Best Practices and Performance Tips
Performance Optimization
// 1. Compile regex patterns once, reuse many times
// BAD: Creating new regex each time
function slowEmailValidation(emails) {
return emails.filter(email => {
return /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/.test(email);
});
}
// GOOD: Compiled once, reused
const EMAIL_PATTERN = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
function fastEmailValidation(emails) {
return emails.filter(email => EMAIL_PATTERN.test(email));
}
// 2. Use specific patterns instead of broad ones
// BAD: Too broad, potentially slow
const slowPattern = /.*@.*\..*$/;
// GOOD: Specific character classes
const fastPattern = /^[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,}$/;
// 3. Anchor your patterns when possible
// BAD: Can match anywhere in string
const unanchored = /\d{3}-\d{2}-\d{4}/;
// GOOD: Anchored for exact match
const anchored = /^\d{3}-\d{2}-\d{4}$/;
// 4. Use non-capturing groups when you don't need the capture
// BAD: Unnecessary capturing
const capturing = /(https?|ftp):\/\/([\w.-]+)/;
// GOOD: Non-capturing for protocol
const nonCapturing = /(?:https?|ftp):\/\/([\w.-]+)/;
// 5. Be careful with quantifiers
// BAD: Potential catastrophic backtracking
const dangerous = /(a+)+b/;
// GOOD: More specific, linear time
const safe = /a+b/;
Readability and Maintenance
// 1. Use comments and variables for complex patterns
class ReadableRegex {
static createEmailPattern() {
// Break down email pattern into readable parts
const localPart = '[a-zA-Z0-9._%+-]+'; // username part
const domain = '[a-zA-Z0-9.-]+'; // domain name
const tld = '[a-zA-Z]{2,}'; // top-level domain
return new RegExp(`^${localPart}@${domain}\.${tld}$`);
}
static createPhonePattern() {
const areaCode = '\\(?[2-9]\\d{2}\\)?'; // (555) or 555
const exchange = '[2-9]\\d{2}'; // 123 (not starting with 0 or 1)
const number = '\\d{4}'; // 4567
const separator = '[-.\\s]?'; // Optional separator
return new RegExp(`^${areaCode}${separator}${exchange}${separator}${number}$`);
}
}
// 2. Use the 'x' flag for verbose patterns (in languages that support it)
// In JavaScript, use template strings for readability
function createComplexPattern() {
return new RegExp(`
^
(?=.*[a-z]) # at least one lowercase letter
(?=.*[A-Z]) # at least one uppercase letter
(?=.*\d) # at least one digit
(?=.*[!@#$%^&*]) # at least one special character
.{8,} # minimum 8 characters
$
`.replace(/\s+#.*$/gm, '').replace(/\s+/g, ''));
}
// 3. Create a regex library for your application
class AppRegexLibrary {
static patterns = {
email: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
phone: /^\(?[2-9]\d{2}\)?[-.\s]?[2-9]\d{2}[-.\s]?\d{4}$/,
zipCode: /^\d{5}(-\d{4})?$/,
strongPassword: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/
};
static validate(type, input) {
const pattern = this.patterns[type];
if (!pattern) {
throw new Error(`Unknown pattern type: ${type}`);
}
return pattern.test(input);
}
static addPattern(name, pattern) {
this.patterns[name] = pattern;
}
}
// 4. Document your regex patterns
/**
* Email validation pattern
*
* Pattern: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
*
* Explanation:
* ^ - Start of string
* [a-zA-Z0-9._%+-]+ - Local part: letters, digits, and common special chars
* @ - Literal @ symbol
* [a-zA-Z0-9.-]+ - Domain: letters, digits, dots, hyphens
* \. - Literal dot before TLD
* [a-zA-Z]{2,} - TLD: at least 2 letters
* $ - End of string
*
* Examples:
* - Matches: "user@example.com", "test.email+tag@domain.co.uk"
* - Doesn't match: "plainaddress", "@domain.com", "user@"
*/
const EMAIL_PATTERN = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
Security Considerations
// 1. Always validate on the server side
// Client-side regex can be bypassed, so always re-validate on server
// 2. Use whitelist approach for security
// BAD: Trying to block dangerous characters
const blacklistApproach = /[^<>"'&]/; // Can be bypassed
// GOOD: Only allow safe characters
const whitelistApproach = /^[a-zA-Z0-9\s.,!?-]+$/;
// 3. Be aware of ReDoS (Regular Expression Denial of Service)
// BAD: Can cause exponential backtracking
const vulnerable = /(a+)+b/;
// Test with: "a".repeat(25) + "c" - will hang!
// GOOD: Linear time complexity
const safe = /a+b/;
// 4. Sanitize input before and after regex validation
function secureValidation(input) {
// Sanitize input
const sanitized = input.trim().toLowerCase();
// Validate with regex
const isValid = /^[a-z0-9._-]{3,20}$/.test(sanitized);
if (!isValid) {
throw new Error('Invalid input format');
}
// Additional security checks
const blockedWords = ['admin', 'root', 'system'];
if (blockedWords.includes(sanitized)) {
throw new Error('Reserved word not allowed');
}
return sanitized;
}
// 5. Use timeout for regex execution in critical applications
function safeRegexTest(pattern, input, timeout = 1000) {
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
reject(new Error('Regex execution timeout'));
}, timeout);
try {
const result = pattern.test(input);
clearTimeout(timer);
resolve(result);
} catch (error) {
clearTimeout(timer);
reject(error);
}
});
}
Conclusion and Next Steps
Congratulations! You've completed a comprehensive journey through regular expressions. You now understand:
- Basic syntax and metacharacters for building patterns
- Character classes and quantifiers for flexible matching
- Groups and capturing for extracting data
- Advanced features like lookahead and lookbehind
- Practical applications in validation, parsing, and text processing
- Performance and security best practices
Practice Exercises
To solidify your understanding, try these exercises:
- Create a log parser that extracts IP addresses, timestamps, and HTTP status codes from server logs
- Build a markdown parser that identifies headers, links, and code blocks
- Design a data anonymizer that masks sensitive information in text files
- Implement a URL router that matches dynamic URL patterns
- Create a SQL query validator that prevents injection attacks
Further Learning
Continue your regex journey with:
- Language-specific features: Learn regex implementations in your favorite programming languages
- Advanced topics: Recursive patterns, conditional expressions, and regex engines
- Tools and libraries: Explore regex testing tools, debuggers, and specialized libraries
- Real-world projects: Apply regex to actual problems in your development work
Remember: Regular expressions are powerful but should be used judiciously. Sometimes a simple string method or a dedicated parser is more appropriate than a complex regex. The key is knowing when and how to use regex effectively.
Happy pattern matching!