Regex Fundamentals: Complete Beginner's Guide to Regular Expressions

Master regular expressions from scratch with comprehensive examples, practical exercises, and real-world applications. Perfect for developers new to regex.

Regex Fundamentals: Complete Beginner's Guide to Regular Expressions

Regular expressions (regex) are one of the most powerful tools in a developer's toolkit. They provide a concise and flexible way to search, match, and manipulate text patterns. This comprehensive guide will take you from complete beginner to confident regex user with practical examples and hands-on exercises.

What are Regular Expressions?
Basic Regex Syntax and Characters
Character Classes and Ranges
Quantifiers: Controlling Repetition
Anchors and Boundaries
Groups and Capturing
Advanced: Lookahead and Lookbehind
Practical Examples and Use Cases
Common Regex Patterns Library
Debugging and Testing Regex
Best Practices and Performance Tips

What are Regular Expressions?

Regular expressions are patterns used to match character combinations in strings. Think of them as a search language that allows you to:

Find specific patterns in text (like email addresses or phone numbers)
Validate user input (ensuring data meets specific formats)
Replace or extract parts of strings based on patterns
Split strings at complex boundaries

Real-World Examples

Before diving into syntax, let's see regex in action:

// Find all email addresses in text
const text = "Contact us at support@example.com or sales@company.org";
const emailPattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
const emails = text.match(emailPattern);
console.log(emails); // ["support@example.com", "sales@company.org"]

// Validate phone number format
const phonePattern = /^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
console.log(phonePattern.test("(555) 123-4567")); // true
console.log(phonePattern.test("not-a-phone"));     // false

// Replace multiple spaces with single space
const messyText = "Too    many     spaces";
const cleanText = messyText.replace(/\s+/g, " ");
console.log(cleanText); // "Too many spaces"

Basic Regex Syntax and Characters

Literal Characters

The simplest regex patterns match literal characters:

// Pattern: cat
// Matches: "cat", "catch", "scattered" (anywhere "cat" appears)
// Doesn't match: "Car", "CAT" (case sensitive by default)

const pattern = /cat/;
console.log(pattern.test("I have a cat"));      // true
console.log(pattern.test("Dog lover"));        // false

Special Characters (Metacharacters)

These characters have special meanings in regex:

Character	Meaning	Example
`.`	Matches any single character (except newline)	`c.t` matches "cat", "cut", "c@t"
`*`	Zero or more of the preceding character	`ca*t` matches "ct", "cat", "caat"
`+`	One or more of the preceding character	`ca+t` matches "cat", "caat" but not "ct"
`?`	Zero or one of the preceding character	`ca?t` matches "ct" and "cat"
`^`	Start of string	`^cat` matches "cat" only at the beginning
`$`	End of string	`cat$` matches "cat" only at the end

Escaping Special Characters

To match special characters literally, escape them with a backslash:

// To match a literal dot
const pattern = /3\.14/;  // Matches "3.14"

// To match a literal question mark
const questionPattern = /How are you\?/;  // Matches "How are you?"

// To match a literal backslash
const pathPattern = /C:\\Users/;  // Matches "C:\Users"

Flags

Flags modify how the pattern matching works:

// Case insensitive matching
const caseInsensitive = /cat/i;
console.log(caseInsensitive.test("CAT"));  // true

// Global matching (find all matches)
const text = "cat and cat and cat";
console.log(text.match(/cat/));   // ["cat"] - first match only
console.log(text.match(/cat/g));  // ["cat", "cat", "cat"] - all matches

// Multiline mode
const multiline = /^start/m;  // ^ matches start of any line, not just string

// Common flag combinations
const emailPattern = /[a-z]+@[a-z]+\.[a-z]+/gi;  // Case insensitive + global

Character Classes and Ranges

Basic Character Classes

Character classes match any character from a specific set:

// Match any vowel
const vowels = /[aeiou]/;
console.log(vowels.test("hello"));  // true (matches 'e')

// Match any digit
const digits = /[0123456789]/;
// Shorthand: /[0-9]/
console.log(digits.test("abc123"));  // true

// Match any letter
const letters = /[a-zA-Z]/;
console.log(letters.test("123abc"));  // true

Predefined Character Classes

Common character classes have shortcuts:

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\s`	`[ \t\n\r\f]`	Any whitespace
`\D`	`[^0-9]`	Any non-digit
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\S`	`[^\s]`	Any non-whitespace

// Practical examples
const phoneDigits = /\d{3}-\d{3}-\d{4}/;     // 123-456-7890
const wordBoundary = /\bcat\b/;              // "cat" as whole word
const noSpaces = /\S+/;                     // Any non-whitespace sequence

// Combining character classes
const alphanumeric = /[a-zA-Z0-9]+/;        // Letters and numbers
const notVowels = /[^aeiou]/;               // Anything except vowels

Negated Character Classes

Use ^ at the start of a character class to negate it:

// Match anything except digits
const notDigits = /[^0-9]/;
console.log(notDigits.test("abc"));   // true
console.log(notDigits.test("123"));   // false

// Match anything except vowels
const consonants = /[^aeiouAEIOU]/;
console.log(consonants.test("hello")); // true (matches 'h')

Quantifiers: Controlling Repetition

Basic Quantifiers

Quantifier	Meaning	Example
`*`	0 or more	`ab*c` matches "ac", "abc", "abbbbc"
`+`	1 or more	`ab+c` matches "abc", "abbbbc" but not "ac"
`?`	0 or 1 (optional)	`colou?r` matches "color" and "colour"

Specific Quantifiers

Use curly braces to specify exact repetition counts:

// Exact count: {n}
const exactThree = /a{3}/;           // Matches "aaa"

// Range: {min,max}
const twoToFour = /a{2,4}/;          // Matches "aa", "aaa", "aaaa"

// Minimum: {min,}
const twoOrMore = /a{2,}/;           // Matches "aa", "aaa", "aaaa", etc.

// Practical examples
const zipCode = /\d{5}/;             // Exactly 5 digits
const zipPlus4 = /\d{5}-\d{4}/;      // 12345-6789
const phoneNumber = /\d{3}-\d{3}-\d{4}/; // 123-456-7890

// Password requirements
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/;
// At least 8 chars with lowercase, uppercase, and digit

Greedy vs. Non-Greedy Quantifiers

By default, quantifiers are greedy (match as much as possible):

// Greedy matching
const text = "Hello
World";
const greedyPattern = /.*<\/div>/;
console.log(text.match(greedyPattern)[0]);
// Result: "Hello
World" (matches everything!)

// Non-greedy (lazy) matching - add ? after quantifier
const lazyPattern = /.*?<\/div>/;
console.log(text.match(lazyPattern)[0]);
// Result: "Hello" (stops at first closing tag)

// More examples
const greedyQuotes = /".*"/;         // Matches from first " to last "
const lazyQuotes = /".*?"/;          // Matches individual quoted strings

Anchors and Boundaries

String Anchors

// ^ - Start of string
const startsWithThe = /^The/;
console.log(startsWithThe.test("The quick brown fox")); // true
console.log(startsWithThe.test("A quick brown fox"));  // false

// $ - End of string
const endsWithFox = /fox$/;
console.log(endsWithFox.test("The quick brown fox")); // true
console.log(endsWithFox.test("The quick brown fox jumps")); // false

// Combining anchors for exact match
const exactMatch = /^hello$/;
console.log(exactMatch.test("hello"));        // true
console.log(exactMatch.test("hello world"));  // false

Word Boundaries

// \b - Word boundary (between word and non-word character)
const wholeWord = /\bcat\b/;
console.log(wholeWord.test("cat"));         // true
console.log(wholeWord.test("catch"));       // false
console.log(wholeWord.test("the cat"));     // true
console.log(wholeWord.test("scattered"));   // false

// \B - Non-word boundary
const notWordBoundary = /\Bcat\B/;
console.log(notWordBoundary.test("scattered")); // true
console.log(notWordBoundary.test("cat"));       // false

// Practical use: Find whole words only
function findWholeWord(text, word) {
  const pattern = new RegExp(`\\b${word}\\b`, 'gi');
  return text.match(pattern) || [];
}

console.log(findWholeWord("The cat caught a rat", "cat")); // ["cat"]
console.log(findWholeWord("The cat caught a rat", "at"));  // []

Groups and Capturing

Basic Groups

Parentheses create groups for applying quantifiers or capturing matches:

// Group with quantifier
const repeatedGroup = /(ab)+/;
console.log(repeatedGroup.test("ababab")); // true

// Without groups (wrong!)
const wrongPattern = /ab+/;  // This means "a" followed by one or more "b"
console.log(wrongPattern.test("ababab")); // true, but matches different pattern

// Alternation in groups
const colorPattern = /(red|blue|green)/;
console.log(colorPattern.test("I like blue")); // true

Capturing Groups

Groups capture matched content that you can extract:

// Extract parts of a date
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2023-12-25".match(datePattern);

console.log(match[0]); // "2023-12-25" (full match)
console.log(match[1]); // "2023" (first group - year)
console.log(match[2]); // "12" (second group - month)
console.log(match[3]); // "25" (third group - day)

// Destructuring for cleaner code
const [fullMatch, year, month, day] = "2023-12-25".match(datePattern);
console.log(`Year: ${year}, Month: ${month}, Day: ${day}`);

// Extract name parts
const namePattern = /(\w+)\s+(\w+)/;
const [, firstName, lastName] = "John Doe".match(namePattern);
console.log(`First: ${firstName}, Last: ${lastName}`);

Non-Capturing Groups

Use (?:) when you need grouping but don't want to capture:

// Non-capturing group
const pattern = /(?:Mr|Mrs|Ms)\.\s+(\w+)/;
const match = "Mr. Smith".match(pattern);
console.log(match[1]); // "Smith" (only the name is captured, not the title)

// Why use non-capturing groups?
// 1. Better performance (no memory allocation for capture)
// 2. Cleaner captured groups array
// 3. Still allows quantifiers and alternation

Named Capturing Groups

Modern JavaScript supports named groups for better readability:

// Named groups syntax: (?pattern)
const phonePattern = /(?\d{3})-(?\d{3})-(?\d{4})/;
const match = "555-123-4567".match(phonePattern);

console.log(match.groups.area);     // "555"
console.log(match.groups.exchange); // "123"
console.log(match.groups.number);   // "4567"

// Destructuring named groups
const { area, exchange, number } = "555-123-4567".match(phonePattern).groups;
console.log(`(${area}) ${exchange}-${number}`);

Advanced: Lookahead and Lookbehind

Positive Lookahead (?=)

Match something followed by something else, without including the "something else":

// Find "Java" only when followed by "Script"
const javaScriptOnly = /Java(?=Script)/;
console.log(javaScriptOnly.test("JavaScript")); // true
console.log(javaScriptOnly.test("Java"));       // false

// Password validation: must contain a digit
const hasDigit = /^(?=.*\d).{6,}$/;
console.log(hasDigit.test("password123")); // true
console.log(hasDigit.test("password"));    // false

// Complex password validation
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;
// Must have: lowercase, uppercase, digit, special char, min 8 chars

Negative Lookahead (?!)

Match something NOT followed by something else:

// Find "Java" not followed by "Script"
const notJavaScript = /Java(?!Script)/;
console.log(notJavaScript.test("Java programming")); // true
console.log(notJavaScript.test("JavaScript"));       // false

// Find words that don't end with "ing"
const notEndingWithIng = /\b\w+(?!ing\b)/;
console.log("running walking sit".match(notEndingWithIng)); // ["sit"]

Positive Lookbehind (?<=)

Match something preceded by something else:

// Find numbers preceded by "$"
const dollarAmounts = /(?<=\$)\d+/;
console.log("$100 and 50 euros".match(dollarAmounts)); // ["100"]

// Extract filenames from paths
const filename = /(?<=\/)[^/]+$/;
console.log("/path/to/file.txt".match(filename)[0]); // "file.txt"

Negative Lookbehind (?

Match something NOT preceded by something else:

// Find numbers not preceded by "$"
const notDollars = /(?



Practical Examples and Use Cases

Form Validation

class FormValidator {
  static email(email) {
    const pattern = /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;
    return pattern.test(email);
  }
  
  static phone(phone) {
    // Supports various formats: (555) 123-4567, 555-123-4567, 555.123.4567
    const pattern = /^\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})$/;
    return pattern.test(phone);
  }
  
  static password(password) {
    // At least 8 chars, one uppercase, one lowercase, one digit
    const pattern = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d@$!%*?&]{8,}$/;
    return pattern.test(password);
  }
  
  static creditCard(card) {
    // Remove spaces and hyphens
    const cleaned = card.replace(/[\s-]/g, '');
    
    // Basic format check (13-19 digits)
    const pattern = /^\d{13,19}$/;
    
    if (!pattern.test(cleaned)) return false;
    
    // Luhn algorithm check
    return this.luhnCheck(cleaned);
  }
  
  static luhnCheck(cardNumber) {
    let sum = 0;
    let alternate = false;
    
    for (let i = cardNumber.length - 1; i >= 0; i--) {
      let n = parseInt(cardNumber.charAt(i), 10);
      
      if (alternate) {
        n *= 2;
        if (n > 9) n = (n % 10) + 1;
      }
      
      sum += n;
      alternate = !alternate;
    }
    
    return sum % 10 === 0;
  }
}

// Usage
console.log(FormValidator.email("user@example.com")); // true
console.log(FormValidator.phone("(555) 123-4567"));   // true
console.log(FormValidator.password("MyPass123"));     // true

Text Processing

class TextProcessor {
  // Extract all URLs from text
  static extractURLs(text) {
    const urlPattern = /https?:\/\/(?:[-\w.])+(?:[:\d]+)?(?:\/(?:[\w\/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:[\w.])*)?)?/g;
    return text.match(urlPattern) || [];
  }
  
  // Extract hashtags from social media text
  static extractHashtags(text) {
    const hashtagPattern = /#[a-zA-Z0-9_]+/g;
    return text.match(hashtagPattern) || [];
  }
  
  // Extract mentions (@username)
  static extractMentions(text) {
    const mentionPattern = /@[a-zA-Z0-9_]+/g;
    return text.match(mentionPattern) || [];
  }
  
  // Clean up extra whitespace
  static normalizeWhitespace(text) {
    return text
      .replace(/\s+/g, ' ')         // Multiple spaces to single space
      .replace(/^\s+|\s+$/g, '');   // Trim start and end
  }
  
  // Convert text to title case
  static toTitleCase(text) {
    return text.replace(/\b\w+/g, (word) => {
      return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
    });
  }
  
  // Extract file extensions
  static getFileExtension(filename) {
    const match = filename.match(/\.([^.]+)$/);
    return match ? match[1].toLowerCase() : null;
  }
  
  // Mask sensitive information
  static maskEmail(email) {
    return email.replace(/(.)(.*)(.@.*)/, (match, first, middle, domain) => {
      return first + '*'.repeat(middle.length) + domain;
    });
  }
  
  static maskPhone(phone) {
    return phone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) ***-$3');
  }
}

// Usage examples
const socialText = "Check out https://example.com #coding @john_doe and #javascript";
console.log(TextProcessor.extractURLs(socialText));     // ["https://example.com"]
console.log(TextProcessor.extractHashtags(socialText)); // ["#coding", "#javascript"]
console.log(TextProcessor.extractMentions(socialText)); // ["@john_doe"]

console.log(TextProcessor.toTitleCase("hello world")); // "Hello World"
console.log(TextProcessor.maskEmail("john.doe@example.com")); // "j********@example.com"

Data Parsing and Extraction

class DataExtractor {
  // Parse CSV line with quoted fields
  static parseCSVLine(line) {
    const csvPattern = /,(?=(?:[^"]*"[^"]*")*[^"]*$)/;
    return line.split(csvPattern).map(field => {
      // Remove surrounding quotes and unescape internal quotes
      return field.replace(/^"(.*)"$/, '$1').replace(/""/g, '"');
    });
  }
  
  // Extract dates in various formats
  static extractDates(text) {
    const datePatterns = [
      /\b(\d{1,2})\/(\d{1,2})\/(\d{4})\b/g,        // MM/DD/YYYY
      /\b(\d{4})-(\d{2})-(\d{2})\b/g,              // YYYY-MM-DD
      /\b(\w+)\s+(\d{1,2}),?\s+(\d{4})\b/g       // Month DD, YYYY
    ];
    
    const dates = [];
    datePatterns.forEach(pattern => {
      let match;
      while ((match = pattern.exec(text)) !== null) {
        dates.push(match[0]);
      }
    });
    
    return dates;
  }
  
  // Parse log file entries
  static parseLogEntry(logLine) {
    // Common log format: IP - - [timestamp] "request" status size
    const logPattern = /^([\d.]+) - - \[([^\]]+)\] "([^"]+)" (\d{3}) (\d+|-)$/;
    const match = logLine.match(logPattern);
    
    if (match) {
      return {
        ip: match[1],
        timestamp: match[2],
        request: match[3],
        status: parseInt(match[4]),
        size: match[5] === '-' ? 0 : parseInt(match[5])
      };
    }
    
    return null;
  }
  
  // Extract structured data from text
  static extractStructuredData(text) {
    const patterns = {
      emails: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
      phones: /\b\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b/g,
      urls: /https?:\/\/[^\s<>"{}|\\^`\[\]]+/g,
      ipAddresses: /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
      creditCards: /\b(?:\d{4}[\s-]?){3}\d{4}\b/g,
      socialSecurity: /\b\d{3}-\d{2}-\d{4}\b/g
    };
    
    const extracted = {};
    
    for (const [key, pattern] of Object.entries(patterns)) {
      extracted[key] = text.match(pattern) || [];
    }
    
    return extracted;
  }
}

// Usage
const csvLine = '"John Doe","john@example.com","(555) 123-4567"';
console.log(DataExtractor.parseCSVLine(csvLine));
// ["John Doe", "john@example.com", "(555) 123-4567"]

const textWithDates = "Meeting on 12/25/2023 and follow-up on January 15, 2024";
console.log(DataExtractor.extractDates(textWithDates));
// ["12/25/2023", "January 15, 2024"]

Common Regex Patterns Library

Validation Patterns

const ValidationPatterns = {
  // Email validation (comprehensive)
  email: /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/,
  
  // Phone numbers (US format)
  phoneUS: /^\+?1?[-.\s]?\(?[2-9]\d{2}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/,
  
  // URLs
  url: /^https?:\/\/(?:[-\w.])+(?:[:\d]+)?(?:\/(?:[\w\/_.])*(?:\?(?:[\w&=%.])*)?(?:#(?:[\w.])*)?)?$/,
  
  // Credit card numbers
  creditCard: {
    visa: /^4[0-9]{12}(?:[0-9]{3})?$/,
    mastercard: /^5[1-5][0-9]{14}$/,
    amex: /^3[47][0-9]{13}$/,
    discover: /^6(?:011|5[0-9]{2})[0-9]{12}$/
  },
  
  // Password strength
  password: {
    weak: /^.{6,}$/,                                    // At least 6 characters
    medium: /^(?=.*[a-zA-Z])(?=.*\d).{8,}$/,           // Letters + numbers, 8+
    strong: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/,  // Upper + lower + digit, 8+
    veryStrong: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/ // All types, 8+
  },
  
  // Date formats
  date: {
    mmddyyyy: /^(0?[1-9]|1[012])\/(0?[1-9]|[12][0-9]|3[01])\/(19|20)\d\d$/,
    ddmmyyyy: /^(0?[1-9]|[12][0-9]|3[01])\/(0?[1-9]|1[012])\/(19|20)\d\d$/,
    yyyymmdd: /^(19|20)\d\d[-.](0?[1-9]|1[012])[-.](0?[1-9]|[12][0-9]|3[01])$/,
    iso8601: /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d{3})?Z?$/
  },
  
  // Postal codes
  postalCode: {
    us: /^\d{5}(-\d{4})?$/,           // 12345 or 12345-6789
    uk: /^[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}$/i, // UK postcode
    canada: /^[A-Z]\d[A-Z]\s?\d[A-Z]\d$/i        // Canadian postal code
  },
  
  // File types
  fileExtension: {
    image: /\.(jpg|jpeg|png|gif|bmp|svg|webp)$/i,
    document: /\.(pdf|doc|docx|txt|rtf|odt)$/i,
    video: /\.(mp4|avi|mov|wmv|flv|webm|mkv)$/i,
    audio: /\.(mp3|wav|flac|aac|ogg|wma)$/i
  }
};

// Usage helper function
function validateInput(input, type, subtype = null) {
  let pattern;
  
  if (subtype && ValidationPatterns[type] && ValidationPatterns[type][subtype]) {
    pattern = ValidationPatterns[type][subtype];
  } else if (ValidationPatterns[type]) {
    pattern = ValidationPatterns[type];
  } else {
    throw new Error(`Unknown validation type: ${type}`);
  }
  
  return pattern.test(input);
}

// Examples
console.log(validateInput("user@example.com", "email"));           // true
console.log(validateInput("4111111111111111", "creditCard", "visa")); // true
console.log(validateInput("StrongPass123!", "password", "strong"));   // true

Debugging and Testing Regex

Regex Testing Function

class RegexTester {
  static test(pattern, testCases) {
    console.log(`Testing pattern: ${pattern}`);
    console.log('='.repeat(50));
    
    const regex = new RegExp(pattern);
    const results = {
      passed: 0,
      failed: 0,
      total: testCases.length
    };
    
    testCases.forEach((testCase, index) => {
      const { input, expected, description } = testCase;
      const actual = regex.test(input);
      const passed = actual === expected;
      
      if (passed) {
        results.passed++;
      } else {
        results.failed++;
      }
      
      const status = passed ? '✓ PASS' : '✗ FAIL';
      const expectedStr = expected ? 'should match' : 'should NOT match';
      
      console.log(`${status} - ${description || `Test ${index + 1}`}`);
      console.log(`  Input: "${input}"`);
      console.log(`  Expected: ${expectedStr}`);
      console.log(`  Actual: ${actual ? 'matched' : 'did not match'}`);
      
      if (!passed) {
        console.log(`  ❌ Expected ${expected}, got ${actual}`);
      }
      
      console.log();
    });
    
    console.log('Summary:');
    console.log(`  Passed: ${results.passed}/${results.total}`);
    console.log(`  Failed: ${results.failed}/${results.total}`);
    console.log(`  Success Rate: ${(results.passed/results.total*100).toFixed(1)}%`);
    
    return results;
  }
  
  // Test with capture groups
  static testCaptures(pattern, testString) {
    const regex = new RegExp(pattern);
    const match = testString.match(regex);
    
    console.log(`Pattern: ${pattern}`);
    console.log(`Input: "${testString}"`);
    
    if (match) {
      console.log('✓ Match found!');
      console.log(`Full match: "${match[0]}"`);
      
      if (match.length > 1) {
        console.log('Captured groups:');
        for (let i = 1; i < match.length; i++) {
          console.log(`  Group ${i}: "${match[i]}"`);
        }
      }
      
      if (match.groups) {
        console.log('Named groups:');
        for (const [name, value] of Object.entries(match.groups)) {
          console.log(`  ${name}: "${value}"`);
        }
      }
    } else {
      console.log('✗ No match found');
    }
  }
  
  // Performance testing
  static performance(pattern, testString, iterations = 100000) {
    const regex = new RegExp(pattern);
    
    console.log(`Performance test: ${iterations} iterations`);
    console.log(`Pattern: ${pattern}`);
    console.log(`Input: "${testString}"`);
    
    const start = Date.now();
    
    for (let i = 0; i < iterations; i++) {
      regex.test(testString);
    }
    
    const end = Date.now();
    const duration = end - start;
    const opsPerSecond = Math.round(iterations / (duration / 1000));
    
    console.log(`Duration: ${duration}ms`);
    console.log(`Operations per second: ${opsPerSecond.toLocaleString()}`);
    
    return { duration, opsPerSecond };
  }
}

// Example usage
const emailTestCases = [
  { input: "user@example.com", expected: true, description: "Valid email" },
  { input: "invalid.email", expected: false, description: "Missing @ symbol" },
  { input: "user@", expected: false, description: "Missing domain" },
  { input: "@domain.com", expected: false, description: "Missing local part" },
  { input: "user@domain", expected: false, description: "Missing TLD" }
];

RegexTester.test(
  /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
  emailTestCases
);

Common Debugging Techniques

// 1. Break down complex patterns
function debugComplexPattern() {
  // Instead of this complex pattern all at once:
  const complex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;
  
  // Test each part separately:
  const parts = {
    hasLowercase: /(?=.*[a-z])/,
    hasUppercase: /(?=.*[A-Z])/,
    hasDigit: /(?=.*\d)/,
    hasSpecial: /(?=.*[!@#$%^&*])/,
    minLength: /.{8,}/
  };
  
  const testPassword = "MyPass123!";
  
  console.log('Testing password:', testPassword);
  for (const [name, pattern] of Object.entries(parts)) {
    const result = pattern.test(testPassword);
    console.log(`${name}: ${result ? '✓' : '✗'}`);
  }
}

// 2. Visualize what your pattern matches
function visualizeMatches(pattern, text) {
  const regex = new RegExp(pattern, 'g');
  let result;
  const matches = [];
  
  while ((result = regex.exec(text)) !== null) {
    matches.push({
      match: result[0],
      start: result.index,
      end: result.index + result[0].length
    });
  }
  
  console.log('Original text:', text);
  console.log('Pattern:', pattern);
  console.log('Matches found:', matches.length);
  
  matches.forEach((match, i) => {
    console.log(`Match ${i + 1}: "${match.match}" at position ${match.start}-${match.end}`);
  });
  
  // Highlight matches in text
  let highlighted = text;
  matches.reverse().forEach(match => {
    highlighted = highlighted.slice(0, match.start) + 
                 `[${highlighted.slice(match.start, match.end)}]` + 
                 highlighted.slice(match.end);
  });
  
  console.log('Highlighted:', highlighted);
}

// Usage
visualizeMatches(/\d+/g, "I have 5 cats and 10 dogs");

Best Practices and Performance Tips

Performance Optimization

// 1. Compile regex patterns once, reuse many times
// BAD: Creating new regex each time
function slowEmailValidation(emails) {
  return emails.filter(email => {
    return /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/.test(email);
  });
}

// GOOD: Compiled once, reused
const EMAIL_PATTERN = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
function fastEmailValidation(emails) {
  return emails.filter(email => EMAIL_PATTERN.test(email));
}

// 2. Use specific patterns instead of broad ones
// BAD: Too broad, potentially slow
const slowPattern = /.*@.*\..*$/;

// GOOD: Specific character classes
const fastPattern = /^[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,}$/;

// 3. Anchor your patterns when possible
// BAD: Can match anywhere in string
const unanchored = /\d{3}-\d{2}-\d{4}/;

// GOOD: Anchored for exact match
const anchored = /^\d{3}-\d{2}-\d{4}$/;

// 4. Use non-capturing groups when you don't need the capture
// BAD: Unnecessary capturing
const capturing = /(https?|ftp):\/\/([\w.-]+)/;

// GOOD: Non-capturing for protocol
const nonCapturing = /(?:https?|ftp):\/\/([\w.-]+)/;

// 5. Be careful with quantifiers
// BAD: Potential catastrophic backtracking
const dangerous = /(a+)+b/;

// GOOD: More specific, linear time
const safe = /a+b/;

Readability and Maintenance

// 1. Use comments and variables for complex patterns
class ReadableRegex {
  static createEmailPattern() {
    // Break down email pattern into readable parts
    const localPart = '[a-zA-Z0-9._%+-]+';     // username part
    const domain = '[a-zA-Z0-9.-]+';           // domain name
    const tld = '[a-zA-Z]{2,}';                // top-level domain
    
    return new RegExp(`^${localPart}@${domain}\.${tld}$`);
  }
  
  static createPhonePattern() {
    const areaCode = '\\(?[2-9]\\d{2}\\)?';   // (555) or 555
    const exchange = '[2-9]\\d{2}';           // 123 (not starting with 0 or 1)
    const number = '\\d{4}';                  // 4567
    const separator = '[-.\\s]?';             // Optional separator
    
    return new RegExp(`^${areaCode}${separator}${exchange}${separator}${number}$`);
  }
}

// 2. Use the 'x' flag for verbose patterns (in languages that support it)
// In JavaScript, use template strings for readability
function createComplexPattern() {
  return new RegExp(`
    ^
    (?=.*[a-z])     # at least one lowercase letter
    (?=.*[A-Z])     # at least one uppercase letter
    (?=.*\d)        # at least one digit
    (?=.*[!@#$%^&*]) # at least one special character
    .{8,}           # minimum 8 characters
    $
  `.replace(/\s+#.*$/gm, '').replace(/\s+/g, ''));
}

// 3. Create a regex library for your application
class AppRegexLibrary {
  static patterns = {
    email: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
    phone: /^\(?[2-9]\d{2}\)?[-.\s]?[2-9]\d{2}[-.\s]?\d{4}$/,
    zipCode: /^\d{5}(-\d{4})?$/,
    strongPassword: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/
  };
  
  static validate(type, input) {
    const pattern = this.patterns[type];
    if (!pattern) {
      throw new Error(`Unknown pattern type: ${type}`);
    }
    return pattern.test(input);
  }
  
  static addPattern(name, pattern) {
    this.patterns[name] = pattern;
  }
}

// 4. Document your regex patterns
/**
 * Email validation pattern
 * 
 * Pattern: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
 * 
 * Explanation:
 * ^ - Start of string
 * [a-zA-Z0-9._%+-]+ - Local part: letters, digits, and common special chars
 * @ - Literal @ symbol
 * [a-zA-Z0-9.-]+ - Domain: letters, digits, dots, hyphens
 * \. - Literal dot before TLD
 * [a-zA-Z]{2,} - TLD: at least 2 letters
 * $ - End of string
 * 
 * Examples:
 * - Matches: "user@example.com", "test.email+tag@domain.co.uk"
 * - Doesn't match: "plainaddress", "@domain.com", "user@"
 */
const EMAIL_PATTERN = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

Security Considerations

// 1. Always validate on the server side
// Client-side regex can be bypassed, so always re-validate on server

// 2. Use whitelist approach for security
// BAD: Trying to block dangerous characters
const blacklistApproach = /[^<>"'&]/;  // Can be bypassed

// GOOD: Only allow safe characters
const whitelistApproach = /^[a-zA-Z0-9\s.,!?-]+$/;

// 3. Be aware of ReDoS (Regular Expression Denial of Service)
// BAD: Can cause exponential backtracking
const vulnerable = /(a+)+b/;

// Test with: "a".repeat(25) + "c" - will hang!

// GOOD: Linear time complexity
const safe = /a+b/;

// 4. Sanitize input before and after regex validation
function secureValidation(input) {
  // Sanitize input
  const sanitized = input.trim().toLowerCase();
  
  // Validate with regex
  const isValid = /^[a-z0-9._-]{3,20}$/.test(sanitized);
  
  if (!isValid) {
    throw new Error('Invalid input format');
  }
  
  // Additional security checks
  const blockedWords = ['admin', 'root', 'system'];
  if (blockedWords.includes(sanitized)) {
    throw new Error('Reserved word not allowed');
  }
  
  return sanitized;
}

// 5. Use timeout for regex execution in critical applications
function safeRegexTest(pattern, input, timeout = 1000) {
  return new Promise((resolve, reject) => {
    const timer = setTimeout(() => {
      reject(new Error('Regex execution timeout'));
    }, timeout);
    
    try {
      const result = pattern.test(input);
      clearTimeout(timer);
      resolve(result);
    } catch (error) {
      clearTimeout(timer);
      reject(error);
    }
  });
}

Conclusion and Next Steps

Congratulations! You've completed a comprehensive journey through regular expressions. You now understand:


Basic syntax and metacharacters for building patterns
Character classes and quantifiers for flexible matching
Groups and capturing for extracting data
Advanced features like lookahead and lookbehind
Practical applications in validation, parsing, and text processing
Performance and security best practices


Practice Exercises
To solidify your understanding, try these exercises:


Create a log parser that extracts IP addresses, timestamps, and HTTP status codes from server logs
Build a markdown parser that identifies headers, links, and code blocks
Design a data anonymizer that masks sensitive information in text files
Implement a URL router that matches dynamic URL patterns
Create a SQL query validator that prevents injection attacks


Further Learning
Continue your regex journey with:


Language-specific features: Learn regex implementations in your favorite programming languages
Advanced topics: Recursive patterns, conditional expressions, and regex engines
Tools and libraries: Explore regex testing tools, debuggers, and specialized libraries
Real-world projects: Apply regex to actual problems in your development work


Remember: Regular expressions are powerful but should be used judiciously. Sometimes a simple string method or a dedicated parser is more appropriate than a complex regex. The key is knowing when and how to use regex effectively.

Happy pattern matching!

Regex Fundamentals: Complete Beginner's Guide to Regular Expressions

Regex Fundamentals: Complete Beginner's Guide to Regular Expressions

Table of Contents

What are Regular Expressions?

Real-World Examples

Basic Regex Syntax and Characters

Literal Characters

Special Characters (Metacharacters)

Escaping Special Characters

Flags

Character Classes and Ranges

Basic Character Classes

Predefined Character Classes

Negated Character Classes

Quantifiers: Controlling Repetition

Basic Quantifiers

Specific Quantifiers

Greedy vs. Non-Greedy Quantifiers

Anchors and Boundaries

String Anchors

Word Boundaries

Groups and Capturing

Basic Groups

Capturing Groups

Non-Capturing Groups

Named Capturing Groups

Advanced: Lookahead and Lookbehind

Positive Lookahead (?=)

Negative Lookahead (?!)

Positive Lookbehind (?<=)

Practical Examples and Use Cases

Form Validation

Text Processing

Data Parsing and Extraction

Common Regex Patterns Library

Validation Patterns

Debugging and Testing Regex

Regex Testing Function

Common Debugging Techniques

Best Practices and Performance Tips

Performance Optimization

Readability and Maintenance

Security Considerations

Conclusion and Next Steps

Practice Exercises

Further Learning