data types and standard library

Regular Expressions

Regular expressions are patterns for matching, extracting, and replacing text. In PHP they are usually used through the preg_* functions.

Use regex when the rule is genuinely pattern-based: references, slugs, route parameters, log lines, identifiers, simple tokens, and controlled text cleanup. Do not reach for regex when a normal string function is clearer, and do not use regex to parse full HTML, XML, or programming languages.

Match a pattern

preg_match() returns 1 for a match, 0 for no match, and false for an error.

PHP example
<?php

declare(strict_types=1);

$subject = 'Order #PFZ-2026-001';

if (preg_match('/PFZ-\d{4}-\d{3}/', $subject, $matches) === 1) {
    echo $matches[0] . PHP_EOL;
}

// Prints:
// PFZ-2026-001

Always compare the result strictly. A regex error and a clean "no match" are different outcomes.

Anchor validation patterns

Use anchors when the whole value must match the rule.

PHP example
<?php

declare(strict_types=1);

function isValidOrderReference(string $reference): bool
{
    return preg_match('/^PFZ-\d{4}-\d{3}$/', $reference) === 1;
}

echo isValidOrderReference('PFZ-2026-001') ? 'valid' : 'invalid';
echo PHP_EOL;

// Prints:
// valid

Without ^ and $, the regex could match a valid-looking substring inside a larger invalid value.

Use named captures for readable extraction

Named captures make the result easier to understand.

PHP example
<?php

declare(strict_types=1);

$reference = 'PFZ-2026-001';

if (preg_match('/^PFZ-(?<year>\d{4})-(?<number>\d{3})$/', $reference, $matches) !== 1) {
    throw new InvalidArgumentException('Invalid order reference.');
}

echo $matches['year'] . ' / ' . $matches['number'] . PHP_EOL;

// Prints:
// 2026 / 001

This is better than relying on $matches[1] and $matches[2] in code that someone else must maintain.

Match all occurrences

Use preg_match_all() when a string may contain many matches.

PHP example
<?php

declare(strict_types=1);

$text = 'Orders PFZ-2026-001 and PFZ-2026-002 are ready.';

preg_match_all('/PFZ-\d{4}-\d{3}/', $text, $matches);

echo implode(', ', $matches[0]) . PHP_EOL;

// Prints:
// PFZ-2026-001, PFZ-2026-002

This shape is common in import tools, log scanning, and support scripts.

Replace controlled patterns

Use preg_replace() for pattern-based cleanup.

PHP example
<?php

declare(strict_types=1);

$title = '  New   product   launch  ';
$normalised = trim(preg_replace('/\s+/', ' ', $title) ?? '');

echo $normalised . PHP_EOL;

// Prints:
// New product launch

preg_replace() can return null on error, so handle that if the pattern is not a fixed literal in your code.

Escape user-provided text

If a user-provided value is part of a regex, escape it with preg_quote().

PHP example
<?php

declare(strict_types=1);

$needle = 'price (GBP)';
$text = 'The field price (GBP) is required.';
$pattern = '/' . preg_quote($needle, '/') . '/';

echo preg_match($pattern, $text) === 1 ? 'found' : 'missing';
echo PHP_EOL;

// Prints:
// found

Without escaping, characters such as (, ), ., +, and ? change the meaning of the regex.

Use Unicode mode when needed

The u modifier tells PCRE to treat the subject as UTF-8.

PHP example
<?php

declare(strict_types=1);

$name = 'Nia Stone';

echo preg_match('/^[\p{L}\s]+$/u', $name) === 1 ? 'letters only' : 'invalid';
echo PHP_EOL;

// Prints:
// letters only

Unicode-aware patterns are important for names, addresses, and international text. They still need product decisions about which characters are allowed.

What to remember

Regex is powerful, but maintainability matters. Anchor validation patterns, use named captures, escape user-provided fragments, compare preg_match() results strictly, and prefer ordinary string functions when the rule is simple.

Practice

Task: Parse order references

Write a small parser for order references.

Requirements

  • Use declare(strict_types=1);.
  • Accept references in the exact format PFZ-YYYY-NNN.
  • Use anchors so the whole string must match.
  • Use named captures for year and number.
  • Return a typed array containing year as an integer and number as a string.
  • Print the parsed values for one valid reference.
  • Show one invalid reference by catching the exception.
  • Include the expected output as comments in the same PHP code block.

The parser should reject strings that only contain a valid-looking reference as a substring.

Show solution
PHP example
<?php

declare(strict_types=1);

function parseOrderReference(string $reference): array
{
    $matched = preg_match('/^PFZ-(?<year>\d{4})-(?<number>\d{3})$/', $reference, $matches);

    if ($matched !== 1) {
        throw new InvalidArgumentException('Invalid order reference.');
    }

    return [
        'year' => (int) $matches['year'],
        'number' => $matches['number'],
    ];
}

$parsed = parseOrderReference('PFZ-2026-001');

echo $parsed['year'] . ' / ' . $parsed['number'] . PHP_EOL;

try {
    parseOrderReference('Order PFZ-2026-001');
} catch (InvalidArgumentException $exception) {
    echo $exception->getMessage() . PHP_EOL;
}

// Prints:
// 2026 / 001
// Invalid order reference.

The regex is anchored, so it validates the whole reference rather than finding a matching substring. Named captures make the returned data obvious in review.