data types and standard library
Regular Expressions
Regular expressions are patterns for matching, extracting, and replacing text. In PHP they are usually used through the preg_* functions.
Use regex when the rule is genuinely pattern-based: references, slugs, route parameters, log lines, identifiers, simple tokens, and controlled text cleanup. Do not reach for regex when a normal string function is clearer, and do not use regex to parse full HTML, XML, or programming languages.
Match a pattern
preg_match() returns 1 for a match, 0 for no match, and false for an error.
<?php
declare(strict_types=1);
$subject = 'Order #PFZ-2026-001';
if (preg_match('/PFZ-\d{4}-\d{3}/', $subject, $matches) === 1) {
echo $matches[0] . PHP_EOL;
}
// Prints:
// PFZ-2026-001
Always compare the result strictly. A regex error and a clean "no match" are different outcomes.
Anchor validation patterns
Use anchors when the whole value must match the rule.
<?php
declare(strict_types=1);
function isValidOrderReference(string $reference): bool
{
return preg_match('/^PFZ-\d{4}-\d{3}$/', $reference) === 1;
}
echo isValidOrderReference('PFZ-2026-001') ? 'valid' : 'invalid';
echo PHP_EOL;
// Prints:
// valid
Without ^ and $, the regex could match a valid-looking substring inside a larger invalid value.
Use named captures for readable extraction
Named captures make the result easier to understand.
<?php
declare(strict_types=1);
$reference = 'PFZ-2026-001';
if (preg_match('/^PFZ-(?<year>\d{4})-(?<number>\d{3})$/', $reference, $matches) !== 1) {
throw new InvalidArgumentException('Invalid order reference.');
}
echo $matches['year'] . ' / ' . $matches['number'] . PHP_EOL;
// Prints:
// 2026 / 001
This is better than relying on $matches[1] and $matches[2] in code that someone else must maintain.
Match all occurrences
Use preg_match_all() when a string may contain many matches.
<?php
declare(strict_types=1);
$text = 'Orders PFZ-2026-001 and PFZ-2026-002 are ready.';
preg_match_all('/PFZ-\d{4}-\d{3}/', $text, $matches);
echo implode(', ', $matches[0]) . PHP_EOL;
// Prints:
// PFZ-2026-001, PFZ-2026-002
This shape is common in import tools, log scanning, and support scripts.
Replace controlled patterns
Use preg_replace() for pattern-based cleanup.
<?php
declare(strict_types=1);
$title = ' New product launch ';
$normalised = trim(preg_replace('/\s+/', ' ', $title) ?? '');
echo $normalised . PHP_EOL;
// Prints:
// New product launch
preg_replace() can return null on error, so handle that if the pattern is not a fixed literal in your code.
Escape user-provided text
If a user-provided value is part of a regex, escape it with preg_quote().
<?php
declare(strict_types=1);
$needle = 'price (GBP)';
$text = 'The field price (GBP) is required.';
$pattern = '/' . preg_quote($needle, '/') . '/';
echo preg_match($pattern, $text) === 1 ? 'found' : 'missing';
echo PHP_EOL;
// Prints:
// found
Without escaping, characters such as (, ), ., +, and ? change the meaning of the regex.
Use Unicode mode when needed
The u modifier tells PCRE to treat the subject as UTF-8.
<?php
declare(strict_types=1);
$name = 'Nia Stone';
echo preg_match('/^[\p{L}\s]+$/u', $name) === 1 ? 'letters only' : 'invalid';
echo PHP_EOL;
// Prints:
// letters only
Unicode-aware patterns are important for names, addresses, and international text. They still need product decisions about which characters are allowed.
What to remember
Regex is powerful, but maintainability matters. Anchor validation patterns, use named captures, escape user-provided fragments, compare preg_match() results strictly, and prefer ordinary string functions when the rule is simple.
Practice
Task: Parse order references
Write a small parser for order references.
Requirements
- Use
declare(strict_types=1);. - Accept references in the exact format
PFZ-YYYY-NNN. - Use anchors so the whole string must match.
- Use named captures for
yearandnumber. - Return a typed array containing
yearas an integer andnumberas a string. - Print the parsed values for one valid reference.
- Show one invalid reference by catching the exception.
- Include the expected output as comments in the same PHP code block.
The parser should reject strings that only contain a valid-looking reference as a substring.
Show solution
<?php
declare(strict_types=1);
function parseOrderReference(string $reference): array
{
$matched = preg_match('/^PFZ-(?<year>\d{4})-(?<number>\d{3})$/', $reference, $matches);
if ($matched !== 1) {
throw new InvalidArgumentException('Invalid order reference.');
}
return [
'year' => (int) $matches['year'],
'number' => $matches['number'],
];
}
$parsed = parseOrderReference('PFZ-2026-001');
echo $parsed['year'] . ' / ' . $parsed['number'] . PHP_EOL;
try {
parseOrderReference('Order PFZ-2026-001');
} catch (InvalidArgumentException $exception) {
echo $exception->getMessage() . PHP_EOL;
}
// Prints:
// 2026 / 001
// Invalid order reference.
The regex is anchored, so it validates the whole reference rather than finding a matching substring. Named captures make the returned data obvious in review.