PHP regular expressions are designed to find, extract or replace substrings within strings.
PHP string functions can do this but regular expressions allow much more sophisticated pattern matching.
Note: If a string function will do what you need then use it instead of a regular expression since this takes much more processing.
Although PHP has several sets of functions that use regular expressions, the POSIX functions are supported by all versions of PHP so are the only ones covered here.
The ereg() Function
The ereg() function is case-sensitive, eregi() is the case-insensitive version.
The function returns true if the supplied pattern is found in the source string.
Simple Example of ereg()
<?php
// This returns true so prints out 1
$match = ereg ('dog', 'My dog digs for bones.');
echo ($match);
?>
Wildcards
To represent any character in a pattern wildcards can be used, which are represented by the . character.
Example of ereg() with Wildcard
<?php
// This returns true so prints out 1
$match = ereg ('d.g', 'My dog digs for bones.');
echo ($match);
// This also returns true so prints out 1
$match = ereg ('d.g', 'My cat digs for bones.');
echo ($match);
?>
If you wish to use the . character itself within the pattern then it needs to be escaped with the \ character as does the \ character itself if used.
Escape sequences within regular expressions follow the same rules as with strings, so are covered in the Strings Tutorial. There are also some tips on using escape sequences at the end of this tutorial.
Character Lists
Instead of using a wildcard to match any character, a list of characters inside brackets can be specified within a pattern.
Example of ereg() with a Character List
<?php
// This returns true for dag, deg, dig, dog and dug
$match = ereg ('d[aeiou]g', 'My dog digs.');
?>
Specifying a Range of Characters
A range of characters can be tested for using the - character between the first and last character in the range e.g. [a-z] means all lowercase characters from a to z.
Note: If the - character is used at the beginning or the end of the character list then it's treated like any other character e.g. [-09] matches characters -, 0, or 9.
Example of ereg() with Character Lists
<?php
// This returns true for '10' up to '29'
$match = ereg ('[1-2][0-9]', '13');
?>
The ^ not Operator
A character list may specify characters that don't match by using the ^ not operator as the first character in the character list.
The pattern "[^A]" will match any character except A.
Example of ereg() with ^ not
<?php
/* This returns true for any 3 letter word that
starts with p and ends with g - except pig */
$match = ereg ('p[^i]g', $word);
?>
Note: If the ^ character is used anywhere within the character list, except as the first character, then it's treated like any other character e.g. [i^] matches characters i and ^.
Using Anchors
A regular expression can specify that the pattern occurs at the start or the end of a source string.
The ^ character anchors the pattern to the start of the source string.
The $ character anchors the pattern to the end of the source string.
Example of ereg() with anchors
<?php
// This returns true if string starts with www.
$match = ereg ('^www\.', 'www.domain.com');
// This returns true if string ends with .com
$match = ereg ('\.com$', 'www.domain.com');
?>
Repeating Characters
PHP allows 3 operators in regular expressions to match a pattern of zero to many occurrences of a character.
The ? Operator
This matches zero or one occurrence of a character. e.g. ereg ('dro?p', $word) returns true for drp or drop but false for droop.
The * Operator
This matches zero to many occurrences of a character. e.g. ereg ('dro*p', $word) returns true for drp, drop, droop, drooop etc.
The + Operator
This matches one to many occurrences of a character. e.g. ereg ('dro+p', $word) returns false for drp but true for drop, droop, drooop etc.
The ?, * and + operators can also be used with a wildcard or a list of characters.
Example of ereg() with anchors and wildcard
<?php
/* This returns true if string starts with www.
and ends with .com */
$match = ereg ('^www\..*\.com$', 'www.foobar.com');
?>
Repeating Groups of Characters
The ?, * and + operators can be used on character groups as well as individual characters by the use of parenthesis.
Example of Groups of Characters
<?php
// This returns true for abc, abcabc, abcabcabc etc.
$match = ereg ('(abc)+', $string);
?>
The | or Operator
The | or operator allows alternatives within a pattern.
Example of the | or Operator
<?php
/* This returns true only if $sentence starts
with Dear Sir, or Dear Madame, */
$match = ereg ('^Dear (Sir|Madame),', $sentence);
?>
The {} Braces Syntax
A fixed number or occurrences of characters from a list can be specified in braces
e.g. '[1-5]{2}' returns true for a two-digit number that contains the digits 1 through to 5.
You can also specify a maximum and minimum number of occurrences within the braces.
e.g. '[1-6]{2,4}' returns true for a two to four digit number that contains the digits 1 through to 6.
Example of ereg() with braces
<?php
// This returns true
$match = ereg ('[1-5]{2,4}', '12345678');
// This returns false
$match = ereg ('[1-5]{2,4}', '9');
?>
Escape Sequences
One of the biggest problems when using regular expressions is readability. The problem is compounded when characters within patterns are the same as the special characters used as operators.
To avoid these characters being interpreted as operators, they must be escaped and this makes patterns less readable.
Using single quoted strings instead of double quoted strings within regular expression can make things a bit easier since single quoted strings require less escape sequences.
Another way to avoid unreadable escape sequences is to place characters with special meanings within a list where they don't need to be escaped.
Using Lists to Escape Special Characters
<?php
// ?, * and + need to be escaped
$match = ereg ('\?\*\+', 'Operators ?*+');
// But in a list they don't
$match = ereg ('[?*+]', 'Operators ?*+');
?>
The Optional Array() Argument
The ereg() function allows an optional array variable to be passed as a third argument (arrays are covered in the Arrays Tutorial).
If a match is found then the array is returned with ten elements populated with the parts of the source string that matched.
The pattern allows up to nine substrings to be matched and each match is placed in an array element in turn, with the complete source string placed in the first element.
The substrings to be matched are arranged in groups, contained within parenthesis, within the pattern.
If no matches are found then ereg() returns FALSE.
Using ereg() with Optional Array Argument
<?php
/* This takes an ISO format date (YYYY-MM-DD)
and splits the year, month and day into an array
then prints it out in DD/MM/YYYY format */