Pattern matching with PHP

Pattern matching with PHP: Part V

We have already looked at the sophisticated features Perl-compatible regular expressions (PCRE) can bring to PHP developers. Now we will finish off with the final significant feature, assertions, and look at the functions provided in the PCRE extension.

Assertions

Assertions allow you to match a sub-pattern without actually consuming that sub-pattern. For example, say you wanted to match only the pattern 'end' if it was followed by the HTML tag '</p>'. We could do this using back references in a PCRE: '/(end)(<\/p>)/'.

Using assertions, however, we can prevent the PCRE matching routine from consuming the '</p>' portion of the string. To do so, we use a look ahead assertion: '/(end)(?=<\/p>)/'. The syntax '?=' tells the PCRE extension that the following pattern is a look ahead assertion.

Complementing this is the look behind assertion, in which the sub-pattern is matched only if it is preceded by another pattern. Say we wanted to match 'item' only if it was preceded by 'news '.The following PCRE could be used: '/(?<=news )(item)/'. Here, the syntax '?<=' tells the PCRE extension that the following pattern is a look behind assertion.

Sub-patterns are also permitted in assertions. For example, say you want to match the pattern 'end' if it is followed by '</p>' or '.' (a full stop). The following PCRE could be used:

'/(end)(?=(<\/p>|\.))/'.

Sub-patterns can be used in a similar way in look behind assertions, with one restriction: each sub-pattern must be of equal length. For example: '/(?<=(news|review) )(item)/' is invalid, since 'news' and 'review' differ in length.

Another feature of assertions is negative assertions. A negative assertion allows you to match a pattern not preceded or followed by a pattern. To match a pattern 'end' not followed by '</p>', the following PCRE could be used: '/(end)(?!<\/p>)/'. Notice that the normal look ahead assertion syntax is '?=' whereas the negative look ahead assertion syntax is '?!'. Likewise, the syntax for negative look behind assertions is '?<!'.

PCRE functions

The PCRE extension provides functions similar to those you would be familiar with from the POSIX regular express extension. For matching, use preg_match(), and for substitution, use preg_replace(). Consider the following examples:

01 <?
02 $str = "<p>Find the word at the end</p>";
03 $regs = array();
04 if(preg_match("/(end)(?=(<\/p>|\.))/",$str,$regs)) {
05 echo "Word is {$regs[1]}' followed by '{$regs[2]}'\n";
06 }
07 ?> 

In this example, we match the string $str with the pattern on line 04. The pattern uses the look ahead assertion discussed above. As with the POSIX extension, we are able to store the matched sub-patterns in $regs as an array and report to the user on our matches.

01 <?
02 $str = "Yet another news item.";
03 $str = preg_replace("/(?<=news )(item)/i","article",$str);
04 echo $str;
05 ?> 

In this example, we replace a matched pattern in the string $str. We match any instance of 'item' preceded by 'news '. Notice that only the sub-pattern 'item' is replaced, not the pattern in the look behind assertion.

The PCRE extension also comes with useful functions missing from the POSIX extension. The preg_grep() function can be used to match patterns in array data. In this sense, it treats each array element as a line of a file and behaves like the popular UNIX tool 'grep'.

01 <?
02 $file = file('test.html');
03 $lines = preg_grep("/<h1>.*<\/h1>/iU",$file);
04 print_r($lines);
05 ?> 

On line 02 of this example we read the input of the file test.html into an array $file, one line per element of the array. In line 03, we attempt to match lines with '<h1>' and '</h1>'. Note the 'i' (case) and 'U' (greediness) modifiers allow us to do this. The keys to the array are the line numbers of those lines from test.html which matched our pattern, counting from one.

Another invaluable function is preg_quote(). This function escapes any PCRE meta characters in the string passed as an argument, preparing the string to be used as a pattern in the PCRE extension. For example, it will escape the string '</h1>' to '<\/h1>'.

Readers interested in extending their PCRE and PHP skills should investigate the full range of functionality available in the extension, at www.php.net/manual/en/ref.pcre.php.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Gavin Sherry

PC World
Show Comments

Essentials

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?