Adam Lundrigan | Blog

Dec/11

5

PHP Tip-of-the-Day™: Use { and } as PCRE delimiters

This is mentioned in the PHP Manual (here), but unfortunately they don’t go into any detail as to why you would choose to use bracket-style delimiters for your PCRE expressions. Here’s an example of why bracket delimeters very nearly == epicness:

Say, for instance, we want to rewrite URLs using patterns such as this:

Selec All Code:
$find = 'http://([^.]+).someurl.com/(.*)';
$replace = 'http://$1.newurl.com/newfolder/$2';

If we’re not careful, and use a ‘/’ delimiter without escaping, we fail:

Selec All Code:
$search_string = 'http://mysub.someurl.com/this/nifty/page.html';
// This will fail, as '/' is the delimiter
preg_replace("/{$find}/", $replace, $search_string);
// ...we have to do this instead...yuck
preg_replace("/" . preg_quote($find, '/') . "/", $replace, $search_string);
// Returns: http://mysub.someurl.com/new/this/nifty/page.html

We could use # as the delimiter, but again if we’re not careful and our $search_string happens to have an anchor reference, we fail:

Selec All Code:
$search_string = 'http://mysub.someurl.com/this/nifty/page.html#foo';
// This will fail, as '#' is the delimiter
preg_replace("#{$find}#", $replace, $search_string);
// ...we have to do this instead...yuck
preg_replace("#" . preg_quote($find, '#') . "#", $replace, $search_string);
// Returns: http://mysub.someurl.com/new/this/nifty/page.html#foo

However, the bracketed delimiter setup is unique: brackets appearing in the pattern body don’t need to be escaped! So, we could do something like this to extract a simple method body:

Selec All Code:
$search_string = 'function foo() { echo "bar!"; }';
preg_match_all("{{(.*)}}s", $search_string, $matches);
// Expression:  { { (.*) } }s
//              ^ open expression
//                ^ match opening brace in subject
//                  ^^^ match any character any number of times
//                       ^ match closing brace in subject
//                         ^ close expression
//                          ^ ignore whitespace
//
// $matches is now an array containing the method body!:
// Array
// (
//     [0] => Array
//         (
//             [0] => { echo "bar!";
//         )
// 
//     [1] => Array
//         (
//             [0] =>  echo "bar!";
//         )
// 
// )

Neat, eh!? That statement, however, has one caveat…and I carefully crafted my first example to avoid that pitfall caveat: the expression will fail if you don’t use the same number of each delimiter, regardless of their position or order:

Selec All Code:
// Attempt to use } in match (2 open, 3 closed) = fail (Unknown modifier '}')
preg_match_all("{{[^}]*}}s", $search_string, $matches);
 
// ...but this is OK, but doesn't match our previously-used $search_string (3 open, 3 closed) = OK)
preg_match_all("{{{[^}]*}}s", $search_string, $matches);
 
// ...so this will fix the first failure above:
preg_match_all("{" . preg_quote('{[^}]*}', '{') . "}s", $search_string, $matches);

So, that’s unfortunate…we still have to use preg_quote if there is a chance the injected expression could contain unequal quantities of {s and }s.

No tags

No comments yet.

Leave a Reply

<<

>>

Theme Design by devolux.nh2.me

Switch to our mobile site