5
PHP Tip-of-the-Day™: Use { and } as PCRE delimiters
No comments · Posted by Adam Lundrigan in Uncategorized
This is mentioned in the PHP Manual (here), but unfortunately they don’t go into any detail as to why you would choose to use bracket-style delimiters for your PCRE expressions. Here’s an example of why bracket delimeters very nearly == epicness:
Say, for instance, we want to rewrite URLs using patterns such as this:
$find = 'http://([^.]+).someurl.com/(.*)'; $replace = 'http://$1.newurl.com/newfolder/$2';
If we’re not careful, and use a ‘/’ delimiter without escaping, we fail:
$search_string = 'http://mysub.someurl.com/this/nifty/page.html'; // This will fail, as '/' is the delimiter preg_replace("/{$find}/", $replace, $search_string); // ...we have to do this instead...yuck preg_replace("/" . preg_quote($find, '/') . "/", $replace, $search_string); // Returns: http://mysub.someurl.com/new/this/nifty/page.html
We could use # as the delimiter, but again if we’re not careful and our $search_string happens to have an anchor reference, we fail:
$search_string = 'http://mysub.someurl.com/this/nifty/page.html#foo'; // This will fail, as '#' is the delimiter preg_replace("#{$find}#", $replace, $search_string); // ...we have to do this instead...yuck preg_replace("#" . preg_quote($find, '#') . "#", $replace, $search_string); // Returns: http://mysub.someurl.com/new/this/nifty/page.html#foo
However, the bracketed delimiter setup is unique: brackets appearing in the pattern body don’t need to be escaped! So, we could do something like this to extract a simple method body:
$search_string = 'function foo() { echo "bar!"; }'; preg_match_all("{{(.*)}}s", $search_string, $matches); // Expression: { { (.*) } }s // ^ open expression // ^ match opening brace in subject // ^^^ match any character any number of times // ^ match closing brace in subject // ^ close expression // ^ ignore whitespace // // $matches is now an array containing the method body!: // Array // ( // [0] => Array // ( // [0] => { echo "bar!"; // ) // // [1] => Array // ( // [0] => echo "bar!"; // ) // // )
Neat, eh!? That statement, however, has one caveat…and I carefully crafted my first example to avoid that pitfall caveat: the expression will fail if you don’t use the same number of each delimiter, regardless of their position or order:
// Attempt to use } in match (2 open, 3 closed) = fail (Unknown modifier '}') preg_match_all("{{[^}]*}}s", $search_string, $matches); // ...but this is OK, but doesn't match our previously-used $search_string (3 open, 3 closed) = OK) preg_match_all("{{{[^}]*}}s", $search_string, $matches); // ...so this will fix the first failure above: preg_match_all("{" . preg_quote('{[^}]*}', '{') . "}s", $search_string, $matches);
So, that’s unfortunate…we still have to use preg_quote if there is a chance the injected expression could contain unequal quantities of {s and }s.
No tags
No comments yet.
Leave a Reply
<< Spying on your BF3 BattleLog friends made easy with PHP + Zend Framework

