Documentation on preg_replace
preg_replace = Perform a regular expression search and replace
Searches subject for matches to pattern and replaces them with replacement.
pattern The pattern to search for. It can be either a string or an array with strings. Several PCRE modifiers are also available. replacement The string or an array with strings to replace. If this parameter is a string and the pattern parameter is an array, all patterns will be replaced by that string. If both pattern and replacement parameters are arrays, each pattern will be replaced by the replacement counterpart. If there are fewer elements in the replacement array than in the pattern array, any extra patterns will be replaced by an empty string. replacement may contain references of the form \\n or (since PHP 4.0.4) $n, with the latter form being the preferred one. Every such reference will be replaced by the text captured by the n'th parenthesized pattern. n can be from 0 to 99, and \\0 or $0 refers to the text matched by the whole pattern. Opening parentheses are counted from left to right (starting from 1) to obtain the number of the capturing subpattern. To use backslash in replacement, it must be doubled ("\\\\" PHP string). When working with a replacement pattern where a backreference is immediately followed by another number (i.e.: placing a literal number immediately after a matched pattern), you cannot use the familiar \\1 notation for your backreference. \\11, for example, would confuse preg_replace() since it does not know whether you want the \\1 backreference followed by a literal 1, or the \\11 backreference followed by nothing. In this case the solution is to use ${1}1. This creates an isolated $1 backreference, leaving the 1 as a literal. When using the deprecated e modifier, this function escapes some characters (namely ', ", \ and NULL) in the strings that replace the backreferences. This is done to ensure that no syntax errors arise from backreference usage with either single or double quotes (e.g. 'strlen(\'$1\')+strlen("$2")'). Make sure you are aware of PHP's string syntax to know exactly how the interpreted string will look. subject The string or an array with strings to search and replace. If subject is an array, then the search and replace is performed on every entry of subject, and the return value is an array as well. limit The maximum possible replacements for each pattern in each subject string. Defaults to -1 (no limit). count If specified, this variable will be filled with the number of replacements done.
Usage, params, and more on preg_replace
mixed preg_replace ( mixed $pattern
, mixed $replacement
, mixed $subject
[, int $limit
= -1 [, int &$count
]] )
pattern
The pattern to search for. It can be either a string or an array with strings. Several PCRE modifiers are also available. replacement
The string or an array with strings to replace. If this parameter is a string and the pattern
parameter is an array, all patterns will be replaced by that string. If both pattern
and replacement
parameters are arrays, each pattern
will be replaced by the replacement
counterpart. If there are fewer elements in the replacement
array than in the pattern
array, any extra pattern
s will be replaced by an empty string. replacement
may contain references of the form \\n or (since PHP 4.0.4) $n, with the latter form being the preferred one. Every such reference will be replaced by the text captured by the n'th parenthesized pattern. n can be from 0 to 99, and \\0 or $0 refers to the text matched by the whole pattern. Opening parentheses are counted from left to right (starting from 1) to obtain the number of the capturing subpattern. To use backslash in replacement, it must be doubled ("\\\\" PHP string). When working with a replacement pattern where a backreference is immediately followed by another number (i.e.: placing a literal number immediately after a matched pattern), you cannot use the familiar \\1 notation for your backreference. \\11, for example, would confuse preg_replace() since it does not know whether you want the \\1 backreference followed by a literal 1, or the \\11 backreference followed by nothing. In this case the solution is to use ${1}1. This creates an isolated $1 backreference, leaving the 1 as a literal. When using the deprecated e modifier, this function escapes some characters (namely ', ", \ and NULL) in the strings that replace the backreferences. This is done to ensure that no syntax errors arise from backreference usage with either single or double quotes (e.g. 'strlen(\'$1\')+strlen("$2")'). Make sure you are aware of PHP's string syntax to know exactly how the interpreted string will look. subject
The string or an array with strings to search and replace. If subject
is an array, then the search and replace is performed on every entry of subject
, and the return value is an array as well. limit
The maximum possible replacements for each pattern in each subject
string. Defaults to -1 (no limit). count
If specified, this variable will be filled with the number of replacements done.
preg_replace() returns an array if the subject
parameter is an array, or a string otherwise.
As of PHP 5.5.0 E_DEPRECATED
level error is emitted when passing in the "\e" modifier. As of PHP 7.0.0 E_WARNING
is emited in this case and "\e" modifier has no effect.
Notes and warnings on preg_replace
Basic example of how to use: preg_replace
Example #1 Using backreferences followed by numeric literals
<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?>
The above example will output:
April1,2003
Example #2 Using indexed arrays with preg_replace()
<?php
$string = 'The quick brown fox jumps over the lazy dog.';
$patterns = array();
$patterns[0] = '/quick/';
$patterns[1] = '/brown/';
$patterns[2] = '/fox/';
$replacements = array();
$replacements[2] = 'bear';
$replacements[1] = 'black';
$replacements[0] = 'slow';
echo preg_replace($patterns, $replacements, $string);
?>
The above example will output:
The bear black slow jumps over the lazy dog.
By ksorting patterns and replacements, we should get what we wanted.
<?php
ksort($patterns);
ksort($replacements);
echo preg_replace($patterns, $replacements, $string);
?>
The above example will output:
The slow black bear jumps over the lazy dog.
Example #3 Replacing several values
<?php
$patterns = array ('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/',
'/^\s*{(\w+)}\s*=/');
$replace = array ('\3/\4/\1\2', '$\1 =');
echo preg_replace($patterns, $replace, '{startDate} = 1999-5-27');
?>
The above example will output:
$startDate = 5/27/1999
Example #4 Strip whitespace
This example strips excess whitespace from a string.
<?php
$str = 'foo o';
$str = preg_replace('/\s\s+/', ' ', $str);
// This will be 'foo o' now
echo $str;
?>
Example #5 Using the count
parameter
<?php
$count = 0;
echo preg_replace(array('/\d/', '/\s/'), '*', 'xp 4 to', -1 , $count);
echo $count; //3
?>
The above example will output:
xp***to 3
Other code examples of preg_replace being used
Post slug generator, for creating clean urls from titles.
It works with many languages.
<?php
function remove_accent($str)
{
$a = array('À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ð', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý', 'ß', 'à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'ÿ', 'Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ', 'ĉ', 'Ċ', 'ċ', 'Č', 'č', 'Ď', 'ď', 'Đ', 'đ', 'Ē', 'ē', 'Ĕ', 'ĕ', 'Ė', 'ė', 'Ę', 'ę', 'Ě', 'ě', 'Ĝ', 'ĝ', 'Ğ', 'ğ', 'Ġ', 'ġ', 'Ģ', 'ģ', 'Ĥ', 'ĥ', 'Ħ', 'ħ', 'Ĩ', 'ĩ', 'Ī', 'ī', 'Ĭ', 'ĭ', 'Į', 'į', 'İ', 'ı', 'IJ', 'ij', 'Ĵ', 'ĵ', 'Ķ', 'ķ', 'Ĺ', 'ĺ', 'Ļ', 'ļ', 'Ľ', 'ľ', 'Ŀ', 'ŀ', 'Ł', 'ł', 'Ń', 'ń', 'Ņ', 'ņ', 'Ň', 'ň', 'ʼn', 'Ō', 'ō', 'Ŏ', 'ŏ', 'Ő', 'ő', 'Œ', 'œ', 'Ŕ', 'ŕ', 'Ŗ', 'ŗ', 'Ř', 'ř', 'Ś', 'ś', 'Ŝ', 'ŝ', 'Ş', 'ş', 'Š', 'š', 'Ţ', 'ţ', 'Ť', 'ť', 'Ŧ', 'ŧ', 'Ũ', 'ũ', 'Ū', 'ū', 'Ŭ', 'ŭ', 'Ů', 'ů', 'Ű', 'ű', 'Ų', 'ų', 'Ŵ', 'ŵ', 'Ŷ', 'ŷ', 'Ÿ', 'Ź', 'ź', 'Ż', 'ż', 'Ž', 'ž', 'ſ', 'ƒ', 'Ơ', 'ơ', 'Ư', 'ư', 'Ǎ', 'ǎ', 'Ǐ', 'ǐ', 'Ǒ', 'ǒ', 'Ǔ', 'ǔ', 'Ǖ', 'ǖ', 'Ǘ', 'ǘ', 'Ǚ', 'ǚ', 'Ǜ', 'ǜ', 'Ǻ', 'ǻ', 'Ǽ', 'ǽ', 'Ǿ', 'ǿ');
$b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o');
return str_replace($a, $b, $str);
}
function post_slug($str)
{
return strtolower(preg_replace(array('/[^a-zA-Z0-9 -]/', '/[ -]+/', '/^-|-$/'),
array('', '-', ''), remove_accent($str)));
}
?>
Example: post_slug(' -Lo#&@rem IPSUM //dolor-/sit - amet-/-consectetur! 12 -- ')
will output: lorem-ipsum-dolor-sit-amet-consectetur-12
Wasted several hours because of this:
<?php
$str='It's a string with HTML entities';
preg_replace('~&#(\d+);~e', 'code2utf($1)', $str);
?>
This code must convert numeric html entities to utf8. And it does with a little exception. It treats wrong codes starting with �
The reason is that code2utf will be called with leading zero, exactly what the pattern matches - code2utf(039).
And it does matter! PHP treats 039 as octal number.
Try <?php print(011); ?>
Solution:
<?php preg_replace('~�*(\d+);~e', 'code2utf($1)', $str); ?>
If you have issues where preg_replace returns an empty string, please take a look at these two ini parameters:
pcre.backtrack_limit
pcre.recursion_limit
The default is set to 100K. If your buffer is larger than this, look to increase these two values.
<?php
//:::replace with anything that you can do with searched string:::
//Marcin Majchrzak
//pixaltic.com
$c = "2 4 8";
echo ($c); //display:2 4 8
$cp = "/(\d)\s(\d)\s(\d)/e"; //pattern
$cr = "'\\3*\\2+\\1='.(('\\3')*('\\2')+('\\1'))"; //replece
$c = preg_replace($cp, $cr, $c);
echo ($c); //display:8*4+2=34
?>
If you want to replace only the n-th occurrence of $pattern, you can use this function:
<?php
function preg_replace_nth($pattern, $replacement, $subject, $nth=1) {
return preg_replace_callback($pattern,
function($found) use (&$pattern, &$replacement, &$nth) {
$nth--;
if ($nth==0) return preg_replace($pattern, $replacement, reset($found) );
return reset($found);
}, $subject,$nth );
}
echo preg_replace_nth("/(\w+)\|/", '${1} is the 4th|', "|aa|b|cc|dd|e|ff|gg|kkk|", 4);
?>
this outputs |aa|b|cc|dd is the 4th|e|ff|gg|kkk|
backreferences are accepted in $replacement
If there's a chance your replacement text contains any strings such as "$0.95", you'll need to escape those $n backreferences:
<?php
function escape_backreference($x)
{
return preg_replace('/\$(\d)/', '\\\$$1', $x);
}
?>
This function will strip all the HTML-like content in a string.
I know you can find a lot of similar content on the web, but this one is simple, fast and robust. Don't simply use the built-in functions like strip_tags(), they dont work so good.
Careful however, this is not a correct validation of a string ; you should use additional functions like mysql_real_escape_string and filter_var, as well as custom tests before putting a submission into your database.
<?php
$html = <<<END
<div id="function.preg-split" class="refentry"> Bonjour1 \t
<div class="refnamediv"> Bonjour2 \t
<h1 class="refname">Bonjour3 \t</h1>
<h1 class=""">Bonjour4 \t</h1>
<h1 class="*%1">Bonjour5 \t</h1>
<body>Bonjour6 \t<//body>>
</ body>Bonjour7 \t<//// body>>
<
a href="image.php" alt="trans" / >
some leftover text...
< DIV class=noCompliant style = "text-align:left;" >
... and some other ...
< dIv > < empty> </ empty>
<p> This is yet another text <br >
that wasn't <b>compliant</b> too... <br />
</p>
<div class="noClass" > this one is better but we don't care anyway </div ><P>
<input type= "text" name ='my "name' value = "nothin really." readonly>
end of paragraph </p> </Div> </div> some trailing text
END;
// This echoes correctly all the text that is not inside HTML tags
$html_reg = '/<+\s*\/*\s*([A-Z][A-Z0-9]*)\b[^>]*\/*\s*>+/i';
echo htmlentities( preg_replace( $html_reg, '', $html ) );
// This extracts only a small portion of the text
echo htmlentities(strip_tags($html));
?>
There seems to be some unexpected behavior when using the /m modifier when the line terminators are win32 or mac format.
If you have a string like below, and try to replace dots, the regex won't replace correctly:
<?php
$s = "Testing, testing.\r\n"
. "Another testing line.\r\n"
. "Testing almost done.";
echo preg_replace('/\.$/m', '.@', $s); // only last . replaced
?>
The /m modifier doesn't seem to work properly when CRLFs or CRs are used. Make sure to convert line endings to LFs (*nix format) in your input string.
A variable can handle a huge quantity of data but preg_replace can't.
Example :
<?php
$url = "ANY URL WITH LOTS OF DATA";
// We get all the data into $data
$data = file_get_contents($url);
// We just want to keep the content of <head>
$head = preg_replace("#(.*)<head>(.*?)</head>(.*)#is", '$2', $data);
?>
$head can have the desired content, or be empty, depends on the length of $data.
For this application, just add :
$data = substr($data, 0, 4096);
before using preg_replace, and it will work fine.
preg_replace (and other preg-functions) return null instead of a string when encountering problems you probably did not think about!
-------------------------
It may not be obvious to everybody that the function returns NULL if an error of any kind occurres. An error I happen to stumple about quite often was the back-tracking-limit:
http://de.php.net/manual/de/pcre.configuration.php
#ini.pcre.backtrack-limit
When working with HTML-documents and their parsing it happens that you encounter documents that have a length of over 100.000 characters and that may lead to certain regular-expressions to fail due the back-tracking-limit of above.
A regular-expression that is ungreedy ("U", http://de.php.net/manual/de/reference.pcre.pattern.modifiers.php) often does the job, but still: sometimes you just need a greedy regular expression working on long strings ...
Since, an unhandled return-value of NULL usually creates a consecutive error in the application with unwanted and unforeseen consequences, I found the following solution to be quite helpful and at least save the application from crashing:
<?php
$string_after = preg_replace( '/some_regexp/', "replacement", $string_before );
// if some error occurred we go on working with the unchanged original string
if (PREG_NO_ERROR !== preg_last_error())
{
$string_after = $string_before;
// put email-sending or a log-message here
} //if
// free memory
unset( $string_before );
?>
You may or should also put a log-message or the sending of an email into the if-condition in order to get informed, once, one of your regular-expressions does not have the effect you desired it to have.
There seems to be some confusion over how greediness works. For those familiar with Regular Expressions in other languages, particularly Perl: it works like you would expect, and as documented. Greedy by default, un-greedy if you follow a quantifier with a question mark.
There is a PHP/PCRE-specific U pattern modifier that flips the greediness, so that quantifiers are by default un-greedy, and become greedy if you follow the quantifier with a question mark: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
To make things clear, a series of examples:
<?php
$preview = "a bunch of stuff <code>this that</code> and more stuff <code>with a second code block</code> then extra at the end";
$preview_default = preg_replace('/<code>(.*)<\/code>/is', "<code class=\"prettyprint\">$1</code>", $preview);
$preview_manually_ungreedy = preg_replace('/<code>(.*?)<\/code>/is', "<code class=\"prettyprint\">$1</code>", $preview);
$preview_U_default = preg_replace('/<code>(.*)<\/code>/isU', "<code class=\"prettyprint\">$1</code>", $preview);
$preview_U_manually_greedy = preg_replace('/<code>(.*?)<\/code>/isU', "<code class=\"prettyprint\">$1</code>", $preview);
echo "Default, no ?: $preview_default\n";
echo "Default, with ?: $preview_manually_ungreedy\n";
echo "U flag, no ?: $preview_U_default\n";
echo "U flag, with ?: $preview_U_manually_greedy\n";
?>
Results in this:
Default, no ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code>with a second code block</code> then extra at the end
Default, with ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code class="prettyprint">with a second code block</code> then extra at the end
U flag, no ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code class="prettyprint">with a second code block</code> then extra at the end
U flag, with ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code>with a second code block</code> then extra at the end
As expected: greedy by default, ? inverts it to ungreedy. With the U flag, un-greedy by default, ? makes it greedy.
People using functions like scandir with user input and protecting against "../" by using preg_replace make sure you run ir recursivly untill preg_match no-long finds it, because if you don't the following can happen.
If a user gives the path:
"./....//....//....//....//....//....//....//"
then your script detects every "../" and removes them leaving:
"./../../../../../../../"
Which is proberly going back enough times to show root.
I just found this vunrability in an old script of mine, which was written several years ago.
Always do:
<?php
while( preg_match( [expression], $input ) )
{
$input = preg_replace( [expression], "", $input );
}
?>
Below is a function for converting Hebrew final characters to their
normal equivelants should they appear in the middle of a word.
The /b argument does not treat Hebrew letters as part of a word,
so I had to work around that limitation.
<?php
$text="עברית מבולגנת";
function hebrewNotWordEndSwitch ($from, $to, $text) {
$text=
preg_replace('/'.$from.'([א-ת])/u','$2'.$to.'$1',$text);
return $text;
}
do {
$text_before=$text;
$text=hebrewNotWordEndSwitch("ך","כ",$text);
$text=hebrewNotWordEndSwitch("ם","מ",$text);
$text=hebrewNotWordEndSwitch("ן","נ",$text);
$text=hebrewNotWordEndSwitch("ף","פ",$text);
$text=hebrewNotWordEndSwitch("ץ","צ",$text);
} while ( $text_before!=$text );
print $text; // עברית מסודרת!
?>
The do-while is necessary for multiple instances of letters, such
as "אנני" which would start off as "אןןי". Note that there's still the
problem of acronyms with gershiim but that's not a difficult one
to solve. The code is in use at http://gibberish.co.il which you can
use to translate wrongly-encoded Hebrew, transliterize, and some
other Hebrew-related functions.
To ensure that there will be no regular characters at the end of a
word, just convert all regular characters to their final forms, then
run this function. Enjoy!
If you would like to remove a tag along with the text inside it then use the following code.
<?php
preg_replace('/(<tag>.+?)+(<\/tag>)/i', '', $string);
?>
example
<?php $string='<span class="normalprice">55 PKR</span>'; ?>
<?php
$string = preg_replace('/(<span class="normalprice">.+?)+(<\/span>)/i', '', $string);
?>
This will results a null or empty string.
<?php
$string='My String <span class="normalprice">55 PKR</span>';
$string = preg_replace('/(<span class="normalprice">.+?)+(<\/span>)/i', '', $string);
?>
This will results a " My String"
To split Pascal/CamelCase into Title Case (for example, converting descriptive class names for use in human-readable frontends), you can use the below function:
<?php
function expandCamelCase($source) {
return preg_replace('/(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])/', ' $1', $source);
}
?>
Before:
ExpandCamelCaseAPIDescriptorPHP5_3_4Version3_21Beta
After:
Expand Camel Case API Descriptor PHP 5_3_4 Version 3_21 Beta
To covert a string to SEO friendly, do this:
<?php
$realname = "This is the string to be made SEO friendly!"
$seoname = preg_replace('/\%/',' percentage',$realname);
$seoname = preg_replace('/\@/',' at ',$seoname);
$seoname = preg_replace('/\&/',' and ',$seoname);
$seoname = preg_replace('/\s[\s]+/','-',$seoname); // Strip off multiple spaces
$seoname = preg_replace('/[\s\W]+/','-',$seoname); // Strip off spaces and non-alpha-numeric
$seoname = preg_replace('/^[\-]+/','',$seoname); // Strip off the starting hyphens
$seoname = preg_replace('/[\-]+$/','',$seoname); // // Strip off the ending hyphens
$seoname = strtolower($seoname);
echo $seoname;
?>
This will print: this-is-the-string-to-be-made-seo-friendly
I use this to prevent users from overdoing repeated text. The following function only allows 3 identical characters at a time and also takes care of repetitions with whitespace added.
This means that 'haaaaaaleluuuujaaaaa' becomes 'haaaleluuujaaa' and 'I am c o o o o o o l' becomes 'I am c o o o l'
<?php
//Example of user input
$str = "aaaaaaaaaaabbccccccccaaaaad d d d d d d ddde''''''''''''";
function stripRepeat($str) {
//Do not allow repeated whitespace
$str = preg_replace("/(\s){2,}/",'$1',$str);
//Result: aaaaaaaaaaabbccccccccaaaaad d d d d d d ddde''''''''''''
//Do not allow more than 3 identical characters separated by any whitespace
$str = preg_replace('{( ?.)\1{4,}}','$1$1$1',$str);
//Final result: aaabbcccaaad d d ddde'''
return $str;
}
?>
To prevent any repetitions of characters, you only need this:
<?php
$str = preg_replace('{(.)\1+}','$1',$str);
//Result: abcad d d d d d d de'
?>
If you want to avoid removing specific tags without allowing dangerous attributes you can replace them with a custom format before using strip_tags().
For example if you want to keep <p>:
<?php
$text = "<p>hello world</p><script>alert('hacked')</script>";
$text = str_replace("<p>", "[[[temp-tag=p]]]", $text);
$text = str_replace("</p>", "[[[temp-tag=/p]]]", $text);
$text = strip_tags($text);
$text = str_replace("[[[temp-tag=p]]]", "<p>", $text);
$text = str_replace("[[[temp-tag=/p]]]", "</p>", $text);
echo $text; // displays <p>hello world</p>alert('hacked')
?>
If you wish to allow specific tags attributes use regex_replace() like so:
<?php
$text = regex_replace("/<(p class=".*?")>", "[[[temp-tag=$1]]]", $text);
?>
Be carefull doing this with href though. Make sure the atribute doesnt call javascript.
<?php
$text = preg_replace('/href="\s*javascript:.*?"/i', "", $text);
$text = regex_replace('/<(a href=".*?")>', [[[temp-tag=$1]]], $text);
?>
It may be useful to note that if you pass an associative array as the $replacement parameter, the keys are preserved.
<?php
$replaced = preg_replace('/foo/', 'bar', ['first' => 'foobar', 'second' => 'barfoo']);
// $replaced is now ['first' => 'barbar', 'second' => 'barbar'].
?>
Matching substrings where the match can exist at the end of the string was non-intuitive to me.
I found this because:
strtotime() interprets 'mon' as 'Monday', but Postgres uses interval types that return short names by default, e.g. interval '1 month' returns as '1 mon'.
I used something like this:
$str = "mon month monday Mon Monday Month MONTH MON";
$strMonth = preg_replace('~(mon)([^\w]|$)~i', '$1th$2', $str);
echo "$str\n$strMonth\n";
//to output:
mon month monday Mon Monday Month MONTH MON
month month monday Month Monday Month MONTH MONth
[Editor's note: in this case it would be wise to rely on the preg_quote() function instead which was added for this specific purpose]
If your replacement string has a dollar sign or a backslash. it may turn into a backreference accidentally! This will fix it.
I want to replace 'text' with '$12345' but this becomes a backreference to $12 (which doesn't exist) and then it prints the remaining '34'. The function down below will return a string that escapes the backreferences.
OUTPUT:
string(8) "some 345"
string(11) "some \12345"
string(8) "some 345"
string(11) "some $12345"
<?php
$a = 'some text';
// Either of these will backreference and fail
$b1 = '\12345'; // Should be '\\12345' to avoid backreference
$b2 = '$12345'; // Should be '\$12345' to avoid backreference
$d = array($b1, $b2);
foreach ($d as $b) {
$result1 = preg_replace('#(text)#', $b, $a); // Fails
var_dump($result1);
$result2 = preg_replace('#(text)#', preg_escape_back($b), $a); // Succeeds
var_dump($result2);
}
// Escape backreferences from string for use with regex
function preg_escape_back($string) {
// Replace $ with \$ and \ with \\
$string = preg_replace('#(?<!\\\\)(\\$|\\\\)#', '\\\\$1', $string);
return $string;
}
?>
Replacement of line numbers, with replacement limit per line.
Solution that worked for me.
I have a file with tasks listed each starting from number, and only starting number should be removed because forth going text has piles of numbers to be omitted.
56 Patient A of 46 years suffering ... ...
57 Newborn of 26 weeks was ...
58 Jane, having age 18 years recollects onsets of ...
...
587 Patient of 70 years ...
etc.
<?php
// Array obtained from file
$array = file($file, true);
// Decompile array with foreach loop
foreach($array as $value)
{
// Take away numbers 100-999
// Starting from biggest
//
// % Delimiter
// ^ Make match from beginning of line
// [0-9] Range of numbers
// {3} Multiplication of digit range (For tree digit numbers)
//
if(preg_match('%^[0-9]{3}%', $value))
{
// Re-assing to value its modified copy
$value = preg_replace('%^[0-9]{3}%', '-HERE WAS XXX NUMBER-', $value, 1);
}
// Take away numbers 10-99
elseif(preg_match('%^[0-9]{2}%', $value)) {
$value = preg_replace('%^[0-9]{2}%', '-HERE WAS XX NUMBER-', $value, 1);
}
// Take away numbers 0-9
elseif(preg_match('%^[0-9]%', $value)) {
$value = preg_replace('%^[0-9]%', '-HERE WAS X NUMBER-', $value, 1);
}
// Build array back
$arr[] = array($value);
}
}
?>
An alternative to the method suggested by sheri is to remember that the regex modifier '$' only looks at the end of the STRING, the example given is a single string consisting of multiple lines.
Try:
<?php
// Following is 1 string containing 3 lines
$s = "Testing, testing.\r\n"
. "Another testing line.\r\n"
. "Testing almost done.";
echo preg_replace('/\.\\r\\n/m', '@\r\n', $s);
?>
This results in the string:
Testing, testing@\r\nAnother testing line@\r\nTesting almost done.
String to filename:
<?php
function string_to_filename($word) {
$tmp = preg_replace('/^\W+|\W+$/', '', $word); // remove all non-alphanumeric chars at begin & end of string
$tmp = preg_replace('/\s+/', '_', $tmp); // compress internal whitespace and replace with _
return strtolower(preg_replace('/\W-/', '', $tmp)); // remove all non-alphanumeric chars except _ and -
}
?>
Returns a usable & readable filename.
A simple BB like thing..
<?php
function AddBB($var) {
$search = array(
'/\[b\](.*?)\[\/b\]/is',
'/\[i\](.*?)\[\/i\]/is',
'/\[u\](.*?)\[\/u\]/is',
'/\[img\](.*?)\[\/img\]/is',
'/\[url\](.*?)\[\/url\]/is',
'/\[url\=(.*?)\](.*?)\[\/url\]/is'
);
$replace = array(
'<strong>$1</strong>',
'<em>$1</em>',
'<u>$1</u>',
'<img src="$1" />',
'<a href="$1">$1</a>',
'<a href="$1">$2</a>'
);
$var = preg_replace ($search, $replace, $var);
return $var;
}
?>
<?php
//Be carefull with utf-8, even with unicode and utf-8 support enabled, a pretty odd bug occurs depending on your operating system
$str = "Hi, my name is Arié!<br />";
echo preg_replace('#\bArié\b#u', 'Gontran', $str);
//on windows system, output is "Hi, my name is Gontran<br />"
//on unix system, output is "Hi, my name is Arié<br />"
echo preg_replace('#\bArié(|\b)#u', 'Gontran', $str);
//on windows and unix system, output is "Hi, my name is Gontran<br />"
Take care when you try to strip whitespaces out of an UTF-8 text. Using something like:
<?php
$text = preg_replace( "{\s+}", ' ', $text );
?>
brokes in my case the letter à which is hex c3a0. But a0 is a whitespace. So use
<?php
$text = preg_replace( "{[ \t]+}", ' ', $text );
?>
to strip all spaces and tabs, or better, use a multibyte function like mb_ereg_replace.
<?php
$converted =
array(
//3 of special chars
'/(;)/ie',
'/(#)/ie',
'/(&)/ie',
//MySQL reserved words!
//Check mysql website!
'/(ACTION)/ie', '/(ADD)/ie', '/(ALL)/ie', '/(ALTER)/ie', '/(ANALYZE)/ie', '/(AND)/ie', '/(AS)/ie', '/(ASC)/ie',
//remaining of special chars
'/(<)/ie', '/(>)/ie', '/(\.)/ie', '/(,)/ie', '/(\?)/ie', '/(`)/ie', '/(!)/ie', '/(@)/ie', '/(\$)/ie', '/(%)/ie', '/(\^)/ie', '/(\*)/ie', '/(\()/ie', '/(\))/ie', '/(_)/ie', '/(-)/ie', '/(\+)/ie',
'/(=)/ie', '/(\/)/ie', '/(\|)/ie', '/(\\\)/ie', "/(')/ie", '/(")/ie', '/(:)/'
);
$input_text = preg_replace($converted, "UTF_to_Unicode('\\1')", $text);
function UTF_to_Unicode($data){
//return $data;
}
?>
The above example useful for filtering input data, then saving into mysql database, it's not need tobe decoded again, just use UTF-8 as charset.
Please Note escaping special chars between delimiter..
For filename tidying I prefer to only ALLOW certain characters rather than converting particular ones that we want to exclude. To this end I use ...
<?php
$allowed = "/[^a-z0-9\\040\\.\\-\\_\\\\]/i";
preg_replace($allowed,"",$str));
?>
Allows letters a-z, digits, space (\\040), hyphen (\\-), underscore (\\_) and backslash (\\\\), everything else is removed from the string.
Hi.
Not sure if this will be a great help to anyone out there, but thought i'd post just in case.
I was having an Issue with a project that relied on $_SERVER['REQUEST_URI']. Obviously this wasn't working on IIS.
(i am using mod_rewrite in apache to call up pages from a database and IIS doesn't set REQUEST_URI). So i knocked up this simple little preg_replace to use the query string set by IIS when redirecting to a PHP error page.
<?php
//My little IIS hack :)
if(!isset($_SERVER['REQUEST_URI'])){
$_SERVER['REQUEST_URI'] = preg_replace( '/404;([a-zA-Z]+:\/\/)(.*?)\//i', "/" , $_SERVER['QUERY_STRING'] );
}
?>
Hope this helps someone else out there trying to do the same thing :)
From what I can see, the problem is, that if you go straight and substitute all 'A's wit 'T's you can't tell for sure which 'T's to substitute with 'A's afterwards. This can be for instance solved by simply replacing all 'A's by another character (for instance '_' or whatever you like), then replacing all 'T's by 'A's, and then replacing all '_'s (or whatever character you chose) by 'A's:
<?php
$dna = "AGTCTGCCCTAG";
echo str_replace(array("A","G","C","T","_","-"), array("_","-","G","A","T","C"), $dna); //output will be TCAGACGGGATC
?>
Although I don't know how transliteration in perl works (though I remember that is kind of similar to the UNIX command "tr") I would suggest following function for "switching" single chars:
<?php
function switch_chars($subject,$switch_table,$unused_char="_") {
foreach ( $switch_table as $_1 => $_2 ) {
$subject = str_replace($_1,$unused_char,$subject);
$subject = str_replace($_2,$_1,$subject);
$subject = str_replace($unused_char,$_2,$subject);
}
return $subject;
}
echo switch_chars("AGTCTGCCCTAG", array("A"=>"T","G"=>"C")); //output will be TCAGACGGGATC
?>
Also worth noting is that you can use array_keys()/array_values() with preg_replace like:
<?php
$subs = array(
'/\[b\](.+)\[\/b\]/Ui' => '<strong>$1</strong>',
'/_(.+)_/Ui' => '<em>$1</em>'
...
...
);
$raw_text = '[b]this is bold[/b] and this is _italic!_';
$bb_text = preg_replace(array_keys($subs), array_values($subs), $raw_text);
?>