Documentation on mb_split
mb_split = Split multibyte string using regular expression
pattern The regular expression pattern. string The string being split. limit If optional parameter limit is specified, it will be split in limit elements as maximum.
Usage, params, and more on mb_split
array mb_split ( string $pattern
, string $string
[, int $limit
= -1 ] )
pattern
The regular expression pattern. string
The string being split. limit
If optional parameter limit
is specified, it will be split in limit
elements as maximum.
The result as an array.
Notes and warnings on mb_split
Other code examples of mb_split being used
I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here's a function I'm using to do that. Change UTF-8 to your chosen encoding method.
<?php
function mbStringToArray ($string) {
$strlen = mb_strlen($string);
while ($strlen) {
$array[] = mb_substr($string,0,1,"UTF-8");
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}
return $array;
}
?>
In addition to Sezer Yalcin's tip.
This function splits a multibyte string into an array of characters. Comparable to str_split().
<?php
function mb_str_split( $string ) {
# Split at all position not after the start: ^
# and not before the end: $
return preg_split('/(?<!^)(?!$)/u', $string );
}
$string = '火车票';
$charlist = mb_str_split( $string );
print_r( $charlist );
?>
# Prints:
Array
(
[0] => 火
[1] => 车
[2] => 票
)
The $pattern argument doesn't use /pattern/ delimiters, unlike other regex functions such as preg_match.
<?php
# Works. No slashes around the /pattern/
print_r( mb_split("\s", "hello world") );
Array (
[0] => hello
[1] => world
)
# Doesn't work:
print_r( mb_split("/\s/", "hello world") );
Array (
[0] => hello world
)
?>
a (simpler) way to extract all characters from a UTF-8 string to array with a single call to a built-in function:
<?php
$str = 'Ма-
руся';
print_r(preg_split('//u', $str, null, PREG_SPLIT_NO_EMPTY));
?>
Output:
Array
(
[0] => М
[1] => а
[2] => -
[3] =>
[4] => р
[5] => у
[6] => с
[7] => я
)
an other way to str_split multibyte string:
<?php
$s='әӘөүҗңһ';
//$temp_s=iconv('UTF-8','UTF-16',$s);
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_a_len=count($temp_a);
for($i=0;$i<$temp_a_len;$i++){
//$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]);
$temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16');
}
echo('<pre>');
print_r($temp_a);
echo('</pre>');
//also possible to directly use UTF-16:
define('SLS',mb_convert_encoding('/','UTF-16'));
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_s=implode(SLS,$temp_a);
$temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16');
echo($temp_s);
?>