PDA

View Full Version : Dodgy Foreign Characters


monger
21-07-2008, 10:52 AM
Č

right, that letter stores and displays here ok. why cant i get it to store in my Mysql database? it converts it to a '?'

monger
21-07-2008, 10:59 AM
its U+0108 Latin Capital C with Circumflex

monger
21-07-2008, 11:01 AM
im assuming i could just convert and store as some entity and then swtch back in display but is there a simpler way to do this? or if that is the method, surely someone has already writted a script covering all the dodgy foreign letters i might need

Blak
21-07-2008, 12:04 PM
You need to change your database character encoding to UTF-8 (go for UTF8_unicode_ci if you don't know which one).

Your current encoding is probably something along the lines of latin1* or ascii, which don't support the vast majority of fancy characters, such as letters with accents / other modifiers or curly quote marks.

monger
21-07-2008, 12:13 PM
tried UTF8_unicode_ci but still converts to a ?. anything else i need to do?

changed the table collation and the whole DB collation and still the same


Warning: #1265 Data truncated for column 'country_name' at row 1
SQL query:
UPDATE `db_hitbags`.`fuji_countries` SET `country_name` = 'Česká Republika' WHERE `fuji_countries`.`country_id` =9 LIMIT 1 ;

Beanz
21-07-2008, 12:22 PM
Thread fails to deliver expected BNP/Daily Mail anti-immigration rant.

Blak
21-07-2008, 12:30 PM
Are you connecting to your mysql server using the same collation? Using a different one might result in you being served data using a different collation when unknown characters get converted to ?s.

Is the application getting the data setup to handle unicode?

Colio
21-07-2008, 12:31 PM
Thread fails to deliver expected BNP/Daily Mail anti-immigration rant.

monger obviously has very tight borders allowing no-foreign material through.

monger
21-07-2008, 01:49 PM
Are you connecting to your mysql server using the same collation? Using a different one might result in you being served data using a different collation when unknown characters get converted to ?s.

Is the application getting the data setup to handle unicode?

not usre how to check this. Im using phpmyadmin

screnshot attached if it means anything to you

Blak
21-07-2008, 02:55 PM
Your connection collation (just below the database ch****t) is UTF8, so that's fine.

Just noticed the error at the bottom: "The mbstring PHP extension was not found and you seem to be using a multibyte ch****t..". You should get that sorted as mbstring enables php to handle multibyte strings (which is what the extended characters in UFT8 are - for some characters you need more than one byte to store them, I would wager Č is one of them).


edit - rofflechops at filtering chAR SEt.

olobley
21-07-2008, 03:00 PM
Your connection collation (just below the database ch****t) is UTF8, so that's fine.

Just noticed the error at the bottom: "The mbstring PHP extension was not found and you seem to be using a multibyte ch****t..". You should get that sorted as mbstring enables php to handle multibyte strings (which is what the extended characters in UFT8 are - for some characters you need more than one byte to store them, I would wager Č is one of them).


edit - rofflechops at filtering chAR SEt.

Try again now Mong, I've added the multibyte dll to the PHP environment.

monger
21-07-2008, 03:13 PM
cheers guys. storing in the DB fine now.

Now i just gotta get my php to display it correctly.

See drop down box on:
http://fuji.mhorner.co.uk/warranty/?c=ez

when i show this line it looks even worse.
header('Content-Type: text/html; char set=utf-8'); (with no space between char and set)
link below shows what it looks like when this line is active
http://fuji.mhorner.co.uk/warranty/?c=ez&header-char-set=1

defo showing right in DB though

Blak
21-07-2008, 03:38 PM
You'll have to convert the special characters to their decimal entity equivilants (you can't just use htmlentities() as Č doesn't have a named entity like " is " ).

This function nabbed off php.net that should do the trick.

function special2decimalentity($string) {
if (strlen($string) == 0) { return $string; };

$string = preg_split("//", $string, -1, PREG_SPLIT_NO_EMPTY);

$count = count($string);
for ($i = 0; $i < $count; $i++) {
$dec = ord($string[$i]);
if ( $dec > 127 ) { $string[$i] = '&#' . $dec . ';'; };
}

return implode('',$string);
}


Or more efficient but a little lazier unless you have any more special characters...
Use mb_ereg_replace (http://uk.php.net/manual/en/function.mb-ereg-replace.php) to replace Č with it's numeric entity.

monger
21-07-2008, 03:49 PM
Thanks Blak. ok that works with the characters that were wrong by adding the header('Content-Type: text/html; char set=utf-8'); but the bloody Č is still wrong .....


i would just accept it doesnt work if it wasnt for the fact that this site is displaying it fine :)

monger
21-07-2008, 04:11 PM
Ĉ - the manual way...

& # 2 6 4 ; with no spaces


fine for this bit but not going to help me when i start to let users type stuff i nthemselves :P ill cross that bridge when i come to it :)

Blak
21-07-2008, 04:24 PM
oh tits, that was rather silly of me... preg_split would only work for single byte characters. For multibyte stuff aswell try:

function special2decimalentity($string) {
if (strlen($string) == 0) { return $string; };

$strlen = mb_strlen($string);
while ($strlen) {
$array[] = mb_substr($string,0,1,"UTF-8");
$string = mb_substr($string,1,$strlen,"UTF-8");
$strlen = mb_strlen($string);
}


$count = count($array);
for ($i = 0; $i < $count; $i++) {
$dec = ord($array[$i]);
if ( $dec > 127 ) { $array[$i] = '&#' . $dec . ';'; };
}

return implode('',$array);
}

monger
21-07-2008, 04:29 PM
nope. broke it more. and i aint gonna claim to understand any of that :)

it hasnt affected the Ĉ, and also seems to have deleted the next two characters after any characters it has affected

Blak
21-07-2008, 04:38 PM
Yeah tbh i'm poking around in the dark when it comes to mbstring stuff in php...

Beanz
21-07-2008, 04:43 PM
Just tell them that it's spelt "Czech Republic" and be done with it

monger
21-07-2008, 05:05 PM
Rasher:
any chance of getting a function in the forum that makes it so Beanz cant post on your thread ? :P

Colio
21-07-2008, 05:11 PM
Just tell them that it's spelt "Czech Republic" and be done with it
.