logo

$4
What does 'transliteration of foreign characters' mean in UTF8?

I am at a new place and they are trying to explain to me some of the decisions they made in the past. One person writes:

we tried to use Full-Text-Search on mySQL, which is only available with myISAM. However, because of the lack of transliteration of foreign characters (ñ->n) we end up using Sphinx (http://www.sphinxsearch.com/)


I am suspicious of this. They originally had their database set to a latin character encoding, which they changed, but then the character encoding of the return data was also latin, till I got them to change it. Now, finally, all of the tables are defined as having a UTF8 character encoding, and the data is returned as UTF8.

Would "transliteration of foreign characters" even be an issue if they had originally had their database tables encoded to UTF8? I mean, what is a foreign character? There are no characters that are foreign to UTF8, yes?

Lawrence Krubner | 07/14/10 at 11:36am | Edit


(1) Possible Answers Submitted...

  • avatar
    Last edited:
    07/14/10
    9:29pm
    Jarret Minkler says:

    Some, but not the ones the notes mentioned. I think what they meant was .. If the user puts in a Spanish n with the tilde, the database couldn't find it, or maybe vice versa. What they want is for the user to be able to enter any characters in but have it match up to the transliterated version that is stored in the DB. so that

    ańos matches up with anos or vice versa

    Storing it in UTF8 with the tilde mark won't exactly work if the user just enters "anos" without the tilde.

    Previous versions of this answer: 07/14/10 at 11:53am | 07/14/10 at 12:02pm

    • 07/14/10 9:29pm

      Lawrence Krubner says:

      K, I think you are right. I completely misunderstood what they were saying.

This question has expired.





Current status of this question: Completed