unicode question

Discussion in 'Programmer Misc' started by Paul Fredlein, Dec 28, 2008.

  1. Hi,

    I have a mysql database which returns, to my Cocoa app, simplified
    Chinese characters using PHP.

    The database seems to store the Chinese as double byte chars which is a
    hassle to get them in there in the first place as everything seems to
    use some kind of unicode set.

    Anyway, so far it all seems to work but I'm convinced it's just good
    luck not good management. What should I use throughout to make sure it
    doesn't break:-

    utf8_unicode_ci
    gb2312

    or what?

    Any help appreciated.

    Paul
     
    Paul Fredlein, Dec 28, 2008
    #1
    1. Advertisements

  2. Paul Fredlein

    Simon Slavin Guest

    On 28/12/2008, Paul Fredlein wrote in message
    <1isohs6.5nfxss13aqht8N%>:

    > I have a mysql database which returns, to my Cocoa app, simplified
    > Chinese characters using PHP.


    Fine.

    > The database seems to store the Chinese as double byte chars which is a
    > hassle to get them in there in the first place as everything seems to
    > use some kind of unicode set.


    You can govern which character set each table uses for storage. The
    default has changed several times over the last few versions so it's hard
    to tell which table uses each character set, but it's not hard to find out:

    http://dev.mysql.com/doc/refman/5.1/en/charset.html

    Since you're a Mac user who uses MySQL I recommend the excellent
    CocoaMySQL which makes finding this out and adjusting it extremely easy.

    My recommendation is to use Unicode for everything -- either UTF-8 or UTF-16. I know it's not what you're used to but it's what the industry is moving to and it will simplify things in the long run. PHP has lots of features which use unicode internally, and do conversion to or from it when you really need to.

    > Anyway, so far it all seems to work but I'm convinced it's just good
    > luck not good management. What should I use throughout to make sure it
    > doesn't break:-
    >
    > utf8_unicode_ci
    > gb2312


    Use the top one, which is the same as 'UTF-8' I referred to above.

    'gb2312' and all code pages like that are things that aren't needed any more. Unicode includes all charactersets in one encoding. Using unicode means you never have to switch charactersets just because a user has decided to use a different script -- Arabic, Tibetan, or whatever -- that you've never worried about before.

    Simon.
    --
    http://www.hearsay.demon.co.uk
     
    Simon Slavin, Dec 30, 2008
    #2
    1. Advertisements

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Similar Threads
  1. Martin
    Replies:
    0
    Views:
    183
    Martin
    Sep 30, 2003
  2. Peter KERR

    Office 2004 supports Unicode?

    Peter KERR, Jan 14, 2004, in forum: Apps
    Replies:
    2
    Views:
    199
    Alice Faber
    Jan 15, 2004
  3. James Meiss
    Replies:
    0
    Views:
    205
    James Meiss
    Sep 9, 2004
  4. Edward Kearns

    Unicode problem in OE 5

    Edward Kearns, Dec 3, 2004, in forum: Apps
    Replies:
    0
    Views:
    252
    Edward Kearns
    Dec 3, 2004
  5. jere7my tho?rpe

    Pages and Unicode

    jere7my tho?rpe, Jan 21, 2005, in forum: Apps
    Replies:
    14
    Views:
    304
    Sara Kirk
    Jan 23, 2005
  6. John Chambers

    Spaced-out Unicode Cyrillic text

    John Chambers, Mar 23, 2006, in forum: Apps
    Replies:
    5
    Views:
    364
    Alice Faber
    Mar 24, 2006
  7. Sithy
    Replies:
    0
    Views:
    186
    Sithy
    Jul 28, 2006
  8. Paul Fredlein

    unicode question

    Paul Fredlein, Dec 29, 2008, in forum: Programmer Help
    Replies:
    0
    Views:
    204
    Paul Fredlein
    Dec 29, 2008
Loading...