Sophie

Sophie

distrib > Mandriva > 2010.0 > x86_64 > by-pkgid > 3f284ad159764cba3dedf5c92c7ab124 > files > 276

mysql-doc-5.1.40-1mdv2010.0.x86_64.rpm

<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><title>Chapter 9. Internationalization and Localization</title><link rel="stylesheet" href="mysql-html.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.69.1"><link rel="start" href="index.html" title="MySQL 5.1 Reference Manual"><link rel="up" href="index.html" title="MySQL 5.1 Reference Manual"><link rel="prev" href="language-structure.html" title="Chapter 8. Language Structure"><link rel="next" href="data-types.html" title="Chapter 10. Data Types"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 9. Internationalization and Localization</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="language-structure.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="data-types.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="internationalization-localization"></a>Chapter 9. Internationalization and Localization</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="internationalization-localization.html#charset">9.1. Character Set Support</a></span></dt><dd><dl><dt><span class="section"><a href="internationalization-localization.html#charset-general">9.1.1. Character Sets and Collations in General</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-mysql">9.1.2. Character Sets and Collations in MySQL</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-syntax">9.1.3. Specifying Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-connection">9.1.4. Connection Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-applications">9.1.5. Configuring the Character Set and Collation for Applications</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-errors">9.1.6. Character Set for Error Messages</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collations">9.1.7. Collation Issues</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-repertoire">9.1.8. String Repertoire</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-operations">9.1.9. Operations Affected by Character Set Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-unicode">9.1.10. Unicode Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-metadata">9.1.11. UTF-8 for Metadata</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-conversion">9.1.12. Column Character Set Conversion</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-charsets">9.1.13. Character Sets and Collations That MySQL Supports</a></span></dt></dl></dd><dt><span class="section"><a href="internationalization-localization.html#charset-configuration">9.2. The Character Set Used for Data and Sorting</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#error-message-language">9.3. Setting the Error Message Language</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-character-set">9.4. Adding a New Character Set</a></span></dt><dd><dl><dt><span class="section"><a href="internationalization-localization.html#character-arrays">9.4.1. The Character Definition Arrays</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#string-collating">9.4.2. String Collating Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#multi-byte-characters">9.4.3. Multi-Byte Character Support</a></span></dt></dl></dd><dt><span class="section"><a href="internationalization-localization.html#adding-collation">9.5. How to Add a New Collation to a Character Set</a></span></dt><dd><dl><dt><span class="section"><a href="internationalization-localization.html#charset-collation-implementations">9.5.1. Collation Implementation Types</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-collation-choosing-id">9.5.2. Choosing a Collation ID</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-collation-simple-8bit">9.5.3. Adding a Simple Collation to an 8-Bit Character Set</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-collation-unicode-uca">9.5.4. Adding a UCA Collation to a Unicode Character Set</a></span></dt></dl></dd><dt><span class="section"><a href="internationalization-localization.html#problems-with-character-sets">9.6. Problems With Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#time-zone-support">9.7. MySQL Server Time Zone Support</a></span></dt><dd><dl><dt><span class="section"><a href="internationalization-localization.html#time-zone-upgrades">9.7.1. Staying Current with Time Zone Changes</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#time-zone-leap-seconds">9.7.2. Time Zone Leap Second Support</a></span></dt></dl></dd><dt><span class="section"><a href="internationalization-localization.html#locale-support">9.8. MySQL Server Locale Support</a></span></dt></dl></div><a class="indexterm" name="id4308669"></a><a class="indexterm" name="id4308678"></a><p>
    This chapter covers issues of internationalization (MySQL's
    capabilities for adapting to local use) and localization (selecting
    particular local conventions):
  </p><div class="itemizedlist"><ul type="disc"><li><p>
        MySQL support for character sets in SQL statements.
      </p></li><li><p>
        How to configure the server to support different character sets.
      </p></li><li><p>
        Selecting the language for error messages.
      </p></li><li><p>
        How to set the server's time zone and enable per-connection time
        zone support.
      </p></li><li><p>
        Selecting the locale for day and month names.
      </p></li></ul></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="charset"></a>9.1. Character Set Support</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-general">9.1.1. Character Sets and Collations in General</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-mysql">9.1.2. Character Sets and Collations in MySQL</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-syntax">9.1.3. Specifying Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-connection">9.1.4. Connection Character Sets and Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-applications">9.1.5. Configuring the Character Set and Collation for Applications</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-errors">9.1.6. Character Set for Error Messages</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collations">9.1.7. Collation Issues</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-repertoire">9.1.8. String Repertoire</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-operations">9.1.9. Operations Affected by Character Set Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-unicode">9.1.10. Unicode Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-metadata">9.1.11. UTF-8 for Metadata</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-conversion">9.1.12. Column Character Set Conversion</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-charsets">9.1.13. Character Sets and Collations That MySQL Supports</a></span></dt></dl></div><a class="indexterm" name="id4308738"></a><a class="indexterm" name="id4308747"></a><a class="indexterm" name="id4308756"></a><a class="indexterm" name="id4308765"></a><p>
      MySQL includes character set support that enables you to store
      data using a variety of character sets and perform comparisons
      according to a variety of collations. You can specify character
      sets at the server, database, table, and column level. MySQL
      supports the use of character sets for the
      <code class="literal">MyISAM</code>, <code class="literal">MEMORY</code>,
      <code class="literal">NDBCLUSTER</code>, and
      <code class="literal">InnoDB</code> storage engines.
    </p><p>
      This chapter discusses the following topics:
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          What are character sets and collations?
        </p></li><li><p>
          The multiple-level default system for character set assignment
        </p></li><li><p>
          Syntax for specifying character sets and collations
        </p></li><li><p>
          Affected functions and operations
        </p></li><li><p>
          Unicode support
        </p></li><li><p>
          The character sets and collations that are available, with
          notes
        </p></li></ul></div><p>
      Character set issues affect not only data storage, but also
      communication between client programs and the MySQL server. If you
      want the client program to communicate with the server using a
      character set different from the default, you'll need to indicate
      which one. For example, to use the <code class="literal">utf8</code> Unicode
      character set, issue this statement after connecting to the
      server:
    </p><pre class="programlisting">SET NAMES 'utf8';
</pre><p>
      For more information about configuring character sets for
      application use and character set-related issues in client/server
      communication, see <a href="internationalization-localization.html#charset-applications" title="9.1.5. Configuring the Character Set and Collation for Applications">Section 9.1.5, “Configuring the Character Set and Collation for Applications”</a>, and
      <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.
    </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-general"></a>9.1.1. Character Sets and Collations in General</h3></div></div></div><p>
        A <em class="firstterm">character set</em> is a set of symbols and
        encodings. A <em class="firstterm">collation</em> is a set of rules
        for comparing characters in a character set. Let's make the
        distinction clear with an example of an imaginary character set.
      </p><p>
        Suppose that we have an alphabet with four letters:
        “<span class="quote"><code class="literal">A</code></span>”,
        “<span class="quote"><code class="literal">B</code></span>”,
        “<span class="quote"><code class="literal">a</code></span>”,
        “<span class="quote"><code class="literal">b</code></span>”. We give each letter a
        number: “<span class="quote"><code class="literal">A</code></span>” = 0,
        “<span class="quote"><code class="literal">B</code></span>” = 1,
        “<span class="quote"><code class="literal">a</code></span>” = 2,
        “<span class="quote"><code class="literal">b</code></span>” = 3. The letter
        “<span class="quote"><code class="literal">A</code></span>” is a symbol, the number 0 is
        the <span class="bold"><strong>encoding</strong></span> for
        “<span class="quote"><code class="literal">A</code></span>”, and the combination of all
        four letters and their encodings is a
        <span class="bold"><strong>character set</strong></span>.
      </p><p>
        Suppose that we want to compare two string values,
        “<span class="quote"><code class="literal">A</code></span>” and
        “<span class="quote"><code class="literal">B</code></span>”. The simplest way to do this
        is to look at the encodings: 0 for
        “<span class="quote"><code class="literal">A</code></span>” and 1 for
        “<span class="quote"><code class="literal">B</code></span>”. Because 0 is less than 1,
        we say “<span class="quote"><code class="literal">A</code></span>” is less than
        “<span class="quote"><code class="literal">B</code></span>”. What we've just done is
        apply a collation to our character set. The collation is a set
        of rules (only one rule in this case): “<span class="quote">compare the
        encodings.</span>” We call this simplest of all possible
        collations a <em class="firstterm">binary</em> collation.
      </p><p>
        But what if we want to say that the lowercase and uppercase
        letters are equivalent? Then we would have at least two rules:
        (1) treat the lowercase letters
        “<span class="quote"><code class="literal">a</code></span>” and
        “<span class="quote"><code class="literal">b</code></span>” as equivalent to
        “<span class="quote"><code class="literal">A</code></span>” and
        “<span class="quote"><code class="literal">B</code></span>”; (2) then compare the
        encodings. We call this a
        <em class="firstterm">case-insensitive</em> collation. It is a
        little more complex than a binary collation.
      </p><p>
        In real life, most character sets have many characters: not just
        “<span class="quote"><code class="literal">A</code></span>” and
        “<span class="quote"><code class="literal">B</code></span>” but whole alphabets,
        sometimes multiple alphabets or eastern writing systems with
        thousands of characters, along with many special symbols and
        punctuation marks. Also in real life, most collations have many
        rules, not just for whether to distinguish lettercase, but also
        for whether to distinguish accents (an “<span class="quote">accent</span>” is
        a mark attached to a character as in German
        “<span class="quote"><code class="literal">Ö</code></span>”), and for
        multiple-character mappings (such as the rule that
        “<span class="quote"><code class="literal">Ö</code></span>” =
        “<span class="quote"><code class="literal">OE</code></span>” in one of the two German
        collations).
      </p><p>
        MySQL can do these things for you:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            Store strings using a variety of character sets
          </p></li><li><p>
            Compare strings using a variety of collations
          </p></li><li><p>
            Mix strings with different character sets or collations in
            the same server, the same database, or even the same table
          </p></li><li><p>
            Allow specification of character set and collation at any
            level
          </p></li></ul></div><p>
        In these respects, MySQL is far ahead of most other database
        management systems. However, to use these features effectively,
        you need to know what character sets and collations are
        available, how to change the defaults, and how they affect the
        behavior of string operators and functions.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-mysql"></a>9.1.2. Character Sets and Collations in MySQL</h3></div></div></div><p>
        The MySQL server can support multiple character sets. To list
        the available character sets, use the <a href="sql-syntax.html#show-character-set" title="12.5.5.4. SHOW CHARACTER SET Syntax"><code class="literal">SHOW
        CHARACTER SET</code></a> statement. A partial listing follows.
        For more complete information, see
        <a href="internationalization-localization.html#charset-charsets" title="9.1.13. Character Sets and Collations That MySQL Supports">Section 9.1.13, “Character Sets and Collations That MySQL Supports”</a>.
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW CHARACTER SET;</code></strong>
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
...
</pre><p>
        Any given character set always has at least one collation. It
        may have several collations. To list the collations for a
        character set, use the <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW
        COLLATION</code></a> statement. For example, to see the
        collations for the <code class="literal">latin1</code> (cp1252 West
        European) character set, use this statement to find those
        collation names that begin with <code class="literal">latin1</code>:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW COLLATION LIKE 'latin1%';</code></strong>
+---------------------+---------+----+---------+----------+---------+
| Collation           | Charset | Id | Default | Compiled | Sortlen |
+---------------------+---------+----+---------+----------+---------+
| latin1_german1_ci   | latin1  |  5 |         |          |       0 |
| latin1_swedish_ci   | latin1  |  8 | Yes     | Yes      |       1 |
| latin1_danish_ci    | latin1  | 15 |         |          |       0 |
| latin1_german2_ci   | latin1  | 31 |         | Yes      |       2 |
| latin1_bin          | latin1  | 47 |         | Yes      |       1 |
| latin1_general_ci   | latin1  | 48 |         |          |       0 |
| latin1_general_cs   | latin1  | 49 |         |          |       0 |
| latin1_spanish_ci   | latin1  | 94 |         |          |       0 |
+---------------------+---------+----+---------+----------+---------+
</pre><p>
        The <code class="literal">latin1</code> collations have the following
        meanings.
      </p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td><span class="bold"><strong>Collation</strong></span></td><td><span class="bold"><strong>Meaning</strong></span></td></tr><tr><td><code class="literal">latin1_german1_ci</code></td><td>German DIN-1</td></tr><tr><td><code class="literal">latin1_swedish_ci</code></td><td>Swedish/Finnish</td></tr><tr><td><code class="literal">latin1_danish_ci</code></td><td>Danish/Norwegian</td></tr><tr><td><code class="literal">latin1_german2_ci</code></td><td>German DIN-2</td></tr><tr><td><code class="literal">latin1_bin</code></td><td>Binary according to <code class="literal">latin1</code> encoding</td></tr><tr><td><code class="literal">latin1_general_ci</code></td><td>Multilingual (Western European)</td></tr><tr><td><code class="literal">latin1_general_cs</code></td><td>Multilingual (ISO Western European), case sensitive</td></tr><tr><td><code class="literal">latin1_spanish_ci</code></td><td>Modern Spanish</td></tr></tbody></table></div><p>
        Collations have these general characteristics:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            Two different character sets cannot have the same collation.
          </p></li><li><p>
            Each character set has one collation that is the
            <span class="emphasis"><em>default collation</em></span>. For example, the
            default collation for <code class="literal">latin1</code> is
            <code class="literal">latin1_swedish_ci</code>. The output for
            <a href="sql-syntax.html#show-character-set" title="12.5.5.4. SHOW CHARACTER SET Syntax"><code class="literal">SHOW CHARACTER SET</code></a> indicates
            which collation is the default for each displayed character
            set.
          </p></li><li><p>
            There is a convention for collation names: They start with
            the name of the character set with which they are
            associated, they usually include a language name, and they
            end with <code class="literal">_ci</code> (case insensitive),
            <code class="literal">_cs</code> (case sensitive), or
            <code class="literal">_bin</code> (binary).
          </p></li></ul></div><p>
        In cases where a character set has multiple collations, it might
        not be clear which collation is most suitable for a given
        application. To avoid choosing the wrong collation, it can be
        helpful to perform some comparisons with representative data
        values to make sure that a given collation sorts values the way
        you expect.
      </p><p>
        <a href="http://www.collation-charts.org/" target="_top">Collation-Charts.Org</a>
        is a useful site for information that shows how one collation
        compares to another.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-syntax"></a>9.1.3. Specifying Character Sets and Collations</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-server">9.1.3.1. Server Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-database">9.1.3.2. Database Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-table">9.1.3.3. Table Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-column">9.1.3.4. Column Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-literal">9.1.3.5. Character String Literal Character Set and Collation</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-national">9.1.3.6. National Character Set</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-examples">9.1.3.7. Examples of Character Set and Collation Assignment</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-compatibility">9.1.3.8. Compatibility with Other DBMSs</a></span></dt></dl></div><p>
        There are default settings for character sets and collations at
        four levels: server, database, table, and column. The
        description in the following sections may appear complex, but it
        has been found in practice that multiple-level defaulting leads
        to natural and obvious results.
      </p><p>
        <code class="literal">CHARACTER SET</code> is used in clauses that specify
        a character set. <code class="literal">CHARSET</code> can be used as a
        synonym for <code class="literal">CHARACTER SET</code>.
      </p><p>
        Character set issues affect not only data storage, but also
        communication between client programs and the MySQL server. If
        you want the client program to communicate with the server using
        a character set different from the default, you'll need to
        indicate which one. For example, to use the
        <code class="literal">utf8</code> Unicode character set, issue this
        statement after connecting to the server:
      </p><pre class="programlisting">SET NAMES 'utf8';
</pre><p>
        For more information about character set-related issues in
        client/server communication, see
        <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.
      </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-server"></a>9.1.3.1. Server Character Set and Collation</h4></div></div></div><p>
          MySQL Server has a server character set and a server
          collation. These can be set at server startup on the command
          line or in an option file and changed at runtime.
        </p><p>
          Initially, the server character set and collation depend on
          the options that you use when you start
          <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a>. You can use
          <a href="server-administration.html#option_mysqld_character-set-server"><code class="option">--character-set-server</code></a> for the
          character set. Along with it, you can add
          <a href="server-administration.html#option_mysqld_collation-server"><code class="option">--collation-server</code></a> for the
          collation. If you don't specify a character set, that is the
          same as saying
          <a href="server-administration.html#option_mysqld_character-set-server"><code class="option">--character-set-server=latin1</code></a>.
          If you specify only a character set (for example,
          <code class="literal">latin1</code>) but not a collation, that is the
          same as saying
          <a href="server-administration.html#option_mysqld_character-set-server"><code class="option">--character-set-server=latin1</code></a>
          <a href="server-administration.html#option_mysqld_collation-server"><code class="option">--collation-server=latin1_swedish_ci</code></a>
          because <code class="literal">latin1_swedish_ci</code> is the default
          collation for <code class="literal">latin1</code>. Therefore, the
          following three commands all have the same effect:
        </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>mysqld</code></strong>
shell&gt; <strong class="userinput"><code>mysqld --character-set-server=latin1</code></strong>
shell&gt; <strong class="userinput"><code>mysqld --character-set-server=latin1 \</code></strong>
           <strong class="userinput"><code>--collation-server=latin1_swedish_ci</code></strong>
</pre><p>
          One way to change the settings is by recompiling. If you want
          to change the default server character set and collation when
          building from sources, use:
          <a href="installing.html#option_configure_with-charset"><code class="option">--with-charset</code></a> and
          <a href="installing.html#option_configure_with-collation"><code class="option">--with-collation</code></a> as
          arguments for <span><strong class="command">configure</strong></span>. For example:
        </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>./configure --with-charset=latin1</code></strong>
</pre><p>
          Or:
        </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>./configure --with-charset=latin1 \</code></strong>
           <strong class="userinput"><code>--with-collation=latin1_german1_ci</code></strong>
</pre><p>
          Both <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> and
          <span><strong class="command">configure</strong></span> verify that the character
          set/collation combination is valid. If not, each program
          displays an error message and terminates.
        </p><p>
          The server character set and collation are used as default
          values if the database character set and collation are not
          specified in <a href="sql-syntax.html#create-database" title="12.1.10. CREATE DATABASE Syntax"><code class="literal">CREATE DATABASE</code></a>
          statements. They have no other purpose.
        </p><p>
          The current server character set and collation can be
          determined from the values of the
          <a href="server-administration.html#sysvar_character_set_server"><code class="literal">character_set_server</code></a> and
          <a href="server-administration.html#sysvar_collation_server"><code class="literal">collation_server</code></a> system
          variables. These variables can be changed at runtime.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-database"></a>9.1.3.2. Database Character Set and Collation</h4></div></div></div><p>
          Every database has a database character set and a database
          collation. The <a href="sql-syntax.html#create-database" title="12.1.10. CREATE DATABASE Syntax"><code class="literal">CREATE DATABASE</code></a>
          and <a href="sql-syntax.html#alter-database" title="12.1.1. ALTER DATABASE Syntax"><code class="literal">ALTER DATABASE</code></a> statements
          have optional clauses for specifying the database character
          set and collation:
        </p><pre class="programlisting">CREATE DATABASE <em class="replaceable"><code>db_name</code></em>
    [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>]
    [[DEFAULT] COLLATE <em class="replaceable"><code>collation_name</code></em>]

ALTER DATABASE <em class="replaceable"><code>db_name</code></em>
    [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>]
    [[DEFAULT] COLLATE <em class="replaceable"><code>collation_name</code></em>]
</pre><p>
          The keyword <code class="literal">SCHEMA</code> can be used instead of
          <code class="literal">DATABASE</code>.
        </p><p>
          All database options are stored in a text file named
          <code class="filename">db.opt</code> that can be found in the database
          directory.
        </p><p>
          The <code class="literal">CHARACTER SET</code> and
          <code class="literal">COLLATE</code> clauses make it possible to create
          databases with different character sets and collations on the
          same MySQL server.
        </p><p>
          Example:
        </p><pre class="programlisting">CREATE DATABASE <em class="replaceable"><code>db_name</code></em> CHARACTER SET latin1 COLLATE latin1_swedish_ci;
</pre><p>
          MySQL chooses the database character set and database
          collation in the following manner:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              If both <code class="literal">CHARACTER SET
              <em class="replaceable"><code>X</code></em></code> and
              <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code>
              are specified, character set <em class="replaceable"><code>X</code></em>
              and collation <em class="replaceable"><code>Y</code></em> are used.
            </p></li><li><p>
              If <code class="literal">CHARACTER SET
              <em class="replaceable"><code>X</code></em></code> is specified
              without <code class="literal">COLLATE</code>, character set
              <em class="replaceable"><code>X</code></em> and its default collation are
              used. To see the default collation for each character set,
              use the <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW COLLATION</code></a>
              statement.
            </p></li><li><p>
              If <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code>
              is specified without <code class="literal">CHARACTER SET</code>, the
              character set associated with <em class="replaceable"><code>Y</code></em>
              and collation <em class="replaceable"><code>Y</code></em> are used.
            </p></li><li><p>
              Otherwise, the server character set and server collation
              are used.
            </p></li></ul></div><p>
          The database character set and collation are used as default
          values for table definitions if the table character set and
          collation are not specified in <a href="sql-syntax.html#create-table" title="12.1.17. CREATE TABLE Syntax"><code class="literal">CREATE
          TABLE</code></a> statements. The database character set also is
          used by <a href="sql-syntax.html#load-data" title="12.2.6. LOAD DATA INFILE
      Syntax"><code class="literal">LOAD DATA
          INFILE</code></a>. The character set and collation have no
          other purposes.
        </p><p>
          The character set and collation for the default database can
          be determined from the values of the
          <a href="server-administration.html#sysvar_character_set_database"><code class="literal">character_set_database</code></a> and
          <a href="server-administration.html#sysvar_collation_database"><code class="literal">collation_database</code></a> system
          variables. The server sets these variables whenever the
          default database changes. If there is no default database, the
          variables have the same value as the corresponding
          server-level system variables,
          <a href="server-administration.html#sysvar_character_set_server"><code class="literal">character_set_server</code></a> and
          <a href="server-administration.html#sysvar_collation_server"><code class="literal">collation_server</code></a>.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-table"></a>9.1.3.3. Table Character Set and Collation</h4></div></div></div><p>
          Every table has a table character set and a table collation.
          The <a href="sql-syntax.html#create-table" title="12.1.17. CREATE TABLE Syntax"><code class="literal">CREATE TABLE</code></a> and
          <a href="sql-syntax.html#alter-table" title="12.1.7. ALTER TABLE Syntax"><code class="literal">ALTER TABLE</code></a> statements have
          optional clauses for specifying the table character set and
          collation:
        </p><pre class="programlisting">CREATE TABLE <em class="replaceable"><code>tbl_name</code></em> (<em class="replaceable"><code>column_list</code></em>)
    [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>]
    [COLLATE <em class="replaceable"><code>collation_name</code></em>]]

ALTER TABLE <em class="replaceable"><code>tbl_name</code></em>
    [[DEFAULT] CHARACTER SET <em class="replaceable"><code>charset_name</code></em>]
    [COLLATE <em class="replaceable"><code>collation_name</code></em>]
</pre><p>
          Example:
        </p><pre class="programlisting">CREATE TABLE t1 ( ... )
CHARACTER SET latin1 COLLATE latin1_danish_ci;
</pre><p>
          MySQL chooses the table character set and collation in the
          following manner:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              If both <code class="literal">CHARACTER SET
              <em class="replaceable"><code>X</code></em></code> and
              <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code>
              are specified, character set <em class="replaceable"><code>X</code></em>
              and collation <em class="replaceable"><code>Y</code></em> are used.
            </p></li><li><p>
              If <code class="literal">CHARACTER SET
              <em class="replaceable"><code>X</code></em></code> is specified
              without <code class="literal">COLLATE</code>, character set
              <em class="replaceable"><code>X</code></em> and its default collation are
              used. To see the default collation for each character set,
              use the <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW COLLATION</code></a>
              statement.
            </p></li><li><p>
              If <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code>
              is specified without <code class="literal">CHARACTER SET</code>, the
              character set associated with <em class="replaceable"><code>Y</code></em>
              and collation <em class="replaceable"><code>Y</code></em> are used.
            </p></li><li><p>
              Otherwise, the database character set and collation are
              used.
            </p></li></ul></div><p>
          The table character set and collation are used as default
          values for column definitions if the column character set and
          collation are not specified in individual column definitions.
          The table character set and collation are MySQL extensions;
          there are no such things in standard SQL.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-column"></a>9.1.3.4. Column Character Set and Collation</h4></div></div></div><p>
          Every “<span class="quote">character</span>” column (that is, a column of
          type <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">CHAR</code></a>,
          <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">VARCHAR</code></a>, or
          <a href="data-types.html#blob" title="10.4.3. The BLOB and
        TEXT Types"><code class="literal">TEXT</code></a>) has a column character
          set and a column collation. Column definition syntax for
          <a href="sql-syntax.html#create-table" title="12.1.17. CREATE TABLE Syntax"><code class="literal">CREATE TABLE</code></a> and
          <a href="sql-syntax.html#alter-table" title="12.1.7. ALTER TABLE Syntax"><code class="literal">ALTER TABLE</code></a> has optional
          clauses for specifying the column character set and collation:
        </p><pre class="programlisting"><em class="replaceable"><code>col_name</code></em> {CHAR | VARCHAR | TEXT} (<em class="replaceable"><code>col_length</code></em>)
    [CHARACTER SET <em class="replaceable"><code>charset_name</code></em>]
    [COLLATE <em class="replaceable"><code>collation_name</code></em>]
</pre><p>
          These clauses can also be used for
          <a href="data-types.html#enum" title="10.4.4. The ENUM Type"><code class="literal">ENUM</code></a> and
          <a href="data-types.html#set" title="10.4.5. The SET Type"><code class="literal">SET</code></a> columns:
        </p><pre class="programlisting"><em class="replaceable"><code>col_name</code></em> {ENUM | SET} (<em class="replaceable"><code>val_list</code></em>)
    [CHARACTER SET <em class="replaceable"><code>charset_name</code></em>]
    [COLLATE <em class="replaceable"><code>collation_name</code></em>]
</pre><p>
          Examples:
        </p><pre class="programlisting">CREATE TABLE t1
(
    col1 VARCHAR(5)
      CHARACTER SET latin1
      COLLATE latin1_german1_ci
);

ALTER TABLE t1 MODIFY
    col1 VARCHAR(5)
      CHARACTER SET latin1
      COLLATE latin1_swedish_ci;
</pre><p>
          MySQL chooses the column character set and collation in the
          following manner:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              If both <code class="literal">CHARACTER SET
              <em class="replaceable"><code>X</code></em></code> and
              <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code>
              are specified, character set <em class="replaceable"><code>X</code></em>
              and collation <em class="replaceable"><code>Y</code></em> are used.
            </p><pre class="programlisting">CREATE TABLE t1
(
    col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci
) CHARACTER SET latin1 COLLATE latin1_bin;
</pre><p>
              The character set and collation are specified for the
              column, so they are used. The column has character set
              <code class="literal">utf8</code> and collation
              <code class="literal">utf8_unicode_ci</code>.
            </p></li><li><p>
              If <code class="literal">CHARACTER SET
              <em class="replaceable"><code>X</code></em></code> is specified
              without <code class="literal">COLLATE</code>, character set
              <em class="replaceable"><code>X</code></em> and its default collation are
              used.
            </p><pre class="programlisting">CREATE TABLE t1
(
    col1 CHAR(10) CHARACTER SET utf8
) CHARACTER SET latin1 COLLATE latin1_bin;
</pre><p>
              The character set is specified for the column, but the
              collation is not. The column has character set
              <code class="literal">utf8</code> and the default collation for
              <code class="literal">utf8</code>, which is
              <code class="literal">utf8_general_ci</code>. To see the default
              collation for each character set, use the
              <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW COLLATION</code></a> statement.
            </p></li><li><p>
              If <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code>
              is specified without <code class="literal">CHARACTER SET</code>, the
              character set associated with <em class="replaceable"><code>Y</code></em>
              and collation <em class="replaceable"><code>Y</code></em> are used.
            </p><pre class="programlisting">CREATE TABLE t1
(
    col1 CHAR(10) COLLATE utf8_polish_ci
) CHARACTER SET latin1 COLLATE latin1_bin;
</pre><p>
              The collation is specified for the column, but the
              character set is not. The column has collation
              <code class="literal">utf8_polish_ci</code> and the character set is
              the one associated with the collation, which is
              <code class="literal">utf8</code>.
            </p></li><li><p>
              Otherwise, the table character set and collation are used.
            </p><pre class="programlisting">CREATE TABLE t1
(
    col1 CHAR(10)
) CHARACTER SET latin1 COLLATE latin1_bin;
</pre><p>
              Neither the character set nor collation are specified for
              the column, so the table defaults are used. The column has
              character set <code class="literal">latin1</code> and collation
              <code class="literal">latin1_bin</code>.
            </p></li></ul></div><p>
          The <code class="literal">CHARACTER SET</code> and
          <code class="literal">COLLATE</code> clauses are standard SQL.
        </p><p>
          If you use <a href="sql-syntax.html#alter-table" title="12.1.7. ALTER TABLE Syntax"><code class="literal">ALTER TABLE</code></a> to
          convert a column from one character set to another, MySQL
          attempts to map the data values, but if the character sets are
          incompatible, there may be data loss.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-literal"></a>9.1.3.5. Character String Literal Character Set and Collation</h4></div></div></div><p>
          Every character string literal has a character set and a
          collation.
        </p><p>
          A character string literal may have an optional character set
          introducer and <code class="literal">COLLATE</code> clause:
        </p><a class="indexterm" name="id4310649"></a><a class="indexterm" name="id4310662"></a><pre class="programlisting">[_<em class="replaceable"><code>charset_name</code></em>]'<em class="replaceable"><code>string</code></em>' [COLLATE <em class="replaceable"><code>collation_name</code></em>]
</pre><p>
          Examples:
        </p><pre class="programlisting">SELECT '<em class="replaceable"><code>string</code></em>';
SELECT _latin1'<em class="replaceable"><code>string</code></em>';
SELECT _latin1'<em class="replaceable"><code>string</code></em>' COLLATE latin1_danish_ci;
</pre><p>
          For the simple statement <code class="literal">SELECT
          '<em class="replaceable"><code>string</code></em>'</code>, the string has
          the character set and collation defined by the
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> and
          <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> system
          variables.
        </p><p>
          The
          <code class="literal">_<em class="replaceable"><code>charset_name</code></em></code>
          expression is formally called an
          <span class="emphasis"><em>introducer</em></span>. It tells the parser,
          “<span class="quote">the string that is about to follow uses character set
          <em class="replaceable"><code>X</code></em>.</span>” Because this has
          confused people in the past, we emphasize that an introducer
          does not change the string to the introducer character set
          like <a href="functions.html#function_convert"><code class="literal">CONVERT()</code></a> would do. It
          does not change the string's value, although padding may
          occur. The introducer is just a signal. An introducer is also
          legal before standard hex literal and numeric hex literal
          notation
          (<code class="literal">x'<em class="replaceable"><code>literal</code></em>'</code> and
          <code class="literal">0x<em class="replaceable"><code>nnnn</code></em></code>), or
          before bit-field literal notation
          (<code class="literal">b'<em class="replaceable"><code>literal</code></em>'</code> and
          <code class="literal">0b<em class="replaceable"><code>nnnn</code></em></code>).
        </p><p>
          Examples:
        </p><pre class="programlisting">SELECT _latin1 x'AABBCC';
SELECT _latin1 0xAABBCC;
SELECT _latin1 b'1100011';
SELECT _latin1 0b1100011;
</pre><p>
          MySQL determines a literal's character set and collation in
          the following manner:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              If both <em class="replaceable"><code>_X</code></em> and <code class="literal">COLLATE
              <em class="replaceable"><code>Y</code></em></code> are specified,
              character set <em class="replaceable"><code>X</code></em> and collation
              <em class="replaceable"><code>Y</code></em> are used.
            </p></li><li><p>
              If <em class="replaceable"><code>_X</code></em> is specified but
              <code class="literal">COLLATE</code> is not specified, character set
              <em class="replaceable"><code>X</code></em> and its default collation are
              used. To see the default collation for each character set,
              use the <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW COLLATION</code></a>
              statement.
            </p></li><li><p>
              Otherwise, the character set and collation given by the
              <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>
              and <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a>
              system variables are used.
            </p></li></ul></div><p>
          Examples:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              A string with <code class="literal">latin1</code> character set and
              <code class="literal">latin1_german1_ci</code> collation:
            </p><pre class="programlisting">SELECT _latin1'Müller' COLLATE latin1_german1_ci;
</pre></li><li><p>
              A string with <code class="literal">latin1</code> character set and
              its default collation (that is,
              <code class="literal">latin1_swedish_ci</code>):
            </p><pre class="programlisting">SELECT _latin1'Müller';
</pre></li><li><p>
              A string with the connection default character set and
              collation:
            </p><pre class="programlisting">SELECT 'Müller';
</pre></li></ul></div><p>
          Character set introducers and the <code class="literal">COLLATE</code>
          clause are implemented according to standard SQL
          specifications.
        </p><p>
          An introducer indicates the character set for the following
          string, but does not change now how the parser performs escape
          processing within the string. Escapes are always interpreted
          by the parser according to the character set given by
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>.
        </p><p>
          The following examples show that escape processing occurs
          using
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> even
          in the presence of an introducer. The examples use
          <code class="literal">SET NAMES</code> (which changes
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>, as
          discussed in <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>), and
          display the resulting strings using the
          <a href="functions.html#function_hex"><code class="literal">HEX()</code></a> function so that the
          exact string contents can be seen.
        </p><p>
          Example 1:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES latin1;</code></strong>
Query OK, 0 rows affected (0.01 sec)

mysql&gt; <strong class="userinput"><code>SELECT HEX('à\n'), HEX(_sjis'à\n');</code></strong>
+------------+-----------------+
| HEX('à\n') | HEX(_sjis'à\n') |
+------------+-----------------+
| E00A       | E00A            |
+------------+-----------------+
1 row in set (0.00 sec)
</pre><p>
          Here, “<span class="quote"><code class="literal">à</code></span>” (hex value
          <code class="literal">E0</code>) is followed by
          “<span class="quote"><code class="literal">\n</code></span>”, the escape sequence for
          newline. The escape sequence is interpreted using the
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>
          value of <code class="literal">latin1</code> to produce a literal
          newline (hex value <code class="literal">0A</code>). This happens even
          for the second string. That is, the introducer of
          <code class="literal">_sjis</code> does not affect the parser's escape
          processing.
        </p><p>
          Example 2:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES sjis;</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT HEX('à\n'), HEX(_latin1'à\n');</code></strong>
+------------+-------------------+
| HEX('à\n') | HEX(_latin1'à\n') |
+------------+-------------------+
| E05C6E     | E05C6E            |
+------------+-------------------+
1 row in set (0.04 sec)
</pre><p>
          Here,
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> is
          <code class="literal">sjis</code>, a character set in which the sequence
          of “<span class="quote"><code class="literal">à</code></span>” followed by
          “<span class="quote"><code class="literal">\</code></span>” (hex values
          <code class="literal">05</code> and <code class="literal">5C</code>) is a valid
          multi-byte character. Hence, the first two bytes of the string
          are interpreted as a single <code class="literal">sjis</code> character,
          and the “<span class="quote"><code class="literal">\</code></span>” is not interpreted
          as an escape character. The following
          “<span class="quote"><code class="literal">n</code></span>” (hex value
          <code class="literal">6E</code>) is not interpreted as part of an escape
          sequence. This is true even for the second string; the
          introducer of <code class="literal">_latin1</code> does not affect
          escape processing.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-national"></a>9.1.3.6. National Character Set</h4></div></div></div><p>
          Standard SQL defines <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">NCHAR</code></a> or
          <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">NATIONAL CHAR</code></a> as a way to
          indicate that a <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">CHAR</code></a> column
          should use some predefined character set. MySQL
          5.1 uses <code class="literal">utf8</code> as this
          predefined character set. For example, these data type
          declarations are equivalent:
        </p><pre class="programlisting">CHAR(10) CHARACTER SET utf8
NATIONAL CHARACTER(10)
NCHAR(10)
</pre><p>
          As are these:
        </p><pre class="programlisting">VARCHAR(10) CHARACTER SET utf8
NATIONAL VARCHAR(10)
NCHAR VARCHAR(10)
NATIONAL CHARACTER VARYING(10)
NATIONAL CHAR VARYING(10)
</pre><p>
          You can use
          <code class="literal">N'<em class="replaceable"><code>literal</code></em>'</code> (or
          <code class="literal">n'<em class="replaceable"><code>literal</code></em>'</code>) to
          create a string in the national character set. These
          statements are equivalent:
        </p><pre class="programlisting">SELECT N'some text';
SELECT n'some text';
SELECT _utf8'some text';
</pre><p>
          For information on upgrading character sets to MySQL
          5.1 from versions prior to 4.1, see the
          <em class="citetitle">MySQL 3.23, 4.0, 4.1 Reference Manual</em>.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-examples"></a>9.1.3.7. Examples of Character Set and Collation Assignment</h4></div></div></div><p>
          The following examples show how MySQL determines default
          character set and collation values.
        </p><p>
          <span class="bold"><strong>Example 1: Table and Column
          Definition</strong></span>
        </p><pre class="programlisting">CREATE TABLE t1
(
    c1 CHAR(10) CHARACTER SET latin1 COLLATE latin1_german1_ci
) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;
</pre><p>
          Here we have a column with a <code class="literal">latin1</code>
          character set and a <code class="literal">latin1_german1_ci</code>
          collation. The definition is explicit, so that is
          straightforward. Notice that there is no problem with storing
          a <code class="literal">latin1</code> column in a
          <code class="literal">latin2</code> table.
        </p><p>
          <span class="bold"><strong>Example 2: Table and Column
          Definition</strong></span>
        </p><pre class="programlisting">CREATE TABLE t1
(
    c1 CHAR(10) CHARACTER SET latin1
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
</pre><p>
          This time we have a column with a <code class="literal">latin1</code>
          character set and a default collation. Although it might seem
          natural, the default collation is not taken from the table
          level. Instead, because the default collation for
          <code class="literal">latin1</code> is always
          <code class="literal">latin1_swedish_ci</code>, column
          <code class="literal">c1</code> has a collation of
          <code class="literal">latin1_swedish_ci</code> (not
          <code class="literal">latin1_danish_ci</code>).
        </p><p>
          <span class="bold"><strong>Example 3: Table and Column
          Definition</strong></span>
        </p><pre class="programlisting">CREATE TABLE t1
(
    c1 CHAR(10)
) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;
</pre><p>
          We have a column with a default character set and a default
          collation. In this circumstance, MySQL checks the table level
          to determine the column character set and collation.
          Consequently, the character set for column
          <code class="literal">c1</code> is <code class="literal">latin1</code> and its
          collation is <code class="literal">latin1_danish_ci</code>.
        </p><p>
          <span class="bold"><strong>Example 4: Database, Table, and Column
          Definition</strong></span>
        </p><pre class="programlisting">CREATE DATABASE d1
    DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;
USE d1;
CREATE TABLE t1
(
    c1 CHAR(10)
);
</pre><p>
          We create a column without specifying its character set and
          collation. We're also not specifying a character set and a
          collation at the table level. In this circumstance, MySQL
          checks the database level to determine the table settings,
          which thereafter become the column settings.) Consequently,
          the character set for column <code class="literal">c1</code> is
          <code class="literal">latin2</code> and its collation is
          <code class="literal">latin2_czech_ci</code>.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-compatibility"></a>9.1.3.8. Compatibility with Other DBMSs</h4></div></div></div><p>
          For MaxDB compatibility these two statements are the same:
        </p><pre class="programlisting">CREATE TABLE t1 (f1 CHAR(<em class="replaceable"><code>N</code></em>) UNICODE);
CREATE TABLE t1 (f1 CHAR(<em class="replaceable"><code>N</code></em>) CHARACTER SET ucs2);
</pre></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-connection"></a>9.1.4. Connection Character Sets and Collations</h3></div></div></div><a class="indexterm" name="id4311535"></a><a class="indexterm" name="id4311547"></a><p>
        Several character set and collation system variables relate to a
        client's interaction with the server. Some of these have been
        mentioned in earlier sections:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            The server character set and collation can be determined
            from the values of the
            <a href="server-administration.html#sysvar_character_set_server"><code class="literal">character_set_server</code></a> and
            <a href="server-administration.html#sysvar_collation_server"><code class="literal">collation_server</code></a> system
            variables.
          </p></li><li><p>
            The character set and collation of the default database can
            be determined from the values of the
            <a href="server-administration.html#sysvar_character_set_database"><code class="literal">character_set_database</code></a> and
            <a href="server-administration.html#sysvar_collation_database"><code class="literal">collation_database</code></a> system
            variables.
          </p></li></ul></div><p>
        Additional character set and collation system variables are
        involved in handling traffic for the connection between a client
        and the server. Every client has connection-related character
        set and collation system variables.
      </p><p>
        Consider what a “<span class="quote">connection</span>” is: It is what you
        make when you connect to the server. The client sends SQL
        statements, such as queries, over the connection to the server.
        The server sends responses, such as result sets or error
        messages, over the connection back to the client. This leads to
        several questions about character set and collation handling for
        client connections, each of which can be answered in terms of
        system variables:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            What character set is the statement in when it leaves the
            client?
          </p><p>
            The server takes the
            <a href="server-administration.html#sysvar_character_set_client"><code class="literal">character_set_client</code></a> system
            variable to be the character set in which statements are
            sent by the client.
          </p></li><li><p>
            What character set should the server translate a statement
            to after receiving it?
          </p><p>
            For this, the server uses the
            <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>
            and <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a>
            system variables. It converts statements sent by the client
            from <a href="server-administration.html#sysvar_character_set_client"><code class="literal">character_set_client</code></a>
            to <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>
            (except for string literals that have an introducer such as
            <code class="literal">_latin1</code> or <code class="literal">_utf8</code>).
            <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> is
            important for comparisons of literal strings. For
            comparisons of strings with column values,
            <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> does
            not matter because columns have their own collation, which
            has a higher collation precedence.
          </p></li><li><p>
            What character set should the server translate to before
            shipping result sets back to the client?
          </p><p>
            The <a href="server-administration.html#sysvar_character_set_results"><code class="literal">character_set_results</code></a>
            system variable indicates the character set in which the
            server returns query results to the client. This includes
            result data such as column values, and result metadata such
            as column names.
          </p></li></ul></div><p>
        Clients can fine-tune the settings for these variables, or
        depend on the defaults (in which case, you can skip the rest of
        this section). If you do not use the defaults, you must change
        the character settings <span class="emphasis"><em>for each connection to the
        server.</em></span>
      </p><p>
        There are two statements that affect the connection-related
        character set variables as a group:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            <code class="literal">SET NAMES '<em class="replaceable"><code>charset_name</code></em>'
            [COLLATE
            '<em class="replaceable"><code>collation_name</code></em>']</code>
          </p><p>
            <code class="literal">SET NAMES</code> indicates what character set
            the client will use to send SQL statements to the server.
            Thus, <code class="literal">SET NAMES 'cp1251'</code> tells the
            server, “<span class="quote">future incoming messages from this client are
            in character set <code class="literal">cp1251</code>.</span>” It also
            specifies the character set that the server should use for
            sending results back to the client. (For example, it
            indicates what character set to use for column values if you
            use a <a href="sql-syntax.html#select" title="12.2.8. SELECT Syntax"><code class="literal">SELECT</code></a> statement.)
          </p><p>
            A <code class="literal">SET NAMES
            '<em class="replaceable"><code>x</code></em>'</code> statement is
            equivalent to these three statements:
          </p><pre class="programlisting">SET character_set_client = <em class="replaceable"><code>x</code></em>;
SET character_set_results = <em class="replaceable"><code>x</code></em>;
SET character_set_connection = <em class="replaceable"><code>x</code></em>;
</pre><p>
            Setting each of these character set variables also sets its
            corresponding collation variable to the default correlation
            for the character set. For example, setting
            <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> to
            <em class="replaceable"><code>x</code></em> also sets
            <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> to the
            default collation for <em class="replaceable"><code>x</code></em>. It is
            not necessary to set that collation explicitly. To specify a
            particular collation for the character sets, use the
            optional <code class="literal">COLLATE</code> clause:
          </p><pre class="programlisting">SET NAMES '<em class="replaceable"><code>charset_name</code></em>' COLLATE '<em class="replaceable"><code>collation_name</code></em>'
</pre></li><li><p>
            <code class="literal">SET CHARACTER SET
            <em class="replaceable"><code>charset_name</code></em></code>
          </p><p>
            <code class="literal">SET CHARACTER SET</code> is similar to
            <code class="literal">SET NAMES</code> but sets
            <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>
            and <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> to
            <a href="server-administration.html#sysvar_character_set_database"><code class="literal">character_set_database</code></a> and
            <a href="server-administration.html#sysvar_collation_database"><code class="literal">collation_database</code></a>. A
            <code class="literal">SET CHARACTER SET
            <em class="replaceable"><code>x</code></em></code> statement is
            equivalent to these three statements:
          </p><pre class="programlisting">SET character_set_client = <em class="replaceable"><code>x</code></em>;
SET character_set_results = <em class="replaceable"><code>x</code></em>;
SET collation_connection = @@collation_database;
</pre><p>
            Setting
            <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> also
            sets
            <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> to
            the character set associated with the collation (equivalent
            to executing <code class="literal">SET character_set_connection =
            @@character_set_database</code>). It is not necessary to
            set
            <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a>
            explicitly.
          </p></li></ul></div><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
          <code class="literal">ucs2</code> cannot be used as a client character
          set, which means that it does not work for <code class="literal">SET
          NAMES</code> or <code class="literal">SET CHARACTER SET</code>.
        </p></div><p>
        The MySQL client programs <code class="literal">mysql</code>,
        <code class="literal">mysqladmin</code>, <code class="literal">mysqlcheck</code>,
        <code class="literal">mysqlimport</code>, and <code class="literal">mysqlshow</code>
        determine the default character set to use as follows:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            In the absence of other information, the programs use the
            compiled-in default character set, usually
            <code class="literal">latin1</code>.
          </p></li><li><p>
            The programs support a
            <a href="programs.html#option_mysql_default-character-set"><code class="option">--default-character-set</code></a>
            option, which enables users to specify the character set
            explicitly to override whatever default the client otherwise
            determines.
          </p></li></ul></div><p>
        When a client connects to the server, it sends the name of the
        character set that it wants to use. The server uses the name to
        set the <a href="server-administration.html#sysvar_character_set_client"><code class="literal">character_set_client</code></a>,
        <a href="server-administration.html#sysvar_character_set_results"><code class="literal">character_set_results</code></a>, and
        <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> system
        variables. In effect, the server performs a <code class="literal">SET
        NAMES</code> operation using the character set name.
      </p><p>
        With the <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> client, if you want to use a
        character set different from the default, you could explicitly
        execute <code class="literal">SET NAMES</code> every time you start up.
        However, to accomplish the same result more easily, you can add
        the <a href="programs.html#option_mysql_default-character-set"><code class="option">--default-character-set</code></a> option
        setting to your <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> command line or in your
        option file. For example, the following option file setting
        changes the three connection-related character set variables set
        to <code class="literal">koi8r</code> each time you invoke
        <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a>:
      </p><pre class="programlisting">[mysql]
default-character-set=koi8r
</pre><p>
        If you are using the <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> client with
        auto-reconnect enabled (which is not recommended), it is
        preferable to use the <code class="literal">charset</code> command rather
        than <code class="literal">SET NAMES</code>. For example:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>charset utf8</code></strong>
Charset changed
</pre><p>
        The <code class="literal">charset</code> command issues a <code class="literal">SET
        NAMES</code> statement, and also changes the default
        character set that <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> uses when it
        reconnects after the connection has dropped.
      </p><p>
        Example: Suppose that <code class="literal">column1</code> is defined as
        <code class="literal">CHAR(5) CHARACTER SET latin2</code>. If you do not
        say <code class="literal">SET NAMES</code> or <code class="literal">SET CHARACTER
        SET</code>, then for <code class="literal">SELECT column1 FROM
        t</code>, the server sends back all the values for
        <code class="literal">column1</code> using the character set that the
        client specified when it connected. On the other hand, if you
        say <code class="literal">SET NAMES 'latin1'</code> or <code class="literal">SET
        CHARACTER SET latin1</code> before issuing the
        <a href="sql-syntax.html#select" title="12.2.8. SELECT Syntax"><code class="literal">SELECT</code></a> statement, the server
        converts the <code class="literal">latin2</code> values to
        <code class="literal">latin1</code> just before sending results back.
        Conversion may be lossy if there are characters that are not in
        both character sets.
      </p><p>
        If you do not want the server to perform any conversion of
        result sets or error messages, set
        <a href="server-administration.html#sysvar_character_set_results"><code class="literal">character_set_results</code></a> to
        <code class="literal">NULL</code> or <code class="literal">binary</code>:
      </p><pre class="programlisting">SET character_set_results = NULL;
</pre><p>
        To see the values of the character set and collation system
        variables that apply to your connection, use these statements:
      </p><pre class="programlisting">SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
</pre><p>
        You must also consider the environment within which your MySQL
        applications execute. See
        <a href="internationalization-localization.html#charset-applications" title="9.1.5. Configuring the Character Set and Collation for Applications">Section 9.1.5, “Configuring the Character Set and Collation for Applications”</a>.
      </p><p>
        For more information about character sets and error messages,
        see <a href="internationalization-localization.html#charset-errors" title="9.1.6. Character Set for Error Messages">Section 9.1.6, “Character Set for Error Messages”</a>.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-applications"></a>9.1.5. Configuring the Character Set and Collation for Applications</h3></div></div></div><a class="indexterm" name="id4312379"></a><p>
        For applications that store data using the default MySQL
        character set and collation (<code class="literal">latin1</code>,
        <code class="literal">latin1_swedish_ci</code>), no special configuration
        should be needed. If applications require data storage using a
        different character set or collation, you can configure
        character set information several ways:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            Specify character settings per database. For example,
            applications that use one database might require
            <code class="literal">utf8</code>, whereas applications that use
            another database might require <code class="literal">sjis</code>.
          </p></li><li><p>
            Specify character settings at server startup. This causes
            the server to use the given settings for all applications
            that do not make other arrangements.
          </p></li><li><p>
            Specify character settings at configuration time, if you
            build MySQL from source. This causes the server to use the
            given settings for all applications, without having to
            specify them at server startup.
          </p></li></ul></div><p>
        When different applications require different character
        settings, the per-database technique provides a good deal of
        flexibility. If most or all applications use the same character
        set, specifying character settings at server startup or
        configuration time may be most convenient.
      </p><p>
        For the per-database or server-startup techniques, the settings
        control the character set for data storage. Applications must
        also tell the server which character set to use for
        client/server communications, as described in the following
        instructions.
      </p><p>
        The examples shown here assume use of the
        <code class="literal">utf8</code> character set and
        <code class="literal">utf8_general_ci</code> collation.
      </p><p>
        <span class="bold"><strong>Specify character settings per
        database.</strong></span> To create a database such that its tables
        will use a given default character set and collation for data
        storage, use a <a href="sql-syntax.html#create-database" title="12.1.10. CREATE DATABASE Syntax"><code class="literal">CREATE DATABASE</code></a>
        statement like this:
      </p><pre class="programlisting">CREATE DATABASE mydb
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;
</pre><p>
        Tables created in the database will use <code class="literal">utf8</code>
        and <code class="literal">utf8_general_ci</code> by default for any
        character columns.
      </p><p>
        Applications that use the database should also configure their
        connection to the server each time they connect. This can be
        done by executing a <code class="literal">SET NAMES 'utf8'</code>
        statement after connecting. The statement can be used regardless
        of connection method: The <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> client, PHP
        scripts, and so forth.
      </p><p>
        In some cases, it may be possible to configure the connection to
        use the desired character set some other way. For example, for
        connections made using <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a>, you can specify
        the <a href="programs.html#option_mysql_default-character-set"><code class="option">--default-character-set=utf8</code></a>
        command-line option to achieve the same effect as <code class="literal">SET
        NAMES 'utf8'</code>.
      </p><p>
        For more information about configuring client connections, see
        <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.
      </p><p>
        <span class="bold"><strong>Specify character settings at server
        startup.</strong></span> To select a character set and collation at
        server startup, use the
        <a href="server-administration.html#option_mysqld_character-set-server"><code class="option">--character-set-server</code></a> and
        <a href="server-administration.html#option_mysqld_collation-server"><code class="option">--collation-server</code></a> options. For
        example, to specify the options in an option file, include these
        lines:
      </p><pre class="programlisting">[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
</pre><p>
        These settings apply server-wide and apply as the defaults for
        databases created by any application, and for tables created in
        those databases.
      </p><p>
        It is still necessary for applications to configure their
        connection using <code class="literal">SET NAMES</code> or equivalent
        after they connect, as described previously. You might be
        tempted to start the server with the
        <a href="server-administration.html#sysvar_init_connect"><code class="option">--init_connect="SET NAMES 'utf8'"</code></a>
        option to cause <code class="literal">SET NAMES</code> to be executed
        automatically for each client that connects. However, this will
        yield inconsistent results because the
        <a href="server-administration.html#sysvar_init_connect"><code class="literal">init_connect</code></a> value is not
        executed for users who have the
        <a href="server-administration.html#priv_super"><code class="literal">SUPER</code></a> privilege.
      </p><p>
        <span class="bold"><strong>Specify character settings at MySQL
        configuration time.</strong></span> To select a character set and
        collation when you configure and build MySQL from source, use
        the <a href="installing.html#option_configure_with-charset"><code class="option">--with-charset</code></a> and
        <a href="installing.html#option_configure_with-collation"><code class="option">--with-collation</code></a> options:
      </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>./configure --with-charset=utf8 --with-collation=utf8_general_ci</code></strong>
</pre><p>
        The resulting server uses <code class="literal">utf8</code> and
        <code class="literal">utf8_general_ci</code> as the default for databases
        and tables and for client connections. It is unnecessary to use
        <a href="server-administration.html#option_mysqld_character-set-server"><code class="option">--character-set-server</code></a> and
        <a href="server-administration.html#option_mysqld_collation-server"><code class="option">--collation-server</code></a> to specify
        those defaults at server startup. It is also unnecessary for
        applications to configure their connection using <code class="literal">SET
        NAMES</code> or equivalent after they connect to the server.
      </p><p>
        Regardless of how you configure the MySQL character set for
        application use, you must also consider the environment within
        which those applications execute. If you will send statements
        using UTF-8 text taken from a file that you create in an editor,
        you should edit the file with the locale of your environment set
        to UTF-8 so that the file's encoding is correct and so that the
        operating system handles it correctly. If you use the
        <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> client from within a terminal window,
        the window must be configured to use UTF-8 or characters may not
        display properly. For a script that executes in a Web
        environment, the script must handle character encoding properly
        for its interaction with the MySQL server, and it must generate
        pages that correctly indicate the encoding so that browsers know
        how to display the content of the pages. For example, you can
        include this <code class="literal">&lt;meta&gt;</code> tag within your
        <code class="literal">&lt;head&gt;</code> element:
      </p><pre class="programlisting">&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8" /&gt;
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-errors"></a>9.1.6. Character Set for Error Messages</h3></div></div></div><p>
        This section describes how the server uses character sets for
        constructing error messages and returning them to clients. For
        information about the language of error messages (rather than
        the character set), see
        <a href="internationalization-localization.html#error-message-language" title="9.3. Setting the Error Message Language">Section 9.3, “Setting the Error Message Language”</a>.
      </p><p>
        In MySQL 5.1, the server constructs error messages
        and returns them to clients as follows:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            The message template has the character set associated with
            the error message language. For example, English, Korean,
            and Russian messages use <code class="literal">latin1</code>,
            <code class="literal">euckr</code>, and <code class="literal">koi8r</code>,
            respectively.
          </p></li><li><p>
            Parameters in the message template are replaced with values
            that apply to a specific error occurrence. These parameters
            use their own character set. Identifiers such as table or
            column names use UTF-8. Data values retain their character
            set. For example, in the following duplicate-key message,
            <code class="literal">'<em class="replaceable"><code>xxx</code></em>'</code> has the
            character set of the table column associated with key 1:
          </p><pre class="programlisting">Duplicate entry '<em class="replaceable"><code>xxx</code></em>' for key1
</pre></li></ul></div><p>
        The preceding method of error-message construction can result in
        messages that contain a mix of character sets unless all items
        involved contain only ASCII characters. This issue is resolved
        in MySQL 5.5, in which error messages are constructed internally
        within the server using UTF-8 and returned to the client in the
        character set specified by the
        <a href="server-administration.html#sysvar_character_set_results"><code class="literal">character_set_results</code></a> system
        variable.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-collations"></a>9.1.7. Collation Issues</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-collate">9.1.7.1. Using <code class="literal">COLLATE</code> in SQL Statements</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collate-precedence">9.1.7.2. <code class="literal">COLLATE</code> Clause Precedence</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-binary-op">9.1.7.3. <code class="literal">BINARY</code> Operator</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-binary-collations">9.1.7.4. The <code class="literal">_bin</code> and <code class="literal">binary</code> Collations</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collate-tricky">9.1.7.5. Special Cases Where Collation Determination Is Tricky</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collation-charset">9.1.7.6. Collations Must Be for the Right Character Set</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-collation-effect">9.1.7.7. Examples of the Effect of Collation</a></span></dt></dl></div><p>
        The following sections discuss various aspects of character set
        collations.
      </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collate"></a>9.1.7.1. Using <code class="literal">COLLATE</code> in SQL Statements</h4></div></div></div><p>
          With the <code class="literal">COLLATE</code> clause, you can override
          whatever the default collation is for a comparison.
          <code class="literal">COLLATE</code> may be used in various parts of SQL
          statements. Here are some examples:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              With <code class="literal">ORDER BY</code>:
            </p><pre class="programlisting">SELECT k
FROM t1
ORDER BY k COLLATE latin1_german2_ci;
</pre></li><li><p>
              With <code class="literal">AS</code>:
            </p><pre class="programlisting">SELECT k COLLATE latin1_german2_ci AS k1
FROM t1
ORDER BY k1;
</pre></li><li><p>
              With <code class="literal">GROUP BY</code>:
            </p><pre class="programlisting">SELECT k
FROM t1
GROUP BY k COLLATE latin1_german2_ci;
</pre></li><li><p>
              With aggregate functions:
            </p><pre class="programlisting">SELECT MAX(k COLLATE latin1_german2_ci)
FROM t1;
</pre></li><li><p>
              With <code class="literal">DISTINCT</code>:
            </p><pre class="programlisting">SELECT DISTINCT k COLLATE latin1_german2_ci
FROM t1;
</pre></li><li><p>
              With <code class="literal">WHERE</code>:
            </p><pre class="programlisting">     SELECT *
     FROM t1
     WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k;
</pre><pre class="programlisting">     SELECT *
     FROM t1
     WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci;
</pre></li><li><p>
              With <code class="literal">HAVING</code>:
            </p><pre class="programlisting">SELECT k
FROM t1
GROUP BY k
HAVING k = _latin1 'Müller' COLLATE latin1_german2_ci;
</pre></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collate-precedence"></a>9.1.7.2. <code class="literal">COLLATE</code> Clause Precedence</h4></div></div></div><p>
          The <code class="literal">COLLATE</code> clause has high precedence
          (higher than <a href="functions.html#operator_or"><code class="literal">||</code></a>),
          so the following two expressions are equivalent:
        </p><pre class="programlisting">x || y COLLATE z
x || (y COLLATE z)
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-binary-op"></a>9.1.7.3. <code class="literal">BINARY</code> Operator</h4></div></div></div><p>
          The <a href="functions.html#operator_binary"><code class="literal">BINARY</code></a> operator casts the
          string following it to a binary string. This is an easy way to
          force a comparison to be done byte by byte rather than
          character by character. <a href="functions.html#operator_binary"><code class="literal">BINARY</code></a>
          also causes trailing spaces to be significant.
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT 'a' = 'A';</code></strong>
        -&gt; 1
mysql&gt; <strong class="userinput"><code>SELECT BINARY 'a' = 'A';</code></strong>
        -&gt; 0
mysql&gt; <strong class="userinput"><code>SELECT 'a' = 'a ';</code></strong>
        -&gt; 1
mysql&gt; <strong class="userinput"><code>SELECT BINARY 'a' = 'a ';</code></strong>
        -&gt; 0
</pre><p>
          <code class="literal">BINARY <em class="replaceable"><code>str</code></em></code> is
          shorthand for
          <a href="functions.html#function_cast"><code class="literal">CAST(<em class="replaceable"><code>str</code></em> AS
          BINARY)</code></a>.
        </p><p>
          The <code class="literal">BINARY</code> attribute in character column
          definitions has a different effect. A character column defined
          with the <code class="literal">BINARY</code> attribute is assigned the
          binary collation of the column's character set. Every
          character set has a binary collation. For example, the binary
          collation for the <code class="literal">latin1</code> character set is
          <code class="literal">latin1_bin</code>, so if the table default
          character set is <code class="literal">latin1</code>, these two column
          definitions are equivalent:
        </p><pre class="programlisting">CHAR(10) BINARY
CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin
</pre><p>
          The effect of <code class="literal">BINARY</code> as a column attribute
          differs from its effect prior to MySQL 4.1. Formerly,
          <code class="literal">BINARY</code> resulted in a column that was
          treated as a binary string. A binary string is a string of
          bytes that has no character set or collation, which differs
          from a nonbinary character string that has a binary collation.
          For both types of strings, comparisons are based on the
          numeric values of the string unit, but for nonbinary strings
          the unit is the character and some character sets allow
          multi-byte characters. <a href="data-types.html#binary-varbinary" title="10.4.2. The BINARY and
        VARBINARY Types">Section 10.4.2, “The <code class="literal">BINARY</code> and
        <code class="literal">VARBINARY</code> Types”</a>.
        </p><p>
          The use of <code class="literal">CHARACTER SET binary</code> in the
          definition of a <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">CHAR</code></a>,
          <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">VARCHAR</code></a>, or
          <a href="data-types.html#blob" title="10.4.3. The BLOB and
        TEXT Types"><code class="literal">TEXT</code></a> column causes the column
          to be treated as a binary data type. For example, the
          following pairs of definitions are equivalent:
        </p><pre class="programlisting">CHAR(10) CHARACTER SET binary
BINARY(10)

VARCHAR(10) CHARACTER SET binary
VARBINARY(10)

TEXT CHARACTER SET binary
BLOB
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-binary-collations"></a>9.1.7.4. The <code class="literal">_bin</code> and <code class="literal">binary</code> Collations</h4></div></div></div><p>
          This section describes how <code class="literal">_bin</code> collations
          for nonbinary strings differ from the
          <code class="literal">binary</code> “<span class="quote">collation</span>” for binary
          strings.
        </p><p>
          Nonbinary strings (as stored in the
          <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">CHAR</code></a>,
          <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">VARCHAR</code></a>, and
          <a href="data-types.html#blob" title="10.4.3. The BLOB and
        TEXT Types"><code class="literal">TEXT</code></a> data types) have a
          character set and collation. A given character set can have
          several collations, each of which defines a particular sorting
          and comparison order for the characters in the set. One of
          these is the binary collation for the character set, indicated
          by a <code class="literal">_bin</code> suffix in the collation name. For
          example, <code class="literal">latin1</code> and <code class="literal">utf8</code>
          have binary collations named <code class="literal">latin1_bin</code> and
          <code class="literal">utf8_bin</code>.
        </p><p>
          Binary strings (as stored in the
          <a href="data-types.html#binary-varbinary" title="10.4.2. The BINARY and
        VARBINARY Types"><code class="literal">BINARY</code></a>,
          <a href="data-types.html#binary-varbinary" title="10.4.2. The BINARY and
        VARBINARY Types"><code class="literal">VARBINARY</code></a>, and
          <a href="data-types.html#blob" title="10.4.3. The BLOB and
        TEXT Types"><code class="literal">BLOB</code></a> data types) have no
          character set or collation in the sense that nonbinary strings
          do. (Applied to a binary string, the
          <code class="literal">CHARSET()</code> and
          <code class="literal">COLLATION()</code> functions both return a value
          of <code class="literal">binary</code>.) Binary strings are sequences of
          bytes and the numeric values of those bytes determine sort
          order.
        </p><p>
          The <code class="literal">_bin</code> collations differ from the
          <code class="literal">binary</code> collation in several respects.
        </p><p>
          <span class="bold"><strong>The unit for sorting and
          comparison.</strong></span> Binary strings are sequences of bytes.
          Sorting and comparison is always based on numeric byte values.
          Nonbinary strings are sequences of characters, which might be
          multi-byte. Collations for nonbinary strings define an
          ordering of the character values for sorting and comparison.
          For the <code class="literal">_bin</code> collation, this ordering is
          based solely on numeric values of the characters (which is
          similar to ordering for binary strings except that a
          <code class="literal">_bin</code> collation must take into account that
          a character might contain multiple bytes). For other
          collations, character ordering might take additional factors
          such as lettercase into account.
        </p><p>
          <span class="bold"><strong>Character set conversion.</strong></span> A
          nonbinary string has a character set and is converted to
          another character set in many cases, even when the string has
          a <code class="literal">_bin</code> collation:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              When assigning column values from another column that has
              a different character set:
            </p><pre class="programlisting">UPDATE t1 SET utf8_bin_column=latin1_column;
INSERT INTO t1 (latin1_column) SELECT utf8_bin_column FROM t2;
</pre></li><li><p>
              When assigning column values for
              <a href="sql-syntax.html#insert" title="12.2.5. INSERT Syntax"><code class="literal">INSERT</code></a> or
              <a href="sql-syntax.html#update" title="12.2.11. UPDATE Syntax"><code class="literal">UPDATE</code></a> using a string
              literal:
            </p><pre class="programlisting">SET NAMES latin1;
INSERT INTO t1 (utf8_bin_column) VALUES ('string-in-latin1');
</pre></li><li><p>
              When sending results from the server to a client:
            </p><pre class="programlisting">SET NAMES latin1;
SELECT utf8_bin_column FROM t2;
</pre></li></ul></div><p>
          For binary string columns, no conversion occurs. For the
          preceding cases, the string value is copied byte-wise.
        </p><p>
          <span class="bold"><strong>Lettercase conversion.</strong></span>
          Collations provide information about lettercase of characters,
          so characters in a nonbinary string can be converted from one
          lettercase to another, even for <code class="literal">_bin</code>
          collations that ignore lettercase for ordering:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES latin1 COLLATE latin1_bin;</code></strong>
Query OK, 0 rows affected (0.02 sec)

mysql&gt; <strong class="userinput"><code>SELECT LOWER('aA'), UPPER('zZ');</code></strong>
+-------------+-------------+
| LOWER('aA') | UPPER('zZ') |
+-------------+-------------+
| aa          | ZZ          |
+-------------+-------------+
1 row in set (0.13 sec)
</pre><p>
          The concept of lettercase does not apply to bytes in a binary
          string. To perform lettercase conversion, the string must be
          converted to a nonbinary string:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES binary;</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT LOWER('aA'), LOWER(CONVERT('aA' USING latin1));</code></strong>
+-------------+-----------------------------------+
| LOWER('aA') | LOWER(CONVERT('aA' USING latin1)) |
+-------------+-----------------------------------+
| aA          | aa                                |
+-------------+-----------------------------------+
1 row in set (0.00 sec)
</pre><p>
          <span class="bold"><strong>Trailing space handling in
          comparisons.</strong></span> Nonbinary strings have
          <code class="literal">PADSPACE</code> behavior for all collations,
          including <code class="literal">_bin</code> collations. Trailing spaces
          are insignificant in comparisons:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES utf8 COLLATE utf8_bin;</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT 'a ' = 'a';</code></strong>
+------------+
| 'a ' = 'a' |
+------------+
|          1 |
+------------+
1 row in set (0.00 sec)
</pre><p>
          For binary strings, all characters are significant in
          comparisons, including trailing spaces:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES binary;</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT 'a ' = 'a';</code></strong>
+------------+
| 'a ' = 'a' |
+------------+
|          0 |
+------------+
1 row in set (0.00 sec)
</pre><p>
          <span class="bold"><strong>Trailing space handling for inserts and
          retrievals.</strong></span>
          <code class="literal">CHAR(<em class="replaceable"><code>N</code></em>)</code> columns
          store nonbinary strings. Values shorter than
          <em class="replaceable"><code>N</code></em> characters are extended with
          spaces on insertion. For retrieval, trailing spaces are
          removed.
        </p><p>
          <code class="literal">BINARY(<em class="replaceable"><code>N</code></em>)</code>
          columns store binary strings. Values shorter than
          <em class="replaceable"><code>N</code></em> bytes are extended with
          <code class="literal">0x00</code> bytes on insertion. For retrieval,
          nothing is removed; a value of the declared length is always
          returned.
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>CREATE TABLE t1 (</code></strong>
    -&gt; <strong class="userinput"><code>  a CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin,</code></strong>
    -&gt; <strong class="userinput"><code>  b BINARY(10)</code></strong>
    -&gt; <strong class="userinput"><code>);</code></strong>
Query OK, 0 rows affected (0.09 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO t1 VALUES ('a','a');</code></strong>
Query OK, 1 row affected (0.01 sec)

mysql&gt; <strong class="userinput"><code>SELECT HEX(a), HEX(b) FROM t1;</code></strong>
+--------+----------------------+
| HEX(a) | HEX(b)               |
+--------+----------------------+
| 61     | 61000000000000000000 |
+--------+----------------------+
1 row in set (0.04 sec)
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collate-tricky"></a>9.1.7.5. Special Cases Where Collation Determination Is Tricky</h4></div></div></div><p>
          In the great majority of statements, it is obvious what
          collation MySQL uses to resolve a comparison operation. For
          example, in the following cases, it should be clear that the
          collation is the collation of column <code class="literal">x</code>:
        </p><pre class="programlisting">SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;
</pre><p>
          However, when multiple operands are involved, there can be
          ambiguity. For example:
        </p><pre class="programlisting">SELECT x FROM T WHERE x = 'Y';
</pre><p>
          Should this query use the collation of the column
          <code class="literal">x</code>, or of the string literal
          <code class="literal">'Y'</code>?
        </p><p>
          Standard SQL resolves such questions using what used to be
          called “<span class="quote">coercibility</span>” rules. Basically, this
          means: Both <code class="literal">x</code> and <code class="literal">'Y'</code>
          have collations, so which collation takes precedence? This can
          be difficult to resolve, but the following rules cover most
          situations:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              An explicit <code class="literal">COLLATE</code> clause has a
              coercibility of 0. (Not coercible at all.)
            </p></li><li><p>
              The concatenation of two strings with different collations
              has a coercibility of 1.
            </p></li><li><p>
              The collation of a column or a stored routine parameter or
              local variable has a coercibility of 2.
            </p></li><li><p>
              A “<span class="quote">system constant</span>” (the string returned by
              functions such as <a href="functions.html#function_user"><code class="literal">USER()</code></a> or
              <a href="functions.html#function_version"><code class="literal">VERSION()</code></a>) has a
              coercibility of 3.
            </p></li><li><p>
              A literal's collation has a coercibility of 4.
            </p></li><li><p>
              <code class="literal">NULL</code> or an expression that is derived
              from <code class="literal">NULL</code> has a coercibility of 5.
            </p></li></ul></div><p>
          The preceding coercibility values are current for MySQL
          5.1.
        </p><p>
          Those rules resolve ambiguities in the following manner:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              Use the collation with the lowest coercibility value.
            </p></li><li><p>
              If both sides have the same coercibility, then:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  If both sides are Unicode, or both sides are not
                  Unicode, it is an error.
                </p></li><li><p>
                  If one of the sides has a Unicode character set, and
                  another side has a non-Unicode character set, the side
                  with Unicode character set wins, and automatic
                  character set conversion is applied to the non-Unicode
                  side. For example, the following statement will not
                  return an error:
                </p><pre class="programlisting">SELECT CONCAT(utf8_column, latin1_column) FROM t1;
</pre><p>
                  It will return a result, and the character set of the
                  result will be <code class="literal">utf8</code>. The collation
                  of the result will be the collation of
                  <code class="literal">utf8_column</code>. Values of
                  <code class="literal">latin1_column</code> will be automatically
                  converted to <code class="literal">utf8</code> before
                  concatenating.
                </p></li><li><p>
                  For an operation with operands from the same character
                  set but that mix a <code class="literal">_bin</code> collation
                  and a <code class="literal">_ci</code> or <code class="literal">_cs</code>
                  collation, the <code class="literal">_bin</code> collation is
                  used. This is similar to how operations that mix
                  nonbinary and binary strings evaluate the operands as
                  binary strings, except that it is for collations
                  rather than data types.
                </p></li></ul></div></li></ul></div><p>
          Although automatic conversion is not in the SQL standard, the
          SQL standard document does say that every character set is (in
          terms of supported characters) a “<span class="quote">subset</span>” of
          Unicode. Because it is a well-known principle that “<span class="quote">what
          applies to a superset can apply to a subset,</span>” we
          believe that a collation for Unicode can apply for comparisons
          with non-Unicode strings.
        </p><p>
          Examples:
        </p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">column1 = 'A'</code></td><td>Use collation of <code class="literal">column1</code></td></tr><tr><td><code class="literal">column1 = 'A' COLLATE x</code></td><td>Use collation of <code class="literal">'A' COLLATE x</code></td></tr><tr><td><code class="literal">column1 COLLATE x = 'A' COLLATE y</code></td><td>Error</td></tr></tbody></table></div><p>
          The <a href="functions.html#function_coercibility"><code class="literal">COERCIBILITY()</code></a> function can
          be used to determine the coercibility of a string expression:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);</code></strong>
        -&gt; 0
mysql&gt; <strong class="userinput"><code>SELECT COERCIBILITY(VERSION());</code></strong>
        -&gt; 3
mysql&gt; <strong class="userinput"><code>SELECT COERCIBILITY('A');</code></strong>
        -&gt; 4
</pre><p>
          See <a href="functions.html#information-functions" title="11.11.3. Information Functions">Section 11.11.3, “Information Functions”</a>.
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collation-charset"></a>9.1.7.6. Collations Must Be for the Right Character Set</h4></div></div></div><p>
          Each character set has one or more collations, but each
          collation is associated with one and only one character set.
          Therefore, the following statement causes an error message
          because the <code class="literal">latin2_bin</code> collation is not
          legal with the <code class="literal">latin1</code> character set:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT _latin1 'x' COLLATE latin2_bin;</code></strong>
ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid
for CHARACTER SET 'latin1'
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-collation-effect"></a>9.1.7.7. Examples of the Effect of Collation</h4></div></div></div><p>
          <span class="bold"><strong>Example 1: Sorting German
          Umlauts</strong></span>
        </p><p>
          Suppose that column <code class="literal">X</code> in table
          <code class="literal">T</code> has these <code class="literal">latin1</code>
          column values:
        </p><pre class="programlisting">Muffler
Müller
MX Systems
MySQL
</pre><p>
          Suppose also that the column values are retrieved using the
          following statement:
        </p><pre class="programlisting">SELECT X FROM T ORDER BY X COLLATE <em class="replaceable"><code>collation_name</code></em>;
</pre><p>
          The following table shows the resulting order of the values if
          we use <code class="literal">ORDER BY</code> with different collations.
        </p><div class="informaltable"><table border="1"><colgroup><col><col><col></colgroup><tbody><tr><td><code class="literal">latin1_swedish_ci</code></td><td><code class="literal">latin1_german1_ci</code></td><td><code class="literal">latin1_german2_ci</code></td></tr><tr><td>Muffler</td><td>Muffler</td><td>Müller</td></tr><tr><td>MX Systems</td><td>Müller</td><td>Muffler</td></tr><tr><td>Müller</td><td>MX Systems</td><td>MX Systems</td></tr><tr><td>MySQL</td><td>MySQL</td><td>MySQL</td></tr></tbody></table></div><p>
          The character that causes the different sort orders in this
          example is the U with two dots over it
          (<code class="literal">ü</code>), which the Germans call
          “<span class="quote">U-umlaut.</span>”
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              The first column shows the result of the
              <a href="sql-syntax.html#select" title="12.2.8. SELECT Syntax"><code class="literal">SELECT</code></a> using the
              Swedish/Finnish collating rule, which says that U-umlaut
              sorts with Y.
            </p></li><li><p>
              The second column shows the result of the
              <a href="sql-syntax.html#select" title="12.2.8. SELECT Syntax"><code class="literal">SELECT</code></a> using the German
              DIN-1 rule, which says that U-umlaut sorts with U.
            </p></li><li><p>
              The third column shows the result of the
              <a href="sql-syntax.html#select" title="12.2.8. SELECT Syntax"><code class="literal">SELECT</code></a> using the German
              DIN-2 rule, which says that U-umlaut sorts with UE.
            </p></li></ul></div><p>
          <span class="bold"><strong>Example 2: Searching for German
          Umlauts</strong></span>
        </p><p>
          Suppose that you have three tables that differ only by the
          character set and collation used:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>CREATE TABLE german1 (</code></strong>
    -&gt; <strong class="userinput"><code>  c CHAR(10)</code></strong>
    -&gt; <strong class="userinput"><code>) CHARACTER SET latin1 COLLATE latin1_german1_ci;</code></strong>
mysql&gt; <strong class="userinput"><code>CREATE TABLE german2 (</code></strong>
    -&gt; <strong class="userinput"><code>  c CHAR(10)</code></strong>
    -&gt; <strong class="userinput"><code>) CHARACTER SET latin1 COLLATE latin1_german2_ci;</code></strong>
mysql&gt; <strong class="userinput"><code>CREATE TABLE germanutf8 (</code></strong>
    -&gt; <strong class="userinput"><code>  c CHAR(10)</code></strong>
    -&gt; <strong class="userinput"><code>) CHARACTER SET utf8 COLLATE utf8_unicode_ci;</code></strong>
</pre><p>
          Each table contains two records:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>INSERT INTO german1 VALUES ('Bar'), ('Bär');</code></strong>
mysql&gt; <strong class="userinput"><code>INSERT INTO german2 VALUES ('Bar'), ('Bär');</code></strong>
mysql&gt; <strong class="userinput"><code>INSERT INTO germanutf8 VALUES ('Bar'), ('Bär');</code></strong>
</pre><p>
          Two of the above collations have an <code class="literal">A = Ä</code>
          equality, and one has no such equality
          (<code class="literal">latin1_german2_ci</code>). For that reason,
          you'll get these results in comparisons:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT * FROM german1 WHERE c = 'Bär';</code></strong>
+------+
| c    |
+------+
| Bar  |
| Bär  |
+------+
mysql&gt; <strong class="userinput"><code>SELECT * FROM german2 WHERE c = 'Bär';</code></strong>
+------+
| c    |
+------+
| Bär  |
+------+
mysql&gt; <strong class="userinput"><code>SELECT * FROM germanutf8 WHERE c = 'Bär';</code></strong>
+------+
| c    |
+------+
| Bar  |
| Bär  |
+------+
</pre><p>
          This is not a bug but rather a consequence of the sorting that
          <code class="literal">latin1_german1_ci</code> or
          <code class="literal">utf8_unicode_ci</code> do (the sorting shown is
          done according to the German DIN 5007 standard).
        </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-repertoire"></a>9.1.8. String Repertoire</h3></div></div></div><p>
        The <em class="firstterm">repertoire</em> of a character set is the
        collection of characters in the set.
      </p><p>
        As of MySQL 5.1.21, string expressions have a repertoire
        attribute, which can have two values:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            <code class="literal">ASCII</code>: The expression can contain only
            characters in the Unicode range <code class="literal">U+0000</code> to
            <code class="literal">U+007F</code>.
          </p></li><li><p>
            <code class="literal">UNICODE</code>: The expression can contain
            characters in the Unicode range <code class="literal">U+0000</code> to
            <code class="literal">U+FFFF</code>.
          </p></li></ul></div><p>
        The <code class="literal">ASCII</code> range is a subset of
        <code class="literal">UNICODE</code> range, so a string with
        <code class="literal">ASCII</code> repertoire can be converted safely
        without loss of information to the character set of any string
        with <code class="literal">UNICODE</code> repertoire or to a character set
        that is a superset of <code class="literal">ASCII</code>. (All MySQL
        character sets are supersets of <code class="literal">ASCII</code> with
        the exception of <code class="literal">swe7</code>, which reuses some
        punctuation characters for Swedish accented characters.) The use
        of repertoire enables character set conversion in expressions
        for many cases where MySQL would otherwise return an
        “<span class="quote">illegal mix of collations</span>” error.
      </p><p>
        The following discussion provides examples of expressions and
        their repertoires, and describes how the use of repertoire
        changes string expression evaluation:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            The repertoire for string constants depends on string
            content:
          </p><pre class="programlisting">SET NAMES utf8; SELECT 'abc';
SELECT _utf8'def';
SELECT N'MySQL';
</pre><p>
            Although the character set is <code class="literal">utf8</code> in
            each of the preceding cases, the strings do not actually
            contain any characters outside the ASCII range, so their
            repertoire is <code class="literal">ASCII</code> rather than
            <code class="literal">UNICODE</code>.
          </p></li><li><p>
            Columns having the <code class="literal">ascii</code> character set
            have <code class="literal">ASCII</code> repertoire because of their
            character set. In the following table, <code class="literal">c1</code>
            has <code class="literal">ASCII</code> repertoire:
          </p><pre class="programlisting">CREATE TABLE t1 (c1 CHAR(1) CHARACTER SET ascii);
</pre><p>
            The following example illustrates how repertoire enables a
            result to be determined in a case where an error occurs
            without repertoire:
          </p><pre class="programlisting">CREATE TABLE t1 (
  c1 CHAR(1) CHARACTER SET latin1,
  c2 CHAR(1) CHARACTER SET ascii
);
INSERT INTO t1 VALUES ('a','b');
SELECT CONCAT(c1,c2) FROM t1;
</pre><p>
            Without repertoire, this error occurs:
          </p><pre class="programlisting">ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (ascii_general_ci,IMPLICIT) for operation 'concat'
</pre><p>
            Using repertoire, subset to superset
            (<code class="literal">ascii</code> to <code class="literal">latin1</code>)
            conversion can occur and a result is returned:
          </p><pre class="programlisting">+---------------+
| CONCAT(c1,c2) |
+---------------+
| ab            |
+---------------+
</pre></li><li><p>
            Functions with one string argument inherit the repertoire of
            their argument. The result of
            <a href="functions.html#function_upper"><code class="literal">UPPER(_utf8'<em class="replaceable"><code>abc</code></em>')</code></a>
            has <code class="literal">ASCII</code> repertoire, because its
            argument has <code class="literal">ASCII</code> repertoire.
          </p></li><li><p>
            For functions that return a string but do not have string
            arguments and use
            <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> as
            the result character set, the result repertoire is
            <code class="literal">ASCII</code> if
            <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> is
            <code class="literal">ascii</code>, and <code class="literal">UNICODE</code>
            otherwise:
          </p><pre class="programlisting">FORMAT(<em class="replaceable"><code>numeric_column</code></em>, 4);
</pre><p>
            Use of repertoire changes how MySQL evaluates the following
            example:
          </p><pre class="programlisting">SET NAMES ascii;
CREATE TABLE t1 (a INT, b VARCHAR(10) CHARACTER SET latin1);
INSERT INTO t1 VALUES (1,'b');
SELECT CONCAT(FORMAT(a, 4), b) FROM t1;
</pre><p>
            Without repertoire, this error occurs:
          </p><pre class="programlisting">ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,COERCIBLE)
and (latin1_swedish_ci,IMPLICIT) for operation 'concat'
</pre><p>
            With repertoire, a result is returned:
          </p><pre class="programlisting">+-------------------------+
| CONCAT(FORMAT(a, 4), b) |
+-------------------------+
| 1.0000b                 |
+-------------------------+
</pre></li><li><p>
            Functions with two or more string arguments use the
            “<span class="quote">widest</span>” argument repertoire for the result
            repertoire (<code class="literal">UNICODE</code> is wider than
            <code class="literal">ASCII</code>). Consider the following
            <a href="functions.html#function_concat"><code class="literal">CONCAT()</code></a> calls:
          </p><pre class="programlisting">CONCAT(_ucs2 0x0041, _ucs2 0x0042)
CONCAT(_ucs2 0x0041, _ucs2 0x00C2)
</pre><p>
            For the first call, the repertoire is
            <code class="literal">ASCII</code> because both arguments are within
            the range of the <code class="literal">ascii</code> character set. For
            the second call, the repertoire is
            <code class="literal">UNICODE</code> because the second argument is
            outside the <code class="literal">ascii</code> character set range.
          </p></li><li><p>
            The repertoire for function return values is determined
            based only on the repertoire of the arguments that affect
            the result's character set and collation.
          </p><pre class="programlisting">IF(column1 &lt; column2, 'smaller', 'greater')
</pre><p>
            The result repertoire is <code class="literal">ASCII</code> because
            the two string arguments (the second argument and the third
            argument) both have <code class="literal">ASCII</code> repertoire. The
            first argument does not matter for the result repertoire,
            even if the expression uses string values.
          </p></li></ul></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-operations"></a>9.1.9. Operations Affected by Character Set Support</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-result">9.1.9.1. Result Strings</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-convert">9.1.9.2. <code class="literal">CONVERT()</code> and
          <code class="literal">CAST()</code></a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-show">9.1.9.3. <code class="literal">SHOW</code> Statements and
          <code class="literal">INFORMATION_SCHEMA</code></a></span></dt></dl></div><p>
        This section describes operations that take character set
        information into account.
      </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-result"></a>9.1.9.1. Result Strings</h4></div></div></div><p>
          MySQL has many operators and functions that return a string.
          This section answers the question: What is the character set
          and collation of such a string?
        </p><p>
          For simple functions that take string input and return a
          string result as output, the output's character set and
          collation are the same as those of the principal input value.
          For example,
          <a href="functions.html#function_upper"><code class="literal">UPPER(<em class="replaceable"><code>X</code></em>)</code></a>
          returns a string whose character string and collation are the
          same as that of <em class="replaceable"><code>X</code></em>. The same applies
          for <a href="functions.html#function_instr"><code class="literal">INSTR()</code></a>,
          <a href="functions.html#function_lcase"><code class="literal">LCASE()</code></a>,
          <a href="functions.html#function_lower"><code class="literal">LOWER()</code></a>,
          <a href="functions.html#function_ltrim"><code class="literal">LTRIM()</code></a>,
          <a href="functions.html#function_mid"><code class="literal">MID()</code></a>,
          <a href="functions.html#function_repeat"><code class="literal">REPEAT()</code></a>,
          <a href="functions.html#function_replace"><code class="literal">REPLACE()</code></a>,
          <a href="functions.html#function_reverse"><code class="literal">REVERSE()</code></a>,
          <a href="functions.html#function_right"><code class="literal">RIGHT()</code></a>,
          <a href="functions.html#function_rpad"><code class="literal">RPAD()</code></a>,
          <a href="functions.html#function_rtrim"><code class="literal">RTRIM()</code></a>,
          <a href="functions.html#function_soundex"><code class="literal">SOUNDEX()</code></a>,
          <a href="functions.html#function_substring"><code class="literal">SUBSTRING()</code></a>,
          <a href="functions.html#function_trim"><code class="literal">TRIM()</code></a>,
          <a href="functions.html#function_ucase"><code class="literal">UCASE()</code></a>, and
          <a href="functions.html#function_upper"><code class="literal">UPPER()</code></a>.
        </p><p>
          Note: The <a href="functions.html#function_replace"><code class="literal">REPLACE()</code></a> function,
          unlike all other functions, always ignores the collation of
          the string input and performs a case-sensitive comparison.
        </p><p>
          If a string input or function result is a binary string, the
          string has no character set or collation. This can be checked
          by using the <a href="functions.html#function_charset"><code class="literal">CHARSET()</code></a> and
          <a href="functions.html#function_collation"><code class="literal">COLLATION()</code></a> functions, both of
          which return <code class="literal">binary</code> to indicate that their
          argument is a binary string:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT CHARSET(BINARY 'a'), COLLATION(BINARY 'a');</code></strong>
+---------------------+-----------------------+
| CHARSET(BINARY 'a') | COLLATION(BINARY 'a') |
+---------------------+-----------------------+
| binary              | binary                |
+---------------------+-----------------------+
</pre><p>
          For operations that combine multiple string inputs and return
          a single string output, the “<span class="quote">aggregation rules</span>”
          of standard SQL apply for determining the collation of the
          result:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              If an explicit <code class="literal">COLLATE
              <em class="replaceable"><code>X</code></em></code> occurs, use
              <em class="replaceable"><code>X</code></em>.
            </p></li><li><p>
              If explicit <code class="literal">COLLATE
              <em class="replaceable"><code>X</code></em></code> and
              <code class="literal">COLLATE <em class="replaceable"><code>Y</code></em></code>
              occur, raise an error.
            </p></li><li><p>
              Otherwise, if all collations are
              <em class="replaceable"><code>X</code></em>, use
              <em class="replaceable"><code>X</code></em>.
            </p></li><li><p>
              Otherwise, the result has no collation.
            </p></li></ul></div><p>
          For example, with <code class="literal">CASE ... WHEN a THEN b WHEN b THEN
          c COLLATE <em class="replaceable"><code>X</code></em> END</code>, the
          resulting collation is <em class="replaceable"><code>X</code></em>. The same
          applies for <a href="sql-syntax.html#union" title="12.2.8.3. UNION Syntax"><code class="literal">UNION</code></a>,
          <a href="functions.html#operator_or"><code class="literal">||</code></a>,
          <a href="functions.html#function_concat"><code class="literal">CONCAT()</code></a>,
          <a href="functions.html#function_elt"><code class="literal">ELT()</code></a>,
          <a href="functions.html#function_greatest"><code class="literal">GREATEST()</code></a>,
          <a href="functions.html#function_if"><code class="literal">IF()</code></a>, and
          <a href="functions.html#function_least"><code class="literal">LEAST()</code></a>.
        </p><p>
          For operations that convert to character data, the character
          set and collation of the strings that result from the
          operations are defined by the
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> and
          <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> system
          variables. This applies only to
          <a href="functions.html#function_cast"><code class="literal">CAST()</code></a>,
          <a href="functions.html#function_conv"><code class="literal">CONV()</code></a>,
          <a href="functions.html#function_format"><code class="literal">FORMAT()</code></a>,
          <a href="functions.html#function_hex"><code class="literal">HEX()</code></a>, and
          <a href="functions.html#function_space"><code class="literal">SPACE()</code></a>.
        </p><p>
          If you are uncertain about the character set or collation of
          the result returned by a string function, you can use the
          <a href="functions.html#function_charset"><code class="literal">CHARSET()</code></a> or
          <a href="functions.html#function_collation"><code class="literal">COLLATION()</code></a> function to find
          out:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT USER(), CHARSET(USER()), COLLATION(USER());</code></strong>
+----------------+-----------------+-------------------+
| USER()         | CHARSET(USER()) | COLLATION(USER()) |
+----------------+-----------------+-------------------+
| test@localhost | utf8            | utf8_general_ci   |
+----------------+-----------------+-------------------+
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-convert"></a>9.1.9.2. <code class="literal">CONVERT()</code> and
          <code class="literal">CAST()</code></h4></div></div></div><p>
          <a href="functions.html#function_convert"><code class="literal">CONVERT()</code></a> provides a way to
          convert data between different character sets. The syntax is:
        </p><pre class="programlisting">CONVERT(<em class="replaceable"><code>expr</code></em> USING <em class="replaceable"><code>transcoding_name</code></em>)
</pre><p>
          In MySQL, transcoding names are the same as the corresponding
          character set names.
        </p><p>
          Examples:
        </p><pre class="programlisting">SELECT CONVERT(_latin1'Müller' USING utf8);
INSERT INTO utf8table (utf8column)
    SELECT CONVERT(latin1field USING utf8) FROM latin1table;
</pre><p>
          <a href="functions.html#function_convert"><code class="literal">CONVERT(... USING ...)</code></a> is
          implemented according to the standard SQL specification.
        </p><p>
          You may also use <a href="functions.html#function_cast"><code class="literal">CAST()</code></a> to
          convert a string to a different character set. The syntax is:
        </p><pre class="programlisting">CAST(<em class="replaceable"><code>character_string</code></em> AS <em class="replaceable"><code>character_data_type</code></em> CHARACTER SET <em class="replaceable"><code>charset_name</code></em>)
</pre><p>
          Example:
        </p><pre class="programlisting">SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8);
</pre><p>
          If you use <a href="functions.html#function_cast"><code class="literal">CAST()</code></a> without
          specifying <code class="literal">CHARACTER SET</code>, the resulting
          character set and collation are defined by the
          <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> and
          <a href="server-administration.html#sysvar_collation_connection"><code class="literal">collation_connection</code></a> system
          variables. If you use <a href="functions.html#function_cast"><code class="literal">CAST()</code></a>
          with <code class="literal">CHARACTER SET X</code>, the resulting
          character set and collation are <code class="literal">X</code> and the
          default collation of <code class="literal">X</code>.
        </p><p>
          You may not use a <code class="literal">COLLATE</code> clause inside a
          <a href="functions.html#function_cast"><code class="literal">CAST()</code></a>, but you may use it
          outside. That is, <a href="functions.html#function_cast"><code class="literal">CAST(... COLLATE
          ...)</code></a> is illegal, but <a href="functions.html#function_cast"><code class="literal">CAST(...)
          COLLATE ...</code></a> is legal.
        </p><p>
          Example:
        </p><pre class="programlisting">SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin;
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-show"></a>9.1.9.3. <code class="literal">SHOW</code> Statements and
          <code class="literal">INFORMATION_SCHEMA</code></h4></div></div></div><p>
          Several <a href="sql-syntax.html#show" title="12.5.5. SHOW Syntax"><code class="literal">SHOW</code></a> statements provide
          additional character set information. These include
          <a href="sql-syntax.html#show-character-set" title="12.5.5.4. SHOW CHARACTER SET Syntax"><code class="literal">SHOW CHARACTER SET</code></a>,
          <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW COLLATION</code></a>,
          <a href="sql-syntax.html#show-create-database" title="12.5.5.8. SHOW CREATE DATABASE Syntax"><code class="literal">SHOW CREATE DATABASE</code></a>,
          <a href="sql-syntax.html#show-create-table" title="12.5.5.12. SHOW CREATE TABLE Syntax"><code class="literal">SHOW CREATE TABLE</code></a> and
          <a href="sql-syntax.html#show-columns" title="12.5.5.6. SHOW COLUMNS Syntax"><code class="literal">SHOW COLUMNS</code></a>. These statements
          are described here briefly. For more information, see
          <a href="sql-syntax.html#show" title="12.5.5. SHOW Syntax">Section 12.5.5, “<code class="literal">SHOW</code> Syntax”</a>.
        </p><p>
          <code class="literal">INFORMATION_SCHEMA</code> has several tables that
          contain information similar to that displayed by the
          <a href="sql-syntax.html#show" title="12.5.5. SHOW Syntax"><code class="literal">SHOW</code></a> statements. For example,
          the <a href="information-schema.html#character-sets-table" title="20.9. The INFORMATION_SCHEMA CHARACTER_SETS Table"><code class="literal">CHARACTER_SETS</code></a> and
          <a href="information-schema.html#collations-table" title="20.10. The INFORMATION_SCHEMA COLLATIONS Table"><code class="literal">COLLATIONS</code></a> tables contain the
          information displayed by <a href="sql-syntax.html#show-character-set" title="12.5.5.4. SHOW CHARACTER SET Syntax"><code class="literal">SHOW CHARACTER
          SET</code></a> and <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW
          COLLATION</code></a>. See <a href="information-schema.html" title="Chapter 20. INFORMATION_SCHEMA Tables">Chapter 20, <i><code class="literal">INFORMATION_SCHEMA</code> Tables</i></a>.
        </p><p>
          The <a href="sql-syntax.html#show-character-set" title="12.5.5.4. SHOW CHARACTER SET Syntax"><code class="literal">SHOW CHARACTER SET</code></a> command
          shows all available character sets. It takes an optional
          <a href="functions.html#operator_like"><code class="literal">LIKE</code></a> clause that indicates which
          character set names to match. For example:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW CHARACTER SET LIKE 'latin%';</code></strong>
+---------+-----------------------------+-------------------+--------+
| Charset | Description                 | Default collation | Maxlen |
+---------+-----------------------------+-------------------+--------+
| latin1  | cp1252 West European        | latin1_swedish_ci |      1 |
| latin2  | ISO 8859-2 Central European | latin2_general_ci |      1 |
| latin5  | ISO 8859-9 Turkish          | latin5_turkish_ci |      1 |
| latin7  | ISO 8859-13 Baltic          | latin7_general_ci |      1 |
+---------+-----------------------------+-------------------+--------+
</pre><p>
          The output from <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW COLLATION</code></a>
          includes all available character sets. It takes an optional
          <a href="functions.html#operator_like"><code class="literal">LIKE</code></a> clause that indicates which
          collation names to match. For example:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW COLLATION LIKE 'latin1%';</code></strong>
+-------------------+---------+----+---------+----------+---------+
| Collation         | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1  |  5 |         |          |       0 |
| latin1_swedish_ci | latin1  |  8 | Yes     | Yes      |       0 |
| latin1_danish_ci  | latin1  | 15 |         |          |       0 |
| latin1_german2_ci | latin1  | 31 |         | Yes      |       2 |
| latin1_bin        | latin1  | 47 |         | Yes      |       0 |
| latin1_general_ci | latin1  | 48 |         |          |       0 |
| latin1_general_cs | latin1  | 49 |         |          |       0 |
| latin1_spanish_ci | latin1  | 94 |         |          |       0 |
+-------------------+---------+----+---------+----------+---------+
</pre><p>
          <a href="sql-syntax.html#show-create-database" title="12.5.5.8. SHOW CREATE DATABASE Syntax"><code class="literal">SHOW CREATE DATABASE</code></a> displays
          the <a href="sql-syntax.html#create-database" title="12.1.10. CREATE DATABASE Syntax"><code class="literal">CREATE DATABASE</code></a> statement
          that creates a given database:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW CREATE DATABASE test;</code></strong>
+----------+-----------------------------------------------------------------+
| Database | Create Database                                                 |
+----------+-----------------------------------------------------------------+
| test     | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+
</pre><p>
          If no <code class="literal">COLLATE</code> clause is shown, the default
          collation for the character set applies.
        </p><p>
          <a href="sql-syntax.html#show-create-table" title="12.5.5.12. SHOW CREATE TABLE Syntax"><code class="literal">SHOW CREATE TABLE</code></a> is similar,
          but displays the <a href="sql-syntax.html#create-table" title="12.1.17. CREATE TABLE Syntax"><code class="literal">CREATE TABLE</code></a>
          statement to create a given table. The column definitions
          indicate any character set specifications, and the table
          options include character set information.
        </p><p>
          The <a href="sql-syntax.html#show-columns" title="12.5.5.6. SHOW COLUMNS Syntax"><code class="literal">SHOW COLUMNS</code></a> statement
          displays the collations of a table's columns when invoked as
          <a href="sql-syntax.html#show-columns" title="12.5.5.6. SHOW COLUMNS Syntax"><code class="literal">SHOW FULL
          COLUMNS</code></a>. Columns with
          <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">CHAR</code></a>,
          <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">VARCHAR</code></a>, or
          <a href="data-types.html#blob" title="10.4.3. The BLOB and
        TEXT Types"><code class="literal">TEXT</code></a> data types have
          collations. Numeric and other noncharacter types have no
          collation (indicated by <code class="literal">NULL</code> as the
          <code class="literal">Collation</code> value). For example:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW FULL COLUMNS FROM person\G</code></strong>
*************************** 1. row ***************************
     Field: id
      Type: smallint(5) unsigned
 Collation: NULL
      Null: NO
       Key: PRI
   Default: NULL
     Extra: auto_increment
Privileges: select,insert,update,references
   Comment:
*************************** 2. row ***************************
     Field: name
      Type: char(60)
 Collation: latin1_swedish_ci
      Null: NO
       Key:
   Default:
     Extra:
Privileges: select,insert,update,references
   Comment:

</pre><p>
          The character set is not part of the display but is implied by
          the collation name.
        </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-unicode"></a>9.1.10. Unicode Support</h3></div></div></div><p>
        MySQL 5.1 supports two character sets for storing
        Unicode data:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            <code class="literal">ucs2</code>, the UCS-2 encoding of the Unicode
            character set using 16 bits per character
          </p></li><li><p>
            <code class="literal">utf8</code>, a UTF-8 encoding of the Unicode
            character set using one to three bytes per character
          </p></li></ul></div><p>
        These two character sets support the characters from the Basic
        Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters
        have these characteristics:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            Their code values are between 0 and 65535 (or
            <code class="literal">U+0000</code> .. <code class="literal">U+FFFF</code>)
          </p></li><li><p>
            They can be encoded with a fixed 16-bit word, as in
            <code class="literal">ucs2</code>
          </p></li><li><p>
            They can be encoded with 8, 16, or 24 bits, as in
            <code class="literal">utf8</code>
          </p></li><li><p>
            They are sufficient for almost all characters in major
            languages
          </p></li></ul></div><p>
        The <code class="literal">ucs2</code> and <code class="literal">utf8</code>
        character sets do not support supplementary characters that lie
        outside the BMP.
      </p><p>
        A similar set of collations is available for each Unicode
        character set. For example, each has a Danish collation, the
        names of which are <code class="literal">ucs2_danish_ci</code> and
        <code class="literal">utf8_danish_ci</code>. All Unicode collations are
        listed at <a href="internationalization-localization.html#charset-unicode-sets" title="9.1.13.1. Unicode Character Sets">Section 9.1.13.1, “Unicode Character Sets”</a>.
      </p><p>
        In UCS-2, every character is represented by a two-byte Unicode
        code with the most significant byte first. For example:
        <code class="literal">LATIN CAPITAL LETTER A</code> has the code
        <code class="literal">0x0041</code> and it is stored as a two-byte
        sequence: <code class="literal">0x00 0x41</code>. <code class="literal">CYRILLIC SMALL
        LETTER YERU</code> (Unicode <code class="literal">0x044B</code>) is
        stored as a two-byte sequence: <code class="literal">0x04 0x4B</code>. For
        Unicode characters and their codes, please refer to the
        <a href="http://www.unicode.org/" target="_top">Unicode Home Page</a>.
      </p><p>
        The MySQL implementation of UCS-2 stores characters in
        big-endian byte order and does not use a byte order mark (BOM)
        at the beginning of UCS-2 values. Other database systems might
        use little-endian byte order or a BOM, in which case, conversion
        of UCS-2 values will need to be performed when transferring data
        between those systems and MySQL.
      </p><p>
        UTF-8 (Unicode Transformation Format with 8-bit units) is an
        alternative way to store Unicode data. It is implemented
        according to RFC 3629. RFC 3629 describes encoding sequences
        that take from one to four bytes. Currently, MySQL support for
        UTF-8 does not include four-byte sequences. (An older standard
        for UTF-8 encoding is given by RFC 2279, which describes UTF-8
        sequences that take from one to six bytes. RFC 3629 renders RFC
        2279 obsolete; for this reason, sequences with five and six
        bytes are no longer used.)
      </p><p>
        The idea of UTF-8 is that various Unicode characters are encoded
        using byte sequences of different lengths:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            Basic Latin letters, digits, and punctuation signs use one
            byte.
          </p></li><li><p>
            Most European and Middle East script letters fit into a
            two-byte sequence: extended Latin letters (with tilde,
            macron, acute, grave and other accents), Cyrillic, Greek,
            Armenian, Hebrew, Arabic, Syriac, and others.
          </p></li><li><p>
            Korean, Chinese, and Japanese ideographs use three-byte
            sequences.
          </p></li></ul></div><p>
        MySQL uses no BOM for UTF-8 values.
      </p><p>
        <span class="bold"><strong>Tip</strong></span>: To save space with UTF-8,
        use <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">VARCHAR</code></a> instead of
        <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">CHAR</code></a>. Otherwise, MySQL must
        reserve three bytes for each character in a <code class="literal">CHAR
        CHARACTER SET utf8</code> column because that is the maximum
        possible length. For example, MySQL must reserve 30 bytes for a
        <code class="literal">CHAR(10) CHARACTER SET utf8</code> column.
      </p><p>
        UCS-2 cannot be used as a client character set, which means that
        <code class="literal">SET NAMES 'ucs2'</code> does not work. (See
        <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.)
      </p><p>
        Client applications that need to communicate with the server
        using Unicode should set the client character set accordingly;
        for example, by issuing a <code class="literal">SET NAMES 'utf8'</code>
        statement. <code class="literal">ucs2</code> cannot be used as a client
        character set, which means that it does not work for
        <code class="literal">SET NAMES</code> or <code class="literal">SET CHARACTER
        SET</code>. (See <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.)
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-metadata"></a>9.1.11. UTF-8 for Metadata</h3></div></div></div><p>
        <em class="firstterm">Metadata</em> is “<span class="quote">the data about the
        data.</span>” Anything that <span class="emphasis"><em>describes</em></span> the
        database — as opposed to being the
        <span class="emphasis"><em>contents</em></span> of the database — is
        metadata. Thus column names, database names, user names, version
        names, and most of the string results from
        <a href="sql-syntax.html#show" title="12.5.5. SHOW Syntax"><code class="literal">SHOW</code></a> are metadata. This is also
        true of the contents of tables in
        <code class="literal">INFORMATION_SCHEMA</code>, because those tables by
        definition contain information about database objects.
      </p><p>
        Representation of metadata must satisfy these requirements:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            All metadata must be in the same character set. Otherwise,
            neither the <a href="sql-syntax.html#show" title="12.5.5. SHOW Syntax"><code class="literal">SHOW</code></a> commands nor
            <a href="sql-syntax.html#select" title="12.2.8. SELECT Syntax"><code class="literal">SELECT</code></a> statements for tables
            in <code class="literal">INFORMATION_SCHEMA</code> would work properly
            because different rows in the same column of the results of
            these operations would be in different character sets.
          </p></li><li><p>
            Metadata must include all characters in all languages.
            Otherwise, users would not be able to name columns and
            tables using their own languages.
          </p></li></ul></div><p>
        To satisfy both requirements, MySQL stores metadata in a Unicode
        character set, namely UTF-8. This does not cause any disruption
        if you never use accented or non-Latin characters. But if you
        do, you should be aware that metadata is in UTF-8.
      </p><p>
        The metadata requirements mean that the return values of the
        <a href="functions.html#function_user"><code class="literal">USER()</code></a>,
        <a href="functions.html#function_current-user"><code class="literal">CURRENT_USER()</code></a>,
        <a href="functions.html#function_session-user"><code class="literal">SESSION_USER()</code></a>,
        <a href="functions.html#function_system-user"><code class="literal">SYSTEM_USER()</code></a>,
        <a href="functions.html#function_database"><code class="literal">DATABASE()</code></a>, and
        <a href="functions.html#function_version"><code class="literal">VERSION()</code></a> functions have the
        UTF-8 character set by default.
      </p><p>
        The server sets the
        <a href="server-administration.html#sysvar_character_set_system"><code class="literal">character_set_system</code></a> system
        variable to the name of the metadata character set:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW VARIABLES LIKE 'character_set_system';</code></strong>
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| character_set_system | utf8  |
+----------------------+-------+
</pre><p>
        Storage of metadata using Unicode does <span class="emphasis"><em>not</em></span>
        mean that the server returns headers of columns and the results
        of <a href="sql-syntax.html#describe" title="12.3.1. DESCRIBE Syntax"><code class="literal">DESCRIBE</code></a> functions in the
        <a href="server-administration.html#sysvar_character_set_system"><code class="literal">character_set_system</code></a> character
        set by default. When you use <code class="literal">SELECT column1 FROM
        t</code>, the name <code class="literal">column1</code> itself is
        returned from the server to the client in the character set
        determined by the value of the
        <a href="server-administration.html#sysvar_character_set_results"><code class="literal">character_set_results</code></a> system
        variable, which has a default value of
        <code class="literal">latin1</code>. If you want the server to pass
        metadata results back in a different character set, use the
        <code class="literal">SET NAMES</code> statement to force the server to
        perform character set conversion. <code class="literal">SET NAMES</code>
        sets the <a href="server-administration.html#sysvar_character_set_results"><code class="literal">character_set_results</code></a>
        and other related system variables. (See
        <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.) Alternatively, a client
        program can perform the conversion after receiving the result
        from the server. It is more efficient for the client perform the
        conversion, but this option is not always available for all
        clients.
      </p><p>
        If <a href="server-administration.html#sysvar_character_set_results"><code class="literal">character_set_results</code></a> is set
        to <code class="literal">NULL</code>, no conversion is performed and the
        server returns metadata using its original character set (the
        set indicated by
        <a href="server-administration.html#sysvar_character_set_system"><code class="literal">character_set_system</code></a>).
      </p><p>
        Error messages returned from the server to the client are
        converted to the client character set automatically, as with
        metadata.
      </p><p>
        If you are using (for example) the
        <a href="functions.html#function_user"><code class="literal">USER()</code></a> function for comparison or
        assignment within a single statement, don't worry. MySQL
        performs some automatic conversion for you.
      </p><pre class="programlisting">SELECT * FROM t1 WHERE USER() = latin1_column;
</pre><p>
        This works because the contents of
        <code class="literal">latin1_column</code> are automatically converted to
        UTF-8 before the comparison.
      </p><pre class="programlisting">INSERT INTO t1 (latin1_column) SELECT USER();
</pre><p>
        This works because the contents of
        <a href="functions.html#function_user"><code class="literal">USER()</code></a> are automatically
        converted to <code class="literal">latin1</code> before the assignment.
      </p><p>
        Although automatic conversion is not in the SQL standard, the
        SQL standard document does say that every character set is (in
        terms of supported characters) a “<span class="quote">subset</span>” of
        Unicode. Because it is a well-known principle that “<span class="quote">what
        applies to a superset can apply to a subset,</span>” we believe
        that a collation for Unicode can apply for comparisons with
        non-Unicode strings. For more information about coercion of
        strings, see <a href="internationalization-localization.html#charset-collate-tricky" title="9.1.7.5. Special Cases Where Collation Determination Is Tricky">Section 9.1.7.5, “Special Cases Where Collation Determination Is Tricky”</a>.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-conversion"></a>9.1.12. Column Character Set Conversion</h3></div></div></div><p>
        To convert a binary or nonbinary string column to use a
        particular character set, use <a href="sql-syntax.html#alter-table" title="12.1.7. ALTER TABLE Syntax"><code class="literal">ALTER
        TABLE</code></a>. For successful conversion to occur, one of the
        following conditions must apply:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            If the column has a binary data type
            (<a href="data-types.html#binary-varbinary" title="10.4.2. The BINARY and
        VARBINARY Types"><code class="literal">BINARY</code></a>,
            <a href="data-types.html#binary-varbinary" title="10.4.2. The BINARY and
        VARBINARY Types"><code class="literal">VARBINARY</code></a>,
            <a href="data-types.html#blob" title="10.4.3. The BLOB and
        TEXT Types"><code class="literal">BLOB</code></a>), all the values that it
            contains must be encoded using a single character set (the
            character set you're converting the column to). If you use a
            binary column to store information in multiple character
            sets, MySQL has no way to know which values use which
            character set and cannot convert the data properly.
          </p></li><li><p>
            If the column has a nonbinary data type
            (<a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">CHAR</code></a>,
            <a href="data-types.html#char" title="10.4.1. The CHAR and
        VARCHAR Types"><code class="literal">VARCHAR</code></a>,
            <a href="data-types.html#blob" title="10.4.3. The BLOB and
        TEXT Types"><code class="literal">TEXT</code></a>), its contents should be
            encoded in the column's character set, not some other
            character set. If the contents are encoded in a different
            character set, you can convert the column to use a binary
            data type first, and then to a nonbinary column with the
            desired character set.
          </p></li></ul></div><p>
        Suppose that a table <code class="literal">t</code> has a binary column
        named <code class="literal">col1</code> defined as
        <code class="literal">VARBINARY(50)</code>. Assuming that the information
        in the column is encoded using a single character set, you can
        convert it to a nonbinary column that has that character set.
        For example, if <code class="literal">col1</code> contains binary data
        representing characters in the <code class="literal">greek</code>
        character set, you can convert it as follows:
      </p><pre class="programlisting">ALTER TABLE t MODIFY col1 VARCHAR(50) CHARACTER SET greek;
</pre><p>
        If your original column has a type of
        <code class="literal">BINARY(50)</code>, you could convert it to
        <code class="literal">CHAR(50)</code>, but the resulting values will be
        padded with <code class="literal">0x00</code> bytes at the end, which may
        be undesirable. To remove these bytes, use the
        <a href="functions.html#function_trim"><code class="literal">TRIM()</code></a> function:
      </p><pre class="programlisting">UPDATE t SET col1 = TRIM(TRAILING 0x00 FROM col1);
</pre><p>
        Suppose that table <code class="literal">t</code> has a nonbinary column
        named <code class="literal">col1</code> defined as <code class="literal">CHAR(50)
        CHARACTER SET latin1</code> but you want to convert it to use
        <code class="literal">utf8</code> so that you can store values from many
        languages. The following statement accomplishes this:
      </p><pre class="programlisting">ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
</pre><p>
        Conversion may be lossy if the column contains characters that
        are not in both character sets.
      </p><p>
        A special case occurs if you have old tables from MySQL 4.0 or
        earlier where a nonbinary column contains values that actually
        are encoded in a character set different from the server's
        default character set. For example, an application might have
        stored <code class="literal">sjis</code> values in a column, even though
        MySQL's default character set was <code class="literal">latin1</code>. It
        is possible to convert the column to use the proper character
        set but an additional step is required. Suppose that the
        server's default character set was <code class="literal">latin1</code> and
        <code class="literal">col1</code> is defined as
        <code class="literal">CHAR(50)</code> but its contents are
        <code class="literal">sjis</code> values. The first step is to convert the
        column to a binary data type, which removes the existing
        character set information without performing any character
        conversion:
      </p><pre class="programlisting">ALTER TABLE t MODIFY col1 BLOB;
</pre><p>
        The next step is to convert the column to a nonbinary data type
        with the proper character set:
      </p><pre class="programlisting">ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET sjis;
</pre><p>
        This procedure requires that the table not have been modified
        already with statements such as
        <a href="sql-syntax.html#insert" title="12.2.5. INSERT Syntax"><code class="literal">INSERT</code></a> or
        <a href="sql-syntax.html#update" title="12.2.11. UPDATE Syntax"><code class="literal">UPDATE</code></a> after an upgrade to MySQL
        4.1 or later. In that case, MySQL would store new values in the
        column using <code class="literal">latin1</code>, and the column will
        contain a mix of <code class="literal">sjis</code> and
        <code class="literal">latin1</code> values and cannot be converted
        properly.
      </p><p>
        If you specified attributes when creating a column initially,
        you should also specify them when altering the table with
        <a href="sql-syntax.html#alter-table" title="12.1.7. ALTER TABLE Syntax"><code class="literal">ALTER TABLE</code></a>. For example, if you
        specified <code class="literal">NOT NULL</code> and an explicit
        <code class="literal">DEFAULT</code> value, you should also provide them
        in the <a href="sql-syntax.html#alter-table" title="12.1.7. ALTER TABLE Syntax"><code class="literal">ALTER TABLE</code></a> statement.
        Otherwise, the resulting column definition will not include
        those attributes.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-charsets"></a>9.1.13. Character Sets and Collations That MySQL Supports</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-unicode-sets">9.1.13.1. Unicode Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-we-sets">9.1.13.2. West European Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-ce-sets">9.1.13.3. Central European Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-se-me-sets">9.1.13.4. South European and Middle East Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-baltic-sets">9.1.13.5. Baltic Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-cyrillic-sets">9.1.13.6. Cyrillic Character Sets</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#charset-asian-sets">9.1.13.7. Asian Character Sets</a></span></dt></dl></div><p>
        MySQL supports 70+ collations for 30+ character sets. This
        section indicates which character sets MySQL supports. There is
        one subsection for each group of related character sets. For
        each character set, the allowable collations are listed.
      </p><p>
        You can always list the available character sets and their
        default collations with the <a href="sql-syntax.html#show-character-set" title="12.5.5.4. SHOW CHARACTER SET Syntax"><code class="literal">SHOW CHARACTER
        SET</code></a> statement:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW CHARACTER SET;</code></strong>
+----------+-----------------------------+---------------------+
| Charset  | Description                 | Default collation   |
+----------+-----------------------------+---------------------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |
| dec8     | DEC West European           | dec8_swedish_ci     |
| cp850    | DOS West European           | cp850_general_ci    |
| hp8      | HP West European            | hp8_english_ci      |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |
| latin1   | cp1252 West European        | latin1_swedish_ci   |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |
| ascii    | US ASCII                    | ascii_general_ci    |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |
| cp1250   | Windows Central European    | cp1250_general_ci   |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |
| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |
| utf8     | UTF-8 Unicode               | utf8_general_ci     |
| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |
| cp866    | DOS Russian                 | cp866_general_ci    |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |
| macce    | Mac Central European        | macce_general_ci    |
| macroman | Mac West European           | macroman_general_ci |
| cp852    | DOS Central European        | cp852_general_ci    |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |
| cp1251   | Windows Cyrillic            | cp1251_general_ci   |
| cp1256   | Windows Arabic              | cp1256_general_ci   |
| cp1257   | Windows Baltic              | cp1257_general_ci   |
| binary   | Binary pseudo charset       | binary              |
| geostd8  | GEOSTD8 Georgian            | geostd8_general_ci  |
| cp932    | SJIS for Windows Japanese   | cp932_japanese_ci   |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |
+----------+-----------------------------+---------------------+
</pre><p>
        In cases where a character set has multiple collations, it might
        not be clear which collation is most suitable for a given
        application. To avoid choosing the wrong collation, it can be
        helpful to perform some comparisons with representative data
        values to make sure that a given collation sorts values the way
        you expect.
      </p><p>
        <a href="http://www.collation-charts.org/" target="_top">Collation-Charts.Org</a>
        is a useful site for information that shows how one collation
        compares to another.
      </p><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-unicode-sets"></a>9.1.13.1. Unicode Character Sets</h4></div></div></div><p>
          MySQL 5.1 has two Unicode character sets:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal">ucs2</code>, the UCS-2 encoding of the Unicode
              character set using 16 bits per character
            </p></li><li><p>
              <code class="literal">utf8</code>, a UTF-8 encoding of the Unicode
              character set using one to three bytes per character
            </p></li></ul></div><p>
          You can store text in about 650 languages using these
          character sets. This section lists the collations available
          for each Unicode character set. For general information about
          the character sets, see <a href="internationalization-localization.html#charset-unicode" title="9.1.10. Unicode Support">Section 9.1.10, “Unicode Support”</a>.
        </p><p>
          A similar set of collations is available for each Unicode
          character set. These are shown in the following list, where
          <em class="replaceable"><code>xxx</code></em> represents the character set
          name. For example,
          <code class="literal"><em class="replaceable"><code>xxx</code></em>_danish_ci</code>
          represents the Danish collations, the specific names of which
          are <code class="literal">ucs2_danish_ci</code> and
          <code class="literal">utf8_danish_ci</code>.
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_bin</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_czech_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_danish_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_esperanto_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_estonian_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_general_ci</code>
              (default)
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_hungarian_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_icelandic_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_latvian_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_lithuanian_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_persian_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_polish_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_roman_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_romanian_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_slovak_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_slovenian_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_spanish2_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_spanish_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_swedish_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_turkish_ci</code>
            </p></li><li><p>
              <code class="literal"><em class="replaceable"><code>xxx</code></em>_unicode_ci</code>
            </p></li></ul></div><p>
          The
          <code class="literal"><em class="replaceable"><code>xxx</code></em>_hungarian_ci</code>
          collations were added in MySQL 5.1.5.
        </p><a class="indexterm" name="id4317522"></a><p>
          MySQL implements the
          <code class="literal"><em class="replaceable"><code>xxx</code></em>_unicode_ci</code>
          collations according to the Unicode Collation Algorithm (UCA)
          described at
          <a href="http://www.unicode.org/reports/tr10/" target="_top">http://www.unicode.org/reports/tr10/</a>. The
          collation uses the version-4.0.0 UCA weight keys:
          <a href="http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt" target="_top">http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt</a>.
          Currently, the
          <code class="literal"><em class="replaceable"><code>xxx</code></em>_unicode_ci</code>
          collations have only partial support for the Unicode Collation
          Algorithm. Some characters are not supported yet. Also,
          combining marks are not fully supported. This affects
          primarily Vietnamese, Yoruba, and some smaller languages such
          as Navajo. The following discussion uses
          <code class="literal">utf8_unicode_ci</code> for concreteness.
        </p><p>
          For any Unicode character set, operations performed using the
          <code class="literal">_general_ci</code> collation are faster than those
          for the <code class="literal">_unicode_ci</code> collation. For example,
          comparisons for the <code class="literal">utf8_general_ci</code>
          collation are faster, but slightly less correct, than
          comparisons for <code class="literal">utf8_unicode_ci</code>. The reason
          for this is that <code class="literal">utf8_unicode_ci</code> supports
          mappings such as expansions; that is, when one character
          compares as equal to combinations of other characters. For
          example, in German and some other languages
          “<span class="quote"><code class="literal">ß</code></span>” is equal to
          “<span class="quote"><code class="literal">ss</code></span>”.
          <code class="literal">utf8_unicode_ci</code> also supports contractions
          and ignorable characters. <code class="literal">utf8_general_ci</code>
          is a legacy collation that does not support expansions,
          contractions, or ignorable characters. It can make only
          one-to-one comparisons between characters.
        </p><p>
          To further illustrate, the following equalities hold in both
          <code class="literal">utf8_general_ci</code> and
          <code class="literal">utf8_unicode_ci</code> (for the effect this has in
          comparisons or when doing searches, see
          <a href="internationalization-localization.html#charset-collation-effect" title="9.1.7.7. Examples of the Effect of Collation">Section 9.1.7.7, “Examples of the Effect of Collation”</a>):
        </p><pre class="programlisting">Ä = A
Ö = O
Ü = U
</pre><p>
          A difference between the collations is that this is true for
          <code class="literal">utf8_general_ci</code>:
        </p><pre class="programlisting">ß = s
</pre><p>
          Whereas this is true for <code class="literal">utf8_unicode_ci</code>:
        </p><pre class="programlisting">ß = ss
</pre><p>
          MySQL implements language-specific collations for the
          <code class="literal">utf8</code> character set only if the ordering
          with <code class="literal">utf8_unicode_ci</code> does not work well for
          a language. For example, <code class="literal">utf8_unicode_ci</code>
          works fine for German and French, so there is no need to
          create special <code class="literal">utf8</code> collations for these
          two languages.
        </p><p>
          <code class="literal">utf8_general_ci</code> also is satisfactory for
          both German and French, except that
          “<span class="quote"><code class="literal">ß</code></span>” is equal to
          “<span class="quote"><code class="literal">s</code></span>”, and not to
          “<span class="quote"><code class="literal">ss</code></span>”. If this is acceptable
          for your application, then you should use
          <code class="literal">utf8_general_ci</code> because it is faster.
          Otherwise, use <code class="literal">utf8_unicode_ci</code> because it
          is more accurate.
        </p><p>
          <code class="literal">utf8_swedish_ci</code>, like other
          <code class="literal">utf8</code> language-specific collations, is
          derived from <code class="literal">utf8_unicode_ci</code> with
          additional language rules. For example, in Swedish, the
          following relationship holds, which is not something expected
          by a German or French speaker:
        </p><pre class="programlisting">Ü = Y &lt; Ö
</pre><p>
          The
          <code class="literal"><em class="replaceable"><code>xxx</code></em>_spanish_ci</code>
          and
          <code class="literal"><em class="replaceable"><code>xxx</code></em>_spanish2_ci</code>
          collations correspond to modern Spanish and traditional
          Spanish, respectively. In both collations,
          “<span class="quote"><code class="literal">ñ</code></span>” (n-tilde) is a separate
          letter between “<span class="quote"><code class="literal">n</code></span>” and
          “<span class="quote"><code class="literal">o</code></span>”. In addition, for
          traditional Spanish, “<span class="quote"><code class="literal">ch</code></span>” is a
          separate letter between “<span class="quote"><code class="literal">c</code></span>”
          and “<span class="quote"><code class="literal">d</code></span>”, and
          “<span class="quote"><code class="literal">ll</code></span>” is a separate letter
          between “<span class="quote"><code class="literal">l</code></span>” and
          “<span class="quote"><code class="literal">m</code></span>”
        </p><p>
          In the
          <code class="literal"><em class="replaceable"><code>xxx</code></em>_roman_ci</code>
          collations, <code class="literal">I</code> and <code class="literal">J</code>
          compare as equal, and <code class="literal">U</code> and
          <code class="literal">V</code> compare as equal.
        </p><p>
          For additional information about Unicode collations in MySQL,
          see Collation-Charts.Org
          (<a href="http://www.collation-charts.org/mysql60/by-charset.html#utf8" target="_top">utf8</a>).
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-we-sets"></a>9.1.13.2. West European Character Sets</h4></div></div></div><p>
          Western European character sets cover most West European
          languages, such as French, Spanish, Catalan, Basque,
          Portuguese, Italian, Albanian, Dutch, German, Danish, Swedish,
          Norwegian, Finnish, Faroese, Icelandic, Irish, Scottish, and
          English.
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal">ascii</code> (US ASCII) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">ascii_bin</code>
                </p></li><li><p>
                  <code class="literal">ascii_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">cp850</code> (DOS West European) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp850_bin</code>
                </p></li><li><p>
                  <code class="literal">cp850_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">dec8</code> (DEC Western European) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">dec8_bin</code>
                </p></li><li><p>
                  <code class="literal">dec8_swedish_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">hp8</code> (HP Western European) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">hp8_bin</code>
                </p></li><li><p>
                  <code class="literal">hp8_english_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">latin1</code> (cp1252 West European)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">latin1_bin</code>
                </p></li><li><p>
                  <code class="literal">latin1_danish_ci</code>
                </p></li><li><p>
                  <code class="literal">latin1_general_ci</code>
                </p></li><li><p>
                  <code class="literal">latin1_general_cs</code>
                </p></li><li><p>
                  <code class="literal">latin1_german1_ci</code>
                </p></li><li><p>
                  <code class="literal">latin1_german2_ci</code>
                </p></li><li><p>
                  <code class="literal">latin1_spanish_ci</code>
                </p></li><li><p>
                  <code class="literal">latin1_swedish_ci</code> (default)
                </p></li></ul></div><p>
              <code class="literal">latin1</code> is the default character set.
              MySQL's <code class="literal">latin1</code> is the same as the
              Windows <code class="literal">cp1252</code> character set. This
              means it is the same as the official <code class="literal">ISO
              8859-1</code> or IANA (Internet Assigned Numbers
              Authority) <code class="literal">latin1</code>, except that IANA
              <code class="literal">latin1</code> treats the code points between
              <code class="literal">0x80</code> and <code class="literal">0x9f</code> as
              “<span class="quote">undefined,</span>” whereas
              <code class="literal">cp1252</code>, and therefore MySQL's
              <code class="literal">latin1</code>, assign characters for those
              positions. For example, <code class="literal">0x80</code> is the
              Euro sign. For the “<span class="quote">undefined</span>” entries in
              <code class="literal">cp1252</code>, MySQL translates
              <code class="literal">0x81</code> to Unicode
              <code class="literal">0x0081</code>, <code class="literal">0x8d</code> to
              <code class="literal">0x008d</code>, <code class="literal">0x8f</code> to
              <code class="literal">0x008f</code>, <code class="literal">0x90</code> to
              <code class="literal">0x0090</code>, and <code class="literal">0x9d</code> to
              <code class="literal">0x009d</code>.
            </p><p>
              The <code class="literal">latin1_swedish_ci</code> collation is the
              default that probably is used by the majority of MySQL
              customers. Although it is frequently said that it is based
              on the Swedish/Finnish collation rules, there are Swedes
              and Finns who disagree with this statement.
            </p><p>
              The <code class="literal">latin1_german1_ci</code> and
              <code class="literal">latin1_german2_ci</code> collations are based
              on the DIN-1 and DIN-2 standards, where DIN stands for
              <span class="foreignphrase"><em class="foreignphrase">Deutsches Institut für
              Normung</em></span> (the German equivalent of ANSI).
              DIN-1 is called the “<span class="quote">dictionary collation</span>”
              and DIN-2 is called the “<span class="quote">phone book
              collation.</span>” For an example of the effect this has
              in comparisons or when doing searches, see
              <a href="internationalization-localization.html#charset-collation-effect" title="9.1.7.7. Examples of the Effect of Collation">Section 9.1.7.7, “Examples of the Effect of Collation”</a>.
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">latin1_german1_ci</code> (dictionary)
                  rules:
                </p><pre class="programlisting">Ä = A
Ö = O
Ü = U
ß = s
</pre></li><li><p>
                  <code class="literal">latin1_german2_ci</code> (phone-book)
                  rules:
                </p><pre class="programlisting">Ä = AE
Ö = OE
Ü = UE
ß = ss
</pre></li></ul></div><p>
              For an example of the effect this has in comparisons or
              when doing searches, see
              <a href="internationalization-localization.html#charset-collation-effect" title="9.1.7.7. Examples of the Effect of Collation">Section 9.1.7.7, “Examples of the Effect of Collation”</a>.
            </p><p>
              In the <code class="literal">latin1_spanish_ci</code> collation,
              “<span class="quote"><code class="literal">ñ</code></span>” (n-tilde) is a
              separate letter between
              “<span class="quote"><code class="literal">n</code></span>” and
              “<span class="quote"><code class="literal">o</code></span>”.
            </p></li><li><p>
              <code class="literal">macroman</code> (Mac West European)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">macroman_bin</code>
                </p></li><li><p>
                  <code class="literal">macroman_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">swe7</code> (7bit Swedish) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">swe7_bin</code>
                </p></li><li><p>
                  <code class="literal">swe7_swedish_ci</code> (default)
                </p></li></ul></div></li></ul></div><p>
          For additional information about Western European collations
          in MySQL, see Collation-Charts.Org
          (<a href="http://www.collation-charts.org/mysql60/by-charset.html#ascii" target="_top">ascii</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#cp850" target="_top">cp850</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#dec8" target="_top">dec8</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#hp8" target="_top">hp8</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#latin1" target="_top">latin1</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#macroman" target="_top">macroman</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#swe7" target="_top">swe7</a>).
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-ce-sets"></a>9.1.13.3. Central European Character Sets</h4></div></div></div><p>
          MySQL provides some support for character sets used in the
          Czech Republic, Slovakia, Hungary, Romania, Slovenia, Croatia,
          Poland, and Serbia (Latin).
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal">cp1250</code> (Windows Central European)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp1250_bin</code>
                </p></li><li><p>
                  <code class="literal">cp1250_croatian_ci</code>
                </p></li><li><p>
                  <code class="literal">cp1250_czech_cs</code>
                </p></li><li><p>
                  <code class="literal">cp1250_general_ci</code> (default)
                </p></li><li><p>
                  <code class="literal">cp1250_polish_ci</code>
                </p></li></ul></div></li><li><p>
              <code class="literal">cp852</code> (DOS Central European)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp852_bin</code>
                </p></li><li><p>
                  <code class="literal">cp852_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">keybcs2</code> (DOS Kamenicky Czech-Slovak)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">keybcs2_bin</code>
                </p></li><li><p>
                  <code class="literal">keybcs2_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">latin2</code> (ISO 8859-2 Central European)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">latin2_bin</code>
                </p></li><li><p>
                  <code class="literal">latin2_croatian_ci</code>
                </p></li><li><p>
                  <code class="literal">latin2_czech_cs</code>
                </p></li><li><p>
                  <code class="literal">latin2_general_ci</code> (default)
                </p></li><li><p>
                  <code class="literal">latin2_hungarian_ci</code>
                </p></li></ul></div></li><li><p>
              <code class="literal">macce</code> (Mac Central European)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">macce_bin</code>
                </p></li><li><p>
                  <code class="literal">macce_general_ci</code> (default)
                </p></li></ul></div></li></ul></div><p>
          For additional information about Central European collations
          in MySQL, see Collation-Charts.Org
          (<a href="http://www.collation-charts.org/mysql60/by-charset.html#cp1250" target="_top">cp1250</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#cp852" target="_top">cp852</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#keybcs2" target="_top">keybcs2</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#latin2" target="_top">latin2</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#macce" target="_top">macce</a>).
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-se-me-sets"></a>9.1.13.4. South European and Middle East Character Sets</h4></div></div></div><p>
          South European and Middle Eastern character sets supported by
          MySQL include Armenian, Arabic, Georgian, Greek, Hebrew, and
          Turkish.
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal">armscii8</code> (ARMSCII-8 Armenian)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">armscii8_bin</code>
                </p></li><li><p>
                  <code class="literal">armscii8_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">cp1256</code> (Windows Arabic) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp1256_bin</code>
                </p></li><li><p>
                  <code class="literal">cp1256_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">geostd8</code> (GEOSTD8 Georgian) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">geostd8_bin</code>
                </p></li><li><p>
                  <code class="literal">geostd8_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">greek</code> (ISO 8859-7 Greek) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">greek_bin</code>
                </p></li><li><p>
                  <code class="literal">greek_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">hebrew</code> (ISO 8859-8 Hebrew) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">hebrew_bin</code>
                </p></li><li><p>
                  <code class="literal">hebrew_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">latin5</code> (ISO 8859-9 Turkish) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">latin5_bin</code>
                </p></li><li><p>
                  <code class="literal">latin5_turkish_ci</code> (default)
                </p></li></ul></div></li></ul></div><p>
          For additional information about South European and Middle
          Eastern collations in MySQL, see Collation-Charts.Org
          (<a href="http://www.collation-charts.org/mysql60/by-charset.html#armscii8" target="_top">armscii8</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#cp1256" target="_top">cp1256</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#geostd8" target="_top">geostd8</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#greek" target="_top">greek</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#hebrew" target="_top">hebrew</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#latin5" target="_top">latin5</a>).
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-baltic-sets"></a>9.1.13.5. Baltic Character Sets</h4></div></div></div><p>
          The Baltic character sets cover Estonian, Latvian, and
          Lithuanian languages.
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal">cp1257</code> (Windows Baltic) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp1257_bin</code>
                </p></li><li><p>
                  <code class="literal">cp1257_general_ci</code> (default)
                </p></li><li><p>
                  <code class="literal">cp1257_lithuanian_ci</code>
                </p></li></ul></div></li><li><p>
              <code class="literal">latin7</code> (ISO 8859-13 Baltic) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">latin7_bin</code>
                </p></li><li><p>
                  <code class="literal">latin7_estonian_cs</code>
                </p></li><li><p>
                  <code class="literal">latin7_general_ci</code> (default)
                </p></li><li><p>
                  <code class="literal">latin7_general_cs</code>
                </p></li></ul></div></li></ul></div><p>
          For additional information about Baltic collations in MySQL,
          see Collation-Charts.Org
          (<a href="http://www.collation-charts.org/mysql60/by-charset.html#cp1257" target="_top">cp1257</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#latin7" target="_top">latin7</a>).
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-cyrillic-sets"></a>9.1.13.6. Cyrillic Character Sets</h4></div></div></div><p>
          The Cyrillic character sets and collations are for use with
          Belarusian, Bulgarian, Russian, Ukrainian, and Serbian
          (Cyrillic) languages.
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal">cp1251</code> (Windows Cyrillic) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp1251_bin</code>
                </p></li><li><p>
                  <code class="literal">cp1251_bulgarian_ci</code>
                </p></li><li><p>
                  <code class="literal">cp1251_general_ci</code> (default)
                </p></li><li><p>
                  <code class="literal">cp1251_general_cs</code>
                </p></li><li><p>
                  <code class="literal">cp1251_ukrainian_ci</code>
                </p></li></ul></div></li><li><p>
              <code class="literal">cp866</code> (DOS Russian) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp866_bin</code>
                </p></li><li><p>
                  <code class="literal">cp866_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">koi8r</code> (KOI8-R Relcom Russian)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">koi8r_bin</code>
                </p></li><li><p>
                  <code class="literal">koi8r_general_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">koi8u</code> (KOI8-U Ukrainian) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">koi8u_bin</code>
                </p></li><li><p>
                  <code class="literal">koi8u_general_ci</code> (default)
                </p></li></ul></div></li></ul></div><p>
          For additional information about Cyrillic collations in MySQL,
          see Collation-Charts.Org
          (<a href="http://www.collation-charts.org/mysql60/by-charset.html#cp1251" target="_top">cp1251</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#cp866" target="_top">cp866</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#koi8r" target="_top">koi8r</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#koi8u" target="_top">koi8u</a>).
          ).
        </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="charset-asian-sets"></a>9.1.13.7. Asian Character Sets</h4></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-cp932">9.1.13.7.1. The <code class="literal">cp932</code> Character Set</a></span></dt></dl></div><p>
          The Asian character sets that we support include Chinese,
          Japanese, Korean, and Thai. These can be complicated. For
          example, the Chinese sets must allow for thousands of
          different characters. See <a href="internationalization-localization.html#charset-cp932" title="9.1.13.7.1. The cp932 Character Set">Section 9.1.13.7.1, “The <code class="literal">cp932</code> Character Set”</a>, for
          additional information about the <code class="literal">cp932</code> and
          <code class="literal">sjis</code> character sets.
        </p><p>
          For answers to some common questions and problems relating
          support for Asian character sets in MySQL, see
          <a href="faqs.html#faqs-cjk" title="A.11. MySQL 5.1 FAQ — MySQL Chinese, Japanese, and Korean
      Character Sets">Section A.11, “MySQL 5.1 FAQ — MySQL Chinese, Japanese, and Korean
      Character Sets”</a>.
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              <code class="literal">big5</code> (Big5 Traditional Chinese)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">big5_bin</code>
                </p></li><li><p>
                  <code class="literal">big5_chinese_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">cp932</code> (SJIS for Windows Japanese)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">cp932_bin</code>
                </p></li><li><p>
                  <code class="literal">cp932_japanese_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">eucjpms</code> (UJIS for Windows Japanese)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">eucjpms_bin</code>
                </p></li><li><p>
                  <code class="literal">eucjpms_japanese_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">euckr</code> (EUC-KR Korean) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">euckr_bin</code>
                </p></li><li><p>
                  <code class="literal">euckr_korean_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">gb2312</code> (GB2312 Simplified Chinese)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">gb2312_bin</code>
                </p></li><li><p>
                  <code class="literal">gb2312_chinese_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">gbk</code> (GBK Simplified Chinese)
              collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">gbk_bin</code>
                </p></li><li><p>
                  <code class="literal">gbk_chinese_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">sjis</code> (Shift-JIS Japanese) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">sjis_bin</code>
                </p></li><li><p>
                  <code class="literal">sjis_japanese_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">tis620</code> (TIS620 Thai) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">tis620_bin</code>
                </p></li><li><p>
                  <code class="literal">tis620_thai_ci</code> (default)
                </p></li></ul></div></li><li><p>
              <code class="literal">ujis</code> (EUC-JP Japanese) collations:
            </p><div class="itemizedlist"><ul type="circle"><li><p>
                  <code class="literal">ujis_bin</code>
                </p></li><li><p>
                  <code class="literal">ujis_japanese_ci</code> (default)
                </p></li></ul></div></li></ul></div><p>
          The <code class="literal">big5_chinese_ci</code> collation sorts on
          number of strokes.
        </p><p>
          For additional information about Asian collations in MySQL,
          see Collation-Charts.Org
          (<a href="http://www.collation-charts.org/mysql60/by-charset.html#big5" target="_top">big5</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#cp932" target="_top">cp932</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#eucjpms" target="_top">eucjpms</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#euckr" target="_top">euckr</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#gb2312" target="_top">gb2312</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#gbk" target="_top">gbk</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#sjis" target="_top">sjis</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#tis620" target="_top">tis620</a>,
          <a href="http://www.collation-charts.org/mysql60/by-charset.html#ujis" target="_top">ujis</a>).
        </p><div class="section" lang="en"><div class="titlepage"><div><div><h5 class="title"><a name="charset-cp932"></a>9.1.13.7.1. The <code class="literal">cp932</code> Character Set</h5></div></div></div><p>
            <span class="bold"><strong>Why is <code class="literal">cp932</code>
            needed?</strong></span>
          </p><p>
            In MySQL, the <code class="literal">sjis</code> character set
            corresponds to the <code class="literal">Shift_JIS</code> character
            set defined by IANA, which supports JIS X0201 and JIS X0208
            characters. (See
            <a href="http://www.iana.org/assignments/character-sets" target="_top">http://www.iana.org/assignments/character-sets</a>.)
          </p><p>
            However, the meaning of “<span class="quote">SHIFT JIS</span>” as a
            descriptive term has become very vague and it often includes
            the extensions to <code class="literal">Shift_JIS</code> that are
            defined by various vendors.
          </p><p>
            For example, “<span class="quote">SHIFT JIS</span>” used in Japanese
            Windows environments is a Microsoft extension of
            <code class="literal">Shift_JIS</code> and its exact name is
            <code class="literal">Microsoft Windows Codepage : 932</code> or
            <code class="literal">cp932</code>. In addition to the characters
            supported by <code class="literal">Shift_JIS</code>,
            <code class="literal">cp932</code> supports extension characters such
            as NEC special characters, NEC selected — IBM extended
            characters, and IBM extended characters.
          </p><p>
            Many Japanese users have experienced problems using these
            extension characters. These problems stem from the following
            factors:
          </p><div class="itemizedlist"><ul type="disc"><li><p>
                MySQL automatically converts character sets.
              </p></li><li><p>
                Character sets are converted via Unicode
                (<code class="literal">ucs2</code>).
              </p></li><li><p>
                The <code class="literal">sjis</code> character set does not
                support the conversion of these extension characters.
              </p></li><li><p>
                There are several conversion rules from so-called
                “<span class="quote">SHIFT JIS</span>” to Unicode, and some characters
                are converted to Unicode differently depending on the
                conversion rule. MySQL supports only one of these rules
                (described later).
              </p></li></ul></div><p>
            The MySQL <code class="literal">cp932</code> character set is designed
            to solve these problems.
          </p><p>
            Because MySQL supports character set conversion, it is
            important to separate IANA <code class="literal">Shift_JIS</code> and
            <code class="literal">cp932</code> into two different character sets
            because they provide different conversion rules.
          </p><p>
            <span class="bold"><strong>How does <code class="literal">cp932</code>
            differ from <code class="literal">sjis</code>?</strong></span>
          </p><p>
            The <code class="literal">cp932</code> character set differs from
            <code class="literal">sjis</code> in the following ways:
          </p><div class="itemizedlist"><ul type="disc"><li><p>
                <code class="literal">cp932</code> supports NEC special
                characters, NEC selected — IBM extended
                characters, and IBM selected characters.
              </p></li><li><p>
                Some <code class="literal">cp932</code> characters have two
                different code points, both of which convert to the same
                Unicode code point. When converting from Unicode back to
                <code class="literal">cp932</code>, one of the code points must be
                selected. For this “<span class="quote">round trip conversion,</span>”
                the rule recommended by Microsoft is used. (See
                <a href="http://support.microsoft.com/kb/170559/EN-US/" target="_top">http://support.microsoft.com/kb/170559/EN-US/</a>.)
              </p><p>
                The conversion rule works like this:
              </p><div class="itemizedlist"><ul type="circle"><li><p>
                    If the character is in both JIS X 0208 and NEC
                    special characters, use the code point of JIS X
                    0208.
                  </p></li><li><p>
                    If the character is in both NEC special characters
                    and IBM selected characters, use the code point of
                    NEC special characters.
                  </p></li><li><p>
                    If the character is in both IBM selected characters
                    and NEC selected — IBM extended characters,
                    use the code point of IBM extended characters.
                  </p></li></ul></div><p>
                The table shown at
                <a href="http://www.microsoft.com/globaldev/reference/dbcs/932.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932.htm</a>
                provides information about the Unicode values of
                <code class="literal">cp932</code> characters. For
                <code class="literal">cp932</code> table entries with characters
                under which a four-digit number appears, the number
                represents the corresponding Unicode
                (<code class="literal">ucs2</code>) encoding. For table entries
                with an underlined two-digit value appears, there is a
                range of <code class="literal">cp932</code> character values that
                begin with those two digits. Clicking such a table entry
                takes you to a page that displays the Unicode value for
                each of the <code class="literal">cp932</code> characters that
                begin with those digits.
              </p><p>
                The following links are of special interest. They
                correspond to the encodings for the following sets of
                characters:
              </p><div class="itemizedlist"><ul type="circle"><li><p>
                    NEC special characters:
                  </p><pre class="programlisting"><a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_87.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_87.htm</a>
</pre></li><li><p>
                    NEC selected — IBM extended characters:
                  </p><pre class="programlisting"><a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_ED.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_ED.htm</a>
<a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_EE.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_EE.htm</a>
</pre></li><li><p>
                    IBM selected characters:
                  </p><pre class="programlisting"><a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_FA.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_FA.htm</a>
<a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_FB.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_FB.htm</a>
<a href="http://www.microsoft.com/globaldev/reference/dbcs/932/932_FC.htm" target="_top">http://www.microsoft.com/globaldev/reference/dbcs/932/932_FC.htm</a>
</pre></li></ul></div></li><li><p>
                <code class="literal">cp932</code> supports conversion of
                user-defined characters in combination with
                <code class="literal">eucjpms</code>, and solves the problems with
                <code class="literal">sjis</code>/<code class="literal">ujis</code>
                conversion. For details, please refer to
                <a href="http://www.opengroup.or.jp/jvc/cde/sjis-euc-e.html" target="_top">http://www.opengroup.or.jp/jvc/cde/sjis-euc-e.html</a>.
              </p></li></ul></div><p>
            For some characters, conversion to and from
            <code class="literal">ucs2</code> is different for
            <code class="literal">sjis</code> and <code class="literal">cp932</code>. The
            following tables illustrate these differences.
          </p><p>
            Conversion to <code class="literal">ucs2</code>:
          </p><div class="informaltable"><table border="1"><colgroup><col><col><col></colgroup><tbody><tr><td><span class="bold"><strong><code class="literal">sjis</code>/<code class="literal">cp932</code>
                    Value</strong></span></td><td><span class="bold"><strong><code class="literal">sjis</code> -&gt;
                    <code class="literal">ucs2</code> Conversion</strong></span></td><td><span class="bold"><strong><code class="literal">cp932</code> -&gt;
                    <code class="literal">ucs2</code> Conversion</strong></span></td></tr><tr><td>5C</td><td>005C</td><td>005C</td></tr><tr><td>7E</td><td>007E</td><td>007E</td></tr><tr><td>815C</td><td>2015</td><td>2015</td></tr><tr><td>815F</td><td>005C</td><td>FF3C</td></tr><tr><td>8160</td><td>301C</td><td>FF5E</td></tr><tr><td>8161</td><td>2016</td><td>2225</td></tr><tr><td>817C</td><td>2212</td><td>FF0D</td></tr><tr><td>8191</td><td>00A2</td><td>FFE0</td></tr><tr><td>8192</td><td>00A3</td><td>FFE1</td></tr><tr><td>81CA</td><td>00AC</td><td>FFE2</td></tr></tbody></table></div><p>
            Conversion from <code class="literal">ucs2</code>:
          </p><div class="informaltable"><table border="1"><colgroup><col><col><col></colgroup><tbody><tr><td><span class="bold"><strong><code class="literal">ucs2</code> value</strong></span></td><td><span class="bold"><strong><code class="literal">ucs2</code> -&gt;
                    <code class="literal">sjis</code> Conversion</strong></span></td><td><span class="bold"><strong><code class="literal">ucs2</code> -&gt;
                    <code class="literal">cp932</code> Conversion</strong></span></td></tr><tr><td>005C</td><td>815F</td><td>5C</td></tr><tr><td>007E</td><td>7E</td><td>7E</td></tr><tr><td>00A2</td><td>8191</td><td>3F</td></tr><tr><td>00A3</td><td>8192</td><td>3F</td></tr><tr><td>00AC</td><td>81CA</td><td>3F</td></tr><tr><td>2015</td><td>815C</td><td>815C</td></tr><tr><td>2016</td><td>8161</td><td>3F</td></tr><tr><td>2212</td><td>817C</td><td>3F</td></tr><tr><td>2225</td><td>3F</td><td>8161</td></tr><tr><td>301C</td><td>8160</td><td>3F</td></tr><tr><td>FF0D</td><td>3F</td><td>817C</td></tr><tr><td>FF3C</td><td>3F</td><td>815F</td></tr><tr><td>FF5E</td><td>3F</td><td>8160</td></tr><tr><td>FFE0</td><td>3F</td><td>8191</td></tr><tr><td>FFE1</td><td>3F</td><td>8192</td></tr><tr><td>FFE2</td><td>3F</td><td>81CA</td></tr></tbody></table></div><p>
            Users of any Japanese character sets should be aware that
            using
            <a href="server-administration.html#option_mysqld_character-set-client-handshake"><code class="option">--character-set-client-handshake</code></a>
            (or
            <a href="server-administration.html#option_mysqld_character-set-client-handshake"><code class="option">--skip-character-set-client-handshake</code></a>)
            has an important effect. See
            <a href="server-administration.html#server-options" title="5.1.2. Server Command Options">Section 5.1.2, “Server Command Options”</a>.
          </p></div></div></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="charset-configuration"></a>9.2. The Character Set Used for Data and Sorting</h2></div></div></div><a class="indexterm" name="id4321213"></a><a class="indexterm" name="id4321222"></a><a class="indexterm" name="id4321234"></a><p>
      The character set determines what characters are allowed in
      identifiers. The collation determines how strings are sorted by
      the <code class="literal">ORDER BY</code> and <code class="literal">GROUP BY</code>
      clauses of the <a href="sql-syntax.html#select" title="12.2.8. SELECT Syntax"><code class="literal">SELECT</code></a> statement.
    </p><p>
      By default, MySQL uses the <code class="literal">latin1</code> (cp1252 West
      European) character set and the
      <code class="literal">latin1_swedish_ci</code> collation that sorts
      according to Swedish/Finnish rules. These defaults are suitable
      for the United States and most of Western Europe.
    </p><p>
      All MySQL binary distributions are compiled with
      <a href="installing.html#option_configure_with-extra-charsets"><code class="option">--with-extra-charsets=complex</code></a>.
      This adds code to all standard programs that enables them to
      handle <code class="literal">latin1</code> and all multi-byte character sets
      within the binary. Other character sets are loaded from a
      character-set definition file when needed.
    </p><p>
      You can change the default server character set and collation with
      the <a href="server-administration.html#option_mysqld_character-set-server"><code class="option">--character-set-server</code></a> and
      <a href="server-administration.html#option_mysqld_collation-server"><code class="option">--collation-server</code></a> options when you
      start the server. The collation must be a legal collation for the
      default character set. (Use the <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW
      COLLATION</code></a> statement to determine which collations are
      available for each character set.) See
      <a href="server-administration.html#server-options" title="5.1.2. Server Command Options">Section 5.1.2, “Server Command Options”</a>.
    </p><p>
      The character sets available depend on the
      <a href="installing.html#option_configure_with-charset"><code class="option">--with-charset=<em class="replaceable"><code>charset_name</code></em></code></a>
      and
      <a href="installing.html#option_configure_with-extra-charsets"><code class="option">--with-extra-charsets=<em class="replaceable"><code>list-of-charsets</code></em>
      | complex | all | none</code></a> options to
      <span><strong class="command">configure</strong></span>, and the character set configuration
      files listed in
      <code class="filename"><em class="replaceable"><code>SHAREDIR</code></em>/charsets/Index</code>.
      See <a href="installing.html#configure-options" title="2.3.2. Typical configure Options">Section 2.3.2, “Typical <span><strong class="command">configure</strong></span> Options”</a>.
    </p><p>
      When a client connects to the server, it sends the name of the
      character set that it wants to use. The server uses the name to
      set various character set system variables for the connection. For
      more information, see <a href="internationalization-localization.html#charset-connection" title="9.1.4. Connection Character Sets and Collations">Section 9.1.4, “Connection Character Sets and Collations”</a>.
    </p><p>
      Within C programs, you should use
      <a href="connectors-apis.html#mysql-real-escape-string" title="21.9.3.53. mysql_real_escape_string()"><code class="literal">mysql_real_escape_string()</code></a> when
      escaping strings for an SQL query.
      <a href="connectors-apis.html#mysql-real-escape-string" title="21.9.3.53. mysql_real_escape_string()"><code class="literal">mysql_real_escape_string()</code></a> is
      identical to the old
      <a href="connectors-apis.html#mysql-escape-string" title="21.9.3.16. mysql_escape_string()"><code class="literal">mysql_escape_string()</code></a> function,
      except that it takes the <code class="literal">MYSQL</code> connection
      handler as the first parameter so that the appropriate character
      set can be taken into account when escaping characters.
    </p><p>
      If the client is compiled with paths that differ from where the
      server is installed and the user who configured MySQL did not
      include all character sets in the MySQL binary, you must tell the
      client where it can find the additional character sets it needs if
      the server runs with a different character set from the client.
      You can do this by specifying a
      <code class="option">--character-sets-dir</code> option to indicate the path
      to the directory in which the dynamic MySQL character sets are
      stored. For example, you can put the following in an option file:
    </p><pre class="programlisting">[client]
character-sets-dir=/usr/local/mysql/share/mysql/charsets
</pre><p>
      You can force the client to use specific character set as follows:
    </p><pre class="programlisting">[client]
default-character-set=<em class="replaceable"><code>charset_name</code></em>
</pre><p>
      This is normally unnecessary, however.
    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="error-message-language"></a>9.3. Setting the Error Message Language</h2></div></div></div><a class="indexterm" name="id4321469"></a><a class="indexterm" name="id4321481"></a><a class="indexterm" name="id4321494"></a><a class="indexterm" name="id4321506"></a><p>
      By default, <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> produces error messages in
      English, but they can also be displayed in any of several other
      languages: Czech, Danish, Dutch, Estonian, French, German, Greek,
      Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny,
      Polish, Portuguese, Romanian, Russian, Slovak, Spanish, or
      Swedish.
    </p><p>
      You can select which language the server uses for error messages
      using the instructions in this section.
    </p><p>
      To start <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> with a particular language for
      error messages, use the <a href="server-administration.html#option_mysqld_language"><code class="option">--language</code></a>
      or <code class="option">-L</code> option. The option value can be a language
      name or the full path to the error message file. For example:
    </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>mysqld --language=swedish</code></strong>
</pre><p>
      Or:
    </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>mysqld --language=/usr/local/share/swedish</code></strong>
</pre><p>
      The language name should be specified in lowercase.
    </p><p>
      By default, the language files are located in the
      <code class="filename">share/mysql/<em class="replaceable"><code>LANGUAGE</code></em></code>
      directory under the MySQL base directory.
    </p><p>
      For information about changing the character set for error
      messages (rather than the language), see
      <a href="internationalization-localization.html#charset-errors" title="9.1.6. Character Set for Error Messages">Section 9.1.6, “Character Set for Error Messages”</a>.
    </p><p>
      You can change the content of the error messages produced by the
      server using the instructions in the MySQL Internals manual,
      available at
      <a href="http://forge.mysql.com/wiki/MySQL_Internals_Error_Messages" target="_top">http://forge.mysql.com/wiki/MySQL_Internals_Error_Messages</a>.
      If you do change the content of error messages, remember to repeat
      your changes after each upgrade to a newer version of MySQL.
    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="adding-character-set"></a>9.4. Adding a New Character Set</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#character-arrays">9.4.1. The Character Definition Arrays</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#string-collating">9.4.2. String Collating Support</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#multi-byte-characters">9.4.3. Multi-Byte Character Support</a></span></dt></dl></div><a class="indexterm" name="id4321630"></a><a class="indexterm" name="id4321643"></a><p>
      This section discusses the procedure for adding a new character
      set to MySQL. You must have a MySQL source distribution to use
      these instructions. The proper procedure depends on whether the
      character set is simple or complex:
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          If the character set does not need to use special string
          collating routines for sorting and does not need multi-byte
          character support, it is simple.
        </p></li><li><p>
          If the character set needs either of those features, it is
          complex.
        </p></li></ul></div><p>
      For example, <code class="literal">greek</code> and <code class="literal">swe7</code>
      are simple character sets, whereas <code class="literal">big5</code> and
      <code class="literal">czech</code> are complex character sets.
    </p><p>
      In the following instructions, <em class="replaceable"><code>MYSET</code></em>
      represents the name of the character set that you want to add.
    </p><div class="orderedlist"><ol type="1"><li><p>
          Add a <code class="literal">&lt;charset&gt;</code> element for
          <em class="replaceable"><code>MYSET</code></em> to the
          <code class="filename">sql/share/charsets/Index.xml</code> file. Use
          the existing contents in the file as a guide to adding new
          contents.
        </p><p>
          The <code class="literal">&lt;charset&gt;</code> element must list all
          the collations for the character set. These must include at
          least a binary collation and a default collation. The default
          collation is usually named using a suffix of
          <code class="literal">general_ci</code> (general, case insensitive). It
          is possible for the binary collation to be the default
          collation, but usually they are different. The default
          collation should have a <code class="literal">primary</code> flag. The
          binary collation should have a <code class="literal">binary</code> flag.
        </p><p>
          You must assign a unique ID number to each collation, chosen
          from the range 1 to 254. To find the maximum of the currently
          used collation IDs, use this query:
        </p><pre class="programlisting">SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
</pre></li><li><p>
          This step depends on whether you are adding a simple or
          complex character set. A simple character set requires only a
          configuration file, whereas a complex character set requires C
          source file that defines collation functions, multi-byte
          functions, or both.
        </p><p>
          For a simple character set, create a configuration file,
          <code class="filename"><em class="replaceable"><code>MYSET</code></em>.xml</code>,
          that describes the character set properties. Create this file
          in the <code class="filename">sql/share/charsets</code> directory. (You
          can use a copy of <code class="filename">latin1.xml</code> as the basis
          for this file.) The syntax for the file is very simple:
        </p><div class="itemizedlist"><ul type="disc"><li><p>
              Comments are written as ordinary XML comments
              (<code class="literal">&lt;!-- <em class="replaceable"><code>text</code></em>
              --&gt;</code>).
            </p></li><li><p>
              Words within <code class="literal">&lt;map&gt;</code> array elements
              are separated by arbitrary amounts of whitespace.
            </p></li><li><p>
              Each word within <code class="literal">&lt;map&gt;</code> array
              elements must be a number in hexadecimal format.
            </p></li><li><p>
              The <code class="literal">&lt;map&gt;</code> array element for the
              <code class="literal">&lt;ctype&gt;</code> element has 257 words.
              The other <code class="literal">&lt;map&gt;</code> array elements
              after that have 256 words. See
              <a href="internationalization-localization.html#character-arrays" title="9.4.1. The Character Definition Arrays">Section 9.4.1, “The Character Definition Arrays”</a>.
            </p></li><li><p>
              For each collation listed in the
              <code class="literal">&lt;charset&gt;</code> element for the
              character set in <code class="filename">Index.xml</code>,
              <code class="filename"><em class="replaceable"><code>MYSET</code></em>.xml</code>
              must contain a <code class="literal">&lt;collation&gt;</code>
              element that defines the character ordering.
            </p></li></ul></div><p>
          For a complex character set, create a C source file that
          describes the character set properties and defines the support
          routines necessary to properly perform operations on the
          character set:
        </p><div class="orderedlist"><ol type="a"><li><p>
              Create the file
              <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>
              in the <code class="filename">strings</code> directory. Look at one
              of the existing <code class="filename">ctype-*.c</code> files (such
              as <code class="filename">ctype-big5.c</code>) to see what needs to
              be defined. The arrays in your file must have names like
              <code class="literal">ctype_<em class="replaceable"><code>MYSET</code></em></code>,
              <code class="literal">to_lower_<em class="replaceable"><code>MYSET</code></em></code>,
              and so on. These correspond to the arrays for a simple
              character set. See <a href="internationalization-localization.html#character-arrays" title="9.4.1. The Character Definition Arrays">Section 9.4.1, “The Character Definition Arrays”</a>.
            </p></li><li><p>
              For each collation listed in the
              <code class="literal">&lt;charset&gt;</code> element for the
              character set in <code class="filename">Index.xml</code>, the
              <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>
              file must provide an implementation of the collation.
            </p></li><li><p>
              If you need string collating functions, see
              <a href="internationalization-localization.html#string-collating" title="9.4.2. String Collating Support">Section 9.4.2, “String Collating Support”</a>.
            </p></li><li><p>
              If you need multi-byte character support, see
              <a href="internationalization-localization.html#multi-byte-characters" title="9.4.3. Multi-Byte Character Support">Section 9.4.3, “Multi-Byte Character Support”</a>.
            </p></li></ol></div></li><li><p>
          Follow these steps to modify the configuration information.
          Use the existing configuration information as a guide to
          adding information for <em class="replaceable"><code>MYSYS</code></em>. The
          example here assumes that the character set has default and
          binary collations, but more lines will be needed if
          <em class="replaceable"><code>MYSET</code></em> has additional collations.
        </p><div class="orderedlist"><ol type="a"><li><p>
              Edit <code class="filename">mysys/charset-def.c</code>, and
              “<span class="quote">register</span>” the collations for the new
              character set.
            </p><p>
              Add these lines to the “<span class="quote">declaration</span>” section:
            </p><pre class="programlisting">#ifdef HAVE_CHARSET_<em class="replaceable"><code>MYSET</code></em>
extern CHARSET_INFO my_charset_<em class="replaceable"><code>MYSET</code></em>_general_ci;
extern CHARSET_INFO my_charset_<em class="replaceable"><code>MYSET</code></em>_bin;
#endif
</pre><p>
              Add these lines to the “<span class="quote">registration</span>”
              section:
            </p><pre class="programlisting">#ifdef HAVE_CHARSET_<em class="replaceable"><code>MYSET</code></em>
  add_compiled_collation(&amp;my_charset_<em class="replaceable"><code>MYSET</code></em>_general_ci);
  add_compiled_collation(&amp;my_charset_<em class="replaceable"><code>MYSET</code></em>_bin);
#endif
</pre></li><li><p>
              If the character set uses
              <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>,
              edit <code class="filename">strings/Makefile.am</code> and add
              <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>
              to each definition of the <code class="literal">CSRCS</code>
              variable, and to the <code class="literal">EXTRA_DIST</code>
              variable.
            </p></li><li><p>
              If the character set uses
              <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>,
              edit <code class="filename">libmysql/Makefile.shared</code> and add
              <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.lo</code>
              to the <code class="literal">mystringsobjects</code> definition.
            </p></li><li><p>
              Edit
              <code class="filename">config/ac-macros/character_sets.m4</code>:
            </p><div class="orderedlist"><ol type="i"><li><p>
                  Add <em class="replaceable"><code>MYSET</code></em> to one of the
                  <code class="literal">define(CHARSETS_AVAILABLE...)</code> lines
                  in alphabetic order.
                </p></li><li><p>
                  Add <em class="replaceable"><code>MYSET</code></em> to
                  <code class="literal">CHARSETS_COMPLEX</code>. This is needed
                  even for simple character sets, or
                  <span><strong class="command">configure</strong></span> will not recognize
                  <a href="installing.html#option_configure_with-charset"><code class="option">--with-charset=<em class="replaceable"><code>MYSET</code></em></code></a>.
                </p></li><li><p>
                  Add <em class="replaceable"><code>MYSET</code></em> to the first
                  <code class="literal">case</code> control structure. Omit the
                  <code class="literal">USE_MB</code> and
                  <code class="literal">USE_MB_IDENT</code> lines for 8-bit
                  character sets.
                </p><pre class="programlisting"><em class="replaceable"><code>MYSET</code></em>)
  AC_DEFINE(HAVE_CHARSET_<em class="replaceable"><code>MYSET</code></em>, 1, [Define to enable charset <em class="replaceable"><code>MYSET</code></em>])
  AC_DEFINE([USE_MB], 1, [Use multi-byte character routines])
  AC_DEFINE(USE_MB_IDENT, 1)
  ;;
</pre></li><li><p>
                  Add <em class="replaceable"><code>MYSET</code></em> to the second
                  <code class="literal">case</code> control structure:
                </p><pre class="programlisting"><em class="replaceable"><code>MYSET</code></em>)
  default_charset_default_collation="<em class="replaceable"><code>MYSET</code></em>_general_ci"
  default_charset_collations="<em class="replaceable"><code>MYSET</code></em>_general_ci <em class="replaceable"><code>MYSET</code></em>_bin"
  ;;
</pre></li></ol></div></li></ol></div></li><li><p>
          Reconfigure, recompile, and test.
        </p></li></ol></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="character-arrays"></a>9.4.1. The Character Definition Arrays</h3></div></div></div><p>
        Each simple character set has a configuration file located in
        the <code class="filename">sql/share/charsets</code> directory. The file
        is named
        <code class="filename"><em class="replaceable"><code>MYSET</code></em>.xml</code>. It
        uses <code class="literal">&lt;map&gt;</code> array elements to list
        character set properties. <code class="literal">&lt;map&gt;</code>
        elements appear within these elements:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            <code class="literal">&lt;ctype&gt;</code> defines attributes for each
            character
          </p></li><li><p>
            <code class="literal">&lt;lower&gt;</code> and
            <code class="literal">&lt;upper&gt;</code> list the lowercase and
            uppercase characters
          </p></li><li><p>
            <code class="literal">&lt;unicode&gt;</code> maps 8-bit character
            values to Unicode values
          </p></li><li><p>
            <code class="literal">&lt;collation&gt;</code> elements indicate
            character ordering for comparisons and sorts, one element
            per collation (binary collations need no
            <code class="literal">&lt;map&gt;</code> element because the character
            codes themselves provide the ordering)
          </p></li></ul></div><p>
        For a complex character set as implemented in a
        <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>
        file in the <code class="filename">strings</code> directory, there are
        corresponding arrays:
        <code class="literal">ctype_<em class="replaceable"><code>MYSET</code></em>[]</code>,
        <code class="literal">to_lower_<em class="replaceable"><code>MYSET</code></em>[]</code>,
        and so forth. Not every complex character set has all of the
        arrays. See the existing <code class="filename">ctype-*.c</code> files
        for examples. See the <code class="filename">CHARSET_INFO.txt</code> file
        in the <code class="filename">strings</code> directory for additional
        information.
      </p><p>
        The <code class="literal">ctype</code> array is indexed by character value
        + 1 and has 257 elements. This is an old legacy convention for
        handling <code class="literal">EOF</code>. The other arrays are indexed by
        character value and have 256 elements.
      </p><p>
        <code class="literal">ctype</code> array elements are bit values. Each
        element describes the attributes of a single character in the
        character set. Each attribute is associated with a bitmask, as
        defined in <code class="filename">include/m_ctype.h</code>:
      </p><pre class="programlisting">#define _MY_U   01      /* Upper case */
#define _MY_L   02      /* Lower case */
#define _MY_NMR 04      /* Numeral (digit) */
#define _MY_SPC 010     /* Spacing character */
#define _MY_PNT 020     /* Punctuation */
#define _MY_CTR 040     /* Control character */
#define _MY_B   0100    /* Blank */
#define _MY_X   0200    /* heXadecimal digit */
</pre><p>
        The <code class="literal">ctype</code> value for a given character should
        be the union of the applicable bitmask values that describe the
        character. For example, <code class="literal">'A'</code> is an uppercase
        character (<code class="literal">_MY_U</code>) as well as a hexadecimal
        digit (<code class="literal">_MY_X</code>), so its
        <code class="literal">ctype</code> value should be defined like this:
      </p><pre class="programlisting">ctype['A'+1] = _MY_U | _MY_X = 01 | 0200 = 0201
</pre><p>
        The bitmask values in <code class="filename">m_ctype.h</code> are octal
        values, but the elements of the <code class="literal">ctype</code> array
        in <code class="filename"><em class="replaceable"><code>MYSET</code></em>.xml</code>
        should be written as hexadecimal values.
      </p><p>
        The <code class="literal">lower</code> and <code class="literal">upper</code> arrays
        hold the lowercase and uppercase characters corresponding to
        each member of the character set. For example:
      </p><pre class="programlisting">lower['A'] should contain 'a'
upper['a'] should contain 'A'
</pre><p>
        Each <code class="literal">collation</code> array is a map indicating how
        characters should be ordered for comparison and sorting
        purposes. MySQL sorts characters based on the values of this
        information. In some cases, this is the same as the
        <code class="literal">upper</code> array, which means that sorting is
        case-insensitive. For more complicated sorting rules (for
        complex character sets), see the discussion of string collating
        in <a href="internationalization-localization.html#string-collating" title="9.4.2. String Collating Support">Section 9.4.2, “String Collating Support”</a>.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="string-collating"></a>9.4.2. String Collating Support</h3></div></div></div><a class="indexterm" name="id4322720"></a><a class="indexterm" name="id4322732"></a><p>
        For simple character sets, sorting rules are specified in the
        <code class="filename"><em class="replaceable"><code>MYSET</code></em>.xml</code>
        configuration file using <code class="literal">&lt;map&gt;</code> array
        elements within <code class="literal">&lt;collation&gt;</code> elements.
        If the sorting rules for your language are too complex to be
        handled with simple arrays, you need to define string collating
        functions in the
        <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>
        source file in the <code class="filename">strings</code> directory.
      </p><p>
        The existing character sets provide the best documentation and
        examples to show how these functions are implemented. Look at
        the <code class="filename">ctype-*.c</code> files in the
        <code class="filename">strings</code> directory, such as the files for
        the <code class="literal">big5</code>, <code class="literal">czech</code>,
        <code class="literal">gbk</code>, <code class="literal">sjis</code>, and
        <code class="literal">tis160</code> character sets. Take a look at the
        <code class="literal">MY_COLLATION_HANDLER</code> structures to see how
        they are used, and see the <code class="filename">CHARSET_INFO.txt</code>
        file in the <code class="filename">strings</code> directory for
        additional information.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="multi-byte-characters"></a>9.4.3. Multi-Byte Character Support</h3></div></div></div><a class="indexterm" name="id4322857"></a><a class="indexterm" name="id4322870"></a><p>
        If you want to add support for a new character set that includes
        multi-byte characters, you need to use multi-byte character
        functions in the
        <code class="filename">ctype-<em class="replaceable"><code>MYSET</code></em>.c</code>
        source file in the <code class="filename">strings</code> directory.
      </p><p>
        The existing character sets provide the best documentation and
        examples to show how these functions are implemented. Look at
        the <code class="filename">ctype-*.c</code> files in the
        <code class="filename">strings</code> directory, such as the files for
        the <code class="literal">euc_kr</code>, <code class="literal">gb2312</code>,
        <code class="literal">gbk</code>, <code class="literal">sjis</code>, and
        <code class="literal">ujis</code> character sets. Take a look at the
        <code class="literal">MY_CHARSET_HANDLER</code> structures to see how they
        are used, and see the <code class="filename">CHARSET_INFO.txt</code> file
        in the <code class="filename">strings</code> directory for additional
        information.
      </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="adding-collation"></a>9.5. How to Add a New Collation to a Character Set</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#charset-collation-implementations">9.5.1. Collation Implementation Types</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-collation-choosing-id">9.5.2. Choosing a Collation ID</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-collation-simple-8bit">9.5.3. Adding a Simple Collation to an 8-Bit Character Set</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#adding-collation-unicode-uca">9.5.4. Adding a UCA Collation to a Unicode Character Set</a></span></dt></dl></div><a class="indexterm" name="id4322974"></a><p>
      A collation is a set of rules that defines how to compare and sort
      character strings. Each collation in MySQL belongs to a single
      character set. Every character set has at least one collation, and
      most have two or more collations.
    </p><p>
      A collation orders characters based on weights. Each character in
      a character set maps to a weight. Characters with equal weights
      compare as equal, and characters with unequal weights compare
      according to the relative magnitude of their weights.
    </p><p>
      MySQL supports several collation implementations, as discussed in
      <a href="internationalization-localization.html#charset-collation-implementations" title="9.5.1. Collation Implementation Types">Section 9.5.1, “Collation Implementation Types”</a>. Some of these
      can be added to MySQL without recompiling:
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          Simple collations for 8-bit character sets
        </p></li><li><p>
          UCA-based collations for Unicode character sets
        </p></li><li><p>
          Binary (<code class="literal"><em class="replaceable"><code>xxx</code></em>_bin</code>)
          collations
        </p></li></ul></div><p>
      The following discussion describes how to add collations of the
      first two types to existing character sets. All existing character
      sets already have a binary collation, so there is no need here to
      describe how to add one.
    </p><p>
      Summary of the procedure for adding a new collation:
    </p><div class="orderedlist"><ol type="1"><li><p>
          Choose a collation ID
        </p></li><li><p>
          Add configuration information that names the collation and
          describes the character-ordering rules
        </p></li><li><p>
          Restart the server
        </p></li><li><p>
          Verify that the collation is present
        </p></li></ol></div><p>
      The instructions here cover only collations that can be added
      without recompiling MySQL. To add a collation that does require
      recompiling (as implemented by means of functions in a C source
      file), use the instructions in
      <a href="internationalization-localization.html#adding-character-set" title="9.4. Adding a New Character Set">Section 9.4, “Adding a New Character Set”</a>. However, instead of adding
      all the information required for a complete character set, just
      modify the appropriate files for an existing character set. That
      is, based on what is already present for the character set's
      current collations, add new data structures, functions, and
      configuration information for the new collation. For an example,
      see the MySQL Blog article in the following list of additional
      resources.
    </p><p>
      <span class="bold"><strong>Additional resources</strong></span>
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          The Unicode Collation Algorithm (UCA) specification:
          <a href="http://www.unicode.org/reports/tr10/" target="_top">http://www.unicode.org/reports/tr10/</a>
        </p></li><li><p>
          The Locale Data Markup Language (LDML) specification:
          <a href="http://www.unicode.org/reports/tr35/" target="_top">http://www.unicode.org/reports/tr35/</a>
        </p></li><li><p>
          MySQL University session “<span class="quote">How to Add a
          Collation</span>”:
          <a href="http://forge.mysql.com/wiki/How_to_Add_a_Collation" target="_top">http://forge.mysql.com/wiki/How_to_Add_a_Collation</a>
        </p></li><li><p>
          MySQL Blog article “<span class="quote">Instructions for adding a new
          Unicode collation</span>”:
          <a href="http://blogs.mysql.com/peterg/2008/05/19/instructions-for-adding-a-new-unicode-collation/" target="_top">http://blogs.mysql.com/peterg/2008/05/19/instructions-for-adding-a-new-unicode-collation/</a>
        </p></li></ul></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="charset-collation-implementations"></a>9.5.1. Collation Implementation Types</h3></div></div></div><p>
        MySQL implements several types of collations:
      </p><p>
        <span class="bold"><strong>Simple collations for 8-bit character
        sets</strong></span>
      </p><p>
        This kind of collation is implemented using an array of 256
        weights that defines a one-to-one mapping from character codes
        to weights. <code class="literal">latin1_swedish_ci</code> is an example.
        It is a case-insensitive collation, so the uppercase and
        lowercase versions of a character have the same weights and they
        compare as equal.
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES 'latin1' COLLATE 'latin1_swedish_ci';</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT 'a' = 'A';</code></strong>
+-----------+
| 'a' = 'A' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)
</pre><p>
        <span class="bold"><strong>Complex collations for 8-bit character
        sets</strong></span>
      </p><p>
        This kind of collation is implemented using functions in a C
        source file that define how to order characters, as described in
        <a href="internationalization-localization.html#adding-character-set" title="9.4. Adding a New Character Set">Section 9.4, “Adding a New Character Set”</a>.
      </p><p>
        <span class="bold"><strong>Collations for non-Unicode multi-byte
        character sets</strong></span>
      </p><p>
        For this type of collation, 8-bit (single-byte) and multi-byte
        characters are handled differently. For 8-bit characters,
        character codes map to weights in case-insensitive fashion. (For
        example, the single-byte characters <code class="literal">'a'</code> and
        <code class="literal">'A'</code> both have a weight of
        <code class="literal">0x41</code>.) For multi-byte characters, there are
        two types of relationship between character codes and weights:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            Weights equal character codes.
            <code class="literal">sjis_japanese_ci</code> is an example of this
            kind of collation. The multi-byte character
            <code class="literal">'ぢ'</code> has a character code of
            <code class="literal">0x82C0</code>, and the weight is also
            <code class="literal">0x82C0</code>.
          </p></li><li><p>
            Character codes map one-to-one to weights, but a code is not
            necessarily equal to the weight.
            <code class="literal">gbk_chinese_ci</code> is an example of this kind
            of collation. The multi-byte character
            <code class="literal">'膰'</code> has a character code of
            <code class="literal">0x81B0</code> but a weight of
            <code class="literal">0xC286</code>.
          </p></li></ul></div><p>
        <span class="bold"><strong>Collations for Unicode multi-byte
        character sets</strong></span>
      </p><p>
        Some of these collations are based on the Unicode Collation
        Algorithm (UCA), others are not.
      </p><p>
        Non-UCA collations have a one-to-one mapping from character code
        to weight. In MySQL, such collations are case insensitive and
        accent insensitive. <code class="literal">utf8_general_ci</code> is an
        example: <code class="literal">'a'</code>, <code class="literal">'A'</code>,
        <code class="literal">'À'</code>, and <code class="literal">'á'</code> each have
        different character codes but all have a weight of
        <code class="literal">0x0041</code> and compare as equal.
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES 'utf8' COLLATE 'utf8_general_ci';</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';</code></strong>
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
|         1 |         1 |         1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)
</pre><p>
        UCA-based collations in MySQL have these properties:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            If a character has weights, each weight uses 2 bytes (16
            bits)
          </p></li><li><p>
            A character may have zero weights (or an empty weight). In
            this case, the character is ignorable. Example: "U+0000
            NULL" does not have a weight and is ignorable.
          </p></li><li><p>
            A character may have one weight. Example:
            <code class="literal">'a'</code> has a weight of
            <code class="literal">0x0E33</code>.
          </p></li><li><p>
            A character may have many weights. This is an expansion.
            Example: The German letter <code class="literal">'ß'</code> (SZ
            LEAGUE, or SHARP S) has a weight of
            <code class="literal">0x0FEA0FEA</code>.
          </p></li><li><p>
            Many characters may have one weight. This is a contraction.
            Example: <code class="literal">'ch'</code> is a single letter in Czech
            and has a weight of <code class="literal">0x0EE2</code>.
          </p></li></ul></div><p>
        A many-characters-to-many-weights mapping is also possible (this
        is contraction with expansion), but is not supported by MySQL.
      </p><p>
        <span class="bold"><strong>Miscellaneous collations</strong></span>
      </p><p>
        There are also a few collations that do not fall into any of the
        previous categories.
      </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="adding-collation-choosing-id"></a>9.5.2. Choosing a Collation ID</h3></div></div></div><p>
        Each collation must have a unique ID. To add a new collation,
        you must choose an ID value that is not currently used. The
        value must be in the range from 1 to 254. The collation ID that
        you choose will show up in these contexts:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            The <code class="literal">Id</code> column of
            <a href="sql-syntax.html#show-collation" title="12.5.5.5. SHOW COLLATION Syntax"><code class="literal">SHOW COLLATION</code></a> output
          </p></li><li><p>
            The <code class="literal">ID</code> column of the
            <a href="information-schema.html#collations-table" title="20.10. The INFORMATION_SCHEMA COLLATIONS Table"><code class="literal">INFORMATION_SCHEMA.COLLATIONS</code></a>
            table
          </p></li><li><p>
            The <code class="literal">charsetnr</code> member of the
            <code class="literal">MYSQL_FIELD</code> C API data structure
          </p></li><li><p>
            The <code class="literal">number</code> member of the
            <code class="literal">MY_CHARSET_INFO</code> data structure returned
            by the
            <a href="connectors-apis.html#mysql-get-character-set-info" title="21.9.3.26. mysql_get_character_set_info()"><code class="literal">mysql_get_character_set_info()</code></a>
            C API function
          </p></li></ul></div><p>
        To determine the largest currently used ID, issue the following
        statement:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;</code></strong>
+---------+
| MAX(ID) |
+---------+
|     210 |
+---------+
</pre><p>
        For the output just shown, you could choose an ID higher than
        210 for the new collation.
      </p><p>
        To display a list of all currently used IDs, issue this
        statement:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT ID FROM INFORMATION_SCHEMA.COLLATIONS ORDER BY ID;</code></strong>
+-----+
| ID  |
+-----+
|   1 |
|   2 |
| ... |
|  52 |
|  53 |
|  57 |
|  58 |
| ... |
|  98 |
|  99 |
| 128 |
| 129 |
| ... |
| 210 |
+-----+
</pre><p>
        In this case, you can either choose an unused ID from within the
        current range of IDs, or choose an ID that is higher than the
        current maximum ID. For example, in the output just shown, there
        are unused IDs between 53 and 57, and between 99 and 128. Or you
        could choose an ID higher than 210.
      </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
          If you upgrade MySQL, you may find that the collation ID you
          choose has been assigned to a collation included in the new
          MySQL distribution. In this case, you will need to choose a
          new value for your own collation.
        </p><p>
          In addition, before upgrading, you should save the
          configuration files that you change. If you upgrade in place,
          the process will replace the your modified files.
        </p></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="adding-collation-simple-8bit"></a>9.5.3. Adding a Simple Collation to an 8-Bit Character Set</h3></div></div></div><p>
        To add a simple collation for an 8-bit character set without
        recompiling MySQL, use the following procedure. The example adds
        a collation named <code class="literal">latin1_test_ci</code> to the
        <code class="literal">latin1</code> character set.
      </p><div class="orderedlist"><ol type="1"><li><p>
            Choose a collation ID, as shown in
            <a href="internationalization-localization.html#adding-collation-choosing-id" title="9.5.2. Choosing a Collation ID">Section 9.5.2, “Choosing a Collation ID”</a>. The
            following steps use an ID of 56.
          </p></li><li><p>
            You will need to modify the <code class="literal">Index.xml</code> and
            <code class="literal">latin1.xml</code> configuration files. These
            files will be located in the directory named by the
            <a href="server-administration.html#sysvar_character_sets_dir"><code class="literal">character_sets_dir</code></a> system
            variable. You can check the variable value as follows,
            although the path name might be different on your system:
          </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW VARIABLES LIKE 'character_sets_dir';</code></strong>
+--------------------+-----------------------------------------+
| Variable_name      | Value                                   |
+--------------------+-----------------------------------------+
| character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
+--------------------+-----------------------------------------+
</pre></li><li><p>
            Choose a name for the collation and list it in the
            <code class="filename">Index.xml</code> file. Find the
            <code class="literal">&lt;charset&gt;</code> element for the character
            set to which the collation is being added, and add a
            <code class="literal">&lt;collation&gt;</code> element that indicates
            the collation name and ID. For example:
          </p><pre class="programlisting">&lt;charset name="latin1"&gt;
  ...
  &lt;!-- associate collation name with its ID --&gt;
  &lt;collation name="latin1_test_ci" id="56"/&gt;
  ...
&lt;/charset&gt;
</pre></li><li><p>
            In the <code class="filename">latin1.xml</code> configuration file,
            add a <code class="literal">&lt;collation&gt;</code> element that
            names the collation and that contains a
            <code class="literal">&lt;map&gt;</code> element that defines a
            character code-to-weight mapping table. Each word within the
            <code class="literal">&lt;map&gt;</code> element must be a number in
            hexadecimal format.
          </p><pre class="programlisting">&lt;collation name="latin1_test_ci"&gt;
&lt;map&gt;
 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
 60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
 50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
 41 41 41 41 5B 5D 5B 43 45 45 45 45 49 49 49 49
 44 4E 4F 4F 4F 4F 5C D7 5C 55 55 55 59 59 DE DF
 41 41 41 41 5B 5D 5B 43 45 45 45 45 49 49 49 49
 44 4E 4F 4F 4F 4F 5C F7 5C 55 55 55 59 59 DE FF
&lt;/map&gt;
&lt;/collation&gt;
</pre></li><li><p>
            Restart the server and use this statement to verify that the
            collation is present:
          </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW COLLATION LIKE 'latin1_test_ci';</code></strong>
+----------------+---------+----+---------+----------+---------+
| Collation      | Charset | Id | Default | Compiled | Sortlen |
+----------------+---------+----+---------+----------+---------+
| latin1_test_ci | latin1  | 56 |         |          |       1 |
+----------------+---------+----+---------+----------+---------+
</pre></li></ol></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="adding-collation-unicode-uca"></a>9.5.4. Adding a UCA Collation to a Unicode Character Set</h3></div></div></div><p>
        UCA collations for Unicode character sets can be added to MySQL
        without recompiling by using a subset of the Locale Data Markup
        Language (LDML), which is available at
        <a href="http://www.unicode.org/reports/tr35/" target="_top">http://www.unicode.org/reports/tr35/</a>. In
        5.1, this method of adding collations is supported
        as of MySQL 5.1.20. With this method, you begin with an existing
        “<span class="quote">base</span>” collation. Then you describe the new
        collation in terms of how it differs from the base collation,
        rather than defining the entire collation. The following table
        lists the base collations for the Unicode character sets.
      </p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td><span class="bold"><strong>Character Set</strong></span></td><td><span class="bold"><strong>Base Collation</strong></span></td></tr><tr><td><code class="literal">utf8</code></td><td><code class="literal">utf8_unicode_ci</code></td></tr><tr><td><code class="literal">ucs2</code></td><td><code class="literal">ucs2_unicode_ci</code></td></tr></tbody></table></div><p>
        The following brief summary describes the LDML characteristics
        required for understanding the procedure for adding a collation
        given later in this section:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            LDML has reset rules and shift rules.
          </p></li><li><p>
            Characters named in these rules can be written in
            <code class="literal">\u<em class="replaceable"><code>nnnn</code></em></code> format,
            where <em class="replaceable"><code>nnnn</code></em> is the hexadecimal
            Unicode code point value. Basic Latin letters
            <code class="literal">A-Z</code> and <code class="literal">a-z</code> can also
            be written literally (this is a MySQL limitation; the LDML
            specification allows literal non-Latin1 characters in the
            rules). Only characters in the Basic Multilingual Plane can
            be specified. This notation does not apply to characters
            outside the BMP range of <code class="literal">0000</code> to
            <code class="literal">FFFF</code>.
          </p></li><li><p>
            A reset rule does not specify any ordering in and of itself.
            Instead, it “<span class="quote">resets</span>” the ordering for
            subsequent shift rules to cause them to be taken in relation
            to a given character. Either of the following rules resets
            subsequent shift rules to be taken in relation to the letter
            <code class="literal">'A'</code>:
          </p><pre class="programlisting">&lt;reset&gt;A&lt;/reset&gt;

&lt;reset&gt;\u0041&lt;/reset&gt;
</pre></li><li><p>
            Shift rules define primary, secondary, and tertiary
            differences of a character from another character. They are
            specified using <code class="literal">&lt;p&gt;</code>,
            <code class="literal">&lt;s&gt;</code>, and
            <code class="literal">&lt;t&gt;</code> elements. Either of the
            following rules specifies a primary shift rule for the
            <code class="literal">'G'</code> character:
          </p><pre class="programlisting">&lt;p&gt;G&lt;/p&gt;

&lt;p&gt;\u0047&lt;/p&gt;
</pre><div class="itemizedlist"><ul type="circle"><li><p>
                Use primary differences to distinguish separate letters.
              </p></li><li><p>
                Use secondary differences to distinguish accent
                variations.
              </p></li><li><p>
                Use tertiary differences to distinguish lettercase
                variations.
              </p></li></ul></div></li></ul></div><p>
        To add a UCA collation for a Unicode character set without
        recompiling MySQL, use the following procedure. The example adds
        a collation named <code class="literal">utf8_phone_ci</code> to the
        <code class="literal">utf8</code> character set. The collation is designed
        for a scenario involving a Web application for which users post
        their names and phone numbers. Phone numbers can be given in
        very different formats:
      </p><pre class="programlisting">+7-12345-67
+7-12-345-67
+7 12 345 67
+7 (12) 345 67
+71234567
</pre><p>
        The problem raised by dealing with these kinds of values is that
        the varying allowable formats make searching for a specific
        phone number very difficult. The solution is to define a new
        collation that reorders punctuation characters, making them
        ignorable.
      </p><div class="orderedlist"><ol type="1"><li><p>
            Choose a collation ID, as shown in
            <a href="internationalization-localization.html#adding-collation-choosing-id" title="9.5.2. Choosing a Collation ID">Section 9.5.2, “Choosing a Collation ID”</a>. The
            following steps use an ID of 252.
          </p></li><li><p>
            You will need to modify the <code class="literal">Index.xml</code>
            configuration file. This file will be located in the
            directory named by the
            <a href="server-administration.html#sysvar_character_sets_dir"><code class="literal">character_sets_dir</code></a> system
            variable. You can check the variable value as follows,
            although the path name might be different on your system:
          </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW VARIABLES LIKE 'character_sets_dir';</code></strong>
+--------------------+-----------------------------------------+
| Variable_name      | Value                                   |
+--------------------+-----------------------------------------+
| character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
+--------------------+-----------------------------------------+
</pre></li><li><p>
            Choose a name for the collation and list it in the
            <code class="filename">Index.xml</code> file. In addition, you'll
            need to provide the collation ordering rules. Find the
            <code class="literal">&lt;charset&gt;</code> element for the character
            set to which the collation is being added, and add a
            <code class="literal">&lt;collation&gt;</code> element that indicates
            the collation name and ID. Within the
            <code class="literal">&lt;collation&gt;</code> element, provide a
            <code class="literal">&lt;rules&gt;</code> element containing the
            ordering rules:
          </p><pre class="programlisting">&lt;charset name="utf8"&gt;
  ...
  &lt;!-- associate collation name with its ID --&gt;
  &lt;collation name="utf8_phone_ci" id="252"&gt;
    &lt;rules&gt;
      &lt;reset&gt;\u0000&lt;/reset&gt;
        &lt;s&gt;\u0020&lt;/s&gt; &lt;!-- space --&gt;
        &lt;s&gt;\u0028&lt;/s&gt; &lt;!-- left parenthesis --&gt;
        &lt;s&gt;\u0029&lt;/s&gt; &lt;!-- right parenthesis --&gt;
        &lt;s&gt;\u002B&lt;/s&gt; &lt;!-- plus --&gt;
        &lt;s&gt;\u002D&lt;/s&gt; &lt;!-- hyphen --&gt;
    &lt;/rules&gt;
  &lt;/collation&gt;
  ...
&lt;/charset&gt;
</pre></li><li><p>
            If you want a similar collation for other Unicode character
            sets, add other <code class="literal">&lt;collation&gt;</code>
            elements. For example, to define
            <code class="literal">ucs2_phone_ci</code>, add a
            <code class="literal">&lt;collation&gt;</code> element to the
            <code class="literal">&lt;charset name="ucs2"&gt;</code> element.
            Remember that each collation must have its own unique ID.
          </p></li><li><p>
            Restart the server and use this statement to verify that the
            collation is present:
          </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SHOW COLLATION LIKE 'utf8_phone_ci';</code></strong>
+---------------+---------+-----+---------+----------+---------+
| Collation     | Charset | Id  | Default | Compiled | Sortlen |
+---------------+---------+-----+---------+----------+---------+
| utf8_phone_ci | utf8    | 252 |         |          |       8 |
+---------------+---------+-----+---------+----------+---------+
</pre></li></ol></div><p>
        Now we can test the collation to make sure that it has the
        desired properties.
      </p><p>
        Create a table containing some sample phone numbers using the
        new collation:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>CREATE TABLE phonebook (</code></strong>
    -&gt; <strong class="userinput"><code>  name VARCHAR(64),</code></strong>
    -&gt; <strong class="userinput"><code>  phone VARCHAR(64) CHARACTER SET utf8 COLLATE utf8_phone_ci</code></strong>
    -&gt; <strong class="userinput"><code>);</code></strong>
Query OK, 0 rows affected (0.09 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO phonebook VALUES ('Svoj','+7 912 800 80 02');</code></strong>
Query OK, 1 row affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO phonebook VALUES ('Hf','+7 (912) 800 80 04');</code></strong>
Query OK, 1 row affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO phonebook VALUES ('Bar','+7-912-800-80-01');</code></strong>
Query OK, 1 row affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO phonebook VALUES ('Ramil','(7912) 800 80 03');</code></strong>
Query OK, 1 row affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO phonebook VALUES ('Sanja','+380 (912) 8008005');</code></strong>
Query OK, 1 row affected (0.00 sec)
</pre><p>
        Run some queries to see whether the ignored punctuation
        characters are in fact ignored for sorting and comparisons:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT * FROM phonebook ORDER BY phone;</code></strong>
+-------+--------------------+
| name  | phone              |
+-------+--------------------+
| Sanja | +380 (912) 8008005 |
| Bar   | +7-912-800-80-01   |
| Svoj  | +7 912 800 80 02   |
| Ramil | (7912) 800 80 03   |
| Hf    | +7 (912) 800 80 04 |
+-------+--------------------+
5 rows in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT * FROM phonebook WHERE phone='+7(912)800-80-01';</code></strong>
+------+------------------+
| name | phone            |
+------+------------------+
| Bar  | +7-912-800-80-01 |
+------+------------------+
1 row in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT * FROM phonebook WHERE phone='79128008001';</code></strong>
+------+------------------+
| name | phone            |
+------+------------------+
| Bar  | +7-912-800-80-01 |
+------+------------------+
1 row in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT * FROM phonebook WHERE phone='7 9 1 2 8 0 0 8 0 0 1';</code></strong>
+------+------------------+
| name | phone            |
+------+------------------+
| Bar  | +7-912-800-80-01 |
+------+------------------+
1 row in set (0.00 sec)
</pre></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="problems-with-character-sets"></a>9.6. Problems With Character Sets</h2></div></div></div><p>
      If you try to use a character set that is not compiled into your
      binary, you might run into the following problems:
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          Your program uses an incorrect path to determine where the
          character sets are stored (which is typically the
          <code class="filename">share/mysql/charsets</code> or
          <code class="filename">share/charsets</code> directory under the MySQL
          installation directory). This can be fixed by using the
          <code class="option">--character-sets-dir</code> option when you run the
          program in question. For example, to specify a directory to be
          used by MySQL client programs, list it in the
          <code class="literal">[client]</code> group of your option file. The
          examples given here show what the setting might look like for
          Unix or Windows, respectively:
        </p><pre class="programlisting">[client]
character-sets-dir=/usr/local/mysql/share/mysql/charsets

[client]
character-sets-dir="C:/Program Files/MySQL/MySQL Server 5.1/share/charsets"
</pre></li><li><p>
          The character set is a complex character set that cannot be
          loaded dynamically. In this case, you must recompile the
          program with support for the character set.
        </p><p>
          For Unicode character sets, you can define collations without
          recompiling by using LDML notation. See
          <a href="internationalization-localization.html#adding-collation-unicode-uca" title="9.5.4. Adding a UCA Collation to a Unicode Character Set">Section 9.5.4, “Adding a UCA Collation to a Unicode Character Set”</a>.
        </p></li><li><p>
          The character set is a dynamic character set, but you do not
          have a configuration file for it. In this case, you should
          install the configuration file for the character set from a
          new MySQL distribution.
        </p></li><li><p>
          If your character set index file does not contain the name for
          the character set, your program displays an error message. The
          file is named <code class="filename">Index.xml</code> and the message
          is:
        </p><pre class="programlisting">Character set '<em class="replaceable"><code>charset_name</code></em>' is not a compiled character set and is not
specified in the '/usr/share/mysql/charsets/Index.xml' file
</pre><p>
          To solve this problem, you should either get a new index file
          or manually add the name of any missing character sets to the
          current file.
        </p></li></ul></div><p>
      For <code class="literal">MyISAM</code> tables, you can check the character
      set name and number for a table with <a href="programs.html#myisamchk" title="4.6.3. myisamchk — MyISAM Table-Maintenance Utility"><span><strong class="command">myisamchk -dvv
      <em class="replaceable"><code>tbl_name</code></em></strong></span></a>.
    </p></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="time-zone-support"></a>9.7. MySQL Server Time Zone Support</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="internationalization-localization.html#time-zone-upgrades">9.7.1. Staying Current with Time Zone Changes</a></span></dt><dt><span class="section"><a href="internationalization-localization.html#time-zone-leap-seconds">9.7.2. Time Zone Leap Second Support</a></span></dt></dl></div><a class="indexterm" name="id4324579"></a><p>
      The MySQL server maintains several time zone settings:
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          The system time zone. When the server starts, it attempts to
          determine the time zone of the host machine and uses it to set
          the <a href="server-administration.html#sysvar_system_time_zone"><code class="literal">system_time_zone</code></a> system
          variable. The value does not change thereafter.
        </p><p>
          You can set the system time zone for MySQL Server at startup
          with the
          <a href="programs.html#option_mysqld_safe_timezone"><code class="option">--timezone=<em class="replaceable"><code>timezone_name</code></em></code></a>
          option to <a href="programs.html#mysqld-safe" title="4.3.2. mysqld_safe — MySQL Server Startup Script"><span><strong class="command">mysqld_safe</strong></span></a>. You can also set it
          by setting the <code class="literal">TZ</code> environment variable
          before you start <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a>. The allowable
          values for <a href="programs.html#option_mysqld_safe_timezone"><code class="option">--timezone</code></a> or
          <code class="literal">TZ</code> are system-dependent. Consult your
          operating system documentation to see what values are
          acceptable.
        </p></li><li><p>
          The server's current time zone. The global
          <a href="server-administration.html#sysvar_time_zone"><code class="literal">time_zone</code></a> system variable
          indicates the time zone the server currently is operating in.
          The initial value for
          <a href="server-administration.html#sysvar_time_zone"><code class="literal">time_zone</code></a> is
          <code class="literal">'SYSTEM'</code>, which indicates that the server
          time zone is the same as the system time zone.
        </p><p>
          The initial global server time zone value can be specified
          explicitly at startup with the
          <a href="server-administration.html#option_mysqld_default-time-zone"><code class="option">--default-time-zone=<em class="replaceable"><code>timezone</code></em></code></a>
          option on the command line, or you can use the following line
          in an option file:
        </p><pre class="programlisting">default-time-zone='<em class="replaceable"><code>timezone</code></em>'
</pre><p>
          If you have the <a href="server-administration.html#priv_super"><code class="literal">SUPER</code></a>
          privilege, you can set the global server time zone value at
          runtime with this statement:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET GLOBAL time_zone = <em class="replaceable"><code>timezone</code></em>;</code></strong>
</pre></li><li><p>
          Per-connection time zones. Each client that connects has its
          own time zone setting, given by the session
          <a href="server-administration.html#sysvar_time_zone"><code class="literal">time_zone</code></a> variable.
          Initially, the session variable takes its value from the
          global <a href="server-administration.html#sysvar_time_zone"><code class="literal">time_zone</code></a> variable,
          but the client can change its own time zone with this
          statement:
        </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET time_zone = <em class="replaceable"><code>timezone</code></em>;</code></strong>
</pre></li></ul></div><p>
      The current session time zone setting affects display and storage
      of time values that are zone-sensitive. This includes the values
      displayed by functions such as
      <a href="functions.html#function_now"><code class="literal">NOW()</code></a> or
      <a href="functions.html#function_curtime"><code class="literal">CURTIME()</code></a>, and values stored in and
      retrieved from <a href="data-types.html#datetime" title="10.3.1. The DATETIME,
        DATE, and
        TIMESTAMP Types"><code class="literal">TIMESTAMP</code></a> columns.
      Values for <a href="data-types.html#datetime" title="10.3.1. The DATETIME,
        DATE, and
        TIMESTAMP Types"><code class="literal">TIMESTAMP</code></a> columns are
      converted from the current time zone to UTC for storage, and from
      UTC to the current time zone for retrieval.
    </p><p>
      The current time zone setting does not affect values displayed by
      functions such as <a href="functions.html#function_utc-timestamp"><code class="literal">UTC_TIMESTAMP()</code></a>
      or values in <a href="data-types.html#datetime" title="10.3.1. The DATETIME,
        DATE, and
        TIMESTAMP Types"><code class="literal">DATE</code></a>,
      <a href="data-types.html#time" title="10.3.2. The TIME Type"><code class="literal">TIME</code></a>, or
      <a href="data-types.html#datetime" title="10.3.1. The DATETIME,
        DATE, and
        TIMESTAMP Types"><code class="literal">DATETIME</code></a> columns. Nor are values in
      those data types stored in UTC; the time zone applies for them
      only when converting from <code class="literal">TIMESTAMP</code> values. If
      you want locale-specific arithmetic for
      <a href="data-types.html#datetime" title="10.3.1. The DATETIME,
        DATE, and
        TIMESTAMP Types"><code class="literal">DATE</code></a>,
      <a href="data-types.html#time" title="10.3.2. The TIME Type"><code class="literal">TIME</code></a>, or
      <a href="data-types.html#datetime" title="10.3.1. The DATETIME,
        DATE, and
        TIMESTAMP Types"><code class="literal">DATETIME</code></a> values, convert them to
      UTC, perform the arithmetic, and then convert back.
    </p><p>
      The current values of the global and client-specific time zones
      can be retrieved like this:
    </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT @@global.time_zone, @@session.time_zone;</code></strong>
</pre><p>
      <em class="replaceable"><code>timezone</code></em> values can be given in several
      formats, none of which are case sensitive:
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          The value <code class="literal">'SYSTEM'</code> indicates that the time
          zone should be the same as the system time zone.
        </p></li><li><p>
          The value can be given as a string indicating an offset from
          UTC, such as <code class="literal">'+10:00'</code> or
          <code class="literal">'-6:00'</code>.
        </p></li><li><p>
          The value can be given as a named time zone, such as
          <code class="literal">'Europe/Helsinki'</code>,
          <code class="literal">'US/Eastern'</code>, or <code class="literal">'MET'</code>.
          Named time zones can be used only if the time zone information
          tables in the <code class="literal">mysql</code> database have been
          created and populated.
        </p></li></ul></div><p>
      The MySQL installation procedure creates the time zone tables in
      the <code class="literal">mysql</code> database, but does not load them. You
      must do so manually using the following instructions. (If you are
      upgrading to MySQL 4.1.3 or later from an earlier version, you can
      create the tables by upgrading your <code class="literal">mysql</code>
      database. Use the instructions in <a href="programs.html#mysql-upgrade" title="4.4.8. mysql_upgrade — Check Tables for MySQL Upgrade">Section 4.4.8, “<span><strong class="command">mysql_upgrade</strong></span> — Check Tables for MySQL Upgrade”</a>.
      After creating the tables, you can load them.)
    </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>
        Loading the time zone information is not necessarily a one-time
        operation because the information changes occasionally. For
        example, the rules for Daylight Saving Time in the United
        States, Mexico, and parts of Canada changed in 2007. When such
        changes occur, applications that use the old rules become out of
        date and you may find it necessary to reload the time zone
        tables to keep the information used by your MySQL server
        current. See the notes at the end of this section.
      </p></div><p>
      If your system has its own <em class="firstterm">zoneinfo</em>
      database (the set of files describing time zones), you should use
      the <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a> program for filling the
      time zone tables. Examples of such systems are Linux, FreeBSD, Sun
      Solaris, and Mac OS X. One likely location for these files is the
      <code class="filename">/usr/share/zoneinfo</code> directory. If your system
      does not have a zoneinfo database, you can use the downloadable
      package described later in this section.
    </p><p>
      The <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a> program is used to load
      the time zone tables. On the command line, pass the zoneinfo
      directory path name to <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a> and
      send the output into the <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> program. For
      example:
    </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql</code></strong>
</pre><p>
      <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a> reads your system's time
      zone files and generates SQL statements from them.
      <a href="programs.html#mysql" title="4.5.1. mysql — The MySQL Command-Line Tool"><span><strong class="command">mysql</strong></span></a> processes those statements to load the
      time zone tables.
    </p><p>
      <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a> also can be used to load a
      single time zone file or to generate leap second information:
    </p><div class="itemizedlist"><ul type="disc"><li><p>
          To load a single time zone file
          <em class="replaceable"><code>tz_file</code></em> that corresponds to a time
          zone name <em class="replaceable"><code>tz_name</code></em>, invoke
          <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a> like this:
        </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>mysql_tzinfo_to_sql <em class="replaceable"><code>tz_file</code></em> <em class="replaceable"><code>tz_name</code></em> | mysql -u root mysql</code></strong>
</pre><p>
          With this approach, you must execute a separate command to
          load the time zone file for each named zone that the server
          needs to know about.
        </p></li><li><p>
          If your time zone needs to account for leap seconds,
          initialize the leap second information like this, where
          <em class="replaceable"><code>tz_file</code></em> is the name of your time
          zone file:
        </p><pre class="programlisting">shell&gt; <strong class="userinput"><code>mysql_tzinfo_to_sql --leap <em class="replaceable"><code>tz_file</code></em> | mysql -u root mysql</code></strong>
</pre></li><li><p>
          After running <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a>, it is
          best to restart the server so that it does not continue to use
          any previously cached time zone data.
        </p></li></ul></div><p>
      If your system is one that has no zoneinfo database (for example,
      Windows or HP-UX), you can use the package of pre-built time zone
      tables that is available for download at the MySQL Developer Zone:
    </p><pre class="programlisting"><a href="http://dev.mysql.com/downloads/timezones.html" target="_top">http://dev.mysql.com/downloads/timezones.html</a>
</pre><p>
      This time zone package contains <code class="filename">.frm</code>,
      <code class="filename">.MYD</code>, and <code class="filename">.MYI</code> files for
      the <code class="literal">MyISAM</code> time zone tables. These tables
      should be part of the <code class="literal">mysql</code> database, so you
      should place the files in the <code class="filename">mysql</code>
      subdirectory of your MySQL server's data directory. The server
      should be stopped while you do this and restarted afterward.
    </p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Warning</h3><p>
        Do not use the downloadable package if your system has a
        zoneinfo database. Use the
        <a href="programs.html#mysql-tzinfo-to-sql" title="4.4.7. mysql_tzinfo_to_sql — Load the Time Zone Tables"><span><strong class="command">mysql_tzinfo_to_sql</strong></span></a> utility instead.
        Otherwise, you may cause a difference in datetime handling
        between MySQL and other applications on your system.
      </p></div><p>
      For information about time zone settings in replication setup,
      please see <a href="replication.html#replication-features" title="16.3.1. Replication Features and Issues">Section 16.3.1, “Replication Features and Issues”</a>.
    </p><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="time-zone-upgrades"></a>9.7.1. Staying Current with Time Zone Changes</h3></div></div></div><a class="indexterm" name="id4325270"></a><p>
        As mentioned earlier, when the time zone rules change,
        applications that use the old rules become out of date. To stay
        current, it is necessary to make sure that your system uses
        current time zone information is used. For MySQL, there are two
        factors to consider in staying current:
      </p><div class="itemizedlist"><ul type="disc"><li><p>
            The operating system time affects the value that the MySQL
            server uses for times if its time zone is set to
            <code class="literal">SYSTEM</code>. Make sure that your operating
            system is using the latest time zone information. For most
            operating systems, the latest update or service pack
            prepares your system for the time changes. Check the Web
            site for your operating system vendor for an update that
            addresses the time changes.
          </p></li><li><p>
            If you replace the system's
            <code class="filename">/etc/localtime</code> timezone file with a
            version that uses rules differing from those in effect at
            <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> startup, you should restart
            <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> so that it uses the updated rules.
            Otherwise, <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> might not notice when
            the system changes its time.
          </p></li><li><p>
            If you use named time zones with MySQL, make sure that the
            time zone tables in the <code class="literal">mysql</code> database
            are up to date. If your system has its own zoneinfo
            database, you should reload the MySQL time zone tables
            whenever the zoneinfo database is updated, using the
            instructions given earlier in this section. For systems that
            do not have their own zoneinfo database, check the MySQL
            Developer Zone for updates. When a new update is available,
            download it and use it to replace your current time zone
            tables. <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> caches time zone
            information that it looks up, so after replacing the time
            zone tables, you should restart <a href="programs.html#mysqld" title="4.3.1. mysqld — The MySQL Server"><span><strong class="command">mysqld</strong></span></a> to
            make sure that it does not continue to serve outdated time
            zone data.
          </p></li></ul></div><p>
        If you are uncertain whether named time zones are available, for
        use either as the server's time zone setting or by clients that
        set their own time zone, check whether your time zone tables are
        empty. The following query determines whether the table that
        contains time zone names has any rows:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT COUNT(*) FROM mysql.time_zone_name;</code></strong>
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
</pre><p>
        A count of zero indicates that the table is empty. In this case,
        no one can be using named time zones, and you don't need to
        update the tables. A count greater than zero indicates that the
        table is not empty and that its contents are available to be
        used for named time zone support. In this case, you should be
        sure to reload your time zone tables so that anyone who uses
        named time zones will get correct query results.
      </p><p>
        To check whether your MySQL installation is updated properly for
        a change in Daylight Saving Time rules, use a test like the one
        following. The example uses values that are appropriate for the
        2007 DST 1-hour change that occurs in the United States on March
        11 at 2 a.m.
      </p><p>
        The test uses these two queries:
      </p><pre class="programlisting">SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');
SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');
</pre><p>
        The two time values indicate the times at which the DST change
        occurs, and the use of named time zones requires that the time
        zone tables be used. The desired result is that both queries
        return the same result (the input time, converted to the
        equivalent value in the 'US/Central' time zone).
      </p><p>
        Before updating the time zone tables, you would see an incorrect
        result like this:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');</code></strong>
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 01:00:00                                        |
+------------------------------------------------------------+

mysql&gt; <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');</code></strong>
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 02:00:00                                        |
+------------------------------------------------------------+
</pre><p>
        After updating the tables, you should see the correct result:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central');</code></strong>
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 01:00:00                                        |
+------------------------------------------------------------+

mysql&gt; <strong class="userinput"><code>SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');</code></strong>
+------------------------------------------------------------+
| CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') |
+------------------------------------------------------------+
| 2007-03-11 01:00:00                                        |
+------------------------------------------------------------+
</pre></div><div class="section" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="time-zone-leap-seconds"></a>9.7.2. Time Zone Leap Second Support</h3></div></div></div><a class="indexterm" name="id4325496"></a><p>
        Before MySQL 5.1.31, if the operating system is configured to
        return leap seconds from OS time calls or if the MySQL server
        uses a time zone definition that has leap seconds, functions
        such as <a href="functions.html#function_now"><code class="literal">NOW()</code></a> could return a
        value having a time part that ends with
        <code class="literal">:59:60</code> or <code class="literal">:59:61</code>. If such
        values are inserted into a table, they would be dumped as is by
        <a href="programs.html#mysqldump" title="4.5.4. mysqldump — A Database Backup Program"><span><strong class="command">mysqldump</strong></span></a> but considered invalid when
        reloaded, leading to backup/restore problems.
      </p><p>
        As of MySQL 5.1.31, leap second values are returned with a time
        part that ends with <code class="literal">:59:59</code>. This means that a
        function such as <a href="functions.html#function_now"><code class="literal">NOW()</code></a> can return
        the same value for two or three consecutive seconds during the
        leap second. It remains true that literal temporal values having
        a time part that ends with <code class="literal">:59:60</code> or
        <code class="literal">:59:61</code> are considered invalid.
      </p><p>
        If it is necessary to search for
        <a href="data-types.html#datetime" title="10.3.1. The DATETIME,
        DATE, and
        TIMESTAMP Types"><code class="literal">TIMESTAMP</code></a> values one second
        before the leap second, anomalous results may be obtained if you
        use a comparison with <code class="literal">'YYYY-MM-DD hh:mm:ss'</code>
        values:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>CREATE TABLE t1 (a INT, ts TIMESTAMP DEFAULT NOW(), PRIMARY KEY (ts));</code></strong>
Query OK, 0 rows affected (0.11 sec)

mysql&gt; <strong class="userinput"><code># Simulate NOW() = '2009-01-01 02:59:59'</code></strong>
mysql&gt; <strong class="userinput"><code>SET timestamp = 1230768022;</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO t1 (a) VALUES (1);</code></strong>
Query OK, 1 row affected (0.07 sec)

mysql&gt; <strong class="userinput"><code># Simulate NOW() = '2009-01-01 02:59:60'</code></strong>
mysql&gt; <strong class="userinput"><code>SET timestamp = 1230768023;</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>INSERT INTO t1 (a) VALUES (2);</code></strong>
Query OK, 1 row affected (0.02 sec)

mysql&gt; <strong class="userinput"><code>SELECT * FROM t1;</code></strong>
+------+---------------------+
| a    | ts                  |
+------+---------------------+
|    1 | 2008-12-31 18:00:22 |
|    2 | 2008-12-31 18:00:23 |
+------+---------------------+
2 rows in set (0.02 sec)

mysql&gt; <strong class="userinput"><code>SELECT * FROM t1 WHERE ts = '2009-01-01 02:59:59';</code></strong>
Empty set (0.03 sec)
</pre><p>
        To work around this, you can use a comparison based on the UTC
        value actually stored in column, which has the leap second
        correction applied:
      </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SELECT * FROM t1 WHERE UNIX_TIMESTAMP(ts) = 1230768023;</code></strong>
+------+---------------------+
| a    | ts                  |
+------+---------------------+
|    2 | 2008-12-31 18:00:23 |
+------+---------------------+
1 row in set (0.02 sec)
</pre></div></div><div class="section" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="locale-support"></a>9.8. MySQL Server Locale Support</h2></div></div></div><p>
      Beginning with MySQL 5.1.12, the locale indicated by the
      <a href="server-administration.html#sysvar_lc_time_names"><code class="literal">lc_time_names</code></a> system variable
      controls the language used to display day and month names and
      abbreviations. This variable affects the output from the
      <a href="functions.html#function_date-format"><code class="literal">DATE_FORMAT()</code></a>,
      <a href="functions.html#function_dayname"><code class="literal">DAYNAME()</code></a> and
      <a href="functions.html#function_monthname"><code class="literal">MONTHNAME()</code></a> functions.
    </p><p>
      Locale names have language and region subtags listed by IANA
      (<a href="http://www.iana.org/assignments/language-subtag-registry" target="_top">http://www.iana.org/assignments/language-subtag-registry</a>)
      such as <code class="literal">'ja_JP'</code> or <code class="literal">'pt_BR'</code>.
      The default value is <code class="literal">'en_US'</code> regardless of your
      system's locale setting, but you can set the value at server
      startup or set the <code class="literal">GLOBAL</code> value if you have the
      <a href="server-administration.html#priv_super"><code class="literal">SUPER</code></a> privilege. Any client can
      examine the value of
      <a href="server-administration.html#sysvar_lc_time_names"><code class="literal">lc_time_names</code></a> or set its
      <code class="literal">SESSION</code> value to affect the locale for its own
      connection.
    </p><pre class="programlisting">mysql&gt; <strong class="userinput"><code>SET NAMES 'utf8';</code></strong>
Query OK, 0 rows affected (0.09 sec)

mysql&gt; <strong class="userinput"><code>SELECT @@lc_time_names;</code></strong>
+-----------------+
| @@lc_time_names |
+-----------------+
| en_US           |
+-----------------+
1 row in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01');</code></strong>
+-----------------------+-------------------------+
| DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') |
+-----------------------+-------------------------+
| Friday                | January                 |
+-----------------------+-------------------------+
1 row in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT DATE_FORMAT('2010-01-01','%W %a %M %b');</code></strong>
+-----------------------------------------+
| DATE_FORMAT('2010-01-01','%W %a %M %b') |
+-----------------------------------------+
| Friday Fri January Jan                  |
+-----------------------------------------+
1 row in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SET lc_time_names = 'es_MX';</code></strong>
Query OK, 0 rows affected (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT @@lc_time_names;</code></strong>
+-----------------+
| @@lc_time_names |
+-----------------+
| es_MX           |
+-----------------+
1 row in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01');</code></strong>
+-----------------------+-------------------------+
| DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') |
+-----------------------+-------------------------+
| viernes               | enero                   |
+-----------------------+-------------------------+
1 row in set (0.00 sec)

mysql&gt; <strong class="userinput"><code>SELECT DATE_FORMAT('2010-01-01','%W %a %M %b');</code></strong>
+-----------------------------------------+
| DATE_FORMAT('2010-01-01','%W %a %M %b') |
+-----------------------------------------+
| viernes vie enero ene                   |
+-----------------------------------------+
1 row in set (0.00 sec)
</pre><p>
      The day or month name for each of the affected functions is
      converted from <code class="literal">utf8</code> to the character set
      indicated by the
      <a href="server-administration.html#sysvar_character_set_connection"><code class="literal">character_set_connection</code></a> system
      variable.
    </p><p>
      <a href="server-administration.html#sysvar_lc_time_names"><code class="literal">lc_time_names</code></a> may be set to any
      of the following locale values.
    </p><div class="informaltable"><table border="1"><colgroup><col><col></colgroup><tbody><tr><td><code class="literal">ar_AE</code>: Arabic - United Arab Emirates</td><td><code class="literal">ar_BH</code>: Arabic - Bahrain</td></tr><tr><td><code class="literal">ar_DZ</code>: Arabic - Algeria</td><td><code class="literal">ar_EG</code>: Arabic - Egypt</td></tr><tr><td><code class="literal">ar_IN</code>: Arabic - India</td><td><code class="literal">ar_IQ</code>: Arabic - Iraq</td></tr><tr><td><code class="literal">ar_JO</code>: Arabic - Jordan</td><td><code class="literal">ar_KW</code>: Arabic - Kuwait</td></tr><tr><td><code class="literal">ar_LB</code>: Arabic - Lebanon</td><td><code class="literal">ar_LY</code>: Arabic - Libya</td></tr><tr><td><code class="literal">ar_MA</code>: Arabic - Morocco</td><td><code class="literal">ar_OM</code>: Arabic - Oman</td></tr><tr><td><code class="literal">ar_QA</code>: Arabic - Qatar</td><td><code class="literal">ar_SA</code>: Arabic - Saudi Arabia</td></tr><tr><td><code class="literal">ar_SD</code>: Arabic - Sudan</td><td><code class="literal">ar_SY</code>: Arabic - Syria</td></tr><tr><td><code class="literal">ar_TN</code>: Arabic - Tunisia</td><td><code class="literal">ar_YE</code>: Arabic - Yemen</td></tr><tr><td><code class="literal">be_BY</code>: Belarusian - Belarus</td><td><code class="literal">bg_BG</code>: Bulgarian - Bulgaria</td></tr><tr><td><code class="literal">ca_ES</code>: Catalan - Spain</td><td><code class="literal">cs_CZ</code>: Czech - Czech Republic</td></tr><tr><td><code class="literal">da_DK</code>: Danish - Denmark</td><td><code class="literal">de_AT</code>: German - Austria</td></tr><tr><td><code class="literal">de_BE</code>: German - Belgium</td><td><code class="literal">de_CH</code>: German - Switzerland</td></tr><tr><td><code class="literal">de_DE</code>: German - Germany</td><td><code class="literal">de_LU</code>: German - Luxembourg</td></tr><tr><td><code class="literal">EE</code>: Estonian - Estonia</td><td><code class="literal">en_AU</code>: English - Australia</td></tr><tr><td><code class="literal">en_CA</code>: English - Canada</td><td><code class="literal">en_GB</code>: English - United Kingdom</td></tr><tr><td><code class="literal">en_IN</code>: English - India</td><td><code class="literal">en_NZ</code>: English - New Zealand</td></tr><tr><td><code class="literal">en_PH</code>: English - Philippines</td><td><code class="literal">en_US</code>: English - United States</td></tr><tr><td><code class="literal">en_ZA</code>: English - South Africa</td><td><code class="literal">en_ZW</code>: English - Zimbabwe</td></tr><tr><td><code class="literal">es_AR</code>: Spanish - Argentina</td><td><code class="literal">es_BO</code>: Spanish - Bolivia</td></tr><tr><td><code class="literal">es_CL</code>: Spanish - Chile</td><td><code class="literal">es_CO</code>: Spanish - Columbia</td></tr><tr><td><code class="literal">es_CR</code>: Spanish - Costa Rica</td><td><code class="literal">es_DO</code>: Spanish - Dominican Republic</td></tr><tr><td><code class="literal">es_EC</code>: Spanish - Ecuador</td><td><code class="literal">es_ES</code>: Spanish - Spain</td></tr><tr><td><code class="literal">es_GT</code>: Spanish - Guatemala</td><td><code class="literal">es_HN</code>: Spanish - Honduras</td></tr><tr><td><code class="literal">es_MX</code>: Spanish - Mexico</td><td><code class="literal">es_NI</code>: Spanish - Nicaragua</td></tr><tr><td><code class="literal">es_PA</code>: Spanish - Panama</td><td><code class="literal">es_PE</code>: Spanish - Peru</td></tr><tr><td><code class="literal">es_PR</code>: Spanish - Puerto Rico</td><td><code class="literal">es_PY</code>: Spanish - Paraguay</td></tr><tr><td><code class="literal">es_SV</code>: Spanish - El Salvador</td><td><code class="literal">es_US</code>: Spanish - United States</td></tr><tr><td><code class="literal">es_UY</code>: Spanish - Uruguay</td><td><code class="literal">es_VE</code>: Spanish - Venezuela</td></tr><tr><td><code class="literal">eu_ES</code>: Basque - Basque</td><td><code class="literal">fi_FI</code>: Finnish - Finland</td></tr><tr><td><code class="literal">fo_FO</code>: Faroese - Faroe Islands</td><td><code class="literal">fr_BE</code>: French - Belgium</td></tr><tr><td><code class="literal">fr_CA</code>: French - Canada</td><td><code class="literal">fr_CH</code>: French - Switzerland</td></tr><tr><td><code class="literal">fr_FR</code>: French - France</td><td><code class="literal">fr_LU</code>: French - Luxembourg</td></tr><tr><td><code class="literal">gl_ES</code>: Galician - Spain</td><td><code class="literal">gu_IN</code>: Gujarati - India</td></tr><tr><td><code class="literal">he_IL</code>: Hebrew - Israel</td><td><code class="literal">hi_IN</code>: Hindi - India</td></tr><tr><td><code class="literal">hr_HR</code>: Croatian - Croatia</td><td><code class="literal">hu_HU</code>: Hungarian - Hungary</td></tr><tr><td><code class="literal">id_ID</code>: Indonesian - Indonesia</td><td><code class="literal">is_IS</code>: Icelandic - Iceland</td></tr><tr><td><code class="literal">it_CH</code>: Italian - Switzerland</td><td><code class="literal">it_IT</code>: Italian - Italy</td></tr><tr><td><code class="literal">ja_JP</code>: Japanese - Japan</td><td><code class="literal">ko_KR</code>: Korean - Republic of Korea</td></tr><tr><td><code class="literal">lt_LT</code>: Lithuanian - Lithuania</td><td><code class="literal">lv_LV</code>: Latvian - Latvia</td></tr><tr><td><code class="literal">mk_MK</code>: Macedonian - FYROM</td><td><code class="literal">mn_MN</code>: Mongolia - Mongolian</td></tr><tr><td><code class="literal">ms_MY</code>: Malay - Malaysia</td><td><code class="literal">nb_NO</code>: Norwegian(Bokmål) - Norway</td></tr><tr><td><code class="literal">nl_BE</code>: Dutch - Belgium</td><td><code class="literal">nl_NL</code>: Dutch - The Netherlands</td></tr><tr><td><code class="literal">no_NO</code>: Norwegian - Norway</td><td><code class="literal">pl_PL</code>: Polish - Poland</td></tr><tr><td><code class="literal">pt_BR</code>: Portugese - Brazil</td><td><code class="literal">pt_PT</code>: Portugese - Portugal</td></tr><tr><td><code class="literal">ro_RO</code>: Romanian - Romania</td><td><code class="literal">ru_RU</code>: Russian - Russia</td></tr><tr><td><code class="literal">ru_UA</code>: Russian - Ukraine</td><td><code class="literal">sk_SK</code>: Slovak - Slovakia</td></tr><tr><td><code class="literal">sl_SI</code>: Slovenian - Slovenia</td><td><code class="literal">sq_AL</code>: Albanian - Albania</td></tr><tr><td><code class="literal">sr_YU</code>: Serbian - Yugoslavia</td><td><code class="literal">sv_FI</code>: Swedish - Finland</td></tr><tr><td><code class="literal">sv_SE</code>: Swedish - Sweden</td><td><code class="literal">ta_IN</code>: Tamil - India</td></tr><tr><td><code class="literal">te_IN</code>: Telugu - India</td><td><code class="literal">th_TH</code>: Thai - Thailand</td></tr><tr><td><code class="literal">tr_TR</code>: Turkish - Turkey</td><td><code class="literal">uk_UA</code>: Ukrainian - Ukraine</td></tr><tr><td><code class="literal">ur_PK</code>: Urdu - Pakistan</td><td><code class="literal">vi_VN</code>: Vietnamese - Viet Nam</td></tr><tr><td><code class="literal">zh_CN</code>: Chinese - China</td><td><code class="literal">zh_HK</code>: Chinese - Hong Kong</td></tr><tr><td><code class="literal">zh_TW</code>: Chinese - Taiwan Province of China</td><td> </td></tr></tbody></table></div><p>
      <a href="server-administration.html#sysvar_lc_time_names"><code class="literal">lc_time_names</code></a> currently does not
      affect the <a href="functions.html#function_str-to-date"><code class="literal">STR_TO_DATE()</code></a> or
      <a href="functions.html#function_get-format"><code class="literal">GET_FORMAT()</code></a> function.
    </p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="language-structure.html">Prev</a> </td><td width="20%" align="center"> </td><td width="40%" align="right"> <a accesskey="n" href="data-types.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 8. Language Structure </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 10. Data Types</td></tr></table></div></body></html>