<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML ><HEAD ><TITLE >XML Type</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK REV="MADE" HREF="mailto:pgsql-docs@postgresql.org"><LINK REL="HOME" TITLE="PostgreSQL 8.3.11 Documentation" HREF="index.html"><LINK REL="UP" TITLE="Data Types" HREF="datatype.html"><LINK REL="PREVIOUS" TITLE="UUID Type" HREF="datatype-uuid.html"><LINK REL="NEXT" TITLE="Arrays" HREF="arrays.html"><LINK REL="STYLESHEET" TYPE="text/css" HREF="stylesheet.css"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"><META NAME="creation" CONTENT="2010-05-14T03:43:41"></HEAD ><BODY CLASS="SECT1" ><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="5" ALIGN="center" VALIGN="bottom" >PostgreSQL 8.3.11 Documentation</TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A HREF="datatype-uuid.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A HREF="datatype.html" >Fast Backward</A ></TD ><TD WIDTH="60%" ALIGN="center" VALIGN="bottom" >Chapter 8. Data Types</TD ><TD WIDTH="10%" ALIGN="right" VALIGN="top" ><A HREF="datatype.html" >Fast Forward</A ></TD ><TD WIDTH="10%" ALIGN="right" VALIGN="top" ><A HREF="arrays.html" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="DATATYPE-XML" >8.13. <ACRONYM CLASS="ACRONYM" >XML</ACRONYM > Type</A ></H1 ><A NAME="AEN5941" ></A ><P > The data type <TT CLASS="TYPE" >xml</TT > can be used to store XML data. Its advantage over storing XML data in a <TT CLASS="TYPE" >text</TT > field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see <A HREF="functions-xml.html" >Section 9.14</A >. Use of this data type requires the installation to have been built with <TT CLASS="COMMAND" >configure --with-libxml</TT >. </P ><P > The <TT CLASS="TYPE" >xml</TT > type can store well-formed <SPAN CLASS="QUOTE" >"documents"</SPAN >, as defined by the XML standard, as well as <SPAN CLASS="QUOTE" >"content"</SPAN > fragments, which are defined by the production <TT CLASS="LITERAL" >XMLDecl? content</TT > in the XML standard. Roughly, this means that content fragments can have more than one top-level element or character node. The expression <TT CLASS="LITERAL" ><TT CLASS="REPLACEABLE" ><I >xmlvalue</I ></TT > IS DOCUMENT</TT > can be used to evaluate whether a particular <TT CLASS="TYPE" >xml</TT > value is a full document or only a content fragment. </P ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN5956" >8.13.1. Creating XML Values</A ></H2 ><P > To produce a value of type <TT CLASS="TYPE" >xml</TT > from character data, use the function <CODE CLASS="FUNCTION" >xmlparse</CODE >:<A NAME="AEN5961" ></A > </P><PRE CLASS="SYNOPSIS" >XMLPARSE ( { DOCUMENT | CONTENT } <TT CLASS="REPLACEABLE" ><I >value</I ></TT >)</PRE ><P> Examples: </P><PRE CLASS="PROGRAMLISTING" >XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>') XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')</PRE ><P> While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes: </P><PRE CLASS="PROGRAMLISTING" >xml '<foo>bar</foo>' '<foo>bar</foo>'::xml</PRE ><P> can also be used. </P ><P > The <TT CLASS="TYPE" >xml</TT > type does not validate its input values against a possibly included document type declaration (DTD).<A NAME="AEN5969" ></A > </P ><P > The inverse operation, producing character string type values from <TT CLASS="TYPE" >xml</TT >, uses the function <CODE CLASS="FUNCTION" >xmlserialize</CODE >:<A NAME="AEN5974" ></A > </P><PRE CLASS="SYNOPSIS" >XMLSERIALIZE ( { DOCUMENT | CONTENT } <TT CLASS="REPLACEABLE" ><I >value</I ></TT > AS <TT CLASS="REPLACEABLE" ><I >type</I ></TT > )</PRE ><P> <TT CLASS="REPLACEABLE" ><I >type</I ></TT > can be one of <TT CLASS="TYPE" >character</TT >, <TT CLASS="TYPE" >character varying</TT >, or <TT CLASS="TYPE" >text</TT > (or an alias name for those). Again, according to the SQL standard, this is the only way to convert between type <TT CLASS="TYPE" >xml</TT > and character types, but PostgreSQL also allows you to simply cast the value. </P ><P > When character string values are cast to or from type <TT CLASS="TYPE" >xml</TT > without going through <TT CLASS="TYPE" >XMLPARSE</TT > or <TT CLASS="TYPE" >XMLSERIALIZE</TT >, respectively, the choice of <TT CLASS="LITERAL" >DOCUMENT</TT > versus <TT CLASS="LITERAL" >CONTENT</TT > is determined by the <SPAN CLASS="QUOTE" >"XML option"</SPAN > <A NAME="AEN5991" ></A > session configuration parameter, which can be set using the standard command </P><PRE CLASS="SYNOPSIS" >SET XML OPTION { DOCUMENT | CONTENT };</PRE ><P> or the more PostgreSQL-like syntax </P><PRE CLASS="SYNOPSIS" >SET xmloption TO { DOCUMENT | CONTENT };</PRE ><P> The default is <TT CLASS="LITERAL" >CONTENT</TT >, so all forms of XML data are allowed. </P ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN5996" >8.13.2. Encoding Handling</A ></H2 ><P > Care must be taken when dealing with multiple character encodings on the client, server, and in the XML data passed through them. When using the text mode to pass queries to the server and query results to the client (which is the normal mode), PostgreSQL converts all character data passed between the client and the server and vice versa to the character encoding of the respective end; see <A HREF="multibyte.html" >Section 22.2</A >. This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in XML data might become invalid as the character data is converted to other encodings while travelling between client and server, while the embedded encoding declaration is not changed. To cope with this behavior, an encoding declaration contained in a character string presented for input to the <TT CLASS="TYPE" >xml</TT > type is <SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >ignored</I ></SPAN >, and the content is always assumed to be in the current server encoding. Consequently, for correct processing, such character strings of XML data must be sent off from the client in the current client encoding. It is the responsibility of the client to either convert the document to the current client encoding before sending it off to the server or to adjust the client encoding appropriately. On output, values of type <TT CLASS="TYPE" >xml</TT > will not have an encoding declaration, and clients must assume that the data is in the current client encoding. </P ><P > When using the binary mode to pass query parameters to the server and query results back to the client, no character set conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16 at all). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted. </P ><P > Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8. </P ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN6005" >8.13.3. Accessing XML Values</A ></H2 ><P > The <TT CLASS="TYPE" >xml</TT > data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an <TT CLASS="TYPE" >xml</TT > column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method. </P ><P > Since there are no comparison operators for the <TT CLASS="TYPE" >xml</TT > data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds would be casting the expression to a character string type and indexing that, or indexing an XPath expression. The actual query would of course have to be adjusted to search by the indexed expression. </P ><P > The text-search functionality in PostgreSQL could also be used to speed up full-document searches in XML data. The necessary preprocessing support is, however, not available in the PostgreSQL distribution in this release. </P ></DIV ></DIV ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="datatype-uuid.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.html" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="arrays.html" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><ACRONYM CLASS="ACRONYM" >UUID</ACRONYM > Type</TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="datatype.html" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >Arrays</TD ></TR ></TABLE ></DIV ></BODY ></HTML >