XML-DBMS, Version 1.01
Java Packages for Transferring Data between XML Documents and Relational Databases
1.0 Introduction
XML-DBMS is a set of Java packages for transferring data between XML documents and relational databases. Programmers use these packages to build systems that transfer data; a sample application can be run from the command line to transfer data between a database and an XML file.
XML-DBMS preserves the hierarchical structure of an XML document, as well as the data (character data and attribute values) in that document. If requested, it also preserves the order in which the children at a given level in the hierarchy appear. (For many data-centric applications, such order is not important and the code runs faster without it.)
Because XML-DBMS seeks to transfer data, not documents, it does not preserve document type declarations, nor does it preserve physical structure such as entity use, CDATA sections, or document encodings. In particular, it does not attempt to implement a document management system on top of a relational database.
For a general discussion of XML and databases, see XML and Databases.
2.0 Getting Started
2.1 System Requirements
To run XML-DBMS, you need the following software:
-
XML-DBMS
Download: http://www.rpbourret.com/xmldbms/index.htm
-
JDK (Java Development Kit) 1.1.x or 1.2.x
Download: http://java.sun.com/products/jdk/1.1/index.html
-
A relational database such as Oracle, DB2, Informix, MS Access, or MySQL
You will need to buy this on your own or find a free relational database on the Web. Note that if you are thinking about using XML-DBMS, you probably already have a relational database.
-
A JDBC driver for your database
Most relational databases are shipped with JDBC and ODBC drivers. If you have an ODBC driver but not a JDBC driver, you can use an JDBC-ODBC bridge as your JDBC driver. This converts JDBC calls to ODBC calls. An experimental (and therefore somewhat buggy) JDBC-ODBC bridge is shipped with the JDK; JDBC-ODBC bridges are available from other companies as well. Note that the quality of JDBC drivers varies considerably, so if one JDBC driver does not work, you can try another. For a list of available JDBC drivers, see:
http://industry.java.sun.com/products/jdbc/drivers
-
An XML parser written in Java
XML parsers are available from many companies, such as Oracle, Sun, IBM, and Microsoft (DataChannel). In addition, Open Source parsers are available from many organizations and individuals, such as James Clark, OpenXML, and Apache. For a partial list of available parsers, see:
http://www.xmlsoftware.com/parsers/
-
A DOM level 1 implementation written in Java
Most XML parsers include an implementation of the DOM (Document Object Model). Thus, if your parser already supports the DOM, you don't need to download a separate DOM implementation. If your parser does not support the DOM, you can download a separate DOM implementation, such as that from Docuverse:
http://www.docuverse.com/
-
SAX (Simple API for XML) version 1
SAX is an event-based parser API. Virtually all parsers include an implementation of SAX; if yours doesn't, you should get a parser that does. For more information about SAX, see:
http://www.megginson.com/SAX/index.html
XML-DBMS has been run with products from numerous companies, including the following. Note that inclusion on this list does not imply full testing -- simply that some part of XML-DBMS has been run successfully with the product. Similarly, products not on the list should still be useable by XML-DBMS. This is because XML-DBMS rarely strays from the core features of Java, SAX, DOM, and JDBC and therefore should work with most implementations of these specifications.
- JDK: 1.1.7, 1.1.8, 1.2.1
- XML Parsers: DataChannel, IBM, OpenXML, Oracle (versions 1 and 2), Sun, Xerces
- JDBC Drivers: Sun JDBC-ODBC Bridge, Easysoft JDBC-ODBC Bridge, JDBC Driver from Oracle
- Databases: Microsoft Access, Oracle, MS SQL Server[1], Sybase SQL Server[1], MySQL, PrimeBase
[1] See the bug "DBMSToDOM does not work with SQL Server" in Version 1.x bugs.
2.2 Downloading and Installing XML-DBMS
To download the current version of XML-DBMS, see:
To install XML-DBMS:
- Unzip the downloaded file with a tool such as jar or WinZip.
- Add xmldbms.jar to your CLASSPATH. For example:
c:\Java\xmldbms\xmldbms.jar;c:\Java\jdk1.2.1\bin;c:\Java\xerces-1_1_1\xerces.jar;.\
2.3 Using XML-DBMS in Your Application
XML-DBMS is middleware -- that is, it sits between your application and a database. Because of this, you need to write your own application to use it. For example, you might use the following code to transfer the data in the sales_in.xml document to the database according to the mapping defined in sales.map. This uses the Map, DOMToDBMS, and DocInfo classes from XML-DBMS.
// Use a user-defined function to create a map object. Map map = createMap("sales.map", conn1); // Use a user-defined function to create a DOM tree over sales_in.xml Document doc = openDocument("sales_in.xml"); // Create a new DOMToDBMS object and store the data. DOMToDBMS domToDBMS = new DOMToDBMS(map); DocInfo docInfo = domToDBMS.storeDocument(doc);
XML-DBMS includes four sample programs (Transfer, TransferResultSet, GenerateMap, and ConvertSchema) to show you how to use different parts of the system. For most people, the best way learn XML-DBMS is to read the documentation suggested for beginners, then start playing with the Transfer sample and the sample data.
2.4 XML-DBMS Documentation
XML-DBMS comes with a complete set of documentation. Beginners should start with the following section of this document:
- An Object View of an XML Document
- Map Objects and Map Factories
- The XML-DBMS Mapping Language (sections 4.3.1 to 4.3.6 only)
- Transferring Data between XML Documents and the Database
- The Transfer sample application
- The sample data
More advanced users will also want to read the following:
- The XML-DBMS Mapping Language (sections 4.3.7 to 4.3.9)
- Additional Programming Considerations
- The GenerateMap, ConvertSchema, and TransferResultSet sample applications
- The comments in the XML-DBMS mapping language DTD
- Comments in the code. (See src.zip)
- XML-DBMS API documentation
- The presentation "Mapping DTDs to Databases", available from http://www.rpbourret.com/xml/index.htm
2.5 XML-DBMS Source Code
The source code for XML-DBMS is shipped with the product. You can find it in the src.zip file in the directory you installed XML-DBMS in.
3.0 An Object View of an XML Document
XML-DBMS views an XML document as a tree of objects and then uses an object-relational mapping to map these objects to a relational database. In this view, element types generally correspond to classes and attributes and PCDATA correspond to properties. Child element types are generally viewed as pointed-to classes; that is, an interclass relationship exists between the classes corresponding to parent and child element types.
For example, consider the following XML document:
<Orders> <SalesOrder SONumber="12345"> <Customer CustNumber="543"> <CustName>ABC Industries</CustName> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> </Customer> <OrderDate>981215</OrderDate> <Line LineNumber="1"> <Part PartNumber="123"> <Description> <P><B>Turkey wrench:</B><BR /> Stainless steel, one-piece construction, lifetime guarantee.</P> </Description> <Price>9.95</Price> </Part> <Quantity>10</Quantity> </Line> <Line LineNumber="2"> <Part PartNumber="456"> <Description> <P><B>Stuffing separator:<B><BR /> Aluminum, one-year guarantee.</P> </Description> <Price>13.27</Price> </Part> <Quantity>5</Quantity> </Line> </SalesOrder> </Orders>
This could be viewed as a tree of objects from five classes -- Orders, SalesOrder, Customer, Line, and Part -- as shown in the following diagram:
Orders | SalesOrder / | \ Customer Line Line | | Part Part
It is important to understand that the tree of objects is not the Document Object Model (DOM). The reason for this is that the DOM models the document itself and not the data in the document. For example, the above XML document would form the following DOM tree:
Element | Element _____/ / \ \_____ / / \ \ Attr Element Element Element etc.
The view of element types as classes is not absolute: element types can also be viewed as properties of their parent element type-as-class. This is most useful when an element type contains only PCDATA. For example, the OrderDate element is best viewed as a property of the SalesOrder class.
However, it is useful in other cases as well. For example, consider the Description element type, which contains a description written in XHTML. Although this description has subelements such as <B> and <P>, these subelements cannot be meaningfully interpreted on their own. Therefore, it makes more sense to view the element type as a single value (property) of the Part class rather than as a separate class.
For a more complete introduction to the object-relational mapping used by XML-DBMS, see the "Object-Based Mappings" section of the presentation "Mapping DTDs to Databases". This is available from http://www.rpbourret.com/xml/index.htm.
4.0 Mapping XML Documents to the Database
The user specifies how element types, attributes, and PCDATA are viewed, as well as how to map this view to the database. This information is contained in a Map object, which is created by a map factory. The most common way to do this is to create a map document with the XML-DBMS mapping language and pass this document to the MapFactory_MapDocument class.
4.1 Map Objects
A Map object describes the object view of the element types, attributes, and PCDATA in an XML document and how to map this view to the database. Map objects are opaque -- you can think of them as the compiled form of a map document.
To use a Map object, you first construct it with a map factory and then pass it to one of the data transfer classes (DOMToDBMS or DBMSToDOM). For most uses, you never need to call any of the methods on a Map object. For example, the following code calls a user-defined function to create a Map from a map document and passes it to a DOMToDBMS object. Later, when the DOMToDBMS.storeDocument() method is called, the data will be transferred according to this Map.
// Use a user-defined function that calls a map // factory to create a map. (See section 4.2.) map = createMap("sales.map", conn1); // Set the Map on the DOMToDBMS object. domToDBMS.setMap(map);
A Map object can be used multiple times. For example, suppose that a program expects to store four different types of XML documents in the database. It can create the Maps for each of these documents at start-up, then, as it receives documents to process, pass the appropriate Map to the DOMToDBMS object.
Warning: If it is possible for multiple threads to share the same object -- I don't know how multi-threading works in Java, so this may not be relevant -- then multiple threads should not share the same Map object. The reason for this is that a Map contains a reference to a Connection object and the data transfer classes (DBMSToDOM and DOMToDBMS) commit transactions on this object. Since Connection.commit() commits all statements open on a given Connection, a commit executed in one data transfer object will commit statements being used by all other data transfer objects sharing the same Map/Connection. This is unlikely to be the desired behavior.
4.2 Map Factories
Currently, XML-DBMS has a two map factories: one to create Map objects from map documents and one to create Map objects from DTDs and schema documents.
4.2.1 MapFactory_MapDocument
The MapFactory_MapDocument class creates Map objects from map documents -- you can think of it as a compiler for map documents. It is the most commonly used map factory. For example, the following code creates a Map object from the sales.map map document:
// Instantiate a new map factory from a database connection // and a SAX parser. factory = new MapFactory_MapDocument(conn, parser); // Create a Map from sales.map. map = factory.createMap(new InputSource(new FileReader("sales.map")));
4.2.2 MapFactory_DTD
The MapFactory_DTD class creates Map objects from DTDs and XML schema documents. This factory is designed primarily for use as a tool to help build mapping documents. For example, the following code creates a Map object from product.dtd and then serializes that Map object to a file.
// Instantiate a new map factory and create a map. factory = new MapFactory_DTD(); src = new InputSource("file://c:/java/de/tudarmstadt/ito/xmldbms/samples/product.dtd"); map = factory.createMapFromDTD(src, MapFactory_DTD.DTD_EXTERNAL, true, null);
// Open a FileOutputStream and serialize the Map to that stream. mapFile = new FileOutputStream("c:\java\de\tudarmstadt\ito\xmldbms\samples\product.map"); map.serialize(mapFile); mapFile.close();
Maps created by MapFactory_DTD cannot be used to transfer data until the Map.setConnection method has been called to specify a JDBC Connection.
MapFactory_DTD supports DTDs in two forms: either an external subset -- that is, a stand-alone DTD file -- or an XML document containing an internal subset, reference to an external subset, or both. It also supports XML schemas written in DDML (Data Definition Markup Language). If you need to use another schema language, such as the W3C's XML Schema language, DCD (Document Content Description for XML), SOX (Schema for Object-Oriented XML), or XML-Data Reduced, you will need to write a conversion module similar to de.tudarmstadt.ito.schemas.converters.DDMLToDTD.
4.3 The XML-DBMS Mapping Language
The XML-DBMS mapping language is a simple, XML-based language that describes both how to construct an object view for an XML document and how to map this view to a relational schema. We will introduce the main parts of the language in a series of examples. These use the XML documents and tables described in sections A.1, "Sales Sample" and A.2, "Product Description Sample".
For complete information about the XML-DBMS mapping language, see the mapping language DTD.
4.3.1 Table and Column Names
IMPORTANT: Table and column names in the map document must exactly match the names stored in the database. A frequent mistake is to use a different case in the map document than is used in the database. This happens because some databases (notably Oracle) convert table and column names in a CREATE TABLE statement to all upper or all lower case.
For example, suppose you use the name Sales in a CREATE TABLE statement. The database might store this name as SALES, in which case you must use the name SALES in your map document. If you use the name Sales, you will get an error similar to the following:
de.tudarmstadt.ito.xmldbms.InvalidMapException: Table not found: Sales at de.tudarmstadt.ito.xmldbms.Map.addColumnMetadata(Map.java:1711) at de.tudarmstadt.ito.xmldbms.Map.getTableMetadata(Map.java:1587) at de.tudarmstadt.ito.xmldbms.Map.initColumnMetadata(Map.java:229) at de.tudarmstadt.ito.xmldbms.mapfactories.MapFactory_MapDocument.createMap (MapFactory_MapDocument.java:335) at Transfer.createMap(Transfer.java:186) at Transfer.toDBMS(Transfer.java:124) at Transfer.main(Transfer.java:91)
To find out how your database stores names, see the documentation for your database.
4.3.2 Mapping Classes (Element Types) to Tables
Element types with element content are usually viewed as classes and mapped to a table. For example, the following declares the SalesOrder element type to be a class and maps it to the Sales table:
<ClassMap> <ElementType Name="SalesOrder"/> <ToClassTable> <Table Name="Sales"/> </ToClassTable> ...property maps... ...related class maps... </ClassMap>
The ClassMap element contains all of the information needed to map a single class (element type), including the table to which the class is mapped, the maps for each property in the class, a list of related classes, and a list of passed-through child classes.
The ElementType element identifies the element type (class) being mapped and the ToClassTable element gives the name of the table to which the class is mapped.
4.3.3 Mapping Properties (Attributes and Element Types) to Columns
Single-valued attributes and element types with PCDATA-only content are usually viewed as properties and mapped to columns. For example, the following declares the SONumber attribute and the OrderDate element type (when SalesOrder is its parent) to be properties and maps them to the Number and Date columns, respectively. These maps are nested inside the class map for SalesOrder.
<PropertyMap> <Attribute Name="SONumber"/> <ToColumn> <Column Name="Number"/> </ToColumn> </PropertyMap> <PropertyMap> <ElementType Name="OrderDate"/> <ToColumn> <Column Name="Date"/> </ToColumn> </PropertyMap>
The Attribute and ElementType elements identify the properties being mapped and the ToColumn elements state that they are being mapped to columns. These columns are understood to be in the table to which the class (SalesOrder) is mapped.
4.3.4 Mapping Inter-Class Relationships (Element Hierarchy)
When a child element type is viewed as a class, its relationship with its parent element type must be stated in the map of the parent class. For example, the following declares that Line is related to the SalesOrder class. This map is nested inside the class map for SalesOrder; the actual mapping of the Line class (element type) is separate.
<RelatedClass KeyInParentTable="Candidate"> <ElementType Name="Line"/> <CandidateKey Generate="No"> <Column Name="Number"/> <CandidateKey/> <ForeignKey> <Column Name="SONumber"/> </ForeignKey> <OrderColumn Name="Number" Generate="No"/> </RelatedClass>
The ElementType element gives the name of the related class and the KeyInParentTable attribute states that the candidate key used to join the tables is in the parent (Sales) table. CandidateKey and ForeignKey give the columns in these keys, which must match in number and type. The Generate attribute of CandidateKey tells the system whether to generate the key. This allows us to preserve keys that have business meaning and generate object identifiers when no such keys exist. In this case, we do not generate the key because we have mapped the SONumber attribute of the SalesOrder element type to the candidate key column (Sales.Number).
The (optional) OrderColumn element gives the name of the column that contains information about the order in which Line elements appear in the SalesOrder element. Because this column must appear in the table on the "many" side of the relationship, Number refers to the Lines.Number column, not the Sales.Number column. The Generate attribute of the OrderColumn element tells the system whether to generate the order value. In this case, we do not generate the order value because we will separately map the LineNumber attribute of the Line element type to the order column (Lines.Number).
4.3.5 Eliminating Unwanted Root Elements
Root elements sometimes exist only because XML requires a single root element. For example, in our sales order language, we would like to store multiple sales orders in a single document. To do this, we need the Orders element to encapsulate multiple SalesOrder elements. However, there is no structure in the database corresponding to the Orders element and we would like to eliminate it. For example, the following states that the Orders element type is to be ignored.
<IgnoreRoot> <ElementType Name="Orders"/> <PseudoRoot> <ElementType Name="SalesOrder"/> <CandidateKey Generate="No"> <Column Name="Number"/> </CandidateKey> </PseudoRoot> </IgnoreRoot>
The first ElementType element gives the element type to be ignored. The PseudoRoot element introduces an element type (SalesOrder) to serve as a root in its place; there can be multiple pseudo-roots. The (optional) CandidateKey element gives the key to be used when retrieving data from the database; not shown is an option OrderColumn element that gives the order in which the SalesOrder elements are to be retrieved.
Ignored root elements are reconstructed when retrieving data from the database.
4.3.6 Mapping Mixed Content
Mixed content consists of both PCDATA and elements, such as in our document language. The order in which the PCDATA and elements appear is usually important, so we usually need to keep order information for the PCDATA as well as each element. For example, the following maps the Name element type to the Name column in the Product table and stores system-generated order information in the NameOrder column; this map is nested inside the class map for the Product element type.
<PropertyMap> <ElementType Name="Name"/> <ToColumn> <Column Name="Name"/> </ToColumn> <OrderColumn Name="NameOrder" Generate="Yes"/> </PropertyMap>
Because PCDATA can occur multiple times in mixed content, it is usually mapped to a separate table. For example, the following maps the PCDATA from the Para element type to the ParaPCDATA table; this map is nested inside the class map for the Para element type.
<PropertyMap> <PCDATA/> <ToPropertyTable KeyInParentTable="Candidate"> <Table Name="ParaPCDATA"/> <CandidateKey Generate="Yes"> <Column Name="ParaID"/> </CandidateKey> <ForeignKey> <Column Name="ParaFK"/> </ForeignKey> <Column Name="ParaPCDATA"/> <OrderColumn Name="ParaPCDATAOrder" Generate="Yes"/> </ToPropertyTable> </PropertyMap>
The ToPropertyTable element states that the table contains only property values, not a class. In addition to giving the candidate and foreign keys needed to retrieve PCDATA values from the table, we give the names of the columns (ParaPCDATA and ParaPCDATAOrder) in which the PCDATA and order information are stored. Notice that we ask the system to generate both the candidate key (ParaID) and the order information; this is because the document does not contain this information. For more information about generating keys, see section 6.3, "Key (Object ID) Generators".
As you may have noticed, the document language requires more tables and more columns per property than the sales order language. This is because the document language is an example of a document-centric language, while the sales language is an example of a data-centric language.
Document-centric languages are used to create documents for human consumption, such as books, email, and advertisements. They are characterized by less predictable structures, coarser-grained data, and large amounts of mixed content and the order in which sibling elements and PCDATA occurs is usually significant. Because order is usually signficant and element types-as-properties and PCDATA generally can occur multiple times in their parent (thus requiring storage in separate tables), document-centric languages require a more complex structure in the database.
Data-centric languages tend to describe discrete pieces of data and are typically used to transfer data between applications and data stores. They are characterized by fairly regular structure, fine-grained data (the smallest independent unit of data is usually at the attribute or PCDATA-only element level), and little or no mixed content. The order in which sibling elements and PCDATA occurs is usually not significant. Because of their regular structure and the unimportance of order, data-centric languages require a less complex structure in the database.
Although XML-DBMS and relational databases can be used to store documents written in document- centric languages, they are better suited to storing the regular structure encountered in documents written in data-centric languages.4.3.7 Using Namespaces
Namespaces are supported through Namespace elements, which declare the prefixes and URIs used in the Name attributes of ElementType and Attribute elements. (Note that these prefixes are separate from those declared with xmlns attributes.) For example, suppose the sales order language has a namespace URI of http://ito.tu-darmstadt.de/xmldbms/sales. The map document might contain the following Namespace element, which states that the sales prefix is used in the map document to identify element types and attributes from this namespace.
<Namespace Prefix="sales" URI="http://ito.tu-darmstadt.de/xmldbms/sales"/>
Thus, when mapping the SalesOrder element type, the following reference is used:
<ElementType Name="sales:SalesOrder"/>
As with namespaces in XML documents, unprefixed attribute names referenced in the Name attribute of the Attribute element type do not belong to any XML namespace. (For those of you who are confused by this statement, remember that such attribute names must be unique within their element type; this is a requirement imposed by the XML specification and has nothing to do with XML namespaces.) For example, in the following class map, the SONumber attribute is assumed to belong to the SalesOrder element type; it does not belong to any XML namespace.
<ClassMap> <ElementType Name="sales:SalesOrder"/> <ToClassTable> <Table Name="Sales"/> </ToClassTable> <PropertyMap> <Attribute Name="SONumber"/> <ToColumn> <Column Name="Number"/> </ToColumn> </PropertyMap> </ClassMap>
Prefixes used in the map document do not need to match those used in instance documents. All that is important is that the namespace URIs are the same. Currently, Namespace elements do not support empty prefixes; that is, you cannot declare a namespace URI that will be associated with unprefixed element type and attribute names in the map document.
Whether a document using namespaces can actually be processed depends on the DOM implementation being used. For more information, see section 6.1, "XML Namespaces".
4.3.8 Handling Null Values
A null value is a value that simply isn't there. This is very different from a value of 0 (for numbers) or zero length (for a string). For example, suppose you have data collected from a weather station. If the thermometer isn't working, a null value is stored in the database rather than a 0, which would mean something different altogether.
XML also supports the concept of null data through optional element types and attributes. If the value of an optional element type or attribute is null, it simply isn't included in the document. As with databases, empty elements or attributes containing zero length strings are not null: their value is a zero-length string.
In spite of this definition of null values, it is quite likely that XML documents will use empty (zero-length) strings to represent null values. Because of this, the EmptyStringIsNull element can be used to state how empty strings are treated. If it is present, empty strings are treated in the same way as null values. If it is absent, empty strings are treated as strings. For example, the following states that empty strings should be treated as nulls.
<EmptyStringIsNull/>
The EmptyStringIsNull element is nested inside the Options element. Note that it applies only to element types and attributes mapped as propertys. An empty element-as-class with no attributes results in a row of all NULLs in the database.
4.3.9 Date, Time, and Timestamp Formats
Because XML documents are international, it is likely that you will encounter a variety of date, time, and timestamp formats. You can specify the formats to use with the DateTimeFormats element, which contains an optional Locale element and a Patterns element that specifies the actual formatting patterns to use. For example, the following specifies that dates use the "dd.MM.yy" format (e.g. 29.10.58), times use the "HH:mm" format (e.g. 18:37), and timestamps use the "MMM d, yyyy h:mm a" (e.g. February 9, 1962 6:35 AM).
<DateTimeFormats> <Patterns Date="HH:mm" Time="HH:mm" Timestamp="MMM d, yyyy h:mm"/> </DateTimeFormats>
Like EmptyStringIsNull, the DateTimeFormats element is nested inside the Options element. The formats used are defined in the java.text.DateFormat and java.text.SimpleDateFormat classes.
5.0 Transferring Data between XML Documents and the Database
XML-DBMS has two classes for transferring data between XML documents and the database: DOMToDBMS transfers data from XML documents to the database and DBMSToDOM transfers data in the opposite direction. Both classes use DOM trees as intermediate forms of the XML document.
(SAXToDBMS and DBMSToSAX classes are planned for a future release of XML-DBMS. These should help solve some of the scalability problems encountered by using DOM trees.)
5.1 Transferring Data to the Database
The DOMToDBMS class transfers data from a DOM tree to the database according to a given Map. For example, the following code transfers data from the sales_in.xml document to the database according to the Map object created from sales.map:
// Use a user-defined function to create a map. map = createMap("sales.map", conn1); // Use a user-defined function to create a DOM tree over sales_in.xml doc = openDocument("sales_in.xml"); // Create a new DOMToDBMS object and store the data. domToDBMS = new DOMToDBMS(map); docInfo = domToDBMS.storeDocument(doc);
Information about how to retrieve the data at a later point in time is returned in a DocumentInfo object, which is just a list of table names, key column names, key values, and order column names.
5.2 Transferring Data to an XML Document
The DBMSToDOM class transfers data from the database to a DOM tree according to a given Map. For example, the following code transfers data for sales order number 123 from the Sales table to the sales_out.xml document according to the Map object created from sales.map:
// Use a user-defined function to create a map. map = createMap("sales.map", conn); // Create a new DBMSToDOM object. dbmsToDOM = new DBMSToDOM(map, new DF_Oracle()); // Create a key and retrieve the data. key = {new Integer(123)}; doc = dbmsToDOM.retrieveDocument("Sales", key);
Note that the DBMSToDOM object is created with a DocumentFactory (DF_Oracle) that can create Documents for Oracle's implementation of the DOM. For more information, see section 6.2, "DOM Document Factories".
The DBMSToDOM class has four different retrieveDocument methods. In addition to the method shown above, there are methods that accept arrays of tables and keys, a DocumentInfo object, and a JDBC ResultSet object as arguments. In the latter case, the Map object must map an element type to the table named "Result Set".
If the data specified by the parameters of a retrieveDocument method contains more than one row, the Map object must specify an ignored root type. Otherwise, DBMSToDOM will attempt to add more than one root element to the document, resulting in a DOMException. (Note that this does not include rows of data retrieved from subordinate tables.)
6.0 Additional Programming Considerations
6.1 XML Namespaces
The DOM specification level 1, which was the only level available when XML-DBMS was written, does not define how namespaces are supported. Thus, some DOM implementations defined methods for retrieving various information about the namespace used by a given Node. The NameQualifier interface encapsulates a subset of this information. It uses the following definitions:
- Local name
- The unprefixed name of a node.
- Prefixed name
- The prefixed name of a node. If there is no namespace URI, the prefixed name is the same as the local name.
- Qualified name
- The namespace URI, plus a caret (^), plus the local name. If there is no namespace URI, the qualified name is the same as the local name.
For example:
<foo:element1 xmlns="http://foo"> Local name: "element1" Prefixed name: "foo:element1" Qualified name: "http://foo^element1" <element2> Local name: "element2" Prefixed name: "element2" Qualified name: "element2"
To use namespaces, the DOM implementation must support namespaces and the Map object must declare the namespace URI (if any) of each mapped element type and attribute (see section 4.3.7, "Using Namespaces"). If a DOM implementation does not support namespaces, then the element type and attribute names in the Map object must exactly match the names returned by the DOM's implementation Node.getNodeName(). Usually, this will be the prefixed name.
When transferring data from an XML document to the database, the caller must pass an object that implements the NameQualifier interface to the DOMToDBMS object. For example, the following code passes a NameQualifier for Oracle's DOM implementation:
domToDBMS.setNameQualifier(new NQ_Oracle());
The NameQualifier may also be set in the constructor and no NameQualifier is needed if neither the XML document nor the Map uses namespaces. The de.tudarmstadt.ito.domutils package includes implementations of NameQualifier for DataChannel (Microsoft), IBM, Oracle (versions 1 and 2), and Sun, as well as a generic implementation of NameQualifier for all DOM implementations that support level 2 of the specification. As of this writing, Docuverse and OpenXML do not support namespaces. However, you should check whether a newer version of either implemenation does; implementing NameQualifier yourself is trivial.
When transferring data from the database to an XML document, the caller must choose how namespaces will be used. DOM level 1 does not support setting the namespace or prefix of an element or attribute. Thus, the caller can choose whether element and attribute names are prefixed according to the namespace prefixes in the Map or no prefixes are used at all.
Prefixing the element and attribute names in the returned DOM tree is useful if the DOM tree is to be serialized as XML. However, it will probably cause problems if the DOM tree is to be used directly. The reason for this is that the DOM implementation will not correctly recognize and return the unprefixed name, the namespace URI, or the qualified name. By default, prefixes are not used. The following code shows how to request that prefixes be used:
dbmsToDOM.usePrefixes(true);
(DOM level 2 does support setting namespaces on elements or attributes; support for this will probably be added in the future.)
6.2 DOM Document Factories
There is no standard way to create an empty DOM Document. Thus, we have encapsulated this functionality in the DocumentFactory interface. When transferring data from the database to an XML document, an object implementing this interface must be passed to the DBMSToDOM object. For example, the following code uses the DocumentFactory for Oracle:
dbmsToDOM.setDocumentFactory(new DF_Oracle());
The DocumentFactory may also be set in the constructor. The de.tudarmstadt.ito.domutils package contains implementations of DocumentFactory for the DataChannel (Microsoft), Docuverse, IBM, OpenXML, Oracle (versions 1 and 2), Sun, and Xerces DOM implementations. Be sure to check that these implementations match the version of the implementation you are using. If not, you may need to implement DocumentFactory yourself; doing so is trivial.
6.3 Key (Object ID) Generators
If DOMToDBMS needs to generate key values, such as when mapping mixed content to the database, the calling application must provide an object that implements the KeyGenerator interface. DOMToDBMS calls a method on this object to get unique key values (object IDs). You can write your on object to implement KeyGenerator -- for example, you might write a key generator that generates GUIDs (UUIDs) or timestamps -- or you can use KeyGeneratorImpl, which is a sample implementation of KeyGenerator.
KeyGeneratorImpl generates unique 4-byte integers based on a value stored in a special table. Before you use KeyGeneratorImpl, you must:
- Create a table named XMLDBMSKey with a single INTEGER column named HighKey.
- Add a single row to this table with HighKey set to 0.
The following code shows how an application uses KeyGeneratorImpl. Note that the KeyGeneratorImpl object and the DOMToDBMS object use different connections to the same database. This is because each commits transactions at different times and using the same connection for both objects would lead to statements being committed prematurely.
// Use a user-defined function to create a map. Map map = createMap("document.map", conn1); // Use a user-defined function to create a DOM tree over document_in.xml doc = openDocument("document_in.xml"); // Instantiate KeyGeneratorImpl and initialize it with a Connection. keyGenerator = new KeyGeneratorImpl(); keyGenerator.initialize(conn2); // Create a new DOMToDBMS object and set the KeyGenerator. domToDBMS = new DOMToDBMS(map); domToDBMS.setKeyGenerator(keyGenerator); // Store the data. docInfo = domToDBMS.storeDocument(doc);
6.4 Tips and Tricks
Here is a short list of ways that might help your code run faster:
- Reuse Map objects if you need to transfer data from multiple documents to the database or vice versa. For details, see section 4.1, "Map Objects".
- Use a pool of SELECT statements when transferring data from the database to an XML file. For details, see the comments in the code for Map.checkInSelectStatement.
- Close your prepared statements before discarding a Map object. For details, see Map.closeStatements().
6.5 Classes Not for General Use
The de.tudarmstadt.ito.xmldbms package contains a number of public classes that are not for general use. That is, programmers using XML-DBMS do not need to instantiate or call methods on these classes. These classes are used to map XML document structures to database structures and are public so that they can be used by map factories, which are in a different package.
The not-for-general-use mapping classes are:
ClassMap Column ColumnMap LinkInfo MapOptions OrderInfo PropertyMap RelatedClassMap RootClassMap RootTableMap Table TableMap
A special case is the Map class. For programmers using XML-DBMS, this is generally treated as an opaque object. That is, the programmer gets a Map object from a map factory and passes it to DOMToDBMS or DBMSToDOM (usually) without calling any methods on it. In spite of this, the Map object does have public methods that some (but not all) XML-DBMS programmers use, such as methods to serialize the map to an OutputStream and to get CREATE TABLE statements. Although many variables in the Map class are public, programmers should never need to access them.
It is possible for programmers to directly create objects in the mapping classes, but it is strongly recommended that a map factory be used instead. Note that DOMToDBMS and DBMSToDOM largely assume that the objects in these classes have been created correctly, so using incorrectly constructed objects has unpredictable results. However, should a programmer be brave (foolish?) enough to construct these objects by hand, a slightly simplified hierarchy of them is as follows:
Map Table (array of) TableMap (array of) RootClassMap (hashtable of) ClassMap PropertyMap (hashtable of) RelatedClassMap (hashtable of) ClassMap... LinkInfo OrderInfo LinkInfo OrderInfo RootTableMap (hashtable of) TableMap Table Column (array of) ColumnMap (array of) Column TableMap... (array of)
7.0 Samples
XML-DBMS comes with four samples, Transfer, TransferResultSet, GenerateMap, and ConvertSchema. These can be found in the samples subdirectory.
Applications that use XML-DBMS usually use a specific XML parser, DOM implementation, and JDBC driver. The sample programs are no different. They use the Oracle 2.x parser and DOM implementation and the JDBC-ODBC Bridge from Sun. Thus, if you want to run the samples or use them as a starting point for your application, you must either install this software on your system or modify the sample code.
If you decide to change the code (most people do), here is a list of things you might need to change. These are clearly labeled in the samples and generally separated from the rest of the code. Note that all samples come with commented-out code to use the Xerces and Sun parsers and DOM implementations.
- Database URL
- JDBC driver class
- NameQualifier implementation (see section 6.1, "XML Namespaces")
- DocumentFactory implementation (see section 6.2, "DOM Document Factories")
- Code to instantiate a SAX parser, get a DOM tree from an XML document, and serialize a DOM tree
XML-DBMS also comes with two sets of sample data that you can use with the sample applications. This is described in Appendix A, "Sample Documents and Data".
7.1 Transfer
Transfer is a simple command-line application that transfers data between an XML file and the database according to a particular map document. It shows how to use the MapFactory_MapDocument, DOMToDBMS, DBMSToDOM, and Map classes.
When transferring data from an XML document to the database, use the command:
java Transfer -todbms <map-file> <xml-file>
For example, to transfer data from the sample file document_in.xml to the database according the map document document.map, use the command:
java Transfer -todbms document.map document_in.xml
When transferring data from the database to an XML document, use the command:
java Transfer -toxml <map-file> <xml-file> <table-name> <key-value>...
where <key-value> is one or more values in a single key. (There are multiple values only if the key is multi-part.) For example, to transfer data for sales order number 123 from the Sales table to the file sales_out.xml according to the map document sales.map, use the command:
java Transfer -toxml sales.map sales_out.xml Sales 123
Before you use the Transfer sample, you must:
- Create an ODBC data source named "xmldbms" that points to your database.
- Create the tables that you want to transfer data to/from. If you are using the sample data, see the table schemas in Appendix A, "Sample Documents and Data".
- Create and initialize the XMLDBMSKey table. For more information, see section 6.3, "Key (Object ID) Generators".
7.2 TransferResultSet (Advanced)
Like Transfer, TransferResultSet is a simple command-line application that transfers data from the database to an XML document according to a particular map. Unlike Transfer, it gets its data from a result set specified by the user, rather than a root table name and key value. It shows how to use the DBMSToDOM.retrieveDocument(ResultSet) method, as well as the MapFactory_MapDocument and Map classes.
To use TransferResultSet, use the command:
java TransferResultSet {-t <table-name> | -s <SELECT-statement>} <map-file> <xml-file>
Before you use the Transfer sample, you must:
- Create an ODBC data source named "xmldbms" that points to your database.
- Create the tables that you want to transfer data to/from. If you are using the sample data, see the table schemas in Appendix A, "Sample Documents and Data".
7.2.1 Transferring Data from a Single Table
To transfer data from a result set created over a single table, use the -t option. For example, to transfer all the data from the Sales table to the sales_out.xml file, use the following command. The sales_rs1.map document shows how to map the SalesOrder element type to the result set created over the table using the special name "Result Set".
java TransferResultSet -t Sales sales_rs1.map sales_out.xml
7.2.2 Transferring Data from a SELECT Statement
To transfer data from a result set created from a SELECT statement, use the -s option; the SELECT statement must be enclosed in quotes. For example, to transfer data from the result set created by the Microsoft Access-specific SELECT statement:
SELECT Sales.CustNumber, Lines.Part, Sales.Number, Lines.SONumber, Sales.Date, Lines.Number AS LineNumber FROM Sales INNER JOIN Lines ON Sales.Number = Lines.SONumber
to the sales_out.xml file, use the following command:
java TransferResultSet -s "SELECT Sales.CustNumber, Lines.Part, Sales.Number, Lines.SONumber, Sales.Date, Lines.Number AS LineNumber FROM Sales INNER JOIN Lines ON Sales.Number = Lines.SONumber" sales_rs2.map sales_out.xml
The sales_rs2.map document shows how to map the SalesOrder element type to the special name "Result Set", as well as how to use columns renamed in the select list -- Lines.Number is renamed as LineNumber in this case. Note that it does not map all columns in the result set.
Because this SELECT statement joins two tables, rows can contain duplicate information in some columns. For example, if a sales order contains multiple lines, there will be more than one row with the same sales order number. (Each row will contain a different line number.) Because of this, the information is duplicated in the XML document as well, not nested as one might expect. This is an inherent limitation of creating XML documents from result sets and is easily avoided by using one of the other forms of the DBMSToDOM.retrieveDocument() method, all of which perform joins themselves, or as shown in the following section.
7.2.3 Transferring Nested Data from a SELECT Statement
To avoid the problems shown in the previous section, you need to do two things: use a SELECT statement over a single table and use a map document with RelatedClass elements. Because the SELECT statement is created over a single table, there is no duplicate data. Because the map document uses RelatedClass elements, DBMSToDOM does the necessary joins to retrieve the related data and nest it in the XML document.
For example, to transfer data from the result set created by the SELECT statement:
SELECT * FROM Sales WHERE CustNumber='543'
to the sales_out.xml file, use the following command.
java TransferResultSet -s "SELECT * FROM Sales WHERE CustNumber='543'" sales_rs3.map sales_out.xml
The sales_rs3.map document is identical to sales.map, except that it maps the SalesOrder element type to "Result Set" instead of "Sales":
The inclusion of related classes in the map document does not cause any problems for DBMSToDOM. This is because it starts its processing from a single root result set. With the other forms of retrieveDocument(), it creates this result set itself, such as by creating a result set using a table name and key value passed by the application. The only difference with retrieveDocument (ResultSet) is that the application, not DBMSToDOM, creates the root result set. One consequence of this is that applications can do more complicated SELECTs over a given table when they use retrieveDocument(ResultSet) than when they use other forms of retrieveDocument().
7.3 GenerateMap
GenerateMap is a simple command-line application that generates a map and a set of CREATE TABLE statements from a DTD, an XML document containing or referring to a DTD, or a DDML schema document. The map is saved in a document with the .map extension and the CREATE TABLE statements are saved in a document with the .sql extension. It shows how to use the MapFactory_DTD and Map classes.
To run GenerateMap, use the command:
java GenerateMap <DTD or XML document>
For example, to generate a map from the document DTD, use the command:
java GenerateMap document.dtd
The GenerateMap application requires an ODBC data source named "xmldbms" and an ODBC driver for that database. It does not require that the database contain any tables -- it simply needs to retrieve information from the database about how to construct the CREATE TABLE statements.
7.4 ConvertSchema
ConvertSchema is a simple command-line application that converts schema documents to DTDs and vice versa. Currently, only DDML-to-DTD and DTD-to-DDML are supported, but writing converters to other schema languages is relatively easy -- a half day to day per direction. Although this sample has nothing to do with databases, it does show the capabilities of some of the schema conversion classes (SubsetToDTD, DDMLToDTD, DTDToDDML, and DTD), which might be useful in other applications.
To convert a schema document to a DTD or vice versa, use the command:
java ConvertSchema <schema-file>
For example, to convert the document.ddm DDML document to a DTD, use the command:
java ConvertSchema document.ddm
To convert a DTD to convert the document.dtd DTD to a DDML document, use the command:
java ConvertSchema document.dtd
8.0 Licensing and Support
8.1 Licensing
XML-DBMS, along with its source code, is freely available for use in both commercial and non- commercial settings. It is not copyrighted and has absolutely no warranty. Use it as you will.
8.2 Support
If you have questions about how to use XML-DBMS, the first thing you should do is read the documentation and look at the sample programs. If you want to check for known bugs, you have three options. First, check the XML-DBMS bug database on SourceForge. Second, look at the list of known bugs and proposed enhancements shipped with the product. Third, look at the archives of the XML-DBMS mailing list (see below), which lists many known bugs and frequently asked questions.
If you still have a question, you can send email to the mailing list at [email protected]. Although you must be a member of the mailing list to post messages, anyone can read the messages in the archives. You can join the mailing list and read the archives at:
http://groups.yahoo.com/group/xml-dbms
8.3 How can I help?
XML-DBMS is an ongoing, Open Source project and depends on users for feedback, bug reports, and new code. The best way to help out is to join the mailing list and actively participate in discussions, find and fix bugs, suggest new features, and submit changes you have made for inclusion in future releases. For the latest news, check the XML-DBMS home page at:
http://www.rpbourret.com/xmldbms/index.htm
or the XML-DBMS project page on SourceForge:
https://sourceforge.net/projects/xmldbms/
8.4 Thanks
Thanks to all those who have given me feedback and sent in bug reports. Special thanks to the following for their helpful comments, suggestions for new features, code, and testing.
- Alf Hogemark
- Ivana Tzenova
- Iztok Kucan
- Matthias Pfisterer
- Omar Lopez Ruiz
- Peter Adolphs
- Phil Friedman
- Richard Sullivan
- Roland Stengel
- Ruben Lainez
Appendix A: Sample Documents and Tables
XML-DBMS comes with two sets of sample XML documents: a sales order and a product description. The sample documents and the tables needed to store these documents according to the shipped map documents are described in this section.
A.1 Sales Sample
The sales sample shows a simple sales order written as an XML document. This is an example of a data-centric document. (For more information, see section 3.1 of the article "XML and Databases".) The sample comes with several different map documents:
- sales.map maps the complete document
- sales_ns.map maps the form of the document that uses XML namespaces
- sales_rs1.map maps the Sales table for output with the TransferResultSet sample
- sales_rs2.map maps a join of the Sales and Lines tables for output with the TransferResultSet sample
- sales_rs3.map maps the Sales table and all child tables for nested output with the TransferResultSet sample
A.1.1 Sample Sales Document
The sales sample comes with two forms of the sales order document: sales_in.xml, which does not use XML namespaces, and sales_in_ns.xml, which does use XML namespaces. sales_in.xml is shown below.
<Orders> <SalesOrder SONumber="12345"> <Customer CustNumber="543"> <CustName>ABC Industries</CustName> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> </Customer> <OrderDate>981215</OrderDate> <Line LineNumber="1"> <Part PartNumber="123"> <Description> <P><B>Turkey wrench:</B><BR /> Stainless steel, one-piece construction, lifetime guarantee.</P> </Description> <Price>9.95</Price> </Part> <Quantity>10</Quantity> </Line> <Line LineNumber="2"> <Part PartNumber="456"> <Description> <P><B>Stuffing separator:<B><BR /> Aluminum, one-year guarantee.</P> </Description> <Price>13.27</Price> </Part> <Quantity>5</Quantity> </Line> </SalesOrder> </Orders>
A.1.2 Tables for Sales Sample
The following tables are needed to store the sales_in.xml document, as it is mapped with sales.map.
Sales | Lines | ||||
---|---|---|---|---|---|
Number | VARCHAR(10) | SONumber | VARCHAR(10) | ||
CustNumber | VARCHAR(10) | Number | INTEGER | ||
Date | DATE | Part | VARCHAR(10) | ||
Quantity | INTEGER |
Customers | Parts | ||||
---|---|---|---|---|---|
Number | VARCHAR(10) | Number | VARCHAR(10) | ||
Name | VARCHAR(40) | Description | VARCHAR(255) | ||
Street | VARCHAR(50) | Price | DOUBLE | ||
City | VARCHAR(50) | ||||
State | VARCHAR(2) | ||||
PostalCode | VARCHAR(10) |
A.2 Product Description Sample
The document sample shows a simple product description written in XML. This is an example of a document-centric document. (For more information, see section 3.2 of the article "XML and Databases".) The sample comes with a single map document (product.map), as well as a DTD (product.dtd). The latter can be used with the GenerateMap sample.
A.2.1 Sample Product Description Document
There is a single form of the product description document (product_in.xml), which is shown below.
<!DOCTYPE Product SYSTEM "product.dtd"> <Product> <Name>XML-DBMS</Name> <Developer>Ronald Bourret, Technical University of Darmstadt</Developer> <Summary>Java packages for transferring data between XML documents and relational databases</Summary> <Description> <Para>XML-DBMS is a set of Java packages for transferring data between XML documents and relational databases. It views the XML document as a tree of objects in which element types are generally viewed as classes and attributes and PCDATA as properties of those classes. It then uses an object- relational mapping to map these objects to the database. An XML-based mapping language is used to define the view and map it to the database.</Para> <Para>You can:</Para> <List> <Item><Link URL="readme.html">Read more about XML-DBMS</Link></Item> <Item><Link URL="XMLAndDatabases.htm">Read more about databases and XML</Link></Item> <Item><Link URL="xmldbms.dtd">View the mapping language DTD</Link></Item> <Item><Link URL="xmldbms.zip">Download XML-DBMS</Link></Item> </List> <Para>XML-DBMS, along with its source code, is freely available for use in both commercial and non-commercial settings.</Para> </Description>
A.2.2 Tables for Product Description Sample
The following tables are needed to store the product_in.xml document, as it is mapped with product.map.
Product | Link | ||||
---|---|---|---|---|---|
ProductPK | INTEGER | ParaFK | INTEGER | ||
ProductOrder | INTEGER | ItemFK | INTEGER | ||
Name | VARCHAR(50) | LinkOrder | INTEGER | ||
NameOrder | INTEGER | URL | VARCHAR(255) | ||
Developer | VARCHAR(100) | LinkPCDATA | VARCHAR(255) | ||
DeveloperOrder | INTEGER | ||||
Summary | VARCHAR(255) | ||||
SummaryOrder | INTEGER |
Description | List | ||||
---|---|---|---|---|---|
ProductFK | INTEGER | DescriptionFK | INTEGER | ||
DescriptionPK | INTEGER | ListPK | INTEGER | ||
DescriptionOrder | INTEGER | ListOrder | INTEGER |
Para | ParaPCDATA | ||||
---|---|---|---|---|---|
DescriptionFK | INTEGER | ParaFK | INTEGER | ||
ParaPK | INTEGER | ParaPCDATA | VARCHAR(1023) or LONGVARCHAR | ||
ParaOrder | INTEGER | ParaPCDATAOrder | INTEGER |
Item | ItemPCDATA | ||||
---|---|---|---|---|---|
ListFK | INTEGER | ItemFK | INTEGER | ||
ItemPK | INTEGER | ItemPCDATA | VARCHAR(255) | ||
ItemOrder | INTEGER | ItemPCDATAOrder | INTEGER |