ShreX: Getting Started

About the Guide

This guide provides information about using ShreX.  A hands-on tutorial is given to help you to start using the GUI.
ShreX is a tool developed for proof of concept.  There is no gurantee that the system will function properly in the scenarios other than described in the example.  Users are welcome to further explore the features, fix bugs, and enhance its functionality through working with its source code.
 

1.Example

Throughout the guide, we use the following IMDB example to illustrate how to work with ShreX GUI.

XML Schema:
<?xml version="1.0"?> 
<schema xmlns="http://www.w3.org/2001/XMLSchema" 
        targetNamespace="http://www.cse.ogi.edu/test" 
        elementFormDefault="qualified" 
        xmlns:test="http://www.cse.ogi.edu/test"
        xmlns:shrex="http://www.cse.ogi.edu/shrex"> 

  <element name="IMDB" type="test:IMDBType"/>

  <complexType name="IMDBType"> 
    <sequence>
      <element name="SHOW" type="test:SHOWType"
        minOccurs="0"
        maxOccurs="unbounded"/>
    </sequence>
  </complexType>

  <complexType name="SHOWType">
    <sequence> 
      <element name="TITLE" type="integer"/>
      <element name="YEAR" type="integer"/>
      <choice>
        <element name="BOXOFFICE" type="integer"/>
        <element name="SEASONS" type="string"/>
      </choice>
      <element name="REVIEW" type="integer" minOccurs="0" maxOccurs="unbounded"/>
    </sequence> 
  </complexType> 
</schema>

XML Document:
<IMDB   xmlns="http://www.cse.ogi.edu/test"
        xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance">

<SHOW>
<TITLE>1</TITLE>
<YEAR>2</YEAR>
<BOXOFFICE>3</BOXOFFICE>
<REVIEW>4</REVIEW>
<REVIEW>5</REVIEW>
</SHOW>

<SHOW>
<TITLE>6</TITLE>
<YEAR>7</YEAR>
<BOXOFFICE>8</BOXOFFICE>
<REVIEW>9</REVIEW>
<REVIEW>10</REVIEW>
<REVIEW>11</REVIEW>
</SHOW>

<SHOW>
<TITLE>12</TITLE>
<YEAR>13</YEAR>
<SEASONS>S14</SEASONS>
<REVIEW>15</REVIEW>
<REVIEW>16</REVIEW>
<REVIEW>17</REVIEW>
</SHOW>

<SHOW>
<TITLE>18</TITLE>
<YEAR>19</YEAR>
<SEASONS>S20</SEASONS>
<REVIEW>21</REVIEW>
</SHOW>

</IMDB>


 

2. Working with ShreX: a tutorial

This section went through an example to demonstrate how to customize XML-to-relational mapping, load generated schema, shred conforming documents into relational database, and query the stored XML data using XPath.  The tutorial is based on the Unix environment.  Operations in Windows should be similar.
 

2.1 Install and Launch ShreX GUI

The steps to create install and launch ShreX GUI are as follows:
  1. unzip shrex.zip to an install directory $INSTALL_DIR$

  2. >cd $INSTALL_DIR$
    >unzip shrex.zip
  3. compile the source code ( assume ant is installed on your machine)

  4. >cd $INSTALL_DIR$/shrex
    >ant
  5. start the GUI

  6. >cd $INSTALL_DIR$/shrex
    >java -cp ./lib/xml-apis.jar:./lib/xercesImpl.jar:./lib/shrex.jar:./lib/commons-jxpath-1.1.jar gui.XSGui &
The welcome page of the GUI is shown in Figure 1.
 
 


Figure 1: Welcome Page

2.2 XML Schema Mapping

2.2.1 Open a Schema

To open a schema, click File->Open schema, then choose data/demo/imdd.xsd
The left pane shows the schema tree, the right pane has five tab panels.

2.2.2 View Schema Information

Click the XML Schema tab.  Select an element/attribute in the schema tree.  The corresponding information is displayed in the right pane.  As shown in Figure 2, the informaiton about Title element inside Show is display in the right pane.


Figure 2: Schema Browser

2.2.3 View XML-to-Relational Mapping

The Mappings pane shows the correspondence between XML schema and relational schema.  Figure 3 displays that the Title element inside Show is mapped to field TITLE in the table IMDB_SHOW.


Figure 3: Mapping Browser

2.2.4 View Generated Relational Schema

All the generated relational tables are listed in the Relational Schema Pane.  View details about a table by clicking its name.


Figure 4: Generated Relational Schema Browser

2.2.5 Customize the Schema

By default, the opened schema imdb.xsd is mapped to three relational tables.
 
CREATE TABLE IMDB
(
    id_ VARCHAR(128) NOT NULL,
    PRIMARY KEY (id_)
);

CREATE TABLE IMDB_SHOW
(
    id_ VARCHAR(128) NOT NULL,
    pid VARCHAR(128) NOT NULL,
    TITLE NUM(12) NOT NULL,
    YEAR NUM(12) NOT NULL,
    BOXOFFICE NUM(12),
    SEASONS VARCHAR(128),
    PRIMARY KEY (id_),
    FOREIGN KEY (pid) REFERENCES IMDB
);

CREATE TABLE IMDB_SHOW_REVIEW
(
    id_ VARCHAR(128) NOT NULL,
    pid VARCHAR(128) NOT NULL,
    REVIEW NUM(12),
    PRIMARY KEY (id_),
    FOREIGN KEY (pid) REFERENCES IMDB_SHOW
);

We may customize the mapping in various ways. Go to the Mapping Editor panel.  Made the following changes
as shown in Figure 4:

    <element name="TITLE" type="integer" shrex:outline="true"/>
    <element name="YEAR" type="integer" shrex:fieldname="ShowYear"/>

Click  to "save and reload" schema.

Come back to the Mapping pane.  The element Title is now mapped to a table, and the element Year is mapped to a field with different name as shown in Figure 6 and 7.


Figure 5: Customize through Editor

Figure 6: Mapping information for element Title after customization

Figure 7: Mapping information for element Year after customization








We can also map the entire document into a single edge table.  Made the following changes:

    <element name="IMDB" type="test:IMDBType" shrex:edgemapping="true"/>

Click  to "save and reload" schema.

Come back to the Relational Schema Pane.  There will be only one table generated as shown in Figure 8.


Figure 8: Edge Mapping Table

2.2.6 Load the Schema to Database

The steps to load the generated relational table schema to a database is as follows.  Here we assume you load the original schema.  You can also tried to load the customized schema.
  1. Click File->Load Table Schema.
  2. Choose the "default" connection and click OK.  The default database connection is defined in ShreX configuration file.  In the distribution, the default connection is set to "File" database, which is a simulated database for demo purpose.  It does not have query capability.
  3. After seeing the verificaiton message, click OK.

  4. Click Database->Connections.  In the top left pane, the list of connections is shown.  There is only the default filed-based connection available.  The tables in the default connection are shown in the bottom left pane.  Since we have not loaded any document, the table only shows headers in the right pane (See Figure 9).



Figure 9: Connection page after loading the schema

2.3 Load XML Document file->load docuemnt

The steps to load a document conforming to the XML schema is as follows:
  1. Click File->Load Document.
  2. You will be prompted with a file chooser.  Select data/demo/imdb.xml
  3. Choose the "default" connection and click OK.
  4. After seeing the verificaiton message, click OK.
  5. Click Database->Connections.  You should be able to see the loaded tuples in the tables (See Figure 10).



Figure 10: Connection Page After Loading the document

2.4 Query XML Document

ShreX supports a subset of XPath expression query.  In the GUI, if users click on the schema tree in the left pane, the path for the selected node automatically appears in the XPath input field, so users can modify the path to make a query.  For example,if user click the element "/IMDB/SHOW/TITLE" in the schema tree, the path appears in the right pane immediately.

There are two buttons in the right pane.  Clicking on the "translate"    button displays the generate SQL query.  Clicking on the "evaluate" button, users will be prompted to select a DB connection, then the query result will be displayed.  The translated SQL for "/IMDB/SHOW/TITLE" is shown in Figure 11.
Since we are using the simulated file-based database, we could not actually perform the translated SQL query.
Users might try it with a real database.
 


 

3. Relational Database Plugin

ShreX can work with all relational databases through a common interface (See Figure 12).
There are three implementation coming with the distribution: MySQL, DB2 and Oracle.  Users who wants to use other databases need to provide their own implementation for the interface "edu.ogi.cse.xs.database.DBConnection".


Figure 13: Database Interface


We show how to connect to a MySQL database.  The steps to use DB2 or Oracle are similar.
 

Start GUI with MySQL JDBC lib in the Classpath

>java -cp ./lib/xml-apis.jar:./lib/xercesImpl.jar:./lib/shrex.jar:./lib/commons-jxpath-1.1.jar:./lib/mysqljava.jar gui.XSGui &

Test MySQL Connection

It is important to make sure that a mysql connection is available.
There are two ways to test a connection.
  1. Use MySQL shell.  Enter the shell by providing host, user and password informaiton.
  2. Use Java client program to connect to MySQL through JDBC. run

  3. java -cp ./lib/mysqljava.jar:./lib/xs.jar  cmd.TestMysqlConnection URL USERNAME PASSWORD
    e.g.
    java -cp ./lib/mysqljava.jar:./lib/xs.jar  cmd.TestMysqlConnection jdbc:mysql://localhost/xs mysql welcome

Add MySQL Connection

Before loading relational schemas to MySQL database, users need to add a connection.

   click menu "Database"->"Add a Connection"

 you will see a pop-up panel.  Fill value for each field:

Click "add", you will see a new item "mydemo" added to left pane.  Click "mydemo", if there are any tables in the database, they should show up.
 

4. Related Documents

For more information about ShreX, please see:
  1. ShreX: A Comprehensive Solution to the XML-to-Relational Mapping Problem
      (by Sihem Amer-Yahia, Fang Du and Juliana Freire )
      In Proceedings of WIDM, 2004.
  2. ShreX: Managing XML Documents in Relational Databases
      (by Fang Du, Sihem Amer-Yahia and Juliana Freire )
      In Proceedings of VLDB Demonstration, 2004.
  3. Java Doc
  4. Customization Example