Thursday, 24 May 2012

MARKLOGIC: How to use MLSQL Xquery Library for Marklogic


MLSQL: An XQuery Library for Relational Database Access
From Jason Hunter

Introduction

MLSQL is an open source XQuery library (written by Jason Hunter of MarkLogic and Ryan Grimm of O'Reilly Media) that allows easy access to relational database systems from within the MarkLogic environment. MLSQL lets you execute arbitrary SQL commands against any relational database (MySQL, Oracle, DB2, etc) and it captures the results as XML for processing within the MarkLogic environment. Using the library enables XQuery applications to leverage a relational database without having to resort to Java or C# glue code to manage the interaction.
This article introduces MLSQL, its architecture and API, and demonstrates a few real life applications built on the library.

Architecture

MLSQL consists of a couple hundred lines of XQuery and a few hundred lines of Java servlet code. The XQuery uses xdmp:http-post() to send the SQL query to the servlet, and the servlet uses JDBC to pass the query to the relational database. Any database with JDBC support will do. The servlet then returns the result set or update details as XML over the wire to the XQuery environment.
The servlet holds the database credentials, connection details, and manages a connection pool for efficiency. The XQuery can pass "bind parameters" to the database as part of the query for efficiency and safety. The diagram below shows the architecture. 

Installation and configuration details follow at the end of this article.

Credentials and Security

For simplicity in MLSQL 1.0 we've opted to encapsulate the database details inside the servlet. An alternate design would allow the XQuery to pass the details, such as in the query string. If you're interested in this, talk to us. Note that for database security, you should restrict access to the MLSQL web service to only the trusted MarkLogic host. Tools like tcpwrappers makes this easy.

MySpace Begins from here

First of all I want to thank Jason Hunter, developer of this technology. One of the below reasons you may need MLSQL in your Application,
·      If you have one application in Marklogic and another one in Relational Database for same purpose. So you need synchronization between these two databases.

Installation

For installation you need following applications in your machine,
  •         JDK (1.6 or any advanced version)
  •          Tomcat Server(6.0 or any advanced version)
  •          Marklogic Server 5.0
  •          MySql Database( or any relational database)
  •          Winant (ant builder software)
Prior to this installation you should have,
  •        In Marklogic
a.       Your application Database(TestDatabase)
b.      Application Server(TestHttpServer)
  •         In MySql
a.       Your application Database(TestDatabase)
b.      Table which you want to update from ML(TestTable)
·         Download MLSQL source from
Step 1
                Copy the C:\Folderlocation\mlsql-src-1.1\client\ sql.xqy library module to your XQuery code(Root directory which mentioned in application server). This usually requires copying the file into your HTTP server root directory, but depending on configuration might require loading into the server's modules database.
Step 2
Installing servlet code,
  • Run the “ant” command in the command prompt by using following steps
  • Go to extracted folder location (i.e. C:\Folderlocation\mlsql-src-1.1\ )
  • Now type “ant” and press enter
  • Now you can get see one additional folder named “C:\Folderlocation\mlsql-src-1.1\ buildtmp
  •  Copy “C:\Folderlocation\mlsql-src-1.1\ buildtmp\mlsql” and paste it into “C:\Tomcat60\webapps”.
Step 3
  • Provide database credentials and connection details.
  • The web application includes a web.xml file with init parameters the servlet will read in order to connect to the database. Adjust these as appropriate for your system.
  • You'll need to set the JDBC driver, connection URL, username, and password. The web.xml file includes instructions.
Step 4
    Make sure to add your JDBC driver JAR files to the WEB-INF/lib directory. Driver libraries aren't usually licensed for redistribution, so you'll have to obtain them on your own.
Step 5 
     test the servlet's URL. Try making a basic web request to the MLSQL servlet. You'll know it's listening and you've found it when you get a <sql:exception> talking about "Premature end of file". That means you didn't provide a query, but at least you found the servlet.
Step 6
  •  Test the XQuery connection. Write a basic query like this:
import module namespace sql  = "http://xqdev.com/sql" at "sql.xqy"
sql:execute("select 1 as test", "http://localhost:8080/mlsql", ())
  •  You might need to adjust the "at" depending on where you placed the library module. If it works you'll see this result:
<sql:result xml:lang="en" xmlns:sql="http://xqdev.com/sql">
  <sql:meta/>
  <sql:tuple>
    <test>1</test>
  </sql:tuple>
</sql:result>

 What Next?

 Now you can View, Create, Update and Delete records from your Relational database by using “MLSQL Module”. I will share the sample code for this operations one by one before that I want to mention some details on sample code
·         Localhost – indicates your local machine details
·     Import Module – I have the sql.xqy file in the same directory(Note: You might need to  adjust the "at" depending on where you placed the library module)
·         Test – Relational DB Table Name
·      In sql:execute(SQLQuery,URL,Options) function – Maintain following parameters for    this function
o   SQLQuery – SQL query for your requirement
o   URL – Pointing the Servlet which is placed in the Tomcat server
o   Options – The third argument to sql:execute() is an XML options node that can include various things including a list of bind parameter values.

View records query

                 Using this XQuery you can view all of your records from Relational DB. Note: If you go for large table you have to face performance issues.
Sample.xqy
import module namespace sql  = "http://xqdev.com/sql" at "sql.xqy"
sql:execute("select  * from test", "http://localhost:8080/mlsql", ())
Results
<sql:result xmlns:sql="http://xqdev.com/sql">
  <sql:meta/>
  <sql:tuple>
    <id>1</id>
    <fname>Jason</fname>
    <lname>Hunter</lname>
    <age>32</age>
  </sql:tuple>
  <sql:tuple>
    <id>1235</id>
    <fname>Ryan</fname>
    <lname>Grimm</lname>
    <age null="true"></age>
  </sql:tuple>
</sql:result>
 
  
Once you got this result in Marklogic you can reconstruct the Result XML as per your Requirement and you can change the SQL Query parameter also.

Bind Variables with your query

                 By using the third parameter of sql:execute() you can bind variables with you SQL query.
Sample.xqy
import module namespace sql  = "http://xqdev.com/sql" at "sql.xqy"
sql:execute("select  * from test where id = ?", "http://localhost:8080/mlsql", sql:params(1235) )
Results
<sql:result xmlns:sql="http://xqdev.com/sql">
  <sql:meta/>
  <sql:tuple>
    <id>1235</id>
    <fname>Ryan</fname>
    <lname>Grimm</lname>
    <age null="true"></age>
  </sql:tuple>
</sql:result>
Note

         Above SQL query returns record which has the id = "1235".

Performance

In simple testing on a laptop, the MLSQL servlet was able to push about 1,000 simple tuples per second toward the client. We anticipate that's fast enough for most SQL queries you will write from XQuery. Just don't try to pull a whole table into the XQuery environment and process it with XPath. That's why SQL has WHERE clauses.

Wednesday, 23 May 2012

xsl:for-each-group @group-adjacent

Developed By:- Shailendra Srivastava


INPUT:-

<Contraindications>They need to know if you have any of these
  conditions:<br/>&#x2022;diabetes<br/>&#x2022;glaucoma<br/>&#x2022;head trauma<br/>&#x2022;heart
  disease<br/>&#x2022;high blood pressure<br/>&#x2022;if you have taken an MAOI like Carbex,
  Eldepryl, Marplan, Nardil, or Parnate in last 14 days<br/>&#x2022;lung or breathing disease,
  like asthma<br/>&#x2022;stomach or intestinal problems<br/>&#x2022;trouble passing
  urine<br/>&#x2022;an unusual or allergic reaction to codeine, hydrocodone, pseudoephedrine,
  other medicines, foods, dyes, or preservatives&#x2022;breast-feeding</Contraindications>

Expected Output:-

<?xml version="1.0" encoding="UTF-8"?>
<div id="div.0030" class="section_contraindications">
  <h1 id="h1.0015" class="section-title">What should I tell my health care provider before I take
    this medicine?</h1>
  <p id="p.0035" class="default">They need to know if you have any of these conditions:</p>
  <ul id="ul.0005" class="default">
    <li id="li.0005">diabetes</li>
    <li id="li.0010">glaucoma</li>
    <li id="li.0015">head trauma</li>
    <li id="li.0020">heart disease</li>
    <li id="li.0025">high blood pressure</li>
    <li id="li.0030">if you have taken an MAOI like Carbex, Eldepryl, Marplan, Nardil, or Parnate in
      last 14 days</li>
    <li id="li.0035">lung or breathing disease, like asthma</li>
    <li id="li.0040">stomach or intestinal problems</li>
    <li id="li.0045">trouble passing urine</li>
    <li id="li.0050">an unusual or allergic reaction to codeine, hydrocodone, pseudoephedrine, other
      medicines, foods, dyes, or preservativesbreast-feeding</li>
  </ul>
</div>


XSLT Code:-

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:pe="vishnu.com">
  
  <xsl:output method="xml" indent="no" use-character-maps="entity"/>
  
  <xsl:character-map name="entity"> 
    <xsl:output-character character="&#x2022;" string=""/> 
  </xsl:character-map>
  
  <xsl:template match="pe:Contraindications"> 
    <div id="div.0030" class="section_contraindications"> 
      <xsl:for-each-group select="child::node()" 
        group-adjacent="boolean(self::br or self::text()[not(local-name(preceding-sibling::node()[1]) = 'BR')])">
        <xsl:choose> 
          <xsl:when test="current-grouping-key()"> 
            <ul id="ul.0005" class="default"> 
              <xsl:for-each select="current-group()[self::text()]"> 
                <li> 
                  <xsl:attribute name="id"> 
                    <xsl:variable name="li-att1"> 
                      <xsl:number format="1" count="br" 
                        from="pe:Contraindications" level="any"/> 
                    </xsl:variable> 
                    <xsl:variable name="li-att2"> 
                      <xsl:value-of select="$li-att1 * 5"/> 
                    </xsl:variable> 
                    <xsl:variable name="li-att3"> 
                      <xsl:choose> 
                        <xsl:when test="$li-att2 = 5"> 
                          <xsl:value-of select="concat('li.000', $li-att2)" 
                          /> 
                        </xsl:when> 
                        <xsl:when test="$li-att2 &lt; 100"> 
                          <xsl:value-of select="concat('li.00', $li-att2)"/> 
                        </xsl:when> 
                        <xsl:when test="$li-att2 &lt; 1000"> 
                          <xsl:value-of select="concat('li.0', $li-att2)"/> 
                        </xsl:when> 
                        <xsl:otherwise> 
                          <xsl:value-of select="concat('li.', $li-att2)"/> 
                        </xsl:otherwise> 
                      </xsl:choose> 
                    </xsl:variable> 
                    <xsl:value-of select="$li-att3"/> 
                  </xsl:attribute> 
                  <xsl:copy/> 
                </li> 
              </xsl:for-each> 
            </ul> 
          </xsl:when> 
          <xsl:otherwise> 
            <xsl:apply-templates select="current-group()"/> 
          </xsl:otherwise> 
        </xsl:choose> 
      </xsl:for-each-group> 
    </div> 
  </xsl:template>
  
  <xsl:template match="U"> 
    <h1 id="h1.0015" class="section-title"> 
      <xsl:apply-templates/> 
    </h1> 
  </xsl:template>
  
  <xsl:template match="node()[name(preceding-sibling::node()[1]) = 'BR']"> 
    <xsl:if test="self::text()"> 
      <p id="p.0035" class="default"> 
        <xsl:value-of select="."/> 
      </p> 
    </xsl:if> 
  </xsl:template>
  
</xsl:stylesheet>


Thursday, 17 May 2012

MarkLogic University announced their first e-learning offering!

Introduction to MarkLogic Server is a free 10-minute overview of the features of the MarkLogic Server, a high-level overview of the product architecture, and several demos of how the Server is being used in real applications.

Monday, 14 May 2012

The White House is Spending Big Money on Big Data


Logo of the United States White House, especia...
 (Photo credit: Wikipedia)
It’s typical in an election year to see an administration spend money on new initiatives. A new cost cutting initiative unveiled back in March has generally gone un-noticed by the main stream technology media. Called the “Big Data Research and Development Initiative” the program is focused on improving the U.S. Federal governments ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most  pressing challenges.
The program includes several federal agencies including NSF, HHS/NIH,DOE, DOD, DARPA and USGS who pledge more than $200 million in new commitments that they promise will greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge  volumes of digital data.
In a statement  Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy said “In the same way that past Federal investments in information-technology R&D led to  dramatic advances in supercomputing and the creation of the Internet, the initiative we  are launching today promises to transform our ability to use Big Data for scientific  discovery, environmental and biomedical research, education, and national security.”
One of the more interesting aspects of this project is the use of public cloud infrastructure, as in cloud computing services provided by the private industry. Confusing I know. A great example of this plan in action is The National Institutes of Health who announced that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is  now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the  equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs  – the current 1000 Genomes Project data set is a prime example of big data, where  data sets become so massive that few researchers have the computing power to make  best use of them. AWS is storing the 1000 Genomes Project as a publicly available  data set for free and researchers only will pay for the computing services that they use.
According to a recent article on genengnews.com this is part of a larger strategy to reduce the number of federal data centers from the current  3,133 data centers sliced by “at least 1,200” by 2015, representing a roughly 40% cutback at a $5 billion savings. This also extends the work started with the administration’s Cloud First Policy outlined last year as part of The White House’s Federal Cloud Computing Strategy.
In a world that is more dependant on data then ever before the stakes are high and so is the money. It will be interesting to follow this initiative over the coming months.