BigData Technologies: May 2012

Thursday, 24 May 2012

MARKLOGIC: How to use MLSQL Xquery Library for Marklogic

MLSQL: An XQuery Library for Relational Database Access

From Jason Hunter

Introduction

MLSQL is an open source XQuery library (written by Jason Hunter of MarkLogic and Ryan Grimm of O'Reilly Media) that allows easy access to relational database systems from within the MarkLogic environment. MLSQL lets you execute arbitrary SQL commands against any relational database (MySQL, Oracle, DB2, etc) and it captures the results as XML for processing within the MarkLogic environment. Using the library enables XQuery applications to leverage a relational database without having to resort to Java or C# glue code to manage the interaction.

This article introduces MLSQL, its architecture and API, and demonstrates a few real life applications built on the library.

Architecture

MLSQL consists of a couple hundred lines of XQuery and a few hundred lines of Java servlet code. The XQuery uses xdmp:http-post() to send the SQL query to the servlet, and the servlet uses JDBC to pass the query to the relational database. Any database with JDBC support will do. The servlet then returns the result set or update details as XML over the wire to the XQuery environment.

The servlet holds the database credentials, connection details, and manages a connection pool for efficiency. The XQuery can pass "bind parameters" to the database as part of the query for efficiency and safety. The diagram below shows the architecture.

Installation and configuration details follow at the end of this article.

Credentials and Security

For simplicity in MLSQL 1.0 we've opted to encapsulate the database details inside the servlet. An alternate design would allow the XQuery to pass the details, such as in the query string. If you're interested in this, talk to us. Note that for database security, you should restrict access to the MLSQL web service to only the trusted MarkLogic host. Tools like tcpwrappers makes this easy.

MySpace Begins from here

First of all I want to thank Jason Hunter, developer of this technology. One of the below reasons you may need MLSQL in your Application,

· If you have one application in Marklogic and another one in Relational Database for same purpose. So you need synchronization between these two databases.

Installation

For installation you need following applications in your machine,

JDK (1.6 or any advanced version)
Tomcat Server(6.0 or any advanced version)
Marklogic Server 5.0
MySql Database( or any relational database)
Winant (ant builder software)

Prior to this installation you should have,

In Marklogic

a. Your application Database(TestDatabase)

b. Application Server(TestHttpServer)

In MySql

a. Your application Database(TestDatabase)

b. Table which you want to update from ML(TestTable)

· Download MLSQL source from

http://community.marklogic.com/download/binaries/code/mlsql/releases/mlsql-src-1.1.zip

Step 1

Copy the C:\Folderlocation\mlsql-src-1.1\client\ sql.xqy library module to your XQuery code(Root directory which mentioned in application server). This usually requires copying the file into your HTTP server root directory, but depending on configuration might require loading into the server's modules database.

Step 2

Installing servlet code,

Run the “ant” command in the command prompt by using following steps

Go to extracted folder location (i.e. C:\Folderlocation\mlsql-src-1.1\ )

Now type “ant” and press enter

Now you can get see one additional folder named “C:\Folderlocation\mlsql-src-1.1\ buildtmp”

Copy “C:\Folderlocation\mlsql-src-1.1\ buildtmp\mlsql” and paste it into “C:\Tomcat60\webapps”.

Step 3

Provide database credentials and connection details.

The web application includes a web.xml file with init parameters the servlet will read in order to connect to the database. Adjust these as appropriate for your system.

You'll need to set the JDBC driver, connection URL, username, and password. The web.xml file includes instructions.

Step 4

Make sure to add your JDBC driver JAR files to the WEB-INF/lib directory. Driver libraries aren't usually licensed for redistribution, so you'll have to obtain them on your own.

Step 5

test the servlet's URL. Try making a basic web request to the MLSQL servlet. You'll know it's listening and you've found it when you get a <sql:exception> talking about "Premature end of file". That means you didn't provide a query, but at least you found the servlet.

Step 6

Test the XQuery connection. Write a basic query like this:

import module namespace sql = "http://xqdev.com/sql" at "sql.xqy"

sql:execute("select 1 as test", "http://localhost:8080/mlsql", ())

You might need to adjust the "at" depending on where you placed the library module. If it works you'll see this result:

<sql:result xml:lang="en" xmlns:sql="http://xqdev.com/sql">

  <sql:meta/>

  <sql:tuple>

    <test>1</test>

  </sql:tuple>

</sql:result>

What Next?

Now you can View, Create, Update and Delete records from your Relational database by using “MLSQL Module”. I will share the sample code for this operations one by one before that I want to mention some details on sample code

· Localhost – indicates your local machine details

· Import Module – I have the sql.xqy file in the same directory(Note: You might need to adjust the "at" depending on where you placed the library module)

· Test – Relational DB Table Name

· In sql:execute(SQLQuery,URL,Options) function – Maintain following parameters for this function

o SQLQuery – SQL query for your requirement

o URL – Pointing the Servlet which is placed in the Tomcat server

o Options – The third argument to sql:execute() is an XML options node that can include various things including a list of bind parameter values.

View records query

Using this XQuery you can view all of your records from Relational DB. Note: If you go for large table you have to face performance issues.

Sample.xqy

import module namespace sql = "http://xqdev.com/sql" at "sql.xqy"

sql:execute("select * from test", "http://localhost:8080/mlsql", ())

Results

<sql:result xmlns:sql="http://xqdev.com/sql">

  <sql:meta/>

  <sql:tuple>

    <id>1</id>

    <fname>Jason</fname>

    <lname>Hunter</lname>

    <age>32</age>

  </sql:tuple>

  <sql:tuple>

    <id>1235</id>

    <fname>Ryan</fname>

    <lname>Grimm</lname>

    <age null="true"></age>

  </sql:tuple>

</sql:result>

Once you got this result in Marklogic you can reconstruct the Result XML as per your Requirement and you can change the SQL Query parameter also.

Bind Variables with your query

By using the third parameter of sql:execute() you can bind variables with you SQL query.

Sample.xqy

import module namespace sql = "http://xqdev.com/sql" at "sql.xqy"

sql:execute("select * from test where id = ?", "http://localhost:8080/mlsql", sql:params(1235) )

Results

<sql:result xmlns:sql="http://xqdev.com/sql">

  <sql:meta/>

  <sql:tuple>

    <id>1235</id>

    <fname>Ryan</fname>

    <lname>Grimm</lname>

    <age null="true"></age>

  </sql:tuple>

</sql:result>

Note

Above SQL query returns record which has the id = "1235".

Performance

In simple testing on a laptop, the MLSQL servlet was able to push about 1,000 simple tuples per second toward the client. We anticipate that's fast enough for most SQL queries you will write from XQuery. Just don't try to pull a whole table into the XQuery environment and process it with XPath. That's why SQL has WHERE clauses.

Wednesday, 23 May 2012

xsl:for-each-group @group-adjacent

Developed By:- Shailendra Srivastava

INPUT:-

<Contraindications>They need to know if you have any of these
conditions:<br/>•diabetes<br/>•glaucoma<br/>•head trauma<br/>•heart
disease<br/>•high blood pressure<br/>•if you have taken an MAOI like Carbex,
Eldepryl, Marplan, Nardil, or Parnate in last 14 days<br/>•lung or breathing disease,
like asthma<br/>•stomach or intestinal problems<br/>•trouble passing
urine<br/>•an unusual or allergic reaction to codeine, hydrocodone, pseudoephedrine,
other medicines, foods, dyes, or preservatives•breast-feeding</Contraindications>

Expected Output:-

<?xml version="1.0" encoding="UTF-8"?>
<div id="div.0030" class="section_contraindications">
<h1 id="h1.0015" class="section-title">What should I tell my health care provider before I take
this medicine?</h1>
<p id="p.0035" class="default">They need to know if you have any of these conditions:</p>
<ul id="ul.0005" class="default">
<li id="li.0005">diabetes</li>
<li id="li.0010">glaucoma</li>
<li id="li.0015">head trauma</li>
<li id="li.0020">heart disease</li>
<li id="li.0025">high blood pressure</li>
<li id="li.0030">if you have taken an MAOI like Carbex, Eldepryl, Marplan, Nardil, or Parnate in
last 14 days</li>
<li id="li.0035">lung or breathing disease, like asthma</li>
<li id="li.0040">stomach or intestinal problems</li>
<li id="li.0045">trouble passing urine</li>
<li id="li.0050">an unusual or allergic reaction to codeine, hydrocodone, pseudoephedrine, other
medicines, foods, dyes, or preservativesbreast-feeding</li>
</ul>
</div>

XSLT Code:-

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:pe="vishnu.com">

<xsl:output method="xml" indent="no" use-character-maps="entity"/>

<xsl:character-map name="entity">

<xsl:output-character character="•" string=""/>

</xsl:character-map>

<xsl:template match="pe:Contraindications">

<xsl:for-each-group select="child::node()"

group-adjacent="boolean(self::br or self::text()[not(local-name(preceding-sibling::node()[1]) = 'BR')])">

<xsl:choose>

<xsl:when test="current-grouping-key()">

<xsl:for-each select="current-group()[self::text()]">

<li>

<xsl:attribute name="id">

<xsl:variable name="li-att1">

<xsl:number format="1" count="br"

from="pe:Contraindications" level="any"/>

</xsl:variable>

<xsl:variable name="li-att2">

<xsl:value-of select="$li-att1 * 5"/>

</xsl:variable>

<xsl:variable name="li-att3">

<xsl:choose>

<xsl:when test="$li-att2 = 5">

<xsl:value-of select="concat('li.000', $li-att2)"

</xsl:when>

<xsl:when test="$li-att2 < 100">

<xsl:value-of select="concat('li.00', $li-att2)"/>

</xsl:when>

<xsl:when test="$li-att2 < 1000">

<xsl:value-of select="concat('li.0', $li-att2)"/>

</xsl:when>

<xsl:otherwise>

<xsl:value-of select="concat('li.', $li-att2)"/>

</xsl:otherwise>

</xsl:choose>

</xsl:variable>

<xsl:value-of select="$li-att3"/>

</xsl:attribute>

<xsl:copy/>

</li>

</xsl:for-each>

</ul>

</xsl:when>

<xsl:otherwise>

<xsl:apply-templates select="current-group()"/>

</xsl:otherwise>

</xsl:choose>

</xsl:for-each-group>

</div>

</xsl:template>

<xsl:template match="U">

<xsl:apply-templates/>

</h1>

</xsl:template>

<xsl:template match="node()[name(preceding-sibling::node()[1]) = 'BR']">

<xsl:if test="self::text()">

<xsl:value-of select="."/>

</p>

</xsl:if>

</xsl:template>

</xsl:stylesheet>

Thursday, 17 May 2012

MarkLogic University announced their first e-learning offering!

Introduction to MarkLogic Server is a free 10-minute overview of the features of the MarkLogic Server, a high-level overview of the product architecture, and several demos of how the Server is being used in real applications.

info.marklogic.com info.marklogic.com

Monday, 14 May 2012

The White House is Spending Big Money on Big Data

(Photo credit: Wikipedia)

It’s typical in an election year to see an administration spend money on new initiatives. A new cost cutting initiative unveiled back in March has generally gone un-noticed by the main stream technology media. Called the “Big Data Research and Development Initiative” the program is focused on improving the U.S. Federal governments ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.

The program includes several federal agencies including NSF, HHS/NIH,DOE, DOD, DARPA and USGS who pledge more than $200 million in new commitments that they promise will greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.

In a statement Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy said “In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.”

One of the more interesting aspects of this project is the use of public cloud infrastructure, as in cloud computing services provided by the private industry. Confusing I know. A great example of this plan in action is The National Institutes of Health who announced that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs – the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publicly available data set for free and researchers only will pay for the computing services that they use.

According to a recent article on genengnews.com this is part of a larger strategy to reduce the number of federal data centers from the current 3,133 data centers sliced by “at least 1,200” by 2015, representing a roughly 40% cutback at a $5 billion savings. This also extends the work started with the administration’s Cloud First Policy outlined last year as part of The White House’s Federal Cloud Computing Strategy.

In a world that is more dependant on data then ever before the stakes are high and so is the money. It will be interesting to follow this initiative over the coming months.

BigData Technologies

Thursday, 24 May 2012

MARKLOGIC: How to use MLSQL Xquery Library for Marklogic

Introduction

Architecture

Credentials and Security

MySpace Begins from here

Installation

What Next?

View records query

Bind Variables with your query

Performance

Wednesday, 23 May 2012

xsl:for-each-group @group-adjacent

Thursday, 17 May 2012

MarkLogic University announced their first e-learning offering!

Introduction to MarkLogic Server is a free 10-minute overview of the features of the MarkLogic Server, a high-level overview of the product architecture, and several demos of how the Server is being used in real applications.

info.marklogic.com info.marklogic.com

Monday, 14 May 2012

The White House is Spending Big Money on Big Data

About Author

Blog Archive

Follow to Learn

Thursday, 24 May 2012

MARKLOGIC: How to use MLSQL Xquery Library for Marklogic

Introduction

Architecture

Credentials and Security

MySpace Begins from here

Installation

What Next?

View records query

Bind Variables with your query

Performance

Wednesday, 23 May 2012

xsl:for-each-group @group-adjacent

Thursday, 17 May 2012

MarkLogic University announced their first e-learning offering!

Introduction to MarkLogic Server is a free 10-minute overview of the features of the MarkLogic Server, a high-level overview of the product architecture, and several demos of how the Server is being used in real applications.

info.marklogic.com info.marklogic.com

Monday, 14 May 2012

The White House is Spending Big Money on Big Data

Subscribe To

Follow to Learn