XML-XDV - the XML Data Visualization framework (XML::XDV)

XML-XDV - the XML Data Visualization framework
==============================================

XML::XDV is a framework to analyze, filter and evaluate scientific data. It
is especially well suited to visualize large quantities of measurement or
simulation results in handy diagrams. A user provides her data in the form of
a chunk of XML files and specifies which particular data sets against which
particular parameters she wants to see, and XML::XDV serves her with a CSV
file, a nicely formatted table or, preferably, a diagram in Encapsulated
PostScript format ready to be included in a scientific paper. XML::XDV sports
a flexible query definition language, several templates to alleviate the pain
of writing queries, a statistical analyzer module and highly tweakable output
engines to generate tables or scientific plots.


INSTALLATION

To install this module, decompress the tarball, enter into the main source
directory of the distribution and type the following:

    perl Makefile.PL
    make
    make install

Issue the last command as root. If installing as an ordinary user, use 

    perl Makefile.PL PREFIX=my_prefix
    make
    make install
    
and do not forget to add `my_prefix' to PERL5LIB. 


DEPENDENCIES

This program has not been tested on platforms other than linux-x86 and
linux-powerpc, but there are no special reasons why it might not work on
Windows as well.

This program requires the following prerequisites to work:

    - first of all, you need a fairly recent Perl. But you do have Perl
      installed, don't you?? If not, download at least version 5.6.1 or any
      later ones from:

		http://perl.org

    - XML::XDV uses `libxml2' to parse XML files, so you must have this
      installed. See 

		http://xmlsoft.org

    - to create fancy diagrams, XML::XDV relies on the GnuPlot scientific
      plotting tool. Install at least version 3.8 from here:

		http://www.gnuplot.info

    - we also need a number of additional Perl modules to do the heavy
      lifting, particularly:

		- XML::LibXML
		- XML::LibXML::Common
		- XML::LibXML::XPathContext
		- IO::All
		- List::MoreUtils
		- Text::Table
		- Text::Reform
		- Chart::Graph
		- Template
		- Statistics::PointEstimation

      These can all be downloaded from CPAN:

	        http://cpan.org

      but is is much rather recommended to use Perl's built-in capabilities
      to install CPAN modules. Become root and type

		perl -MCPAN -e shell

      and, after answering the configuration questions, type

		install XML::LibXML

      This will download all the modules needed by `XML::LibXML', configure,
      compile, test and install the module. Do this for all the modules
      listed above. If in trouble, consult the manual page CPAN(3perl).


USAGE

Suppose you have the following XML file summarizing the results of a
theoretical measurement (see the `new_test/result.xml' file that comes with
the distribution).

    <?xml version="1.0"?>
    <Samples>
     <Sample>
        <A>1</A>
        <B>aaa</B>
        <C>12</C>
        <D>34</D>
      </Sample>
      <Sample>
        <A>3</A>
        <B>bbb</B>
        <C>13</C>
        <D>14</D>
      </Sample>
      <Sample>
        <A>2</A>
        <B>aa</B>
        <C>4333</C>
        <D>14</D>
      </Sample>
    </Samples>

Your XML files might contain any kind of data (strings or numbers, etc)
either as the content of the XML nodes:
    
    <C>12</C>
    
or as the value of a specially called XML attribute `Value':
    
    <C Value="12"/>

Also, you might not need to put all your data into one XML file, data can be
scattered over multiple input files as well.
    
Here is how you obtain a nice table showing how the parameter `D' varies with
parameter `A' using the simple 2D query template `XY':
    
    xdvproc -d 8 -f "name=XY, root=/Samples/Sample, x=A, y=D, driver=table"
    new_test/result.xml

which gives you
    
    +=========+==========+
    | 1.00000 | 34.00000 |
    +---------+----------+
    | 2.00000 | 14.00000 |
    +---------+----------+
    | 3.00000 | 14.00000 |
    +---------+----------+

Here, the command line parameter `-d 8' suppresses all debugging output
(seriously recommended). Observe that the data to be evaluated is specified
in the form of XPath expressions. You can use any valid XPath expression
accepted by `libxml2', which restricts you to XPath 1.0 for the moment, but
this may easily change in the future.

To produce a GnuPlot chart use the `driver' template parameter:

    xdvproc -d 8 -f "name=XY, root=/Samples/Sample, x=A, y=D, driver=gnuplot"
    new_test/result.xml

This will create an Encapsulated Postscript file `D.eps' in your temporary
directory. You can specify the name in an additional template parameter
`filename':

    xdvproc -d 8 -f "name=XY, root=/Samples/Sample, x=A, y=D, driver=gnuplot,
    file=/tmp/my_file.eps" new_test/result.xml

Note that you can also create CSV files to be fed to MS Excel for
instance. Consult the manual page XML::XDV::Template::XY(3pm) regarding the
meaning of the parameters in the template and the directory
`lib/XML/XDV/Template' with respect to the templates actually available in
the distribution.

Unfortunately, templates restrict the flexibility of specifying your queries
to a large extent. Therefore, XML::XDV sports a featureful domain-specific
query language (actually, a special XML file, a so called stylesheet), which
allows you to specify your queries quite liberally. Under the hood, query
templates (like the `XY' template we used above) are just fancy ways to
create query stylesheets. Here is the stylesheet the above template produces
(see `new_test/query.xdv' in the distribution):

    <?xml version="1.0" encoding="UTF-8"?>
    <xdv:stylesheet version="1.0" xmlns:xdv="http://lendulet.tmit.bme.hu/~retvari/XSV">
      <xdv:graph driver="table" style="pretty"/>
      <xdv:aggregate query="/Samples/Sample" priority="1" method="average">
        <xdv:select query="A" type="independent"/>
        <xdv:select query="D" type="dependent"/>
      </xdv:aggregate>
    </xdv:stylesheet>

You can process this query with the command:

    xdvproc -d 8 new_test/query.xdv new_test/result.xml

If you only want to evaluate the samples for which the parameter `B' contains
the string `aa' and omit all the other ones, use the `filter' query
expression:

    <?xml version="1.0" encoding="UTF-8"?>
    <xdv:stylesheet version="1.0" xmlns:xdv="http://lendulet.tmit.bme.hu/~retvari/XSV">
      <xdv:graph driver="table" style="pretty"/>
      <xdv:aggregate query="/Samples/Sample" priority="1" method="average">
        <xdv:filter query="contains(B, 'aa')"/>
        <xdv:select query="A" type="independent"/>
        <xdv:select query="D" type="dependent"/>
      </xdv:aggregate>
    </xdv:stylesheet>

Again, you can use the full power of XPath to formulate your filter expressions.

Brief info on the usage of `xdvproc' can be obtained by issuing:

    xdvproc --help

Documenting the rest of the (plenty of) capabilities of `xdvproc', like e.g.,
statistical analysis, batch queries, etc., is for further study at the
monent...:-( Also, a full rewrite in Haskell is on its way. Contact me at 

    retvari ---- TMIT ---- BME ---- HU

for further help.


COPYRIGHT AND LICENCE

Copyright (C) 2004-2007 Gabor Retvari. All rights reserved.  This program is
free software; you can redistribute and/or modify it under the same terms as
Perl itself.



Download XML::XDV

Back