An Architecture for Archiving and Post-Processing Large, Distributed, Scientific Data Using SQL/MED and XML

Papiani, Mark; Wason, Jasmin L.; Nicole, Denis A.

doi:10.1007/3-540-46439-5_31

Mark Papiani⁷,
Jasmin L. Wason⁷ &
Denis A. Nicole⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1777))

Included in the following conference series:

International Conference on Extending Database Technology

671 Accesses
1 Citations

Abstract

We have developed a Web-based architecture and user interface for archiving and manipulating results of numerical simulations being generated by the UK Turbulence Consortium on the United Kingdom’s new national scientific supercomputing resource. These simulations produce large datasets, requiring Web-based mechanisms for storage, searching and retrieval of simulation results in the hundreds of gigabytes range. We demonstrate that the new DATALINK type, defined in the draft SQL Management of External Data Standard, which facilitates database management of distributed external data, can help to overcome problems associated with limited bandwidth. We show that a database can meet the apparently divergent requirements of storing both the relatively small simulation result metadata, and the large result files, in a unified way, whilst maintaining database security, recovery and integrity. By managing data in this distributed way, the system allows post-processing of archived simulation results to be performed directly without the cost of having to rematerialise to files. This distribution also reduces access bottlenecks and processor loading. We also show that separating the user interface specification from the user interface processing can provide a number of advantages. We provide a tool to generate automatically a default user interface specification, in the form of an XML document, for a given database. The XML document can be customised to change the appearance of the interface. Our architecture can archive not only data in a distributed fashion, but also applications. These applications are loosely coupled to the datasets (in a many-to-many relationship) via XML defined interfaces. They provide reusable server-side post-processing operations such as data reduction and visualisation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Scientific Workflows and XMDD

The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

Article 19 March 2018

MetaStore: an adaptive metadata management framework for heterogeneous metadata models

Article 04 October 2017

References

Sandham, N.D. and Howard, R.J.A. Direct Simulation of Turbulence Using Massively Parallel Computers. In: A. Ecer et al., eds. Parallel Computational Fluid Dynamics’ 97, Elsevier, 1997.
Google Scholar
Williams, R., Bunn, J., Reagan, M., and Pool, C., T. Workshop on Interfaces to Scientific Data Achives, California, USA, 25–27 March, 1998, Technical Report CACR-160, CALTECH, 42pp.
Google Scholar
Eisenberg, A. and Melton, J., SQL:1999, formerly known as SQL3. SIGMOD Record, 28(1), March, 1999.
Google Scholar
Mattos, N., Melton, J. and Richey, J. Database Language SQL-Part 9:Management of External Data (SQL/MED), ISO/IEC Committee Draft, CD 9075-9, December, 1988. ftp://jerry.ece.umassd.edu/isowg3/dbl/YGJdocs/ygj023.pdf
Jim Bray, J., Paoli, J. and Sperberg-McQueen, C., M. eds. Extensible Markup Language (XML) 1.0, W3C Recommendation, 10 February, 1998. http://www.w3.org/TR/REC-xml
Zloof M.M. Query By Example. American Federation of Information Processing (AFIPS) Conf. Proc., Vol. 44, National Computer Conference, 1975, 431–8.
Google Scholar
Manber, U. Future Directions and Research Problems in the World Wide Web. Proc ACM SIGMOD Conf., Montreal, Canada, June 3–5, 1996, 213–15.
Google Scholar
Warren, M., S., et al. Avalon: An Alpha/Linux Cluster Achieves 10 Gflops for $150k. Gordon Bell Price/Performance Prize, Supercomputing 1998. http://cnls.lanl.gov/avalon/
Davidson, J., D., and Ahmed, S. Java Servlet API Specification, Version 2.1a, November, 1988. http://java.sun.com/products/Servlet/index.html
White, S., Hapner, M. JDBC 2.0 API, Sun Microsystems Inc., Version 1.0, May, 1998.
Google Scholar
Haw D., Goble, C., A., and Rector, A., L. GUIDANCE: Making it easy for the user to be an expert. Proc. 2nd Int. workshop on User Interfaces to Databases, Ambleside, UK, 13–15th July, 1994, 19–44.
Google Scholar
McGrath, R., E. A Scientific Data Server: The Conceptual Design. White Paper, NCSA, University of Illinois, Urbana-Champaign, January, 1997.
Google Scholar
Catarci, T., Costabile, M., F., Levialdi, S., and Batini, C. Visual Query Systems for Databases: A Survey. Journal of Visual Languages and Computing, 8, 1997, 215–60.
Article Google Scholar
Carey, M., J., Haas, L., M., Maganty, V., and Williams, J., H. PESTO: An Integrated Query/Browser for Object Databases. Proc. VLDB Int. Conf., India, 3–6 September, 1996, 203–14.
Google Scholar
Yaeger, N. A Web Based Scientific Data Access Service: The Central Component of a Lightweight Data Archive, National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign. http://hopi.ncsa.uiuc.edu/sdb/sdb.html

Download references

Author information

Authors and Affiliations

Department of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK
Mark Papiani, Jasmin L. Wason & Denis A. Nicole

Authors

Mark Papiani
View author publications
You can also search for this author in PubMed Google Scholar
Jasmin L. Wason
View author publications
You can also search for this author in PubMed Google Scholar
Denis A. Nicole
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of California, Los Angeles, CA, 90095, USA
Carlo Zaniolo
Computer Science Department, University of Karlsruhe, P.O. Box 6980, 76128, Karlsruhe, Germany
Peter C. Lockemann
University of Konstanz, P.O. Box D188, 78457, Konstanz, Germany
Marc H. Scholl & Torsten Grust &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papiani, M., Wason, J.L., Nicole, D.A. (2000). An Architecture for Archiving and Post-Processing Large, Distributed, Scientific Data Using SQL/MED and XML. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds) Advances in Database Technology — EDBT 2000. EDBT 2000. Lecture Notes in Computer Science, vol 1777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46439-5_31

Download citation

DOI: https://doi.org/10.1007/3-540-46439-5_31
Published: 24 March 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67227-2
Online ISBN: 978-3-540-46439-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

An Architecture for Archiving and Post-Processing Large, Distributed, Scientific Data Using SQL/MED and XML

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Scientific Workflows and XMDD

The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

MetaStore: an adaptive metadata management framework for heterogeneous metadata models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Architecture for Archiving and Post-Processing Large, Distributed, Scientific Data Using SQL/MED and XML

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Scientific Workflows and XMDD

The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

MetaStore: an adaptive metadata management framework for heterogeneous metadata models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation