infrastructure

The proteomic localization of CmPI/Cm4 depends on three different projects:

Bio Data WareHouse (BioDWH) (also at: SourceForge),
more ...
DAWIS-M.D. (based on BioDWH),
more ...
ANDCell,
more ...

JIB german russian paper workflow 1 7

BioDWH

BioDWH is a novel bioinformatics data warehouse software kit that integrates biological information from multiple public life science data sources into a local database management system. It stands out from other approaches by providing up-to-date integrated knowledge, platform and database independence as well as high usability and customization.

This open source software can be used as a general infrastructure for integrative bioinformatics research and development. The advantages of the approach are realized by using a Java-based system architecture and object-relational mapping (ORM) technology.

BioDWH is implemented in Java and uses a relational database management system in its backend, e.g., Oracle or MySQL. It provides an easy-to-use Java application for parsing and loading the source data into the data warehouse. Several ready-to-use parsers for popular life science information systems are already available, such as: UniProt, KEGG, OMIM, GO, Enzyme,BRENDA, PDB, MINT, SCOP, EMBL-Bank, and PubChem. Furthermore, an XML configurable monitor for data source updates is part of the system. For status requests to the data warehouse, we have developed a graphical user interface that works with every web browser.

A well-engineered, object-relational mapping tool called Hibernate was used as a persistence layer, which performs well and is independent from manufacturers like MySQL or Oracle. Additionally, the Hibernate framework fits perfectly into the Java-based infrastructure of the data warehouse. A Java interface and the object-relational mapping using Hibernate persistence or Java Persistence Architecture (JPA) constitute an easy plug-in architecture for integration of new parser.

The different features of BioDWH are usable by a graphical user interface. It enables the configuration of the monitor and parser for the different public life science data sources as well as the local database management system. XML-based configuration files specify various parameters to access and download the flatfiles from the original data sources, and control the extraction of the downloaded files for integration in the data warehouse afterwards.

T. Töpel, B. Kormeier, A. Klassen, R.Hofestädt. BioDWH: A data warehousekit for life science data integration. Journal of Integrative Bioinformatics- JIB, 5(2):93, 2008.
http://journal.imbio.de/article.php?aid=93

DAWIS-M.D.

DAWIS-M.D. is a publicly available web-based system that integrates data from 11different biomedical databases and can be connected to other analysis and visualizationtools by web services. The advantages of the DAWIS-M.D. application is the usability,performance, high level of platform independence and wide range of life sciences informationand biological knowledge.

Therefore, we present DAWIS-M.D., a platform-independent data warehouse approachfor metabolic data. DAWIS-M.D is an acronym for Data Warehouse Information Systemfor Metabolic Data. The information system contains information from 11 differentdatabases. The following data sources are integrated into the data warehouse: BRENDA, EMBL, HPRD, KEGG, OMIM, SCOP, Transfac, Transpath, ENZYME, GO and UniProt. The data in DAWIS-M.D. is divided into 12 various biologicaldomains, which can be accessed via the web-based graphical user interface (GUI).The application provides search forms for the biological domains Compound, Disease,Drug, Transcription Factor, Enzyme, Gene, Glycan, Gene Ontology, Pathway, Protein,Reaction and Reaction Pair domain.

A wide range of biomedical information in DAWIS-M.D. supports scientists in understandingcomplex biological systems and their properties. In addition, DAWIS-M.D. identifiesrelationships and interactions spanning multiple biological domains and is able todisplay this information. Biological networks can be drawn manually by scientists or automaticallygenerated by using the DAWIS-M.D. data warehouse.

The networks will be displayed in the VANESA software application and canbe edited and extended by the scientist. All information and biological knowledge in VANESA is provided by the DAWIS-M.D.data warehouse that is connected via web service to the VANESA network editor. Forthe scientist it is very easy to find and insert specific information into the network editorVANESA, because DAWIS-M.D. provides an easy-to-use "remote-control" to transfermeta information between the two different tools. Furthermore, the system is independentfrom the underlying relational database management system (RDBMS). This was realizedby using object-relational mapping (ORM) techniques.

K. Hippe, B. Kormeier, T. Töpel, S. Janowski, R. Hofestädt: DAWIS-M.D. - A Data Warehouse System for Metabolic Data. GI Jahrestagung, (2) 2010: 720-725.
http://subs.emis.de/LNI/Proceedings/Proceedings176/736.pdf

ANDCell

ANDCell is developed at the Institute of Cytology and Genetics Novosibirsk. The original version of this - partly commercial - database, contains a set of different databases which can be used to generate molecular/genetic networks. For CmPI, a subset of this database is used. This subset is exceptionally based on PubMed abstracts which correlate proteins and their synonyms with cytological localizations.

O. A. Podkolodnaya, E. E. Yarkova, P. S. Demenkov, O. S. Konovalova, V. A. Ivanisenko, N. A. Kolchanov: “Application of the ANDCell Computer System to Reconstruction and Analysis of Associative Networks Describing Potential Relationships Between Myopia and Glaucoma.” Russian Journal of Genetics: Applied Research 1(1) 2011: 21–28.

CmPI Workflow

The complete workflow is described in the following freely available publication:

B. Sommer, E. S. Tiys, B. Kormeier, K. Hippe, S. J Janowski, T. V Ivanisenko, A. O. Bragin, A. Patrizio, P. S. Demenkov, A. V. Kochetov, V. A. Ivanisenko, N. A. Kolchanov, R. Hofestädt: Visualization and Analysis of a Cardio Vascular Disease-and MUPP1-related Biological Network Combining Text Mining and Data Warehouse Approaches.” Journal of Integrative Bioinformatics 7(1) 2010: 148.
http://journal.imbio.de/article.php?aid=148

Details: Written by bjoern; Category: Cm4 PathwayIntegration; Published: 06 November 2013; Hits: 9087