Main Page
From CDK-Taverna 2.0 Wiki
The open-source CDK Taverna 2.0 project based on the "Pipelining Technology" idea tries to address the advantages of software libraries and sophisticated information systems.
The project combines several other open-source projects:
- The Chemistry Development Kit (CDK) as a basic chemo-/bioinformatics library[2][3].
- The Waikato Environment for Knowledge Analysis (WEKA) machine learning library[4]
Contents |
About The CDK-Taverna 2.0 Project
Pipelining or workflow tools allow for the LegoTM-like, graphical assembly of I/O modules and algorithms into a complex workflow which can be easily deployed, modified and tested without the hassle of implementing it into a monolithic application.
The CDK-Taverna project aims at building an open-source pipelining solution through combination of different open-source projects such as Taverna[1] , the Chemistry Development Kit (CDK)[2][3] or Bioclipse[5]. A first integrated version of CDK-Taverna was recently released to the public[6].
Current developments in CDK-Taverna refactor all workers (activities) as well as the complete setup on the basis of Taverna 2.2 and CDK 1.3.7 which themselves introduce major improvements to the whole platform. In addition the CDK is enhanced with specific functions and options for reaction enumeration based on a reaction template and corresponding reactant libraries. Reaction enumeration supports combinatorial chemistry approaches in the drug discovery process of the pharmaceutical industry. The CDK enhancements are applied and illustrated by corresponding CDK-Taverna-workflows.
Who Did It
CDK-Taverna was originally conceived by Christoph Steinbeck(1) and Achim Zielesny.(2) The developing was done by Thomas Kuhn and co-developed by Egon Willighagen(3). The ported and extended CDK-Taverna 2.0 version for Taverna 2.2 is currently under development by Andreas Truszkowski(1,2) and is well supported by Christoph Steinbeck(1), Achim Zielesny(2) and Egon Willighagen(3).
- Chemoinformatics and Metabolism, European Bioinformatics Institute (EBI), Cambridge, UK
- University of Applied Sciences of Gelsenkirchen, Institute for Bioinformatics and Chemoinformatics, Recklinghausen, Germany
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
CDK-Taverna 2.0 Features
There are more than 180 activities in preparation:
- Input: MDL Molfiles, MDL SDFiles, RXN files, SMILES, CML files, JChemPaint structures
- Output: MDL Molfiles, MDL SDFiles, RXN files, SMILES, CML files, JPEG, PNG, PDF , JChemPaint structures
- Iterative file reading: MDL SDFiles, RXN files (supports large file sizes)
- Filter: Substructures, salts, atom types
- Reaction enumeration with advanced options
- Calculation of more than 90 QSAR descriptors
- ART-2a Classification
- Weka clustering algorithms
- Weka regression algorithms
For an overview and descriptions of the available activities click here.
CDK-Taverna 2.0 works on Microsoft Windows and Mac OS X 32/64bit operating systems and will be released soon.
Workflows
Workflows tackle the complexity of scientific experiments and applications. They give the scientist the power to process their experimental data without any programming knowledge. Therefore various types of tasks can be solved by simple modules which are combined to build powerfull processing networks. Each module delivers only a little fragment of functionality but combined they are able to perform more and more complex tasks. The possibilities for such workflows are almost infinite.
A workflow consists of a sequence of connected steps. This steps are composed of three layers:
- The input layer
- The processing layer
- The output layer
CDK-Taverna 2.0 workflow examples can be found here.
Getting Started with CDK-Taverna 2.0
- How to install the CDK-Taverna 2.0 plugin in Taverna
- How to install the CDK-Taverna 2.0 sources and the developer workbench
- How to run the CDK-Taverna 2.0 developer workbench
- How to write your own CDK-Taverna 2.0 activity
- How to create your first CDK-Taverna 2.0 workflow
- Full descriptions of available activities
Example Workflows
- Reaction Enumeration Workflows
- QSAR Workflows
- ART-2a Clustering Workflows
- WEKA Clustering Workflows
- Miscellaneous Workflows
Getting Involved
People interested in the CDK-Taverna 2.0 plugin development should have a closer look to the Getting Started section of this page.
References
- ↑ 1.0 1.1 Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004, 20(17):3045-3054.
- ↑ 2.0 2.1 Steinbeck C, Han YQ, Kuhn S, Horlacher O, Luttmann E, Willighagen E: The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences 2003, 43(2):493-500.
- ↑ 3.0 3.1 Steinbeck C, Hoppe C, Kuhn S, Guha R, Willighagen EL: Recent Developments of The Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Current Pharmaceutical Design 2006, 12(17):2111-2120.
- ↑ Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., et al. (2010). Weka-a machine learning workbench for data mining. Data Mining and Knowledge Discovery Handbook, 1269–1277. Springer. Retrieved May 10, 2011, from http://www.springerlink.com/index/V2U27361775071V7.pdf.
- ↑ Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Steinbeck C, Wikberg JE: Bioclipse: An open rich client workbench for chemo- and bioinformatics. BMC Bioinformatics 2007, 8(59).
- ↑ Kuhn T, Willighagen EL, Zielesny A, Steinbeck S: CDK-Taverna: an open workflow environment for cheminformatics. BMC Bioinformatics 2010, 11:159.
WebLinks
Contact
For further questions, feel free to contact us at the CDK-Taverna mailing list:
https://lists.sourceforge.net/lists/listinfo/cdk-taverna