Full Descriptions Of Available Activities

From CDK-Taverna 2.0 Wiki

Jump to: navigation, search

This page describes and gives a short introduction of the usage of all available activities in the CDK-Taverna 2.0 project.

Contents

I/O

The I/O folder provides basic activities for reading and writing different types of data from and to hard-disk. They all have an input port to specify the files to read from or the destination to write the data. File writer with the abitlity to handle iterative incoming data can be configurated to write one file per iteration or only a single file for all iterations. Every file writer provides a list of all written files.

Draft of the SD-file reader and the SD-file writer activitiy and the configuration panel of the writer activity.
Note: With the "One file per iteration" checkbox it is possible to decide whether to write one file per iteration or one single file for all iterations.

ARFF File Reader

Reads ARFF files to harddisk. Sets the last attribute as the class attribute when its name is "Class".

ARFF File Writer

Reads ARFF files to harddisk.

CML Chem File Reader

Reads CML Chem Files from hard-disk.

CML Chem File Writer

Writes CML Chem Files to hard-disk.

CSV File reader

Reads CSV Files from hard-disk.

CSV File writer

Writes CSV Files to hard-disk. Specially designed to write CSV data coming from QSAR vector data.

MDL SDFile Reader

Reads MDL SDFiles from hard-disk.

MDL SDFile Writer

Writes MDL SDFiles to hard-disk.

MDL Mol File Reader

Reads MDL Mol Files from hard-disk.

MDL Mol File Writer

Writes MDL Mol Files to hard-disk.

MDL RXN File Reader

Reads MDL RXN Files from hard-disk.

Multi MDL RXN File Reader

Reads Multi MDL RXN Files from hard-disk. This format is not a native MDL format. It is made up of a list of MDL RXN strings seperated by a "$$$$" delimiter string.

MDL RXN File Writer

Writes MDL RXN Files to hard-disk.

SMILES File Reader

Reads SMILES Files from hard-disk.

SMILES File Writer

Writes SMILES Files to hard-disk.

Text File Writer

Writes incoming string data to hard-disk.

XRFF File Reader

Reads XRFF files to harddisk.

XRFF File Writer

Reads XRFF files to harddisk.

Iterative I/O

The iterative I/O folder provides the ability to handle huge file sizes by reading them iteratively. They also have to be configurated like the basic input activities. They all have an input port to specify the files to read the data from. Additionaly you can adjust the number of elements read per iteration through the second port.

Draft of the iterative SD-File reader activity.
Note: To avoid out of memory errors uncheck the In-memory storage checkbox. Only nescessary if the data caching feature of the plugin is disabled.

Consume State

This activity is needed for the iterative loop reader activities. You have to connect an activitiy to the "state" output port of the loop activities because otherwise the port will not be evaluated. For an exapmle have a look at the Loop SDFile Reader activity.

DataCollectorAcceptor/DataCollectorEmitter

This two activities are only in combination with each other useable. The acceptor activity caches all the data coming from an iterative source. Afterwards the emitter activity reads the cached data at once and provides the whole data in a single invovation to the subsequent workflow.

Example workflow to show the usage of the DataCollectorAcceptor and DataCollectorEmitter activity.

Iterative RXN File Reader

Iterative file reader for MDL RXN files. The provided file chooser has multi file selection enabled.

Iterative Multi RXN File Reader

Iterative file reader for multi MDL RXN files. This format is not a native MDL format. It is made up of a list of MDL RXN strings seperated by a "$$$$" delimiter string.

Iterative SDFile Reader

Iterative file reader for MDL SDFiles.

Loop SDFile Reader

Iterative file reader for MDL SDFiles. The difference to the Iterative RXN/SD File readers is that the whole nested workflow is executed bevor the next iteration step starts.

Example workflow to show the usage of the Loop SDFile Reader Activity.
To configure the loop condition go to the "1 Details" tap and press "2 Add looping" in the advanced menu. Afterwards set the looping condition under point 3 and confirm by pressing the "4 OK" button.

Loop RXN File Reader

Iterative file reader for MDL RXN files. The difference to the Iterative RXN/SD File readers is that the whole nested workflow is executed bevor the next iteration step starts. The configuration process is the same like for the Loop SDFile Reader activity.

String Converter

This activities are used to convert string data to the data format used within the CDK-Zaverna 2.0 project and backwards. The activities have not to be configurated.

CML String to Structures Converter

Converts given CML strings to CDK-Taverna 2.0 structure objects.

Structure to CML String Converter

Converts given CDK-Taverna 2.0 structure objects to CML strings.

MDL Mol String to Structures Converter

Converts given MDL Mol File strings to CDK-Taverna 2.0 structure objects.

Structure to MDL Mol String Converter

Converts given CDK-Taverna 2.0 structure objects to MDL Mol File strings.

MDL SDFile String to Structures Converter

Converts given MDL SDFile strings to CDK-Taverna 2.0 structure objects.

Structure to SDFile String Converter

Converts given CDK-Taverna 2.0 structure objects to MDL SDFile string.

MDL RXN String to Reaction Converter

Converts given MDL RXN strings to CDK-Taverna 2.0 reaction objects.

Reaction to MDL RXN String Converter

Converts given CDK-Taverna 2.0 reaction objects to MDL RXN strings.

SMILES String to Structure Converter

Converts given SMILES strings to CDK-Taverna 2.0 structure objects.

Structure to SMILES String Converter

Converts given CDK-Taverna 2.0 structure objects to SMILES strings.

Filter

This folder provides different filter activities.

Atom Type Filter

Filters out molecules which contain for the CDK unknown atom types.

Doublets Filter

Filters out duplcate structures.

Rule Of Five Filter

Filters out structures which fail the Rule Of Five.

Isomorphism

This folder provides activities for basic isomorphism mapping between molecular structures.

Isomorphism Tester

Performs a test whether two structures are structurally identical.

Subgraphisomorphism Filter

Filters structeres whether they contain the query subgraph structure or not.

JChemPaint

JChemPaint is an editor and viewer for 2D chemical structures. The activity can be used to edit chemical structures at runtime.

Configuration panel of the JChemPaint Activity containing the structure editor.

Miscellaneous

This folder contains several activities which could not be classified to a certain class.

Add Explicit Hydrogens

Adds explicit hydrogens to the structures.

Add Implicit Hydrogens

Adds implicit hydrogens to the structures.

Hueckel Aromaticity Detector

Detects the aromaticity based on the Hueckel 4n+2 pi-electrons rule applied to isolated ring systems.

Reaction Reactant Splitter

Splitts given reaction into its reactans.

Reaction Splitter

Sorts given reactions according to their number of reactants.

Tag Molecules With UUID

Tags given molecules with an UUID.

UUID Generator

Generates UUIDs.

Modelling

2D-Coordinates Generator

Generates 2D-Coordinates for molecules.

Molecule Curation

This folder provides activities to curate compound libraries.

Curate Strange Elements

Sorts out molecules which contain other atom types then the following: C, H, N, O, P, S, Cl, F, As, Se, Br, I, B.

Molecule Connectivity Checker

Sorts out counter ions from given structures. The maximum size for ions is definable.

Remove Sugar Groups

Reaction Enumerator

The enumeration activity generates a virtual compound library based on a generic reaction and their associated reactant lists.
Example workflows can be found here.

Reaction enumeration example.

Reaction Enumerator

The Reaction Enumerator activity.

Configuration panel of the Reaction Enumerator Activity.
  • Variabel Region Checker: Gives the user the possibility to define variable regions. This makes the generic template reactions much more flexible.
Currently available variable regions.
  • Multi match checker: Checks wether the reactant matches the reaction template more then one time. So every possible result reaction can be enumerated.
Multi match example

Reaction Enumerator Subgraph Filter

Specially designed subgraph filter for the Reaction Enumerator activity.

Renderer

This activities visualize given structures or reactions as images in the JPG/PNG/PDF format. They all have an input port to specify the destination for writing the data.

Draft of the Write Molecule As JPG activity.

Write Molecule As JPG

Renders given structures into JPG image files.

Write Molecule As PNG

Renders given structures into PNG image files.

Write Molecule As PDF

Renders given structures into a single PDF file.

Write Reaction As PDF

Renders given reactions into a single PDF file.

QSAR

This folder contains activities for the calculation and the processing of QSAR descriptor results.
Example workflows can be found here.
Note: It is strongly recommended to write one file per iteration during iterative workflows. When the workflow is finished merge the CSVs with the Merge CSVs To QSAR Vector activity.

Calculate QSAR Vector Statistics

Evaluates some statistics about the calculated QSAR descriptor values and shows the ratio between calculated and not calculated QSAR descriptor values.

Create Fingerprint Item List From QSAR Vector

Provides from given QSAR Vector a fingerprint item list which is nescessary for the usage of the ART-2a classification activity.

CSV To QSAR Vector

CSV file reader which converts them into a QSAR Vector.

Merge CSVs To QSAR Vector

Multi CSV file reader which merges the different CSV files into one QSAR Vector.

Curate QSAR Vector

Curates the given QSAR Vector from not calculated descriptor values and removes columns which do not differ in their min/max values. You can choose between three curation methods:

  1. Dynamic curation between rows and columns: Tries to maximize the number of remaining descriptor values. This curation type is an intermediate type between curation type 2 and 3.
  2. Curate only columns: Discards the columns which contain not calculated descriptors.
  3. Curate only rows: Discards the rows (molecules) which contain not calculated descriptors.

Additionaly you can choose whether columns with not in min max values differing descriptors should be discarded.

Configuration panel of the Curate QSAR Vector Activity.

Merge QSAR Vectors

Merges given QSAR Vectors into one resulting QSAR Vector. Thereby will be created a minimum subset of the existing describtors. The number of QSAR Vectors to merge is configurable.

QSAR Descriptor

This activity combines the power of all QSAR descriptors in one single activity. You can choose all available descriptors to be calculated at once.

Configuration panel of the QSAR Descriptor Activity.

QSAR Descriptor Threaded (Experimantal)

This activity bases on the QSAR Descriptor Activity but with the ability to use multi threading for the QSAR descriptor calculations. Vou can set the number of used threads in the configuration panel.
Note: It is tagged as experimental because the CDK is not explicitly thread safe.

QSAR Vector Generator

Extracts from structures the QSAR descriptor values and generates a QSAR Vector.

QSAR Vector To CSV

Generates from a QSAR vector a CSV string.

QSAR Descriptors

List of all available QSAR descriptors. Read all about their chemical properties in the technical literature or in the CDK JavaDoc. This activities have not to be configurated.

Atomic QSAR Descriptors

Atomic QSAR Descriptors JavaDoc

  • AtomDegree
  • AtomHybridization
  • AtomHybridizationVSEPR
  • AtomValence
  • BondsToAtom
  • CovalentRadius
  • DistanceToAtom
  • EffectiveAtomPolarizability
  • InductiveAtomicHardness
  • InductiveAtomicSoftness
  • IPAtomicHOSE
  • IPAtomicLearning
  • IsProtonInAromaticSystem
  • IsProtonInConjugatedPiSystem
  • PartialPiCharge
  • PartialSigmaCharge
  • PartialTChargeMMFF94
  • PartialTChargePEOE
  • PeriodicTablePosition
  • PiElectronegativity
  • ProtonAffinityHOSE
  • ProtonTotalPartialCharge
  • SigmaElectronegativity
  • StabilizationPlusCharge
  • VdWRadius

Atomic Proton QSAR Descriptors

Atomic Proton QSAR Descriptors JavaDoc

  • RDFProton_G3R
  • RDFProton_GDR
  • RDFProton_GHR
  • RDFProton_GHR_topol
  • RDFProton_GSR

Atompair QSAR Descriptors

Atompair QSAR Descriptors JavaDoc

  • PiContactDetection

Bond QSAR Descriptors

Bond QSAR Descriptors JavaDoc

  • AtomicNumberDifference
  • BondPartialPiCharge
  • BondPartialSigmaCharge
  • BondPartialTCharge
  • BondSigmaElectronegativity
  • IPBondLearning

Molecular QSAR Descriptors

Molecular QSAR Descriptors JavaDoc

  • AcidicGroupCount
  • ALOGP
  • AminoAcidCount
  • APol
  • AromaticAtomsCount
  • AromaticBondsCount
  • AtomCount
  • Autocorrelation_Charge
  • Autocorrelation_Mass
  • Autocorrelation_Polarizability
  • BasicGroupCount
  • BCUT
  • BondCount
  • BPol
  • CarbonTypes
  • ChiChain
  • ChiCluster
  • ChiPath
  • ChiPathCluster
  • CPSA
  • EcentricConnectivityIndex
  • FMF
  • FragmentComplexity
  • GravitationalIndex
  • HBondAcceptorCount
  • HBondDonorCount
  • HybridizationRatio
  • IPMolecularLearning
  • KappaShapeIndices
  • KierHallSmarts
  • LargestChain
  • LargestPiSystem
  • LengthOverBreadth
  • LongestAliphaticChain
  • MannholdLogP
  • MDE
  • MomentOfInertia
  • PetitjeanNumber
  • PetitjeanShapeIndex
  • RotatableBondsCount
  • RuleOfFive
  • TPSA
  • VAdjMa
  • Weight
  • WeightedPath
  • WHIM
  • WienerNumbers
  • XLogP
  • ZagrebIndex

Protein QSAR Descriptors

Protein QSAR Descriptors JavaDoc

  • TaeAminoAcid

ART-2a Clustering

This folder provides activities for the classification of input data. The used algorithm is the ART-2a classification algorithm.
Example workflows can be found here.

ART-2a Clusterer

This activity implements the ART-2a classification algorithm. There are six parameters to configure:

  1. Number of classifications: Determines the number of classifications within the intervall of the lower and upper vigilance parameter limit.
  2. The upper vigilance limit. The vigilance parameter determines the number of resulting classes. The higher the vigilance paramater the higher the number of resulting classes.
  3. The lower vigilance limit.
  4. The maximum classification time.
  5. Scale fingerprint items to values between 0.0 and 1.0.
  6. The output directory of the classification result files.
Configuration panel of the ART-2a Clusterer Activity.

ART-2a result As PDF

Visualizes the results of an ART-2a clustering in a PDF file. The output directory is the same as for the ART-2a Clusterer activity results.

ART-2a result As PDF File Reader

Has the same functionality like the ART-2a result As PDF activity. But it is possible choose the ART-2a results directly from hard -disk. The output directory is the directory of the input files.

ART-2a Result Considering Different Origins As PDF

This activity visualizes the fraction of each origin in the resulting classes so that it is possible to determine the similarity between different compound sources. An equal fraction within the classes shows a high equality between the sources. The output directory is the same as for the ART-2a Clusterer activity results.

Weka Clustering

This folder provides activities for clustering and result visualisation. It uses the Weka Machine Learning library. All result activities use the same output diretory than the weka clustering worker.
Example workflows can be found here.

Weka Clustering

This activity implements the clustering algoritms from the Weka Machine Learning library. it provides the following algorithms:

  1. EM (Expectation Maximisation)
  2. Farthest First
  3. Hierarchical Clustering
  4. Simple K Means
  5. X Means
Configuration panel of the weka clustering activity

Setup instructions for the activity:

  1. Choose clustering algorithm from list.
  2. Click the configure button.
  3. Configure the chosen clusterer and press OK.
  4. Add job to job list.
  5. Chose output directory

Finally apply the settings and close the configuration panel. To add more than one job to the list repeat steps 1-4.

Create Weka Dataset From QSAR Vector

Converts a QSAR vector to a weka dataset.

Extract Clustering Result As CSV

Writes statistics and UUID cluster memberships into files.

Extract Clustering Result As PDF

Visualizes the clustering result as PDF.

Clustering Result Considering Different Origins As PDF

Visualizes the clustering result from different origins and shows the ratio of the sources in the different clusters. The file is saved as PDF file.The activity uses the same output diretory than the weka clustering worker.

Generate Silhouette Plot From CLustering Result As CSV

Makes a Silhouette analysis of given clusterer result and saves the result as CSV Files file.

Generate Silhouette Plot From CLustering Result As PDF

Makes a Silhouette analysis of given clusterer result and visualizes the results as a chart in a PDF file.

Split Molecules Into Clusters

Splits given molecules into their evaluated clusters. It is created a single MDL SDFile for every cluster. The ID Cluster CSV file can be generated with the Extract Clustering Result As CSV activity.

WEKA Regression

This folder provides activities for regression problems and result visualization and evaluation. The activities use the Weka Machine Learning library algorithms.

Weka Regression

This activity implements the regression algoritms from the Weka Machine Learning library. it provides the following algorithms:

  1. Multiple Linear regression
  2. Three-layer perceptron-type neural networks
  3. Support Vector Machines
  4. M5P Regression Trees
Configuration panel for the weka regression activity

Create Weka Regression Dataset

This activity creates a regression dataset from a basic weka dataset i.e. provided by the Create Weka Dataset From QSAR Vector activity. The first attribute of the dataset has to be an UUID followed by n numeric data attributes. The last field is the class field and has to be numeric and named as "Class". The ID Class CSV File consits of two columns. The first column contains the UUID and the second column the numeric class value.

The Create Weka Regression Dataset activity.

Split Dataset Into Train-/Testset

This activity splits a given dataset into a trainset and a testset. There are three algorithms available:

  1. Random:
    Composes the sets randomly.
  2. Cluster Representatives:
    Uses the simple KMeans clusterer to assemble the sets.
  3. Single Global Max:
    Uses also the simple KMeans clusterer to assemble the sets. But afterwards the algorithm tries to optimize the sets through switching the worst datapoint from the testset into the trainset. This step is performed for a certain amount of iterations. Within every iteration a classification step is performed to evaluate the worst described datapoint in the testet. The blacklisting should usually be enbabled because the algorithm is very prone to oscillation and the blacklisting suppresses this behaviour.
The configuration panel for the Split Dataset Into Train-/Testset activity.

GA Attribute Selection

The activity uses a genetic algorithm to find an optimized set of attributes.

The configuration panel for the GA Attribute Selection activity.

Heuristic Attribute Selection

The activity tries to sort the attributes corresponding to their relevance for the underlying machine learning problem. The algorithms evaluates the performance of every attribute and leaves out the worst. This step is repeated until only one attribute remains.

Evaluate Regression Results as PDF

This activity produces a PDF containing different plots and statistics characterising the used machine learning model.

Contact

For further questions, feel free to contact us at the CDK-Taverna mailing list:
https://lists.sourceforge.net/lists/listinfo/cdk-taverna

Personal tools