QUT Home
MQUTeR Home About Us Research  

Project Areas

MQUTeR
About Us
Staff
Contacts
Press Releases
Sponsors and Collaborators
Research
Bioinformatics
Overview
BioPatML
Biomashups
Bio2RDF
SilverGene
Workflow
People
Publications
Links
Sensor Networks
Parallelism
Mobile Computing

GPFlow

The Gardens Point Flow (GPFlow) is an intuitive workflow environment to support biologists with their research. The workflow wraps legacy tools, presenting a high level interactive web based frontend to biologists. The workflow backend is realized by a commercial grade workflow engine (Windows Workflow Foundation). The goal of GPFlow is flexibility and simplicity.


A larger view of our GPFlow demo can be found here.

TOP

Collections

GPFlow implements a simple but powerful data channeling model which allows the user to perform experiments that iterate over collections of input values, form new data values by aggregating disparate values, and form collections by partitioning subsets of the accumulated corpus of results. Single-valued workflows may be automatically lifted to operate over collections without altering the workflow topology and without requiring the user to insert explicit iteration operators.

The GPFlow runtime system uses operations inspired by the map and reduce operations found in many functional programming languages to enable collection processing. Where collections of input values are supplied, the workflow is executed over the Cartesian product of the supplied input collections. Input combinations are managed by an automatic correlation method which ensures that structural integrity of the workflow is maintained. The potentially explosive cardinality of the resulting output collections is dealt with by key-slice aggregation. A single collection of values is partitioned to form distinct subsets in a manner analogous to the SQL group by clause. Each resulting subset may then be processed as a whole by collection-enabled components.

TOP

Workflow Smarts

A major feature of eResearch is the amount of data which may be incorporated. The amount of data can be both an advantage and a disadvantage unless workable data management and processing methods are in place. The incorporation of both traditional and semantic smart system technologies into a workflow management system is intended to provide a way to integrate various data sources while providing an overall real world goal to guide the process. Specifically in the case of the Microsoft QUT eResearch centre, semantic concepts will form the basis of higher level processing in order to increase the productivity of scientists utilising the system while traditional smart system optimisations will be used to filter and order results according to their relevance.

The goal of eResearch is also to provide access to the results of investigations in order to increase the level of avalable knowledge. The integration of semantic features into a workflow system provides a scientist with the possibility to link their data into the wider body of scientific knowledge according to the relevant real world concepts. They may also be able to have their proposed publications reviewed more easily, and more efficiently by their peers semantic features of the publication are available to the reviewer.

TOP

Annotation Workflow

GPFlow is built to benefit the end user, to that end, users will drive development of workflows. The annotation workflow is a unique system built atop many open and publicly available 3rd party annotation tools. The criteria distinguishing our system from others, is the focus on user customisation of resultant feature predictions.

The ability to execute several analysis tools in parallel, aggregate their results and present an interactive annotation selector for a particular locus is the central goal to the workflow. It is envisaged, support for tools beyond coding, tRNA and rRNA identification can be added transparently to further extend the utility of the annotation system.

TOP