Example Projects Using Kepler

The Kepler community is a continually growing group of scientists and engineers working together to facilitate and streamline scientific analysis and data management. Dozens of projects are currently using Kepler to solve a variety of technical challenges, and we continue to update this page to highlight some of these projects.*

Indigo DataCloud: Scientific Computing Platform as a Service

INDIGO-DataCloud develops an open source data and computing platform targeted at scientific communities, deployable on multiple hardware and provisioned over hybrid, private or public, e-infrastructures. By filling existing gaps in PaaS and SaaS levels, INDIGO-DataCloud will help developers, resources providers, e-infrastructures and scientific communities to overcome current challenges in the Cloud computing, storage and network areas.

INDIGO ready-to-use components can be grouped as:

User-oriented access services (User Interfaces, Mobile Applications, Scientific Portals)
Optimized exploitation of resources across multiple Cloud infrastructures
Seamless and integrated access to geographically distributed data
Improved functionalities in the popular Cloud frameworks OpenNebula and OpenStack

bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data

bioKepler project builds a Kepler module to execute bioinformatics tools using distributed execution patterns. Once customized, these components are executed on multiple distributed platforms including various Cloud and Grid computing platforms. In addition, bioKepler delivers virtual machines including a Kepler engine and all bioinformatics tools and applications distributed in bioKepler.

bioKepler is a module that is distributed on top of the core Kepler scientific workflow system. More information can be found at http://www.biokepler.org/

WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires

The WIFIRE CI (cyberinfrastructure) builds an integrated system for wildfire analysis, with specific regard to changing urban dynamics and climate. The system integrates networked observations such as heterogeneous satellite data and real-time remote sensor data, with computational techniques in signal processing, visualization, modeling, and data assimilation to provide a scalable method to monitor such phenomena as weather patterns that can help predict a wildfire's rate of spread.

Kepler scientific workflows is used in WIFIRE as an integrative distributed programming model and will simplify the implementation of engineering modules for data-driven simulation, prediction and visualization while allowing integration with large-scale computing facilities. More information on WIFIRE can be found at http://wifire.ucsd.edu/

CAMERA: Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis

CAMERA project serves the needs of the microbial ecology research community, and other scientists using metagenomics data, by creating a rich, distinctive data repository and a bioinformatics tools resource that addresses many of the unique challenges of metagenomic analysis.

Kepler scientific workflows are being utilized to launch data analysis tools. Workflows are configurable analysis packages that can be applied to data within the CAMERA workspace or to data uploaded from the local system. More information on CAMERA workflows can be found at http://camera.calit2.net/workflows.shtm

NBCR: National Biomedical Computation Resource

NBCR enables biomedical scientists to address the challenge of integrating detailed structural measurements from diverse scales of biological organization that range from molecules to organ systems in order to gain quantitative understanding of biological function and phenotypes. Predictive multi-scale models and our driving biological research problems together address issues in modeling of sub-cellular biophysics, building molecular modeling tools to accelerate discovery, and defining tools for patient-specific multi-scale modeling.

Kepler is being utilized in NBCR to provide more complete computer and data-aided workflows that are sensible to biological scientists, hide to the extent possible the shifting sands of computing infrastructure as it evolves over years and decades, and are fundamentally sound from the scientific reproducibility viewpoint. More information on NBCR workflows can be found at http://nbcr.ucsd.edu/wordpress2/?page_id=3357

EGI-InSPIRE

The EC EGI-InSPIRE project (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe) is a collaborative effort involving more than 50 institutions in over 40 countries. Its mission is to establish a sustainable European Grid Infrastructure (EGI). Project is providing support for many scientific communities. Among many services exposed to users, services based on Kepler are provided in the field of the scientific workflows.

Kepler (and it's suites including Serpens) are used by several application from the field of Nuclear Fusion and Astrophysics to manage the workflows runs on Grid. More information about EGI-InSPIRE can be found at https://www.egi.eu/about/egi-inspire/.

PLGrid Plus

PLGrid Plus project serves scientific communities enabling extensive cooperation among them, in the scope of research activities in the area of e-Science. It creates computing environments – so called domain grids – i.e., solutions, services and extended infrastructure (including software), tailored to the needs of different groups of scientists. Two of the domains, AstroGrid-PL (https://astrogrid-pl.org) and Nanotechnology are providing services based on Kepler (and it's suites including Serpens).

Kepler scientific workflows facilitates several applications including various astronomical data reduction workflows scenarios integrating different technologies and services (iRODS, Virtual Observatory standards, Grid middleware) or ring anlysis workflows used to study materials. More information about PLGrid Plus can be found at http://www.plgrid.pl/en/.

Science Pipes

Science Pipes allows anyone to access, analyze, and visualize the huge volume of primary biodiversity data currently available online. This site provides access to powerful scientific analyses and workflows through an intuitive, rich web interface based on the visual programming paradigm, similar to Yahoo Pipes. Analyses and visualizations are authored in an open, collaborative environment which allows existing analyses and visualizations to be shared, modified, repurposed, and enhanced.

Behind the scenes, Science Pipes is based on the Kepler scientific workflow software which is used by professional researchers for analysis and modeling. More information about Science Pipes can be found at http://sciencepipes.org.

FilteredPush Continuous Quality Control Network Integration

The aggregation of rapidly increasing quantities of species-occurence data from large number of distributed sources can greatly benefit many biological research areas, such as taxonomy, modeling species distributions and assessing the effects of climate change on biological diversity. There are three critical issues with data in current distributed networks of species-occurrence data, as in all scientific data: correcting the errors, maintaining currency, and assessing fitness for use. The FilteredPush project will build a continuous quality control system for such distributed heterogeneous data sets.

The Filtered Push Continuous Quality Control (FPCQC) software integrates the Kepler workflow system as a means for assessing fitness for use, and also providing quality control facilities to Kepler users. For both Kepler and other FPCQC analysis engines, the comparison of data from different sets can rely on standards and ontologies to insure meaningful interpretation of restuls based on them. More information about FilteredPush can be found at http://wiki.filteredpush.org/wiki/FilteredPush.

NIF: The Neuroscience Information Framework

The Neuroscience Information Framework is a dynamic inventory of Web-based neuroscience resources: data, materials, and tools accessible via any computer connected to the Internet. An initiative of the NIH Blueprint for Neuroscience Research, NIF advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment.

NIF utilizes Kepler workflows to provide an accumulated view on neuroscience data, e.g., the “Brain data flow” which present a user with categorized information and heat maps about sources which have information on various brain regions. More information on NIF can be found at https://www.neuinfo.org/.

CommDyn: Community Dynamics Toolbox

Building on available long-term observations and advanced information technology, CommDyn develops a toolbox for automating the process of analyzing community change. Long-term data sets will be used to demonstrate data and system accessibility and functionality, and through implementation of new metrics we will gain insights into what drives change in ecological communities on a continental scale. Data will be accessed via the DataONE portal and the LTER Network Information System using the Ecological Metadata Language, and analyzed with R routines in Kepler workflows. More information on CommDyn can be found at https://projects.ecoinformatics.org/ecoinfo/projects/commdyn.

pPOD

The pPOD team at UC Davis recently announced a preview release of a Kepler extension that provides new actors and tools to create and manage phylogenetic analyses. This extension was developed as part of a 3-year National Science Foundation grant to address informatics challenges faced by researchers funded by the AToL (Assembling the Tree of Life) initiative. The pPOD extension includes actors that will enable AToL teams to automate phylogenetic analyses as well as reliably record and later reconstruct how results were obtained from primary observations. Read more

Real-time Environment for Analytical Processing (REAP)

REAP project investigators are combining the real-time data grid being constructed through other projects (Data Turbine, OPeNDAP, EarthGrid) and the Kepler scientific workflow system to provide a framework for designing and executing scientific workflows that use sensor data. To this end, project collaborators are extending Kepler to access sensor data in workflows, monitor, inspect and control sensor networks, and simulate the design of new sensor networks. Read more

SANParks: Managing Wildlife Populations

Kruger National Park (KNP), in collaboration with the National Center for Ecological Analysis and Synthesis (NCEAS), is developing a system that uses Kepler workflows to facilitate conservation management analysis. Funded by the Andrew W Mellon Foundation, this workflow-based solution is being adopted by the twenty-two South African National Parks (SANParks) to greatly improve the adaptive management of the park system. Read More

Scientific Data Management Center, Scientific Process Automation (SDM Center, SPA)

Science in many disciplines increasingly requires data-intensive and compute-intensive information technology (IT) solutions for scientific discovery. Scientific applications with these requirements range from the understanding of biological processes at the sub-cellular level of “molecular machines” (as is common, e.g., in genomics and proteomics), to the level of simulating nuclear fusion reactions and supernova explosions in astrophysics. A practical bottleneck for more effective use of available computational and data resources is often in the IT knowledge of the end-user; in the design of resource access and use of processes; and the corresponding execution environments, i.e., in the scientific workflow environment of end user scientists. The goal of the Kepler/SPA thrust of the SDM Center is to provide solutions and products for effective and efficient modeling, design, configurability, execution, and reuse of scientific workflows. Read more

SEEK: Science Environment for Ecological Knowledge

SEEK was a five year initiative designed to create cyberinfrastructure for ecological, environmental, and biodiversity research and to educate the ecological community about ecoinformatics. SEEK participants built an integrated data grid (EcoGrid) for accessing a wide variety of ecological and biodiversity data and analytical tools (Kepler) for efficiently utilizing these data stores to advance ecological and biodiversity science. An intelligent middleware system (SMS) built to facilitate integration and synthesis of data and models within these systems. More information on SEEK can be found at http://seek.ecoinformatics.org

COast-to-Mountain Environmental Transect Project (COMET)

The COMET Project (for COast-to-Mountain Environmental Transect) is funded by the NSF Cyberinfrastructure for Environmental Observatories: Prototype Systems to Address Cross-Cutting Needs (CEO:P) initiative. The goal of the project is to develop a cyberinfrastructure prototype to facilitate the study of the way in which multiple environmental factors, including climate variability, affect major ecosystems along an elevation gradient from coastal California to the summit of the Sierra Nevada. An understanding of the coupling between the strength of the California upwelling system and terrestrial ecosystem carbon exchange is the central scientific question. Additional scientific goals are to better understand the way in which atmospheric dust is transported to Lake Tahoe and an examination of carbon flux in the coastal zone as moderated by upwelling processes. The geographic context is one in which there is a diversity of ecosystems that are believed to be sensitive to climatological changes. The dispersion and complexity of the data needed to answer the scientific questions motivate the development of a state-of-the-art cyberinfrastructure to facilitate the scientific research. This cyberinfrastructure will be based around the integration of access to distributed and varied data collections and sensor data streams, semantic registration of data, models and analysis tools, semantically-aware data query mechanisms, and an orchestration system for advanced scientific workflows. Access to this cyberinfrastructure will be provided through a Web-based portal. Read more

Kepler/Clotho Integration

Clotho (http://www.clothocad.org) is a design environment for synthetic biological systems. Kepler/Clotho Integration consists of actors which wrap Clotho operations for the physical assembly of DNA sequences. The Kepler/Clotho Integration work is available via http://sourceforge.net/projects/keplerclotho. The Clotho/Spectacles/Kepler toolset tied Illinois for the "Best Software Tool" in the 2009 International Genetically Engineered Machine (iGEM) competition.

Ptolemy II

Ptolemy II (http://ptolemy.eecs.berkeley.edu) provides the core execution engine used in Kepler. For details, see What is the relationship between Kepler and Ptolemy? Kepler is the largest user of Ptolemy II and the Kepler community provides bug reports, bug fixes and new development ideas to Ptolemy II.

Other projects that utilized Kepler in the past include:

National Center for Computational Sciences (NCCS)
Hydrant
Nimrod/K
ChIP-chip
Cyberinfrastructure for Phylogenetic Research (CIPRes)
Biodiversity Analysis Pipeline (BAP)
Long Term Ecological Research Network (LTER), ITER
Spatially Oriented Rule Based System for a Resource and Production Management of Raw Bio-Materials (RAPR)
the Geosciences Network (GEON II)

* More organizations working with Kepler: National Center for Ecological Analysis and Synthesis (NCEAS), San Diego Supercomputer Center (SDSC), Monash eScience and Grid Engineering Laboratory and Poznan Supercomputing and Networking Center.

Document Actions

Print this

Sections

Personal tools

Example Projects Using Kepler

Document Actions