|
- Info
Example Projects Using Kepler
The Kepler community is a continually growing group of scientists and engineers working together to facilitate and streamline scientific analysis and data management. Dozens of projects are currently using Kepler to solve a variety of technical challenges, and we continue to update this page to highlight some of these projects.*
Indigo DataCloud: Scientific Computing Platform as a Service
INDIGO-DataCloud develops an open source data and computing platform targeted at scientific communities, deployable on multiple hardware and provisioned over hybrid, private or public, e-infrastructures. By filling existing gaps in PaaS and SaaS levels, INDIGO-DataCloud will help developers, resources providers, e-infrastructures and scientific communities to overcome current challenges in the Cloud computing, storage and network areas.
INDIGO ready-to-use components can be grouped as:
- User-oriented access services (User Interfaces, Mobile Applications, Scientific Portals)
- Optimized exploitation of resources across multiple Cloud infrastructures
- Seamless and integrated access to geographically distributed data
- Improved functionalities in the popular Cloud frameworks OpenNebula and OpenStack
|
|
|
bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data
bioKepler project builds a Kepler module to
execute bioinformatics tools using distributed execution
patterns. Once customized, these components are executed on multiple
distributed platforms including various Cloud and Grid computing
platforms. In addition, bioKepler delivers virtual machines including a Kepler
engine and all bioinformatics tools and applications distributed in bioKepler.
bioKepler is a module that is distributed on top of the core Kepler scientific workflow system. More information can be found at http://www.biokepler.org/
|
WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires
The WIFIRE CI (cyberinfrastructure) builds an integrated system
for wildfire analysis, with specific regard to changing urban dynamics
and climate. The system integrates networked observations such as
heterogeneous satellite data and real-time remote sensor data, with
computational techniques in signal processing, visualization, modeling,
and data assimilation to provide a scalable method to monitor such
phenomena as weather patterns that can help predict a wildfire's rate of
spread.
Kepler scientific workflows is used in WIFIRE as an integrative distributed
programming model and will simplify the implementation of engineering
modules for data-driven simulation, prediction and visualization while
allowing integration with large-scale computing facilities. More information on WIFIRE can be found at http://wifire.ucsd.edu/
|
|
|
CAMERA: Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis
CAMERA project serves the needs of the microbial ecology research community, and
other scientists using metagenomics data, by creating a rich,
distinctive data repository and a bioinformatics tools resource that addresses many of the unique challenges of metagenomic analysis.
Kepler scientific workflows are being utilized to launch data analysis tools. Workflows are configurable analysis
packages that can be applied to data within the CAMERA workspace or to
data uploaded from the local system. More information on CAMERA workflows can be found at http://camera.calit2.net/workflows.shtm
|
NBCR: National Biomedical Computation Resource
NBCR enables biomedical scientists to address the challenge of
integrating detailed structural measurements from diverse scales of
biological organization that range from molecules to organ systems in
order to gain quantitative understanding of biological function and
phenotypes. Predictive multi-scale models and our driving biological
research problems together address issues in modeling of sub-cellular
biophysics, building molecular modeling tools to accelerate discovery,
and defining tools for patient-specific multi-scale modeling.
Kepler is being utilized in NBCR to provide more complete computer and data-aided workflows that are
sensible to biological scientists, hide to the extent possible the
shifting sands of computing infrastructure as it evolves over years and
decades, and are fundamentally sound from the scientific reproducibility
viewpoint. More information on NBCR workflows can be found at http://nbcr.ucsd.edu/wordpress2/?page_id=3357
|
|
|
EGI-InSPIRE
The EC EGI-InSPIRE project (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe) is a collaborative effort involving more than 50 institutions in over 40 countries. Its mission is to establish a sustainable European Grid Infrastructure (EGI). Project is providing support for many scientific communities. Among many services exposed to users, services based on Kepler are provided in the field of the scientific workflows.
Kepler (and it's suites including Serpens) are used by several application from the field of Nuclear Fusion and Astrophysics to manage the workflows runs on Grid. More information about EGI-InSPIRE can be found at https://www.egi.eu/about/egi-inspire/.
|
PLGrid Plus
PLGrid Plus project serves scientific communities enabling extensive cooperation among them, in the scope of research activities in the area of e-Science. It creates computing environments – so called domain grids – i.e., solutions, services and extended infrastructure (including software), tailored to the needs of different groups of scientists. Two of the domains, AstroGrid-PL (https://astrogrid-pl.org) and Nanotechnology are providing services based on Kepler (and it's suites including Serpens).
Kepler scientific workflows facilitates several applications including various astronomical data reduction workflows scenarios integrating different technologies and services (iRODS, Virtual Observatory standards, Grid middleware) or ring anlysis workflows used to study materials. More information about PLGrid Plus can be found at http://www.plgrid.pl/en/. |
|
|
Science Pipes
Science Pipes allows anyone to access, analyze, and visualize the
huge volume of primary biodiversity data currently available online.
This site provides access to powerful scientific analyses and workflows
through an intuitive, rich web interface based on the visual programming
paradigm, similar to Yahoo Pipes. Analyses and visualizations are
authored in an open, collaborative environment which allows existing
analyses and visualizations to be shared, modified, repurposed, and
enhanced.
Behind the scenes, Science Pipes is based on the Kepler scientific workflow software which is used by professional researchers for analysis and modeling. More information about Science Pipes can be found at http://sciencepipes.org.
|
FilteredPush Continuous Quality Control Network Integration
The aggregation of rapidly increasing quantities of species-occurence
data from large number of distributed sources can greatly benefit many
biological research areas, such as taxonomy, modeling species
distributions and assessing the effects of climate change on biological
diversity. There are three critical issues with data in current
distributed networks of species-occurrence data, as in all scientific
data: correcting the errors, maintaining currency, and assessing fitness
for use. The FilteredPush project will build a continuous quality
control system for such distributed heterogeneous data sets.
The Filtered Push Continuous Quality Control (FPCQC) software
integrates the Kepler workflow system as a means for assessing fitness
for use, and also providing quality control facilities to Kepler users.
For both Kepler and other FPCQC analysis engines, the comparison of data
from different sets can rely on standards and ontologies to insure
meaningful interpretation of restuls based on them. More information about FilteredPush can be found at http://wiki.filteredpush.org/wiki/FilteredPush.
|
|
|
NIF: The Neuroscience Information Framework
The Neuroscience Information Framework is a dynamic inventory of
Web-based neuroscience resources: data, materials, and tools accessible
via any computer connected to the Internet. An initiative of the NIH
Blueprint for Neuroscience Research, NIF advances neuroscience research
by enabling discovery and access to public research data and tools
worldwide through an open source, networked environment.
NIF utilizes Kepler workflows to provide an accumulated view on neuroscience data, e.g., the “Brain data flow” which present a user with categorized information and heat maps about sources which have information on various brain regions. More information on NIF can be found at https://www.neuinfo.org/.
|
CommDyn: Community Dynamics Toolbox
Building on available long-term observations and advanced
information technology, CommDyn develops a toolbox for automating the
process of analyzing community change. Long-term data sets will be used
to demonstrate data and system accessibility and functionality, and
through implementation of new metrics we will gain insights into what
drives change in ecological communities on a continental scale. Data
will be accessed via the DataONE portal and the LTER Network Information
System using the Ecological Metadata Language, and analyzed with R
routines in Kepler workflows. More information on CommDyn can be found at https://projects.ecoinformatics.org/ecoinfo/projects/commdyn.
|
|
|
pPOD
The pPOD team at UC Davis recently announced a preview release of a Kepler extension that provides new actors and tools to create and manage phylogenetic analyses. This extension was developed as part of a 3-year National Science Foundation grant to address informatics challenges faced by researchers funded by the AToL (Assembling the Tree of Life) initiative. The pPOD extension includes actors that will enable AToL teams to automate phylogenetic analyses as well as reliably record and later reconstruct how results were obtained from primary observations. Read more
|
Real-time Environment for Analytical Processing (REAP)
REAP project investigators are combining the real-time data grid being constructed through other projects (Data Turbine, OPeNDAP, EarthGrid) and the Kepler scientific workflow system to provide a framework for designing and executing scientific workflows that use sensor data. To this end, project collaborators are extending Kepler to access sensor data in workflows, monitor, inspect and control sensor networks, and simulate the design of new sensor networks. Read more
|
|
|
SANParks: Managing Wildlife Populations
Kruger National Park (KNP), in collaboration with the National Center for Ecological Analysis and Synthesis (NCEAS), is developing a system that uses Kepler workflows to facilitate conservation management analysis. Funded by the Andrew W Mellon Foundation, this workflow-based solution is being adopted by the twenty-two South African National Parks (SANParks) to greatly improve the adaptive management of the park system. Read More
|
Scientific Data Management Center, Scientific Process Automation (SDM Center, SPA)
Science in many disciplines increasingly requires data-intensive and compute-intensive information technology (IT) solutions for scientific discovery. Scientific applications with these requirements range from the understanding of biological processes at the sub-cellular level of “molecular machines” (as is common, e.g., in genomics and proteomics), to the level of simulating nuclear fusion reactions and supernova explosions in astrophysics. A practical bottleneck for more effective use of available computational and data resources is often in the IT knowledge of the end-user; in the design of resource access and use of processes; and the corresponding execution environments, i.e., in the scientific workflow environment of end user scientists. The goal of the Kepler/SPA thrust of the SDM Center is to provide solutions and products for effective and efficient modeling, design, configurability, execution, and reuse of scientific workflows. Read more
|
|
|
SEEK: Science Environment for Ecological Knowledge
SEEK was a five year initiative designed to create cyberinfrastructure for
ecological, environmental, and biodiversity research and to educate the
ecological community about ecoinformatics. SEEK participants built an integrated data grid (EcoGrid) for accessing a wide variety of ecological and biodiversity data and analytical tools (Kepler)
for efficiently utilizing these data stores to advance ecological and
biodiversity science. An intelligent middleware system (SMS) built to facilitate integration and synthesis of data and models within these systems.
More information on SEEK can be found at http://seek.ecoinformatics.org
|
COast-to-Mountain Environmental Transect Project (COMET)
The COMET Project (for COast-to-Mountain Environmental Transect) is funded by the NSF Cyberinfrastructure for Environmental Observatories: Prototype Systems to Address Cross-Cutting Needs (CEO:P) initiative. The goal of the project is to develop a cyberinfrastructure prototype to facilitate the study of the way in which multiple environmental factors, including climate variability, affect major ecosystems along an elevation gradient from coastal California to the summit of the Sierra Nevada. An understanding of the coupling between the strength of the California upwelling system and terrestrial ecosystem carbon exchange is the central scientific question. Additional scientific goals are to better understand the way in which atmospheric dust is transported to Lake Tahoe and an examination of carbon flux in the coastal zone as moderated by upwelling processes. The geographic context is one in which there is a diversity of ecosystems that are believed to be sensitive to climatological changes.
The dispersion and complexity of the data needed to answer the scientific questions motivate the development of a state-of-the-art cyberinfrastructure to facilitate the scientific research. This cyberinfrastructure will be based around the integration of access to distributed and varied data collections and sensor data streams, semantic registration of data, models and analysis tools, semantically-aware data query mechanisms, and an orchestration system for advanced scientific workflows. Access to this cyberinfrastructure will be provided through a Web-based portal. Read more
|
Other projects that utilized Kepler in the past include:
- National Center for Computational Sciences (NCCS)
- Hydrant
- Nimrod/K
- ChIP-chip
- Cyberinfrastructure for Phylogenetic Research (CIPRes)
- Biodiversity Analysis Pipeline (BAP)
- Long Term Ecological Research Network (LTER), ITER
- Spatially Oriented Rule Based System for a Resource and Production Management of Raw Bio-Materials (RAPR)
- the Geosciences Network (GEON II)
* More organizations working with Kepler: National Center for Ecological Analysis and Synthesis (NCEAS), San Diego Supercomputer Center (SDSC), Monash eScience and Grid Engineering Laboratory and Poznan Supercomputing and Networking Center.
|
|