Personal tools
You are here: Home Developer Infrastructure Teams Architecture Requirements Extension Framework

Extension Framework

Requirements document for the Kepler extension framework system. This system will be designed, developed and maintained by the Framework Team and it will allow for Kepler modules to be created, packaged, versioned, configured, loaded and executed.

 

Use Cases

 

UC-1) Scientist downloads a distribution that includes a suite of actors meant to be used together.

UC-2) Scientist downloads an extension that provides additional or alternative system-wide capabilities (e.g., a provenance viewer, an alternative data store, new gui components, etc).

UC-3) Scientist archives a workflow in a shareable package that includes the workflow specification, and the parameter values and data bindings associated with the run, and optionally, the input data sets, output data sets, and detailed dependencies between the data items.

UC-4) Scientist shares a version of a workflow with other scientists.  Downloading of workflow dependencies is handled automatically.

UC-5) Scientist reruns a workflow obtained from someone else as a workflow archive

.

 

UC-6) Actor Developer

creates, packages and versions a new component or collection of components, including the necessary code, jar files, documentation, actor icons, and ontology extensions

 

UC-7) Actor developer or Extender

declares dependencies of an extension on 3rd-party applications, interpreters, and libraries (e.g., R, Matlab, Perl) and on particular add-on packages for those 3rd party systems (e.g., a CPAN module for Perl).

 

UC-8) Extender declares extension points within their extension.


UC-9) Extender creates a variant of an existing bundle, assigning a distinct ID to it to differentiate it from the original.

UC-10) Core Developer creates extension points within the base system.

UC-11) Core Developer integrates customized base system bundles into next release of base system.

UC-12) Scientist enables, disables, or configure downloaded extensions within an installation of Kepler.

System functions

 The following are detailed descriptions of the functions of the Extension Framework.  A system function is something the system does, as opposed to constraints on how the system must do it (the non-functional requirements), and things the user does (the use cases).

F-1)  Package a workflow specification for archival and sharing with others as a workflow archive [is there a better word for this?].  The archive can optionally include:

  • The actor packages the workflow depends on.
  • The non-actor extension packages the workflow depends on.
  • 3rd-party libraries, applications, and native libraries the actor packages and system extensions included in the workflow archive depend on.
  • A workflow archive that includes all of these things is called a self-contained workflow archive.

 

F-2) Package a workflow run for archival and sharing with others as a run archive

.  A run archive must include:

  • A workflow archive that may or may not fully self-contained.
  • The data bindings and parameter values for the archived run of the workflows.  Data bindings may be by reference.

A run archive may optionally include:

  • The input data sets for the archived run (included by value rather than by reference).
  • The output data sets produced by that run (included by value rather than by reference).
  • Some or all of the intermediate data produced during the run but not explicitly output by the workflow.
  •  The detailed dependencies of the output data on input and intermediate data for the run (data provenance).
  • A run archive that includes all of the above is called a self-contained run archive.

F-3)  Create an ad hoc Kepler distribution that bundles one or more self-contained workflow archives along with a distribution of Kepler the included workflows can run in.  An ad hoc distribution of Kepler can be used to:

  •  Share fully configured distributions of Kepler with new potential users, and for creating system and workflow demos.
  • Rapidly deploy Kepler on a large number of machines, e.g. in a classroom setting.
  • Distribute the minimal system that must be installed on each node when running the bundled workflow(s) on a cluster.

F-4)   Download all dependencies a particular actor package or system extension depends on.

 

Non-functional requirements

(see definitions of functional and non-functional requirements)

 

1. The Kepler base system--including the kernel, the standard extensions, and the desktop GUI--should run in a reasonable amount of memory with large models containing many actors (e.g., today 2 GB ram and 500 actors instances). This size limit does not include the memory occupied by data the workflow operates on.

2. The Kepler base system startup time should be no more than 15 seconds the first time it is run and no more than 6 seconds thereafter. The framework likely will need to support lazy loading and caching of configurations and metadata to meet this requirement.


4. Bundles should be able to include, access, and load Java classes, Java jars, Native code libraries that differentiate platform dependencies, and any other resources like data or images on which an actor or other extension depend.  This includes actor icons and new ontologies or ontology extensions for actor classification.

5. The framework should support the use of different versions of the same dependency in two concurrently loaded bundles. (e.g. Bundle A depends on xerces-1.5, while Bundle B depends on xerces-2.0)

7. The framework should support the creation of actor packages without requiring refactoring of the source code for the actors.  (Adapting existing non-actor code to for packaging in a bundle may require refactoring.)

8. All software artifacts other than explicitly OS-specific code should run under recent versions of MacOS X, Win32, and Linux.

10. Bundles and components should always be identifiable by a URN globally unique identifier.

11. The extension point system should be able to differentiate required and optional extension points and cardinality of extension points.

12. The framework should allow for grouping any combination of bundles that satisfy their dependencies.

13. The extension system should support multiple configurations (e.g., what is currently stored in ~/.kepler) for a single installation of Kepler.

14.  Downloading an extension, or loading an extension into Kepler, should provide the opportunity for downloading all dependencies automatically.  It should not be fully automatic by default.

15.  It must be possible for extensions and actor packages to be released on schedules independent of the release schedule for Kepler.

16. An Extender should be able create alternative versions of standard and non-standard extensions, overriding the implementation of or adding capabilities to the existing existing extensions.

 17.  The development overhead for creating and managing bundle definitions should be small relative to the work required to develop the extension.  Efficient bundle management should not depend on tools available only in particular development environments.

18. Creating bundles should be automatable, e.g. similar to the way kar files currently can be created automatically using using targets in our build system.

19.  It should easy for us to create and exercise bundles from within our system tests.

20. Scientists should not need to understand how the extension framework works or be aware of any of the underlying framework technologies, file formats, or standards it employs.

21.  Instances of the same class constructed in two different bundles at run-time should be recognized as being instantiated from the same class and be compatible with methods in the other bundle.

22. The system should not require that actor package or system extension bundles include the source code for those actors or extensions.

Limitations

 

L-1) Dependencies between extensions (and other packagings) may not be cyclic.  If two packages depend on each other, then they must be merged into a single package.

L-2) Dependencies must always be unambiguously versioned.

L-3) Bundles must always be unambiguously versioned.

Assumed functions of other subsystems

AF-1)  A distributed workflow execution framework in Kepler will allow workflows to be distributed to remote systems, cluster nodes, etc.  This framework will depend on the packaging mechanism provided by the extension framework for transporting code and resources to these nodes.

 AF-2) A repository will allow scientists to discover, upload, and download actor packages and system extensions.

AF-2) There will a be a system, somewhere, for providing instances of the Extension Framework with unique IDs for assigning to newly created bundles.

 

 

 

 

 

Glossary of Terms

Also see: Framework Glossary

 

Term
definition
Core Developer
Someone who is responsible for developing and maintaining the base system and defining and maintaining base system extension points. (inherits Extender use cases)
Extender Someone who is developing bundles to add functionality to the base system by extending predefined extension points. (inherits Actor Developer use cases)
Actor Developer
Someone who is developing new components for use with Kepler.  (inherits Scientist use cases)
Scientist Someone who is building workflows using existing components.

 

Document Actions