Kepler job manager
Features of a proposed job manager to be used as a common back end to web-based UIs for Kepler.
Overview
Despite the diversity of the requirements for web-based UIs to Kepler, there appears to be a shared need for a long-running, stateful job manager to serve as a backend to web-based UIs. This job manager would run multiple instances of Kepler as need to handle concurrent workflow execution requests, and would enable web UIs to stage, start, abort, and monitor runs.
We have begun defining the job manager system in terms of the messages one would expect a web UI to send to the job manager
Messages to the job manager from a web-based UI
- Send authentication credentials.
- Store a workflow specification (or possibly a self-contained kar file including required .class files, jars, etc).
- Stage data and parameter values for a run of a workflow (may be called multiple times before starting a workflow).
- Specify resources for a staged run (e.g., whether the workflow should run in its own JVM or inside the application server).
- Start a staged workflow run.
- Get a handle to the input manager for a running workflow.
- Get a handle to the output manager for a running workflow.
- Request run status (is it running, how long has it been running, etc).
- Request list of jobs running (that are visible based on privileges of user).
- Pause a running workflow run.
- Resume a workflow run.
- Stop a workflow run.
- Forcibly terminate and clean up a workflow run.
- Request callbacks for events (e.g., run completion, exceptions, run status changes, any state updates resulting from other messages)
- Get list of workflow run products after a run completes.
- Get a run product after a run completes.
Issues
- Should it be possible to upload more than a .moml file (e.g., .class files) to the job manager? What are the implications for application server security, user administration, etc?
- If we store staged and output data as files behind the application server, where do they go and what persistence and security issues result?
- Is the job manager useful for a detachable desktop client as well?
- Could the job manager be used in distributed execution contexts?