Distributed Kepler Peer-to-Peer

STATUS:

The content on this page is outdated. In particular, we no longer plan on using JXTA if we implement a P2P system. Other work has been done using Jini to implement very similar ideas as those described here. The page is archived for reference only. For more information about current work, please contact the Distributed Execution Interest Group.

It is a DRAFT DESIGN DOCUMENT and does not reflect functionality as it currently exists in Kepler. Comments and feedback are appreciated.

Overview

Our goals for a distributed Kepler include the ability to easily form ad-hoc networks of coorperating Kepler instances, which we will call a "Kepler Grid", or "kgrid". Each Kepler Grid can have access constraints and allows Kepler models or submodels to be run on participating instances. The idea is that users running instances of Kepler can create a new kgrid, choose who is allowed to connect in that kgrid, and join existing kgrids. Any existing Kepler instance should be able to join multiple kgrids at one time. Once a Kepler instance has joined a kgrid, it can configure one or more subcomponents of a workflow to be distributed across nodes of the kgrid.

The Kepler Grid page contains a related discussion of requirements for grid-based workflow execution.

Much of the infrastructure for building distributed workflows has been discussed in the excellent whitepaper:

Y. Zhao and T. H. Feng. Distributed Execution of Ptolemy Models.

In that paper they argue that a separate agent should be used for the communication among distributed ptolemy instances. They identify at least four components of an agent (PtAgent) that should be present: Discovery service, Launcher service, Relay service, and Logging service. The Relay service is primarly responsible for transferring data to submodels executing on other nodes, while other services help establish the communication channels. We believe that this type of system can be effectively implemented as a peer to peer system and that JXTA can provide all of the needed infrastructure in a relatively easy to use package.

Overview of Peer to Peer (P2P) Systems using JXTA

A peer-to-peer network is one in which many or all of the participating hosts act both as client and server in the communication system, forming a distributed network. JXTA is one framework for implementing peer-to-peer systems. The JXTA website states: "JXTA peers create a virtual network where any peer can interact with other peers and resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports."

The JXTA framework provides:

Peers: A networked device that implements one or more JXTA protocol, operates independently and asynchronously from other peers, and has a unique ID

Peer Groups: a self-organizing group of peers, with a commonly agreed upon set of services

Pipes: asynchronous, unidirectional communication links between peers

Messages: objects sent between peers

See the JXTA Programmer's Guide for an excellent applied overview of these concepts and how they can be implemented.

Applying JXTA systems to Kepler

JXTA Peer Groups as basis for kgrids

To form a new collaborative kgrid, users need to be able to create groups on the fly. JXTA Peer Groups are well-suited for this role because they can be created by any peer, and the peer creating a particular peer group can control the membership policies. Peer Groups can have authentication requirements, which allows kgrids to be created that only allow certain members to have access. By creating a membership policy framework based on GSI, we can use distinguished names to specify participating peers and proxy certificates for authentication. This means that any given Kepler instance is controlled by a single user that can be authenticated using GSI. Each peer kepler instance thus is associated with one authenticated user (at a time), and any user can have multiple kepler instances running (peers). Thus, we will need a mechanism to specify that peers (kepler instances) can join a kgrid (peer group) if and only if they are run by a user who is part of the membership list for that kgrid (peer group).

JXTA pipes as basis for message and data exchange

Once a kgrid is established with a set of authenticated peers, we need a way of passing messages among the kepler instances. JXTA pipes are specifically made for this purpose. The types of messages we will want to pass include:

Queries and responses for metadata (e.g., load, capabilities, etc)
Requests and responses to move workflows and workflow components as .ksw files
Data flow messages in executing workflows
Probably others

JXTA-enabled director

On the Zhao and Feng paper the movement of data from one node to another is managed by the PtAgent Relay in concert with a RelayProxy thet executes within the workflow itself. Thus, actors are used for the communication among models. However, as Directors are already responsible for the movement of data from one actor to another, we believe that a JXTA-enabled director can perform this relay function more effectively than we can using actor relay proxies. This item however needs to be discussed more thoroughly.

In implementing a JXTA-enabled director, we envision that a general-purpose JXTA service will be developed that can be used to transfer data from one peer to another. Thus, for any given workflow there will be a 'master peer' (the kepler instance that originates the computation), and one or more 'collaborating peers' (those kepler instances that execute subworkflows for the master). This setup argues somewhat that individual workflow firectors should be peers themselves so that direct communication can be established between the directors in a workflow.

Staging kepler workflows to kgrid nodes

We plan to use ksw files to move whole workflows or workflow components to collaborating peers. In the initial implementation we will assume that all peers have installed all possible actor capabilities. In later revisions, there will be a staging operation that allows the master peer to send needed workflow components to the collaborating peers before the workflow execution begins.

Using Peer to Peer Kepler on clusters

Cluster systems are increasingly common in departments and labs around the country, and the Kepler peer to peer system should be able to be deployed on these clusters easily. For this to happen, a user controlling a cluster should be able to automate the installation of the Kepler on each node of the cluster, remotely (not manually) start those kepler instances on each node, and instruct each instance to join one or more kgrids. The user can then control this cluster of kepler instances from any node that is also a member fo the kgrid, allowing jobs to be distributred across all of the nodes.

Other models

Rather than having an entire Kepler instance act as a peer in a kgrid, one could invision individual workflows that act as peers, or even individual workflow components. There finer-grained approaches should be thought through and examined.

Graphical User Interface

Peer-network configuration

We need a GUI component that alows users to configure their kepler instance to participate in peer grids. This will likely be a dialog box that:

lists the current kgrids to which the node has joined
allows the user to discover kgrids that might be out there but that they have not currently joined
- for kgrids that they don't have access to, allow the user to request access from the kgrid manager
allows the user to create a new kgrid, including specifying membership rules (probably launches a spearate dialog)

Specifying components to parallelize

The Kepler UI needs to be modified to allow the user to select workflows and workflow components to be run on a set of kgrid nodes. In the simplest case, many parts of a workflow might be iterative parameter sweeps in which the same subworkflow is run using different data input, and the runs are independent. In this simple case the subworkflow can be distributed across the nodes of a kgrid simple by sending the subworkflow to the kgrid and mediating the dataflow. In some cases, such as the Map actor, the iteration can explicitly be parallelized and so we may be able to detect these cases through workflow analysis and automatically distribute them, but for now the UI should allow specific workflow subsets to selected for remote execution. From a UI perspective, this may be as simple as selecting an actor or composite actor and toggling a "Run on KGrid" option which may pull up a dialog to select the kgrid to use (assuming there are multiple available).

Document Actions

Print this

Sections

Personal tools