Master-Slave 2.0 RoadMap
the RoadMap for Master-Slave 2.0 release, which includes the objective, concepts, main functionalities, detailed capabilities, limitations, and to-do list.
Objective
Master-Slave Distributed Execution framework in Kepler is to facilitate Kepler users to harness the power of multiple computing nodes to accomplish computations that would be difficult, time-consuming, or impossible to accomplish on a single node.
Master-Slave 2.0 will integrate its own update on workflow distribution capability, e.g. usability and provenance support, and the new features of the coming Kepler 2.0, e.g. module structure and new configuration system. Master-Slave 2.0 will be released as a separate suite after Kepler 2.0.
Concepts
- Master:The master is always the machine from which you will start the execution of the workflow.
- Slave:The slave is any remote machine (1 or more) that will accept incoming connections from the master and execute all or part of your workflow.
- EcoGrid: The EcoGrid has been designed as a series of Grid-service programming interfaces that accomplish seamless access to data via standardized service APIs. In current Master-Slave architecture, EcoGrid is used to register slave and authenticate user.
Main Functionalities
- Actor level distributed execution: Through DistributedCompositeActor (DCA), tasks within this composite actor will be distributed from the master to the configured slaves and executed on them. The results will be returned to the master.
- Slave Registration: Slaves can register them to EcoGrid for masters to use. User can also add available slaves manually.
- Access Control: When starting slave service, it can be configured on which users are allowed/denied to use the slave. If a user is allowed to access a slave, he/she still need to be authenticated by EcoGrid before using the slave.After the first two steps, this user will get credential to use this slave, which is valid for 24 hours by default. And each usage of a certain slave will be expired in one hour by default.
- Provenance Support: Provenance info for the execution of distributed sub-workflows on slaves will be integrated at master side.
Targeted Detailed Capabilities
- Usability
- Users can easily start/stop slave without knowing IT technical details, e.g. RMI and ant.
- Users can easily transform a common composite actor to be distributed and vice versa
- Configuration
- Users can easily configure their slave usage, such as allowed users, registration or not.
- Security
- LDAP based user authentication is supported. By default, users need to be authenticated by Kepler authentication service to use slaves.
- User access control is configurable for each slave.
- (Open Issue) The permissions for command line actor and scripts actors.
- Slave start and stop command.
- Provenance
- (Open Issue) Provenance information needs to be recorded at Slave sides or Master side. This data will be integrated at Master side after workflow execution.
- Data transfer
- Automatically transfer string and file data from master to slave.
- Automatically transfer string and file data from slave back to master.
- Master-Slave Mapping
- One Master can utilize multiple slaves.
- One Slave can serve for multiple Masters.
- One workflow can have multiple Distributed Composite Actors.
- Nested distributed composite actors.
Limitations - Future Work
- Extensible Data transfer Mechanism: We should make data transfer transparently and independently. It should work with other infrastructure. The current implementation is tightly coupled with SRB method, which supports file protocols: local, http, ftp, srb, and irods. Do not support GridFtp.
- Extensible Master-Slave Invocation Mechanism: The current implementation is tightly coupled with RMI infrastructure.
- Peer-to-peer data transfer among multiple Slaves: Distributed composite actors are independent to each other. Each distributed composite actor doesn’t know whether its downstream actors are distributed composite actors and their addresses. So output data generated at slave sides will only be automatically transferred to master side, not other slave sides. Using DistributedDirector might be able to enable this capability.
- Distributed Director: Distributed Director may replace Distributed Composite Actor to enable normal workflow (workflow without Distributed Composite Actors) to be distributed. By telling the Distributed Director about the available slaves, Distributed Director might be able to (optimally) schedule the actors on the slaves and enable peer-to-peer data transfer among multiple slaves.
- Dynamically Loading Actor to Slave Sides: Kepler has to be installed at both master and slave sides beforehand. For the sub-workflow to be executed at slave sides, the actors used in the sub-workflow have to be deployed there in advance. If an actor in the sub-workflow is not available at slave sides, the execution will get exception. It is because that an actor (class file and its dependent jar files) can not be loaded from master side to slave sides automatically if it is not available at slave sides. Module manager may help it, but it is based on module granularity, not actor granularity.