Configuring and Starting Hadoop, Stratosphere and Spark Servers for bioKepler or DDP

This page describes how to configure and run the Hadoop, Stratosphere and spark servers separately (distributed mode) included with the bioKepler and DDP suites.

Hadoop

Hadoop module 1.1

Hadoop module 1.1 is developed based on Hadoop 2.2.0. The BioKepler and DDP suites include the binaries, libraries, and configuration files necessary to run a Hadoop server. These files are located in $HOME/KeplerData/workflows/module/hadoop-1.1.0/tools.

The following steps describe how to configure and start the Hadoop server included in the bioKepler/DDP. If Hadoop does not start, look at the log files in logs/.

Linux and Mac

Set JAVA_HOME in etc/hadoop/hadoop-env.sh to where java is installed on your computer.
Make sure all files in bin/ and sbin/ and etc/ are executable.

 chmod a+x bin/* sbin/* etc/*

Before starting Hadoop for the first time, format the namenode by running:

 bin/format-namenode.sh

Start Hadoop.

bin/start-hadoop.sh

Under Mac OS X, if start-hadoop.sh fails with "localhost: ssh: connect to host localhost port 22: Connection refused", then go to the System Preferences, select Sharing and then enable Remote Login.

Stop Hadoop:

bin/stop-hadoop.sh

Windows

NOTE: The hadoop module hasn't been well tested on Windows.

Download and install Cygwin, including OpenSSH and Open ssl packages in "C:\cygwin" and update your PATH environment variable:
```
c:\cygwin\bin;c:\cygwin\usr\bin
```
Hadoop requires password-less SSH access to manage its nodes. Set up authorization keys to be used by hadoop when ssh’ing to localhost:

ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Set JAVA_HOME parameter in conf/hadoop-env.sh (line 8) to where java is installed on your computer.
Set cygwin path translation in bin/hadoop-config.sh (line 184):

# cygwin path translation
if $cygwin; then
    JAVA_HOME=`cygpath -w "$JAVA_HOME"`
    CLASSPATH=`cygpath -wp "$CLASSPATH"`
    HADOOP_HOME=`cygpath -w "$HADOOP_HOME"`
    HADOOP_LOG_DIR=`cygpath -w "$HADOOP_LOG_DIR"`
    JAVA_LIBRARY_PATH=`cygpath -w "$JAVA_LIBRARY_PATH"`
    TOOL_PATH=`cygpath -wp "$TOOL_PATH"`
fi

Before starting Hadoop for the first time, format the namenode by running:

./bin/format-namenode.sh

Start Hadoop:

bin/start-hadoop.sh

Stop Hadoop:

bin/stop-hadoop.sh

Hadoop module 1.0

The BioKepler and DDP suites include the binaries, libraries, and configuration files necessary to run a Hadoop server. These files are located in $HOME/KeplerData/workflows/module/hadoop-1.0.0/tools.

The following steps describe how to configure and start the Hadoop server included in the bioKepler/DDP. If Hadoop does not start, look at the log files in logs/.

Linux and Mac

Set JAVA_HOME in conf/hadoop-env.sh to where java is installed on your computer.
Make sure all files in bin/ and conf/hadoop-env.sh are executable.
Before starting Hadoop for the first time, format the namenode by running:

 echo "Y" | bin/hadoop namenode -format

Start Hadoop:

bin/start-hadoop.sh

Windows

NOTE: The hadoop module hasn't been well tested on Windows.

Download and install Cygwin, including OpenSSH and Open ssl packages in "C:\cygwin" and update your PATH environment variable:
```
c:\cygwin\bin;c:\cygwin\usr\bin
```
Hadoop requires password-less SSH access to manage its nodes. Set up authorization keys to be used by hadoop when ssh’ing to localhost:

ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Set JAVA_HOME parameter in conf/hadoop-env.sh (line 8) to where java is installed on your computer.
Set cygwin path translation in bin/hadoop-config.sh (line 184):

# cygwin path translation
if $cygwin; then
    JAVA_HOME=`cygpath -w "$JAVA_HOME"`
    CLASSPATH=`cygpath -wp "$CLASSPATH"`
    HADOOP_HOME=`cygpath -w "$HADOOP_HOME"`
    HADOOP_LOG_DIR=`cygpath -w "$HADOOP_LOG_DIR"`
    JAVA_LIBRARY_PATH=`cygpath -w "$JAVA_LIBRARY_PATH"`
    TOOL_PATH=`cygpath -wp "$TOOL_PATH"`
fi

Before starting Hadoop for the first time, format the namenode by running:

./bin/hadoop namenode –format

Start Hadoop:

bin/start-hadoop.sh

Stratosphere

The BioKepler and DDP suites include the binaries, libraries, and configuration files necessary to run a Stratosphere server. These files are located in $HOME/KeplerData/workflows/module/stratosphere-1.2.0/tools.

The following steps describe how to configure and start the Stratosphere server included in the bioKepler/DDP. If Stratosphere does not start, look at the log files in logs/.

Linux and Mac

Make sure all files in bin/ are executable.
Start Stratosphere:

bin/start-local.sh

Windows

Set JAVA_HOME in bin/nephele-config.sh (line 52) to where java is installed on your computer.
Set NEPHELE_JM_CLASSPATH, NEPHELE_CONF_DIR and log_setting in bin/nephele-jobmanager.sh (line 117 and line 124)

NEPHELE_JM_CLASSPATH=`cygpath -wp $NEPHELE_JM_CLASSPATH`
log_setting="-Dlog.file="$log" -Dlog4j.configuration="$NEPHELE_CONF_DIR"/log4j.properties"

Start Stratosphere:

bin/start-local.sh

Spark

Linux and Mac

The assembly JAR necessary to start the Spark Master is not included with Kepler since it is over 100MB. To build this jar use the following steps:

Download the source code for Spark 1.1.0.
Extract the source.

untar xzpf spark-1.1.0.tgz

tar -xvzf spark-1.1.0.tgz

Build the assembly JAR.

cd spark-1.1.0
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly

Copy the assembly JAR to KeplerData.

mkdir -p $HOME/KeplerData/workflows/module/spark/tools/assembly/target/scala-2.10
cp assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop2.2.0.jar $HOME/KeplerData/workflows/module/spark/tools/assembly/target/scala-2.10/

Document Actions

Print this

Sections

Personal tools

Configuring and Starting Hadoop, Stratosphere and Spark Servers for bioKepler or DDP

Hadoop

Hadoop module 1.1

Linux and Mac

Windows

Hadoop module 1.0

Linux and Mac

Windows

Stratosphere

Linux and Mac

Windows

Spark

Linux and Mac

Document Actions