Personal tools
You are here: Home Developer Interest Groups Distributed Execution Configuring and Starting Hadoop, Stratosphere and Spark Servers for bioKepler or DDP

Configuring and Starting Hadoop, Stratosphere and Spark Servers for bioKepler or DDP

This page describes how to configure and run the Hadoop, Stratosphere and spark servers separately (distributed mode) included with the bioKepler and DDP suites.

 

Hadoop

Hadoop module 1.1

Hadoop module 1.1 is developed based on Hadoop 2.2.0. The BioKepler and DDP suites include the binaries, libraries, and configuration files necessary to run a Hadoop server. These files are located in $HOME/KeplerData/workflows/module/hadoop-1.1.0/tools.

The following steps describe how to configure and start the Hadoop server included in the bioKepler/DDP. If Hadoop does not start, look at the log files in logs/

Linux and Mac

  1. Set JAVA_HOME in etc/hadoop/hadoop-env.sh to where java is installed on your computer.
  2. Make sure all files in bin/ and sbin/ and etc/ are executable.
  3.  chmod a+x bin/* sbin/* etc/*
  4. Before starting Hadoop for the first time, format the namenode by running:
  5.  bin/format-namenode.sh
  6.  Start Hadoop.
  7. bin/start-hadoop.sh
    1.  Under Mac OS X, if start-hadoop.sh fails with "localhost: ssh: connect to host localhost port 22: Connection refused", then go to the System Preferences, select Sharing and then enable Remote Login.
  8. Stop Hadoop:
  9. bin/stop-hadoop.sh

Windows

NOTE: The hadoop module hasn't been well tested on Windows.
 
  1. Download and install Cygwin, including OpenSSH and Open ssl packages in "C:\cygwin" and update your PATH environment variable:

    c:\cygwin\bin;c:\cygwin\usr\bin
  2. Hadoop requires password-less SSH access to manage its nodes. Set up authorization keys to be used by hadoop when ssh’ing to localhost:

  3. ssh-keygen -t rsa -P ""
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  4. Set JAVA_HOME parameter in conf/hadoop-env.sh (line  8) to where java is installed on your computer.

  5. Set cygwin path translation in bin/hadoop-config.sh  (line 184):

  6. # cygwin path translation
    if $cygwin; then
        JAVA_HOME=`cygpath -w "$JAVA_HOME"`
        CLASSPATH=`cygpath -wp "$CLASSPATH"`
        HADOOP_HOME=`cygpath -w "$HADOOP_HOME"`
        HADOOP_LOG_DIR=`cygpath -w "$HADOOP_LOG_DIR"`
        JAVA_LIBRARY_PATH=`cygpath -w "$JAVA_LIBRARY_PATH"`
        TOOL_PATH=`cygpath -wp "$TOOL_PATH"`
    fi
  7. Before starting Hadoop for the first time, format the namenode by running:
  8. ./bin/format-namenode.sh
  9.  Start Hadoop:
  10. bin/start-hadoop.sh
  11.  Stop Hadoop:
  12. bin/stop-hadoop.sh
     

Hadoop module 1.0

The BioKepler and DDP suites include the binaries, libraries, and configuration files necessary to run a Hadoop server. These files are located in $HOME/KeplerData/workflows/module/hadoop-1.0.0/tools.

The following steps describe how to configure and start the Hadoop server included in the bioKepler/DDP. If Hadoop does not start, look at the log files in logs/

Linux and Mac

  1. Set JAVA_HOME in conf/hadoop-env.sh to where java is installed on your computer.
  2. Make sure all files in bin/ and conf/hadoop-env.sh are executable.
  3. Before starting Hadoop for the first time, format the namenode by running:
  4.  echo "Y" | bin/hadoop namenode -format
  5.  Start Hadoop:
  6. bin/start-hadoop.sh

Windows

NOTE: The hadoop module hasn't been well tested on Windows.
 
  1. Download and install Cygwin, including OpenSSH and Open ssl packages in "C:\cygwin" and update your PATH environment variable:

    c:\cygwin\bin;c:\cygwin\usr\bin
  2. Hadoop requires password-less SSH access to manage its nodes. Set up authorization keys to be used by hadoop when ssh’ing to localhost:

  3. ssh-keygen -t rsa -P ""
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  4. Set JAVA_HOME parameter in conf/hadoop-env.sh (line  8) to where java is installed on your computer.

  5. Set cygwin path translation in bin/hadoop-config.sh  (line 184):

  6. # cygwin path translation
    if $cygwin; then
        JAVA_HOME=`cygpath -w "$JAVA_HOME"`
        CLASSPATH=`cygpath -wp "$CLASSPATH"`
        HADOOP_HOME=`cygpath -w "$HADOOP_HOME"`
        HADOOP_LOG_DIR=`cygpath -w "$HADOOP_LOG_DIR"`
        JAVA_LIBRARY_PATH=`cygpath -w "$JAVA_LIBRARY_PATH"`
        TOOL_PATH=`cygpath -wp "$TOOL_PATH"`
    fi
  7. Before starting Hadoop for the first time, format the namenode by running:
  8. ./bin/hadoop namenode –format
  9.  Start Hadoop:
  10. bin/start-hadoop.sh
     

Stratosphere

The BioKepler and DDP suites include the binaries, libraries, and configuration files necessary to run a Stratosphere server. These files are located in $HOME/KeplerData/workflows/module/stratosphere-1.2.0/tools.

The following steps describe how to configure and start the Stratosphere server included in the bioKepler/DDP. If Stratosphere does not start, look at the log files in logs/.  

Linux and Mac

  1. Make sure all files in bin/ are executable.
  2. Start Stratosphere:
  3. bin/start-local.sh
 

Windows

  1. Set JAVA_HOME in bin/nephele-config.sh (line 52) to where java is installed on your computer.

  2. Set NEPHELE_JM_CLASSPATH, NEPHELE_CONF_DIR and log_setting in bin/nephele-jobmanager.sh (line 117 and line 124)

  3. NEPHELE_JM_CLASSPATH=`cygpath -wp $NEPHELE_JM_CLASSPATH`
    log_setting="-Dlog.file="$log" -Dlog4j.configuration="$NEPHELE_CONF_DIR"/log4j.properties"
  4. Start Stratosphere:
  5. bin/start-local.sh

Spark 

Linux and Mac

The assembly JAR necessary to start the Spark Master is not included with Kepler since it is over 100MB. To build this jar use the following steps:

  1. Download the source code for Spark 1.1.0.
  2. Extract the source.
  3. untar xzpf spark-1.1.0.tgz
    or
    tar -xvzf spark-1.1.0.tgz
  4. Build the assembly JAR.
  5. cd spark-1.1.0
    SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
    Note: If you have errors like "[error] Server access Error: Too many open files url" when running the command, it is probably due to network problems while downloading. Try to re-run the command.
  6. Copy the assembly JAR to KeplerData.
  7. mkdir -p $HOME/KeplerData/workflows/module/spark/tools/assembly/target/scala-2.10
    cp assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop2.2.0.jar $HOME/KeplerData/workflows/module/spark/tools/assembly/target/scala-2.10/
Document Actions