Saturday, July 23, 2016

HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04(CS6712-GRID AND CLOUD LABORATORY-ANNA UNIVERSITY 2013 Regulation)

Hadoop Installation

HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04


PREREQUISITES

* Java (version 1.6.0 or above) should be installed
[ If java is not installed you can try any of these methods to install java
Method 1:To install the openJDK JDK and JRE 8 use (replace 8 with the version you want, such as 7 or 6):
                          sudo apt-get install openjdk-8-jdk
Method 2:If you instead want to install the official Oracle JDK and JRE and definitely want to install   through apt-get then do (you can replace the 8 with other versions such as 9, and 7):
                          sudo add-apt-repository ppa:webupd8team/java
                          sudo apt-get update
                          sudo apt-get install oracle-java8-installer
To automatically setup the Java 7 environment variables JAVA-HOME and PATH
            Sudo apt-get install oracle-java7-set-default ]
* SSH should be installed and sshd must be running.
            [ If ssh is not installed, you can run the following command to install it
                                    sudo apt-get install openssh-server
               check ssh using the following commands after installing
                                    which ssh
                                    output should be -/usr/bin/ssh
                                    which sshd
                                    output should be -/usr/sbin/sshd

HADOOP USER CREATION


user@node:~$ sudo addgroup hadoop
[sudo] password for user:
Adding group `hadoop' (GID 1001) ...
Done.
user@node:~$ sudo adduser --ingroup hadoop hdpuser
Adding user `hdpuser' ...
Adding new user `hdpuser' (1001) with group `hadoop' ...
Creating home directory `/home/hdpuser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hdpuser
Enter the new value, or press ENTER for the default
            Full Name []:
            Room Number []:
            Work Phone []:
            Home Phone []:
            Other []:
Is the information correct? [Y/n]

SWITCH TO SUPER USER TO ADD  HADOOP USER TO SUDOERS GROUP


Switch to root user                 -       su root
Add the hadoop user to sudoers list by additing the below entry in the file /etc/sudoers
            hadpuser ALL=(ALL:ALL) ALL
                        (under      # User privilege specification
                                                  root    ALL=(ALL:ALL) ALL )

VERIFY JAVA INSTALLATION

Switch to hadoop user              -      su hadoop

hdpuser@node:~$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

hdpuser@node:~$ update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                            Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-7-oracle/jre/bin/java          1072      auto mode
  1            /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java   1071      manual mode
* 2            /usr/lib/jvm/java-7-oracle/jre/bin/java          1072      manual mode

Press enter to keep the current choice[*], or type selection number: 
hdpuser@node:~$

UPDATE JAVA VARIABLES IN THE ~/.BASHRC FILE

Add the below entry in the ~/.bashrc file

export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin

source the .bashrc file using the command
source .bashrc

VERIFY SSH INSTALLATION

hdpuser@node:~$ which ssh
/usr/bin/ssh
hdpuser@node:~$ which sshd
/usr/sbin/sshd


SSH KEY GENERATION

hdpuser@node:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hdpuser/.ssh/id_rsa):
Created directory '/home/hdpuser/.ssh'.
Your identification has been saved in /home/hdpuser/.ssh/id_rsa.
Your public key has been saved in /home/hdpuser/.ssh/id_rsa.pub.
The key fingerprint is:
da:4c:9a:89:bb:02:ac:7e:00:70:16:11:bc:fa:49:5e hdpuser@node
The key's randomart image is:
+--[ RSA 2048]----+
| .++             |
|. +              |
|.o .             |
|. .              |
|o.      S        |
|oo. E. O         |
|.=.o. = o        |
|. =. .           |
|....o.           |
+-----------------+

hdpuser@node:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

DOWNLOADING AND INSTALLING HADOOP


[ Hadoop can be downloaded using the below link if you don't have the pac kage in your system    wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz]

hdpuser@node:~$ cd /home/user/Documents/
hdpuser@node:/home/user/Documents$ sudo mv hadoop-2.6.0.tar.gz /usr/local/
[sudo] password for hdpuser:
hdpuser@node:/home/user/Documents$ cd /usr/local/
hdpuser@node:/usr/local$ sudo tar xvzf hadoop-2.6.0.tar.gz
hdpuser@node:/usr/local$ sudo chown -R hdpuser:hadoop hadoop-2.6.0
hdpuser@node:/usr/local$ sudo ln -s hadoop-2.6.0 hadoop

Add the below entry in the ~/.bashrc file and source the .bashrc file
export HADOOP_HOME=/usr/local/hadoop

hdpuser@node:/usr/local$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/local/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar

SETTING UP HADOOP ENVIRONMENT VARIABLES


   You can set Hadoop environment variables by appending the following commands to ~/.bashrc file.
        export JAVA_HOME=/usr/lib/jvm/java-7-oracle
        export HADOOP_HOME=/usr/local/hadoop
        export HADOOP_MAPRED_HOME=$HADOOP_HOME
        export HADOOP_COMMON_HOME=$HADOOP_HOME
        export HADOOP_HDFS_HOME=$HADOOP_HOME
        export YARN_HOME=$HADOOP_HOME
        export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
        export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
        export HADOOP_INSTALL=$HADOOP_HOME
  Now apply all the changes into the current running system.
            $ source ~/.bashrc

HADOOP CONFIGURATION

        Next we need to configure some of the Hadoop files, namely:
                                            hadoop-env.sh
                                            core-site.xml
                                            hdfs-site.xml
                                            mapred-site.xml
These files are located in $HADOOP_HOME/etc/hadoop

        hadoop-env.sh

                                In this file, add the following line to define the Java home
                                                export JAVA_HOME=/usr/lib/jvm/java-7-oracle

        mapred-site.xml

                                This file may not be present with the same name. In that case we need to first copy this file from the template file
                                cp mapred-site.xml.template mapred-site.xml
                                Then add the following property within the configuration tabs
            <property>
             <name>mapred.job.tracker</name>
            <value>localhost:54311</value>
            <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
             </description>
            </property>


        core-site.xml

                    Add the following property in the configuration tabs
                        <property>
                          <name>fs.default.name</name>
                        <value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose scheme and  authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
                        </property>


        hdfs-site.xml

                                We need to create a couple of directories that would be used by the namenode                              and the datanode in the Hadoop cluster.
                                $ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
                                $ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
                                    $ sudo chown -R hdpuser:hadoop /usr/local/hadoop_store

        Next we add the following properties within the configuration tabs
                        <property>
                         <name>dfs.replication</name>
                         <value>1</value>
                         <description>Default block replication.
                         The actual number of replications can be specified when the file is created.                                    The default is used if replication is not specified in create time.
                         </description>
                        </property>
                        <property>
                          <name>dfs.namenode.name.dir</name>
                          <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
                        </property>
                        <property>
                         <name>dfs.datanode.data.dir</name>
                         <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
                        </property>

FORMATTING THE NAMENODE

        Once the Hadoop configuration is over, we need to format the Namenode.
                    The Hadoop system can be formatted by the following command:
                                hadoop namenode –format
                    The Namenode should be successfully formatted before proceeding further.


START THE HADOOP DAEMONS

        Next we need to start the Hadoop Daemons which run as individual Java services.
        Hadoop provides a set of scripts to start and stop the Daemons.
        To start the DFS Daemons, issue the following command in the terminal:
                                start-dfs.sh
        To start the Yarn Daemons, issue the following command in the terminal:
                                start-yarn.sh


VERIFYING HADOOP INSTALLATION

        Hadoop installation can be verified by checking if all the Daemons are running successfully.
        Since all the Daemons are Java processes, issue the following command on the terminal:
                                $ jps
        It should list the following processes:
                                Namenode
                                SecondaryNamenode
                                Datanode
                                NodeManager
                                ResourceManager

HADOOP WEB INTERFACES

        Hadoop Namenode and ResourceManager can be monitored using the web interfaces.
        Usually used by Hadoop Administrators.
                                For NameNode:
                                            http://localhost:50070
                                For ResourceManger:
                                            http://localhost:8088
                                For Secondary NameNode:
                                            http://localhost:50090
                                For DataNode:
                                            http://localhost:50075









No comments:

Post a Comment