GE8151 PROBLEM SOLVING AND PYTHON PROGRAMMING : HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04(CS6712-GRID AND CLOUD LABORATORY-ANNA UNIVERSITY 2013 Regulation)

HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04

PREREQUISITES

* Java (version 1.6.0 or above) should be installed

[ If java is not installed you can try any of these methods to install java

Method 1:To install the openJDK JDK and JRE 8 use (replace 8 with the version you want, such as 7 or 6):

sudo apt-get install openjdk-8-jdk

Method 2:If you instead want to install the official Oracle JDK and JRE and definitely want to install through apt-get then do (you can replace the 8 with other versions such as 9, and 7):

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java8-installer

To automatically setup the Java 7 environment variables JAVA-HOME and PATH

Sudo apt-get install oracle-java7-set-default ]

* SSH should be installed and sshd must be running.

[ If ssh is not installed, you can run the following command to install it

sudo apt-get install openssh-server

check ssh using the following commands after installing

which ssh

output should be -/usr/bin/ssh

which sshd

output should be -/usr/sbin/sshd

HADOOP USER CREATION

user@node:~$ sudo addgroup hadoop

[sudo] password for user:

Adding group `hadoop' (GID 1001) ...

Done.

user@node:~$ sudo adduser --ingroup hadoop hdpuser

Adding user `hdpuser' ...

Adding new user `hdpuser' (1001) with group `hadoop' ...

Creating home directory `/home/hdpuser' ...

Copying files from `/etc/skel' ...

Enter new UNIX password:

Retype new UNIX password:

passwd: password updated successfully

Changing the user information for hdpuser

Enter the new value, or press ENTER for the default

Full Name []:

Room Number []:

Work Phone []:

Home Phone []:

Other []:

Is the information correct? [Y/n]

SWITCH TO SUPER USER TO ADD HADOOP USER TO SUDOERS GROUP

Switch to root user - su root

Add the hadoop user to sudoers list by additing the below entry in the file /etc/sudoers

hadpuser ALL=(ALL:ALL) ALL

(under # User privilege specification

root ALL=(ALL:ALL) ALL )

VERIFY JAVA INSTALLATION

Switch to hadoop user - su hadoop

hdpuser@node:~$ java -version

java version "1.7.0_80"

Java(TM) SE Runtime Environment (build 1.7.0_80-b15)

Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

hdpuser@node:~$ update-alternatives --config java

There are 2 choices for the alternative java (providing /usr/bin/java).

Selection Path Priority Status

------------------------------------------------------------

0 /usr/lib/jvm/java-7-oracle/jre/bin/java 1072 auto mode

1 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1071 manual mode

* 2 /usr/lib/jvm/java-7-oracle/jre/bin/java 1072 manual mode

Press enter to keep the current choice[*], or type selection number:

hdpuser@node:~$

UPDATE JAVA VARIABLES IN THE ~/.BASHRC FILE

Add the below entry in the ~/.bashrc file

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

export PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin

source the .bashrc file using the command

source .bashrc

VERIFY SSH INSTALLATION

hdpuser@node:~$ which ssh

/usr/bin/ssh

hdpuser@node:~$ which sshd

/usr/sbin/sshd

SSH KEY GENERATION

hdpuser@node:~$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hdpuser/.ssh/id_rsa):

Created directory '/home/hdpuser/.ssh'.

Your identification has been saved in /home/hdpuser/.ssh/id_rsa.

Your public key has been saved in /home/hdpuser/.ssh/id_rsa.pub.

The key fingerprint is:

da:4c:9a:89:bb:02:ac:7e:00:70:16:11:bc:fa:49:5e hdpuser@node

The key's randomart image is:

+--[ RSA 2048]----+

| .++ |

|. + |

|.o . |

|. . |

|o. S |

|oo. E. O |

|.=.o. = o |

|. =. . |

|....o. |

+-----------------+

hdpuser@node:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

DOWNLOADING AND INSTALLING HADOOP

[ Hadoop can be downloaded using the below link if you don't have the pac kage in your system wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz]

hdpuser@node:~$ cd /home/user/Documents/

hdpuser@node:/home/user/Documents$ sudo mv hadoop-2.6.0.tar.gz /usr/local/

[sudo] password for hdpuser:

hdpuser@node:/home/user/Documents$ cd /usr/local/

hdpuser@node:/usr/local$ sudo tar xvzf hadoop-2.6.0.tar.gz

hdpuser@node:/usr/local$ sudo chown -R hdpuser:hadoop hadoop-2.6.0

hdpuser@node:/usr/local$ sudo ln -s hadoop-2.6.0 hadoop

Add the below entry in the ~/.bashrc file and source the .bashrc file

export HADOOP_HOME=/usr/local/hadoop

hdpuser@node:/usr/local$ hadoop version

Hadoop 2.6.0

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1

Compiled by jenkins on 2014-11-13T21:10Z

Compiled with protoc 2.5.0

From source with checksum 18e43357c8f927c0695f1e9522859d6a

This command was run using /usr/local/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar

SETTING UP HADOOP ENVIRONMENT VARIABLES

— You can set Hadoop environment variables by appending the following commands to ~/.bashrc file.

— export JAVA_HOME=/usr/lib/jvm/java-7-oracle

— export HADOOP_HOME=/usr/local/hadoop

— export HADOOP_MAPRED_HOME=$HADOOP_HOME

— export HADOOP_COMMON_HOME=$HADOOP_HOME

— export HADOOP_HDFS_HOME=$HADOOP_HOME

— export YARN_HOME=$HADOOP_HOME

— export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

— export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

— export HADOOP_INSTALL=$HADOOP_HOME

— Now apply all the changes into the current running system.

$ source ~/.bashrc

HADOOP CONFIGURATION

— Next we need to configure some of the Hadoop files, namely:

— hadoop-env.sh

— core-site.xml

— hdfs-site.xml

— mapred-site.xml

These files are located in $HADOOP_HOME/etc/hadoop

— hadoop-env.sh

— In this file, add the following line to define the Java home

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

— mapred-site.xml

— This file may not be present with the same name. In that case we need to first copy this file from the template file

— cp mapred-site.xml.template mapred-site.xml

— Then add the following property within the configuration tabs

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.

</description>

</property>

— core-site.xml

— Add the following property in the configuration tabs

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.The uri's authority is used to determine the host, port, etc. for a filesystem.</description>

</property>

— hdfs-site.xml

— We need to create a couple of directories that would be used by the namenode and the datanode in the Hadoop cluster.

— $ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

— $ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode

$ sudo chown -R hdpuser:hadoop /usr/local/hadoop_store

— Next we add the following properties within the configuration tabs

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

</description>

</property>

<name>dfs.namenode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/namenode</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/datanode</value>

</property>

FORMATTING THE NAMENODE

— Once the Hadoop configuration is over, we need to format the Namenode.

— The Hadoop system can be formatted by the following command:

— hadoop namenode –format

— The Namenode should be successfully formatted before proceeding further.

START THE HADOOP DAEMONS

— Next we need to start the Hadoop Daemons which run as individual Java services.

— Hadoop provides a set of scripts to start and stop the Daemons.

— To start the DFS Daemons, issue the following command in the terminal:

— start-dfs.sh

— To start the Yarn Daemons, issue the following command in the terminal:

— start-yarn.sh

VERIFYING HADOOP INSTALLATION

— Hadoop installation can be verified by checking if all the Daemons are running successfully.

— Since all the Daemons are Java processes, issue the following command on the terminal:

— $ jps

— It should list the following processes:

— Namenode

— SecondaryNamenode

— Datanode

— NodeManager

— ResourceManager

HADOOP WEB INTERFACES

— Hadoop Namenode and ResourceManager can be monitored using the web interfaces.

— Usually used by Hadoop Administrators.

— For NameNode:

— http://localhost:50070

— For ResourceManger:

— http://localhost:8088

— For Secondary NameNode:

— http://localhost:50090

— For DataNode:

— http://localhost:50075

GE8151 PROBLEM SOLVING AND PYTHON PROGRAMMING

RABBITMQ

Saturday, July 23, 2016

HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04(CS6712-GRID AND CLOUD LABORATORY-ANNA UNIVERSITY 2013 Regulation)

HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04

PREREQUISITES

HADOOP USER CREATION

SWITCH TO SUPER USER TO ADD HADOOP USER TO SUDOERS GROUP

VERIFY JAVA INSTALLATION

UPDATE JAVA VARIABLES IN THE ~/.BASHRC FILE

VERIFY SSH INSTALLATION

SSH KEY GENERATION

DOWNLOADING AND INSTALLING HADOOP

SETTING UP HADOOP ENVIRONMENT VARIABLES

HADOOP CONFIGURATION

— hadoop-env.sh

— mapred-site.xml

— core-site.xml

— hdfs-site.xml

FORMATTING THE NAMENODE

START THE HADOOP DAEMONS

VERIFYING HADOOP INSTALLATION

HADOOP WEB INTERFACES

No comments:

Post a Comment