HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04
PREREQUISITES
* Java (version 1.6.0 or above)
should be installed
[
If java is not installed you can try any of these methods to install java
Method 1:To install the
openJDK JDK and JRE 8 use (replace 8 with the version you want, such as 7 or
6):
sudo
apt-get install openjdk-8-jdk
Method 2:If you instead
want to install the official Oracle JDK and JRE and definitely want to install through apt-get then do (you can replace the 8
with other versions such as 9, and 7):
sudo
add-apt-repository ppa:webupd8team/java
sudo
apt-get update
sudo
apt-get install oracle-java8-installer
To
automatically setup the Java 7 environment variables JAVA-HOME and PATH
Sudo apt-get install
oracle-java7-set-default ]
* SSH should be installed and
sshd must be running.
[
If ssh is not installed, you can run the following command to install it
sudo apt-get install openssh-server
check ssh using the following commands after
installing
which ssh
output
should be -/usr/bin/ssh
which sshd
output
should be -/usr/sbin/sshd
HADOOP USER CREATION
user@node:~$ sudo addgroup hadoop
[sudo] password for user:
Adding group `hadoop' (GID 1001)
...
Done.
user@node:~$ sudo adduser
--ingroup hadoop hdpuser
Adding user `hdpuser' ...
Adding new user `hdpuser' (1001)
with group `hadoop' ...
Creating home directory
`/home/hdpuser' ...
Copying files from `/etc/skel'
...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated
successfully
Changing the user information for
hdpuser
Enter the new value, or press
ENTER for the default
Full
Name []:
Room
Number []:
Work
Phone []:
Home
Phone []:
Other
[]:
Is the information correct? [Y/n]
SWITCH TO SUPER USER TO ADD HADOOP USER TO SUDOERS GROUP
Switch to root user - su root
Add the hadoop user to sudoers
list by additing the below entry in the file /etc/sudoers
hadpuser
ALL=(ALL:ALL) ALL
(under # User privilege specification
root ALL=(ALL:ALL)
ALL )
VERIFY JAVA INSTALLATION
Switch to hadoop user - su hadoop
hdpuser@node:~$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment
(build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM
(build 24.80-b11, mixed mode)
hdpuser@node:~$
update-alternatives --config java
There are 2 choices for the
alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0
/usr/lib/jvm/java-7-oracle/jre/bin/java 1072 auto mode
1
/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1071
manual mode
* 2
/usr/lib/jvm/java-7-oracle/jre/bin/java 1072 manual mode
Press enter to keep the current
choice[*], or type selection number:
hdpuser@node:~$
UPDATE JAVA VARIABLES IN THE ~/.BASHRC FILE
Add the below entry in the
~/.bashrc file
export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
export
PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin
source the .bashrc file using the
command
source .bashrc
VERIFY SSH INSTALLATION
hdpuser@node:~$ which ssh
/usr/bin/ssh
hdpuser@node:~$ which sshd
/usr/sbin/sshd
SSH KEY GENERATION
hdpuser@node:~$ ssh-keygen -t rsa
-P ""
Generating public/private rsa key
pair.
Enter file in which to save the
key (/home/hdpuser/.ssh/id_rsa):
Created directory
'/home/hdpuser/.ssh'.
Your identification has been
saved in /home/hdpuser/.ssh/id_rsa.
Your public key has been saved in
/home/hdpuser/.ssh/id_rsa.pub.
The key fingerprint is:
da:4c:9a:89:bb:02:ac:7e:00:70:16:11:bc:fa:49:5e
hdpuser@node
The key's randomart image is:
+--[ RSA 2048]----+
| .++ |
|. + |
|.o . |
|. . |
|o. S
|
|oo. E. O |
|.=.o. = o |
|. =. . |
|....o. |
+-----------------+
hdpuser@node:~$ cat
~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
DOWNLOADING AND INSTALLING HADOOP
[ Hadoop can be downloaded using
the below link if you don't have the pac kage in your system wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz]
hdpuser@node:~$ cd
/home/user/Documents/
hdpuser@node:/home/user/Documents$
sudo mv hadoop-2.6.0.tar.gz /usr/local/
[sudo] password for hdpuser:
hdpuser@node:/home/user/Documents$
cd /usr/local/
hdpuser@node:/usr/local$ sudo tar
xvzf hadoop-2.6.0.tar.gz
hdpuser@node:/usr/local$ sudo
chown -R hdpuser:hadoop hadoop-2.6.0
hdpuser@node:/usr/local$ sudo ln
-s hadoop-2.6.0 hadoop
Add the below entry in the
~/.bashrc file and source the .bashrc file
export
HADOOP_HOME=/usr/local/hadoop
hdpuser@node:/usr/local$ hadoop
version
Hadoop 2.6.0
Subversion
https://git-wip-us.apache.org/repos/asf/hadoop.git -r
e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on
2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum
18e43357c8f927c0695f1e9522859d6a
This command was run using
/usr/local/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar
SETTING UP HADOOP ENVIRONMENT VARIABLES
— You can set Hadoop environment variables by
appending the following commands to ~/.bashrc file.
— export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
— export
HADOOP_HOME=/usr/local/hadoop
— export
HADOOP_MAPRED_HOME=$HADOOP_HOME
— export
HADOOP_COMMON_HOME=$HADOOP_HOME
— export
HADOOP_HDFS_HOME=$HADOOP_HOME
— export
YARN_HOME=$HADOOP_HOME
— export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
— export
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
— export
HADOOP_INSTALL=$HADOOP_HOME
—
Now apply all the changes into the current running system.
$
source ~/.bashrc
HADOOP CONFIGURATION
— Next
we need to configure some of the Hadoop files, namely:
— hadoop-env.sh
— core-site.xml
— hdfs-site.xml
— mapred-site.xml
These files are located in
$HADOOP_HOME/etc/hadoop
— hadoop-env.sh
— In this file, add the following line to
define the Java home
export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
— mapred-site.xml
— This file may not be present with the same name. In that case
we need to first copy this file from the template file
— cp mapred-site.xml.template mapred-site.xml
— Then add the following property within the configuration tabs
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The
host and port that the MapReduce job tracker runs at. If "local",
then jobs are run in-process as a single map and reduce task.
</description>
</property>
— core-site.xml
— Add the following property in the
configuration tabs
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the
default file system. A URI whose scheme and
authority determine the FileSystem implementation. The uri's scheme
determines the config property (fs.SCHEME.impl) naming the FileSystem
implementation class.The uri's authority is used to determine the host, port,
etc. for a filesystem.</description>
</property>
— hdfs-site.xml
— We need to create a couple of
directories that would be used by the namenode and the datanode in the Hadoop
cluster.
— $ sudo mkdir -p
/usr/local/hadoop_store/hdfs/namenode
— $ sudo mkdir -p
/usr/local/hadoop_store/hdfs/datanode
$
sudo chown -R hdpuser:hadoop /usr/local/hadoop_store
— Next
we add the following properties within the configuration tabs
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be
specified when the file is created. The
default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
FORMATTING THE NAMENODE
— Once
the Hadoop configuration is over, we need to format the Namenode.
— The Hadoop system can be formatted by
the following command:
— hadoop namenode –format
— The Namenode should be successfully
formatted before proceeding further.
START THE HADOOP DAEMONS
— Next
we need to start the Hadoop Daemons which run as individual Java services.
— Hadoop
provides a set of scripts to start and stop the Daemons.
— To
start the DFS Daemons, issue the following command in the terminal:
— start-dfs.sh
— To
start the Yarn Daemons, issue the following command in the terminal:
— start-yarn.sh
VERIFYING HADOOP INSTALLATION
— Hadoop
installation can be verified by checking if all the Daemons are running
successfully.
— Since
all the Daemons are Java processes, issue the following command on the
terminal:
— $ jps
— It
should list the following processes:
— Namenode
— SecondaryNamenode
— Datanode
— NodeManager
— ResourceManager
HADOOP WEB INTERFACES
— Hadoop
Namenode and ResourceManager can be monitored using the web interfaces.
— Usually
used by Hadoop Administrators.
— For NameNode:
— For ResourceManger:
— For Secondary NameNode:
— For DataNode:
No comments:
Post a Comment