Saturday, 30 April 2016

Apache Hadoop installation on aws, Redhat linux

Apache Hadoop installation

prerequisite :
1. aws account.

Steps :

1) update the package on all instances.
cmd : sudo yum update

2) install wget
sudo yum install wget

install jdk 1.7.
ref :
1) oracle
2) redhat


3) sudo yum install java-1.7.0-openjdk-devel

a) now check java -version
cmd : java -version

b) know path of java
readlink -f $(which java)

c) set java path in linux .
go to
sudo vi /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.101-2.6.6.1.el7_2.x86_64/
export JRE_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.101-2.6.6.1.el7_2.x86_64/jre
export PATH=$PATH:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.101-2.6.6.1.el7_2.x86_64/bin:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.101-2.6.6.1.el7_2.x86_64/jre/bin

4) install hadoop (2.7)

ref : hadoop release
a)
cmd : sudo wget http://a.mbbsindia.com/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
b) extract file

cmd : sudo tar -xvf hadoop-2.7.2.tar.gz

c) move to local
cmd :  sudo mv hadoop-2.7.2 /usr/local/

5) configure environment varibale for hadoop in rhel
a) open profile to edit
cmd :  sudo vi /etc/profile

export JAVA_HOME=’path to jdk /java/jdk1.7.0_25′
export HADOOP_HOME=’‘/usr/local/hadoop-2.7.2’
export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH

b) Configure hadoop-env.sh and set the java path.

b.1) go to hadoop directory
/etc/hadoop/hadoop-env.sh

sudo vi /etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.101-2.6.6.1.el7_2.x86_64/

c) edit core-site.xml
sudo vi /etc/hadoop/core-site.xml

add property configuration

note

1) add the port 9000 in security group in aws
2) add CIDR block or a Security Group ID.

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://'your ip':9000</value>
</property>
</configuration>

d) edit mapred-site.xml.template

sudo vi mapred-site.xml.template

add the following property

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>


e) edit hdfs-site.xml

add the following property.

<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

6) edit nodenode.

goto bin/

cmd : hdfs namenode -format


7) start the hadoop cluster or
Start NameNode daemon and DataNode daemon
cmd :  sbin/start-dfs.sh


REF :

online resource :
1) HUE
2) Demo 

No comments:

Post a Comment