最近项目调查闲下来了,有时间让大脑来整体之前做过的调查,为了便于基于存档,遂将其记录下来,希望对后来者有所帮助。
背景
为了了解Hbase集群下数据的查询以及安全性方面的性能数据,需要搭建HBase集群,简单测试。
角色分类
+--------------+-------------+---------------------+-----------------+ | mashine | Hadoop | zookeeper | Hbase | +--------------+-------------+---------------------+-----------------+ | sv004 | Master | leader | HMaster | +--------------+-------------+---------------------+-----------------+ | sv001 | Slave1 | follower | HRegionserver | +--------------+-------------+---------------------+-----------------+ | sv002 | Slave2 | follower | HRegionserver | +--------------+-------------+---------------------+-----------------+ | sv003 | Slave3 | follower | HRegionserver | +--------------+-------------+---------------------+-----------------+因为本次测试的目标是RegionServer发生故障的情况下,region移动对查询的性能影响,为了测试的简单化,所以本次搭建的集群只有一个HMaster,缺点就是一旦HMaster发生故障的话,整个环境就无法使用了,需要重新全部启动。一般为了避免这种情况,都是建议至少2个HMaster,一个是active状态的,一个是standby状态的。
虚拟机list对比如下:
172.28.157.1 sv001 172.28.157.2 sv002 172.28.157.3 sv003 172.28.157.4 sv004
Hadoop YARN集群搭建
本次使用的物件是hadoop-2.5.2.bin.gz
操作流程
- 下载hadoop-2.5.2.bin.gz,详细参照hadoop官网
- 解压
tar -zxvf hadoop-2.5.2.bin.gz
- conf文件配置
- hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67根据自己虚拟机的实际修改
2. core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://sv004:9000</value> </property> </configuration>
3. hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/project/hadoop-2.5.2/hdfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/project/hadoop-2.5.2/hdfs/data</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
4. mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>Yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>sv004:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>sv004:19888</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/home/project/hadoop-2.5.2/tmp</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/home/project/hadoop-2.5.2/done</value> </property> </configuration>
5. yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>Yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>sv004</value> </property> <property> <name>Yarn.resourcemanager.address</name> <value>sv004:18040</value> </property> <property> <name>Yarn.resourcemanager.scheduler.address</name> <value>sv004:18030</value> </property> <property> <name>Yarn.resourcemanager.resource-tracker.address</name> <value>sv004:18025</value> </property> <property> <name>Yarn.resourcemanager.admin.address</name> <value>sv004:18041</value> </property> <property> <name>Yarn.resourcemanager.webapp.address</name> <value>sv004:8088</value> </property> <property> <name>Yarn.nodemanager.local-dirs</name> <value>/home/project/hadoop-2.5.2/mynode/my</value> </property> <property> <name>Yarn.nodemanager.log-dirs</name> <value>/home/project/hadoop-2.5.2/mynode/logs</value> </property> <property> <name>Yarn.nodemanager.log.retain-seconds</name> <value>10800</value> </property> <property> <name>Yarn.nodemanager.remote-app-log-dir</name> <value>/logs</value> </property> <property> <name>Yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <name>Yarn.log-aggregation.retain-seconds</name> <value>-1</value> </property> <property> <name>Yarn.log-aggregation.retain-check-interval-seconds</name> <value>-1</value> </property> </configuration>
6. slaves
sv001 sv002 sv003
- SSH无密码配置
无密码登录,这个此处就不记述了,网上的帖子比较多。
追加如下信息:
172.28.157.1 sv001 172.28.157.2 sv002 172.28.157.3 sv003 172.28.157.4 sv004
启动流程
- $HADOOP_HOME/bin/hadoop namenode -format
- $HADOOP_HOME/sbin/start-all.sh
- $HADOOP_HOME/bin/hadoop dfsadmin -report (状态确认)
停止
- $HADOOP_HOME/sbin/stop-all.sh