paulwong

Install Hadoop in the AWS cloud

  1. get the Whirr tar file
    wget http://www.eu.apache.org/dist/whirr/stable/whirr-0.8.2.tar.gz
  2. untar the Whirr tar file
    tar -vxf whirr-0.8.2.tar.gz
  3. create credentials file
    mkdir ~/.whirr
    cp conf/credentials.sample ~/.whirr/credentials
  4. add the following content to credentials file
    # Set cloud provider connection details
    PROVIDER=aws-ec2
    IDENTITY=<AWS Access Key ID>
    CREDENTIAL=<AWS Secret Access Key>
  5. generate a rsa key pair
    ssh-keygen -t rsa -P ''
  6. create a hadoop.properties file and add the following content
    whirr.cluster-name=whirrhadoopcluster
    whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,2 hadoop-datanode+hadoop-tasktracker
    whirr.provider=aws-ec2
    whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
    whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
    whirr.hadoop.version=1.0.2
    whirr.aws-ec2-spot-price=0.08
  7. launch hadoop
    bin/whirr launch-cluster --config hadoop.properties
  8. launch proxy
    cd ~/.whirr/whirrhadoopcluster/
    ./hadoop-proxy.sh
  9. add a rule to iptables
    0.0.0.0/0 50030
    0.0.0.0/0 50070
  10. check the web ui in the browser
    http://<aws-public-dns>:50030
  11. add to /etc/profile
    export HADOOP_CONF_DIR=~/.whirr/whirrhadoopcluster/
  12. check if the hadoop works
    hadoop fs -ls /

















posted on 2013-09-08 13:45 paulwong 阅读(394) 评论(0)  编辑  收藏 所属分类: HADOOPAWS


只有注册用户登录后才能发表评论。


网站导航: