1. Set up ssh
Install ssh related software packages:
Copy the code as follows: sudo apt-get install openssh-client openssh-server
Then start/shut down sshd using one of the following two commands:
Copy the code as follows: sudo /etc/init.d/ssh start|stop
sudo service ssh start|stop
If sshd is successfully started, we can see results similar to the following:
Copy the code as follows: $ ps -e | grep ssh
2766 ? 00:00:00 ssh-agent
10558 ? 00:00:00 sshd
At this time, if you run the following ssh command to log in to the machine, you will be prompted to enter a password:
Copy the code as follows: ssh localhost
Now all we have to do is make it require no password:
Copy the code as follows: $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa #An empty password SSH key
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
That's it. If it still doesn't work, the permissions on the key file may be set incorrectly.
2. Configure hadoop
Unzip hadoop-1.2.1 to ~/, and create the directory hadoop-env under ~/. Continue to create the following directory structure under hadoop-env:
├── dfs
│ ├── checkpoint1
│ ├── data1
│ ├── data2
│ └── name1
└── test
└── input
Configuration file hadoop-1.2.1/conf/core-site.xml:
Copy the code code as follows:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<final>true</final>
</property>
</configuration>
fs.default.name specifies the HDFS uri. If no port is provided in value, the default is 8020.
Configuration file hadoop-1.2.1/conf/hdfs-site.xml:
Copy the code code as follows:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>~/hadoop-env/dfs/name1</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>~/hadoop-env/dfs/data1,~/hadoop-env/dfs/data2</value>
<final>true</final>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>~/hadoop-env/dfs/checkpoint1</value>
<final>true</final>
</property>
</configuration>
dfs.name.dir specifies the directory where the namenode stores metadata. Multiple directories can be specified. These directories must be separated by commas; dfs.data.dir specifies the directory where the datanode stores data. Multiple directories can be specified; fs.checkpoint.dir specifies The directory where the auxiliary namenode stores checkpoints.
Configuration file hadoop-1.2.1/conf/mapred-site.xml:
Copy the code code as follows:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<final>true</final>
</property>
</configuration>
3. Test
First format HDFS:
Copy the code as follows: ./hadoop-1.2.1/bin/hadoop namenode -format
Start HDFS and MapReduce processes:
Copy the code code as follows:
$ ./hadoop-1.2.1/bin/start-dfs.sh
$ ./hadoop-1.2.1/bin/start-mapred.sh
If an error is reported during startup, such as localhost: Error: JAVA_HOME is not set., you need to export JAVA_HOME in the ./hadoop-1.2.1/conf/hadoop-env.sh file, for example:
export JAVA_HOME=~/jdk1.7.0_25
How to check whether it started successfully: The first method is to use the jps command (Java Virtual Machine Process Status Tool). There should be output similar to the following:
Copy the code code as follows:
$ jps
13592 DataNode
13728 SecondaryNameNode
13837 JobTracker
12864 NameNode
13955 TaskTracker
16069Jps
The second method is to log in to the browser at http://localhost:50030 to view the jobtracker and http://localhost:50070 to view the namenode. If you are using circumvention software, you may get an error when accessing. The simplest solution is to turn off the circumvention software. Another way is to view the log file.
Now we create two files in the ~/hadoop-env/test/input directory:
Copy the code code as follows:
$ echo "hello world" > test1.txt
$ echo "hi,world" > test2.txt
Import these two files into HDFS:
Copy the code code as follows:
./hadoop-1.2.1/bin/hadoop dfs -put hadoop-env/test/input/ output/
Check:
Copy the code code as follows:
$ ./hadoop-1.2.1/bin/hadoop dfs -ls /
Found 2 items
drwxr-xr-x - user supergroup 0 2013-10-22 22:07 /test
drwxr-xr-x - user supergroup 0 2013-10-22 21:58 /tmp
$ ./hadoop-1.2.1/bin/hadoop dfs -ls /test
Found 2 items
-rw-r--r-- 3 user supergroup 12 2013-10-22 22:07 /test/test1.txt
-rw-r--r-- 3 user supergroup 9 2013-10-22 22:07 /test/test2.txt
OK, the configuration is complete.
Note: The system used in this article is linux mint 15 64bit, and hadoop uses version 1.2.1.