Example of configuring pseudo-distribution in Hadoop1.2

Author：Eve Cole Update Time：2025-02-04 00:12:02

1. Set up ssh

Install ssh related software packages:

Copy the code as follows: sudo apt-get install openssh-client openssh-server

Then start/shut down sshd using one of the following two commands:

Copy the code as follows: sudo /etc/init.d/ssh start|stop

sudo service ssh start|stop

If sshd is successfully started, we can see results similar to the following:

Copy the code as follows: $ ps -e | grep ssh

2766 ? 00:00:00 ssh-agent

10558 ? 00:00:00 sshd

At this time, if you run the following ssh command to log in to the machine, you will be prompted to enter a password:

Copy the code as follows: ssh localhost

Now all we have to do is make it require no password:

Copy the code as follows: $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa #An empty password SSH key

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

That's it. If it still doesn't work, the permissions on the key file may be set incorrectly.

2. Configure hadoop

Unzip hadoop-1.2.1 to ~/, and create the directory hadoop-env under ~/. Continue to create the following directory structure under hadoop-env:

├── dfs

│ ├── checkpoint1

│ ├── data1

│ ├── data2

│ └── name1

└── test

└── input

Configuration file hadoop-1.2.1/conf/core-site.xml:

Copy the code code as follows:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

fs.default.name specifies the HDFS uri. If no port is provided in value, the default is 8020.

Configuration file hadoop-1.2.1/conf/hdfs-site.xml:

Copy the code code as follows:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<value>~/hadoop-env/dfs/name1</value>

</property>

<value>~/hadoop-env/dfs/data1,~/hadoop-env/dfs/data2</value>

</property>

<name>fs.checkpoint.dir</name>

<value>~/hadoop-env/dfs/checkpoint1</value>

</property>

</configuration>

dfs.name.dir specifies the directory where the namenode stores metadata. Multiple directories can be specified. These directories must be separated by commas; dfs.data.dir specifies the directory where the datanode stores data. Multiple directories can be specified; fs.checkpoint.dir specifies The directory where the auxiliary namenode stores checkpoints.

Configuration file hadoop-1.2.1/conf/mapred-site.xml:

Copy the code code as follows:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

3. Test

First format HDFS:

Copy the code as follows: ./hadoop-1.2.1/bin/hadoop namenode -format

Start HDFS and MapReduce processes:

Copy the code code as follows:

$ ./hadoop-1.2.1/bin/start-dfs.sh

$ ./hadoop-1.2.1/bin/start-mapred.sh

If an error is reported during startup, such as localhost: Error: JAVA_HOME is not set., you need to export JAVA_HOME in the ./hadoop-1.2.1/conf/hadoop-env.sh file, for example:

export JAVA_HOME=~/jdk1.7.0_25

How to check whether it started successfully: The first method is to use the jps command (Java Virtual Machine Process Status Tool). There should be output similar to the following:

Copy the code code as follows:

$ jps

13592 DataNode

13728 SecondaryNameNode

13837 JobTracker

12864 NameNode

13955 TaskTracker

16069Jps

The second method is to log in to the browser at http://localhost:50030 to view the jobtracker and http://localhost:50070 to view the namenode. If you are using circumvention software, you may get an error when accessing. The simplest solution is to turn off the circumvention software. Another way is to view the log file.

Now we create two files in the ~/hadoop-env/test/input directory:

Copy the code code as follows:

$ echo "hello world" > test1.txt

$ echo "hi,world" > test2.txt

Import these two files into HDFS:

Copy the code code as follows:

./hadoop-1.2.1/bin/hadoop dfs -put hadoop-env/test/input/ output/

Check:

Copy the code code as follows:

$ ./hadoop-1.2.1/bin/hadoop dfs -ls /

Found 2 items

drwxr-xr-x - user supergroup 0 2013-10-22 22:07 /test

drwxr-xr-x - user supergroup 0 2013-10-22 21:58 /tmp

$ ./hadoop-1.2.1/bin/hadoop dfs -ls /test

Found 2 items

-rw-r--r-- 3 user supergroup 12 2013-10-22 22:07 /test/test1.txt

-rw-r--r-- 3 user supergroup 9 2013-10-22 22:07 /test/test2.txt

OK, the configuration is complete.

Note: The system used in this article is linux mint 15 64bit, and hadoop uses version 1.2.1.