Here are the steps to install Apache Mahout 1. Download latest package from http://mirror.nexcess.net/apache/mahout/ 2. Extract the package 3. Create a directory and put into HDFS <hadoop_directory> hdfs dfs -put /home/Hadoop/data/mydata.txt /mahout_data/ 4. Run clustering in mahout <mahout_directory>/bin/mahout seqdirectory -i hdfs://localhost:9000/mahout_data/ -o hdfs://localhost:9000/clustered_data/ 5. The output file will be in clustered_data directory