Hello data scientists,
This is a quick installation guide to install the Apache Spark on your local machine. I found the documentation on the website little confusing.
1. Download the Apache Spark tar file from the http://spark.apache.org/downloads.html. [Choose any version you would like from the dropdown menu. I recommend anything 1.3.1 or above]
2. Unzip the file into your home directory.
3. Open your terminal and go to the spark directory by doing cd spark-1.3.1 [Assuming you are in your home directory]
4. Now, simply run
build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
5. It takes at least 10 minutes to complete the whole build.
6. After the build’s completed. It should look something like the following:
7. Now run
./bin/run-example SparkPi 10
8. You should see something like this:
As you can see here, it says the Job has been finished which means you have successfully made it running 🙂
Note: I am assuming you have Java installed properly on your machine. This is very important.