Running Apache Spark standalone cluster in one machine

Posted by ChenRiang on August 25, 2021

Apache Spark is a complex framework that provide parallelized in-memory data processing. Often time during development, we just want to focus on creating the business without having to worried about the infrastructure. In this blog post, we will look into how we can run Apache Spark cluster in single machine.

  1. Download Apache Spark from here and unzip it.

  2. In Spark directory, run following command to start Spark Master.

    1
    
    ./sbin/start-master.sh
    
  3. Start Spark Worker.

    1
    
    ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://`hostname`:7077
    
  4. Submit spark job

    1
    2
    3
    4
    
    ./bin/spark-submit \
       --class <your main class> \
       --master spark://`hostname`:7077 \
       <your jar file> 
    
  5. To stop Spark worker simply press Ctrl-C

  6. To stop Spark master , run the following command

    1
    
    ./sbin/stop-master.sh