Spark application can be submitted in two different ways – client mode and cluster mode
Client Deploy Mode in Spark:
In client mode, the Spark driver component of the spark application will run on the machine from where the job submitted.client mode is majorly used for interactive and debugging purposes.Note that in client mode only the driver runs locally and all tasks run on cluster worker nodes.
The default deployment mode is client mode.
In client mode, a user session running spark-submit terminates , application also terminates with status fail.
Client mode is not used for Production jobs .This is used for testing purposes.
Driver logs are accessible from the local machine itself.
spark-submit --deploy-mode client --driver-memory xxxx ......
Cluster Deploy Mode in Spark:
In Cluster Deploy mode, the driver program would be launched on any one of the spark cluster nodes (on any of the available nodes in the cluster). Cluster deployment is mostly used for large data sets where the job takes few mins/hrs to complete.In the cluster mode, Spark driver get started in any of the worker machines So, the user who is submitting the application can submit the application and the user can go away after initiating the application or can continue with some other work. So, it works with the concept of Fire and Forgets.
In any case, if the job is going to run for a long period time and we don’t want to wait for the result then we can submit the job using cluster mode so once the job submitted client doesn’t need to be online.
spark driver runs on one of the worker node within the cluster, which reduces the data movement overhead between submitting machine and the cluster.
For the Cloudera cluster, you should use yarn commands to access driver logs.
It highly reduces the chance of job failure
spark-submit --deploy-mode cluster --driver-memory xxxx ........
No comments:
Post a Comment