Spark Interview Questions for Intermediates
1. How to programmatically specify a schema for DataFrame? DataFrame can be created programmatically with three steps: Create an RDD of Rows from the original RDD; Create the schema represented by a StructType matching the structure of Rows in the RDD created in Step 1. Apply the schema to the RDD of Rows via createDataFrame method provided by SparkSession . 2. Which transformation returns a new DStream by selecting only those records of the source DStream for which the function returns true? 1. map(func) 2. transform(func) 3. filter(func) 4. count() The correct answer is c) filter(func) . 3. Does Apache Spark provide checkpoints? Yes, Apache Spark provides an API for adding and managing checkpoints. Checkpointing is the process of making streaming applications resilient to failures. It allows you to save the data and metadata into a checkpointing directory. In case of a failure, the spark can recover this data and start from wherever it has stopped. ...