Course Name :- Spark Fundamentals I
Module 1 :- Introduction to Spark
Question 1 : What gives Spark its speed advantage for complex applications?
- Spark can cover a wide range of workloads under one system
- Various libraries provide Spark with additional functionality
- Spark extends the MapReduce model
- Spark makes extensive use of in-memory computations
- All of the above
Question 2 : For what purpose would an Engineer use Spark? Select all that apply.
- Analyzing data to obtain insights
- Programming with Spark’s API
- Transforming data into a useable form for analysis
- Developing a data processing system
- Tuning an application for a business use case
Question 3 : Which of the following statements are true of the Resilient Distributed Dataset (RDD)? Select all that apply.
- There are three types of RDD operations.
- RDDs allow Spark to reconstruct transformations
- RDDs only add a small amount of code due to tight integration
- RDD action operations do not return a value
- RDD is a distributed collection of elements parallelized across the cluster.
Module 2 :- Resilient Distributed Dataset and Dataframes
Question 1 : Which of the following methods can be used to create a Resilient Distributed Dataset (RDD)? Select all that apply.
- Creating a directed acyclic graph (DAG)
- Parallelizing an existing Spark collection
- Referencing a Hadoop-supported dataset
- Using data that resides in Spark
- Transforming an existing RDD to form a new one
Question 2 : What happens when an action is executed?
- The driver sends code to be executed on each block
- Executors prepare the data for operation in parallel
- A cache is created for storing partial results in memory
- Data is partitioned into different blocks across the cluster
- All of the above
Question 3 : Which of the following statements is true of RDD persistence? Select all that apply.
- Persistence through caching provides fault tolerance
- Future actions can be performed significantly faster
- Each partition is replicated on two cluster nodes
- RDD persistence always improves space efficiency
- By default, objects that are too big for memory are stored on the disk
Module 3 :- Spark Application Programming
Question 1 : What is SparkContext?
- A tool for linking to nodes
- A tool that provides fault tolerance
- A programming language for applications
- The built-in shell for the Spark engine
- An object that represents the connection to a Spark cluster
Question 2 : Which of the following methods can be used to pass functions to Spark? Select all that apply.
- Transformations and actions
- Passing by reference
- Static methods in a global singleton
- Import statements
- Anonymous function syntax
Question 3 : Which of the following is a main component of a Spark application’s source code?
- Import statements
- Business Logic
- SparkContext object
- Transformations and actions
- All of the above
Module 4 :- Introduction to the Spark Libraries
Question 1 : Which of the following is NOT an example of a Spark library?
- MLlib
- Hive
- Spark SQL
- GraphX
- Spark Streaming
Question 2 : From which of the following sources can Spark Streaming receive data? Select all that apply.
- Kafka
- JSON
- Parquet
- HDFS
- Hive
Question 3 : In Spark Streaming, processing begins immediately when an element of the application is executed. True or false?
- True
- False
Module 5 :- Spark Configuration , Monitoring and Turning
Question 1 : hich of the following is a main component of a Spark cluster? Select all that apply.
- Driver Program
- SparkContext
- Cluster Manager
- Worker node
- Cache
Question 2 : What are the main locations for Spark configuration? Select all that apply.
- The SparkConf object
- The Spark Shell
- Executor Processes
- Environment variables
- Logging properties
Question 3 : Which of the following techniques can improve Spark performance? Select all that apply.
- Scheduler Configuration
- Memory Tuning
- Data Serialization
- Using Broadcast variables
- Using nested structures
Spark Fundamentals I Cognitive class Final Exam Answers:-
Question 1 : Which of the following is a type of Spark RDD operation? Select all that apply.
- Parallelization
- Action
- Persistence
- Transformation
- Evaluation
Question 2 : Spark must be installed and run on top of a Hadoop cluster. True or false
- True
- False
Question 3 : following operations will work improperly when using a Combiner?
- Average
- Maximum
- Minimum
- Count
- All of the above operations will work properly
Question 4 : Spark supports which of the following libraries?
- Spark SQL
- MLlib
- GraphX
- Spark Streaming
- All of the above
Question 5 : Spark supports which of the following programming languages?
- Scala, Perl, Java
- Scala, Java, C++, Python, Perl
- Scala, Python, Java, R
- Java and Scala
- C++ and Python
Question 6 : A transformation is evaluated immediately. True or false?
- True
- False
Question 7 : Which storage level does the cache() function use?
- MEMORY_ONLY
- MEMORY_ONLY_SER
- MEMORY_AND_DISK
- MEMORY_AND_DISK_SER
Question 8 : Which of the following statements does NOT describe accumulators?
- They can only be added through an associative operation
- Programmers can extend them beyond numeric types
- They can only be read by the driver
- They are read-only
- They implement counters and sums
Question 9 : You must explicitly initialize the SparkContext when creating a Spark application. True or false?
- True
- False
Question 10 : The “local” parameter can be used to specify the number of cores to use for the application. True or false?
- True
- False
Question 11 : Spark applications can ONLY be packaged using one, specific build tool. True or false?
- True
- False
Question 12 : Which of the following parameters of the “spark-submit” script determine where the application will run?
- –master
- –conf
- –class
- –deploy-mode
- None of the above
Question 13 : Which of the following is NOT supported as a cluster manager?
- Mesos
- Spark
- YARN
- Helix
- All of the above are supported
Question 14 : Spark SQL allows relational queries to be expressed in which of the following?
- Scala, SQL, and HiveQL
- Scala and HiveQL
- Scala and SQL
- SQL only
- HiveQL only
Question 15: Spark Streaming processes live streaming data in real-time. True or false?
- True
- False
Question 16 : The MLlib library contains which of the following algorithms?
- Classification
- Regression
- Clustering
- Dimensionality Reduction
- All of the above
Question 17 : What is the purpose of the GraphX library?
- To create a visual representation of the data
- To generate data-parallel models
- To create a visual representation of a directed acyclic graph (DAG)
- To perform graph-parallel computations
- To convert from data-parallel to graph-parallel algorithms
Question 18 : Which list describes the correct order of precedence for Spark configuration, from highest to lowest?
- Flags passed to spark-submit, values in spark-defaults.conf, properties set on SparkConf
- Properties set on SparkConf, values in spark-defaults.conf, flags passed to spark-submit
- Values in spark-defaults.conf, properties set on SparkConf, flags passed to spark-submit
- Properties set on SparkConf, flags passed to spark-submit, values in spark-defaults.conf
- Values in spark-defaults.conf, flags passed to spark-submit, properties set on SparkConf
Question 19 : Spark monitoring can be performed with external tools. True or false?
- True
- False
Question 20 : Which serialization libraries are supported in Spark? Select all that apply.
- Apache Avro
- Java Serialization
- Protocol Buffers
- Kyro Serialization
- TPL