Saturday , July 27 2024

Spark Fundamentals I Cognitive class Exam Answers:-

Course Name :- Spark Fundamentals I

Module 1 :- Introduction to  Spark

Question 1 : What gives Spark its speed advantage for complex applications?

  • Spark can cover a wide range of workloads under one system
  • Various libraries provide Spark with additional functionality
  • Spark extends the MapReduce model
  • Spark makes extensive use of in-memory computations
  • All of the above

Question 2 : For what purpose would an Engineer use Spark? Select all that apply.

  • Analyzing data to obtain insights
  • Programming with Spark’s API
  • Transforming data into a useable form for analysis
  • Developing a data processing system
  • Tuning an application for a business use case

Question 3 : Which of the following statements are true of the Resilient Distributed Dataset (RDD)? Select all that apply.

  • There are three types of RDD operations.
  • RDDs allow Spark to reconstruct transformations
  • RDDs only add a small amount of code due to tight integration
  • RDD action operations do not return a value
  • RDD is a distributed collection of elements parallelized across the cluster.

Module 2 :- Resilient Distributed Dataset and Dataframes 

Question 1 : Which of the following methods can be used to create a Resilient Distributed Dataset (RDD)? Select all that apply.

  • Creating a directed acyclic graph (DAG)
  • Parallelizing an existing Spark collection
  • Referencing a Hadoop-supported dataset
  • Using data that resides in Spark
  • Transforming an existing RDD to form a new one

Question 2 : What happens when an action is executed?

  • The driver sends code to be executed on each block
  • Executors prepare the data for operation in parallel
  • A cache is created for storing partial results in memory
  • Data is partitioned into different blocks across the cluster
  • All of the above

Question 3 : Which of the following statements is true of RDD persistence? Select all that apply.

  • Persistence through caching provides fault tolerance
  • Future actions can be performed significantly faster
  • Each partition is replicated on two cluster nodes
  • RDD persistence always improves space efficiency
  • By default, objects that are too big for memory are stored on the disk

Module 3 :- Spark Application Programming

Question 1 : What is SparkContext?

  • A tool for linking to nodes
  • A tool that provides fault tolerance
  • A programming language for applications
  • The built-in shell for the Spark engine
  • An object that represents the connection to a Spark cluster

Question 2 : Which of the following methods can be used to pass functions to Spark? Select all that apply.

  • Transformations and actions
  • Passing by reference
  • Static methods in a global singleton
  • Import statements
  • Anonymous function syntax

Question 3 : Which of the following is a main component of a Spark application’s source code?

  • Import statements
  • Business Logic
  • SparkContext object
  • Transformations and actions
  • All of the above

Module 4 :- Introduction to the  Spark Libraries

Question 1 : Which of the following is NOT an example of a Spark library?

  • MLlib
  • Hive
  • Spark SQL
  • GraphX
  • Spark Streaming

Question 2 : From which of the following sources can Spark Streaming receive data? Select all that apply.

  • Kafka
  • JSON
  • Parquet
  • HDFS
  • Hive

Question 3 : In Spark Streaming, processing begins immediately when an element of the application is executed. True or false?

  • True
  • False

Module 5 :- Spark Configuration , Monitoring and Turning

Question 1 : hich of the following is a main component of a Spark cluster? Select all that apply.

  • Driver Program
  • SparkContext
  • Cluster Manager
  • Worker node
  • Cache

Question 2 : What are the main locations for Spark configuration? Select all that apply.

  • The SparkConf object
  • The Spark Shell
  • Executor Processes
  • Environment variables
  • Logging properties

Question 3 : Which of the following techniques can improve Spark performance? Select all that apply.

  • Scheduler Configuration
  • Memory Tuning
  • Data Serialization
  • Using Broadcast variables
  • Using nested structures

Spark Fundamentals I Cognitive class Final Exam Answers:-

Question 1 :  Which of the following is a type of Spark RDD operation? Select all that apply.

  • Parallelization
  • Action
  • Persistence
  • Transformation
  • Evaluation

Question 2 :  Spark must be installed and run on top of a Hadoop cluster. True or false

  • True
  • False

Question 3 : following operations will work improperly when using a Combiner?

  • Average
  • Maximum
  • Minimum
  • Count
  • All of the above operations will work properly

Question 4 : Spark supports which of the following libraries?

  • Spark SQL
  • MLlib
  • GraphX
  • Spark Streaming
  • All of the above

Question 5 : Spark supports which of the following programming languages?

  • Scala, Perl, Java
  • Scala, Java, C++, Python, Perl
  • Scala, Python, Java, R
  • Java and Scala
  • C++ and Python

Question 6 : A transformation is evaluated immediately. True or false?

  • True
  • False

Question 7 : Which storage level does the cache() function use?

  • MEMORY_ONLY
  • MEMORY_ONLY_SER
  • MEMORY_AND_DISK
  • MEMORY_AND_DISK_SER

Question 8 : Which of the following statements does NOT describe accumulators?

  • They can only be added through an associative operation
  • Programmers can extend them beyond numeric types
  • They can only be read by the driver
  • They are read-only
  • They implement counters and sums

Question 9 : You must explicitly initialize the SparkContext when creating a Spark application. True or false?

  • True
  • False

Question 10 : The “local” parameter can be used to specify the number of cores to use for the application. True or false?

  • True
  • False

Question 11 : Spark applications can ONLY be packaged using one, specific build tool. True or false?

  • True
  • False

Question 12 : Which of the following parameters of the “spark-submit” script determine where the application will run?

  • –master
  • –conf
  • –class
  • –deploy-mode
  • None of the above

Question 13 : Which of the following is NOT supported as a cluster manager?

  • Mesos
  • Spark
  • YARN
  • Helix
  • All of the above are supported

Question 14 : Spark SQL allows relational queries to be expressed in which of the following?

  • Scala, SQL, and HiveQL
  • Scala and HiveQL
  • Scala and SQL
  • SQL only
  • HiveQL only

Question 15:  Spark Streaming processes live streaming data in real-time. True or false?

  • True
  • False

Question 16 : The MLlib library contains which of the following algorithms?

  • Classification
  • Regression
  • Clustering
  • Dimensionality Reduction
  • All of the above

Question 17 : What is the purpose of the GraphX library?

  • To create a visual representation of the data
  • To generate data-parallel models
  • To create a visual representation of a directed acyclic graph (DAG)
  • To perform graph-parallel computations
  • To convert from data-parallel to graph-parallel algorithms

Question 18 : Which list describes the correct order of precedence for Spark configuration, from highest to lowest?

  • Flags passed to spark-submit, values in spark-defaults.conf, properties set on SparkConf
  • Properties set on SparkConf, values in spark-defaults.conf, flags passed to spark-submit
  • Values in spark-defaults.conf, properties set on SparkConf, flags passed to spark-submit
  • Properties set on SparkConf, flags passed to spark-submit, values in spark-defaults.conf
  • Values in spark-defaults.conf, flags passed to spark-submit, properties set on SparkConf

Question 19 : Spark monitoring can be performed with external tools. True or false?

  • True
  • False

Question 20 : Which serialization libraries are supported in Spark? Select all that apply.

  • Apache Avro
  • Java Serialization
  • Protocol Buffers
  • Kyro Serialization
  • TPL

About Machine Learning

Check Also

Python for Data Science Cognitive Class Exam Answers:-

Course Name:- Python for Data Science Module 1. Python Basics Question 1. What is the …

Leave a Reply

Your email address will not be published. Required fields are marked *