Spark Overview for Scala Analytics cognitive class Exam Answers:-

Machine Learning January 24, 2022 cognitive class Leave a comment 3,193 Views

Course Name:- Spark Overview for Scala Analytics

Spark Overview for Scala Analytics cognitive class Final Exam Answers:-

Question 1. Which language is not supported by Spark?

SQL
Scala
Java
C
Python

Question 2. What does RDD stand for?

REPL Definition and Description
Resilient Distributed Dataset
Reader Distribution Defined
Resilient Documented DataFrame
Read, Distribute, Delete

Question 3. The Spark Web Console is used to:

Edit Spark code
Integrate Spark with third-party tools
Examine data produced by Spark jobs
Monitor running Spark jobs
Submit Spark jobs

Question 4. The RDD flatMap method does what?

Transform each input record to zero or more output records
Transforms each input record to a new output record
Combines all records into a value
Reads a data source
None of the above

Question 5. Shuffling is used to:

Move data between stages
Design where partitions are written to disk
Move tasks to the appropriate nodes in a cluster
Sort data when that’s requested
All of the above

Question 6. Transformation methods have one or more of the following characteristics:

One and only one record is output for each input record
Lazy (delayed) evaluation
Their results are cached in memory
Eager (immediate) evaluation
None of the above

Question 7. Action methods have one or more of the following characteristics:

Eager (immediate) evaluation
Return a new RDD
Do not support type inference
Must be the first methods in a sequence of methods
All of the above

Question 8. The sequence of transformation and action method calls:

Is run in parallel for each data partition
Starts with some data and returns or outputs new data
Is decomposed into stages
Forms a directed, acyclic graph
All of the above

Question 9. The Inverted Index computes what?

The records sorted descending by a key
The minimum, maximum, and average counts for words in the corpus
Output records with words as keys and document ids and counts as values
A table of contents for a corpus of documents
All of the above

Question 10. Broadcast variables are used for what?

To send all RDD data to the tasks
Print messages to the Spark web console
Send metrics to a monitoring tool
Share read-only data with all tasks in an efficient way
None of the above

Question 11. Accumulators are used for what?

Collect the results of the Spark job
Aggregate extra data across all tasks
Manage streams in Spark Streaming
Send metrics to a monitoring tool
All of the above

Question 12. DataFrames have one or more of the following characteristics:

Support for SQL queries
Handle data when its structure is known and consistent
Excellent runtime performance
Support HIVE integration
All of the above

Question 13. DataFrames support the following operations:

Non-equi joins
Reduce
Delete
Group by
All of the above

Question 14. If I have a dataframe “person” with a field “age”, which of the following expressions can never be used to reference that field?

“age”
$”age”
person($”age”)
person(“age”)
All of the above are valid

Question 15. If I want to write a SQL query over a DataFrame, I have to call the following method first:

Persist
Write
RegisterTempTable
Map
None of the above

Question 16. Which one of the following kinds of joins is not supported?

Left outer join
Inner join
Right outer join
Left semijoin
All are supported

Question 17. The DataFrame expression “persons.select($”age”).where($”age” > 21)” returns:

A DataFrame
A Scala Vector[Int]
A ResultSet
A RDD
None of the above

Question 18. In Hive, an external table has the property:

It’s data is not managed by Hive
It’s format is defined elsewhere
It’s schema is defined elsewhere
It is visible to all users of Hive
All of the above

Question 19. In Spark Streaming, a DStream is:

A fixed-sized batch of incoming data
A connector to a socket
A sequence of RDDs
A collection of DataFrames
None of the above

Question 20. The batch interval:

starts at a user-specified value and adjusts in response to load
is the number of events to capture per batch
is the size of each data “chunk” returned by a DataFrame query
is determined dynamically by Spark
is the number of seconds to capture data per batch

Machine Learning Tutorials, Courses and Certifications

Spark Overview for Scala Analytics cognitive class Exam Answers:-

Related Articles

Course Name:- Spark Overview for Scala Analytics

Spark Overview for Scala Analytics cognitive class Final Exam Answers:-

Related

About Machine Learning

Check Also

Python for Data Science Cognitive Class Exam Answers:-

Leave a Reply Cancel reply

From Algorithms to AI: The Evolution of Programming in the Age of Generative Intelligence

Multi Linear Regression

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

Simplifying data pipelines with Apache Kafka Cognitive Class Exam Answers:-

Beyond the Basics: Istio and IBM Cloud Kubernetes Service cognitive exam Answers:-

Microsoft Search Advertising Certification Exam Answers

Machine Learning with R Cognitive Class Exam Answers:-

Hybrid Cloud Conference – Service Mesh Lab Cognitive Class Exam Answers:-

From Algorithms to AI: The Evolution of Programming in the Age of Generative Intelligence

FCF – Introduction to the Threat Landscape 2.0 Self-Paced Quiz Exam Answers

Computer Vision and Image Processing Specialization Certification

Linux Device Drivers Certification

Linux Server Administration Certification