Thursday , September 12 2024

Spark Overview for Scala Analytics cognitive class Exam Answers:-

Course Name:- Spark Overview for Scala Analytics

Spark Overview for Scala Analytics cognitive class Final Exam Answers:-

Question 1. Which language is not supported by Spark?

  • SQL
  • Scala
  • Java
  • C
  • Python

Question 2. What does RDD stand for?

  • REPL Definition and Description
  • Resilient Distributed Dataset
  • Reader Distribution Defined
  • Resilient Documented DataFrame
  • Read, Distribute, Delete

Question 3. The Spark Web Console is used to:

  • Edit Spark code
  • Integrate Spark with third-party tools
  • Examine data produced by Spark jobs
  • Monitor running Spark jobs
  • Submit Spark jobs

Question 4. The RDD flatMap method does what?

  • Transform each input record to zero or more output records
  • Transforms each input record to a new output record
  • Combines all records into a value
  • Reads a data source
  • None of the above

Question 5. Shuffling is used to:

  • Move data between stages
  • Design where partitions are written to disk
  • Move tasks to the appropriate nodes in a cluster
  • Sort data when that’s requested
  • All of the above

Question 6. Transformation methods have one or more of the following characteristics:

  • One and only one record is output for each input record
  • Lazy (delayed) evaluation
  • Their results are cached in memory
  • Eager (immediate) evaluation
  • None of the above

Question 7. Action methods have one or more of the following characteristics:

  • Eager (immediate) evaluation
  • Return a new RDD
  • Do not support type inference
  • Must be the first methods in a sequence of methods
  • All of the above

Question 8. The sequence of transformation and action method calls:

  • Is run in parallel for each data partition
  • Starts with some data and returns or outputs new data
  • Is decomposed into stages
  • Forms a directed, acyclic graph
  • All of the above

Question 9. The Inverted Index computes what?

  • The records sorted descending by a key
  • The minimum, maximum, and average counts for words in the corpus
  • Output records with words as keys and document ids and counts as values
  • A table of contents for a corpus of documents
  • All of the above

Question 10. Broadcast variables are used for what?

  • To send all RDD data to the tasks
  • Print messages to the Spark web console
  • Send metrics to a monitoring tool
  • Share read-only data with all tasks in an efficient way
  • None of the above

Question 11. Accumulators are used for what?

  • Collect the results of the Spark job
  • Aggregate extra data across all tasks
  • Manage streams in Spark Streaming
  • Send metrics to a monitoring tool
  • All of the above

Question 12. DataFrames have one or more of the following characteristics:

  • Support for SQL queries
  • Handle data when its structure is known and consistent
  • Excellent runtime performance
  • Support HIVE integration
  • All of the above

Question 13. DataFrames support the following operations:

  • Non-equi joins
  • Reduce
  • Delete
  • Group by
  • All of the above

Question 14. If I have a dataframe “person” with a field “age”, which of the following expressions can never be used to reference that field?

  • “age”
  • $”age”
  • person($”age”)
  • person(“age”)
  • All of the above are valid

Question 15. If I want to write a SQL query over a DataFrame, I have to call the following method first:

  • Persist
  • Write
  • RegisterTempTable
  • Map
  • None of the above

Question 16. Which one of the following kinds of joins is not supported?

  • Left outer join
  • Inner join
  • Right outer join
  • Left semijoin
  • All are supported

Question 17. The DataFrame expression “persons.select($”age”).where($”age” > 21)” returns:

  • A DataFrame
  • A Scala Vector[Int]
  • A ResultSet
  • A RDD
  • None of the above

Question 18. In Hive, an external table has the property:

  • It’s data is not managed by Hive
  • It’s format is defined elsewhere
  • It’s schema is defined elsewhere
  • It is visible to all users of Hive
  • All of the above

Question 19. In Spark Streaming, a DStream is:

  • A fixed-sized batch of incoming data
  • A connector to a socket
  • A sequence of RDDs
  • A collection of DataFrames
  • None of the above

Question 20. The batch interval:

  • starts at a user-specified value and adjusts in response to load
  • is the number of events to capture per batch
  • is the size of each data “chunk” returned by a DataFrame query
  • is determined dynamically by Spark
  • is the number of seconds to capture data per batch

About Machine Learning

Check Also

Python for Data Science Cognitive Class Exam Answers:-

Course Name:- Python for Data Science Module 1. Python Basics Question 1. What is the …

Leave a Reply

Your email address will not be published. Required fields are marked *