Course Name :- MapReduce and YARN
Module 1 :- Introduction to Map Reduce and YARN
Question 1: Which phase of MapReduce is optional?
- Shuffle
- Reduce
- Combiner
- Map
Question 2: Which node is responsible for assigning (key, value) pairs to different reducers?
- Shuffle node
- Reducer node
- Combiner node
- Mapper node
Question 3 : Where are the output files of the Reducer task stored?
- A data warehouse
- Hadoop FS
- Within the Reducer node
- Linux FS
Module 2 : Limitation of hadoop v1 & MapReduce v1
Question 1 : What is an issue or limitation of the original MapReduce v1 paradigm?
- It’s not scalable
- It only has one TaskTracker
- It only supports Parquet file types
- It only has one JobTracker
Question 2 : How is YARN an improvement over the MapReduce v1 paradigm?
- It’s completely open source
- It splits the JobTracker into two processes: ResourceManager and ApplicationManager
- It reduces multi-tenancy to improve performance
- It splits the TaskTracker into two processes: ResourceManager and ApplicationManager
Question 3: Existing applications can run on YARN without recompilation. True or False?
- True
- False
Module 3:- The Architecture of YARN
Question 1 : The main change from Hadoop v1 to Hadoop v2 was the consolidation of both resource management and job processing. True or False?
- True
- False
Question 2 : The NodeManager is a more generic and efficient version of the TaskTracker. True or False?
- True
- False
Question 3 : A new ApplicationMaster is launched for each job and ends when the job completes. True or False?
- True
- False
MapReduce and YARN final Exam Answer:-
Question 1. Which of the following is the correct sequence of MapReduce flow?
- Reduce —> Combine —> Map
- Combine —> Reduce —> Map
- Map —> Reduce —> Combine
- Map —> Combine —> Reduce
Question 2. Which of the following can be used to control the number of part files in a MapReduce program’s output directory?
- Shuffle parameters
- Number of Reducers
- Counter
- Number of Mappers
Question 3: Which of the following operations will work improperly when using a Combiner?
- Average
- Maximum
- Count
- Minimum
Question 4 : Which of the following is true about MapReduce?
- Compression of input files is optional.
- Output from the Map phase is replicated.
- The programmer must write the Map code, the Shuffle code, and the Reduce code.
- MapReduce programs must be written in Java.
Question 5: Input data to MapReduce is record-oriented and blocks of data contain the same number of full records. True or False?
- False.
- True.
Question 6: Which statement is true about the Reduce phase of MapReduce?
- Output results are sent to the client program.
- Data arrives from the Shuffle phase already sorted by key.
- The Reducer phase sums up the values associated with each key.
- Each Reduce task processes all the data for one key only.
Question 7 : Which statement is true about the Reduce phase of MapReduce?
- Containers are used instead of slots in MRv1, and can be used with either Map or Reduce tasks in MRv2.
- There is one JobTracker in the cluster.
- MapReduce jobs written in Java for MRv1 never require recompilation.
- Each job has an ApplicationManager that obtains Container IDs from the NodeManager.
Question 8 : With YARN, long-running jobs acquire and retain fixed-size containers before execution starts. True or False?
- False.
- True
Question 9 : Which of the following statements is true?
- The NameNode in Hadoop 2 is fully fault-tolerant, whereas in Hadoop 1 it was a single point of failure.
- The NodeManager in Hadoop 2 replaces the TaskTracker in Hadoop 1.
- YARN requires a minimum of two nodes, one master and one slave, to run
- Both MapReduce and YARN can scale to any cluster size
Question 10: The command athhad∞p provides the CLASSPATH needed for compiling Java programs written for MapReduce or YARN. True or False?
- False.
- True.
Question 11: Which statement is true about MapReduce’s use of replication in HDFS?
- Only one copy of each replicated block is processed by MapReduce in normal operation.
- Speculative execution is normally performed on all copies of each “split.”
- Each DataNode uses RAID to store its data.
- Multiple copies of each record are kept on each node.
Question 12 : On which file system (FS) is the output of a Mapper task stored?
- Linux FS, and it is replicated 3 times.
- HDFS, and it is replicated 3 times.
- Linux FS, but it is not replicated.
- HDFS, but it is not replicated.
Question 13 : Which of the following statements is true?
- You can set the number of Reducers.
- The Shuffle phase is optional.
- You can set the number of Mappers and the number of Reducers.
- The number of Combiners is the same as the number of Reducers.
- You can set the number of Mappers
Question 14 : What will a Hadoop job do if you try to run it with an output directory that is already present?
- It will create new files, but with a different suffix.
- It will create another directory to store the output.
- It will erase all files in that directory before running.
- It will not run.
Question 15 : What are the main components of the ResourceManager in YARN? Select two.
- Scheduler
- JobTracker
- DataManager
- HDFS
- ApplicationManager