Course Name:- Hadoop 101
Module 1. Introduction to Hadoop
Question 1. Hadoop is designed for Online Transactional Processing. True or False?
- True
- False
Question 2. When is Hadoop useful for an application?
- When all of the application data is unstructured
- When work can be parallelized
- When the application requires low latency data access
- When random data access is required
Question 3. With the help of InfoSphere Streams, Hadoop can be used with data-at-rest as well as data-in-motion. True or false?
- True
- False
Module 2. Hadoop Architecture & HDFS
Question 1. Network bandwidth between any two nodes in the same rack is greater than bandwidth between two nodes on different racks. True or False?
- True
- False
Question 2. Hadoop works best on a large data set. True or False?
- True
- False
Question 3. HDFS is a fully POSIX compliant file system. True or False?
- True
- False
Module 3. Hadoop Administration
Question 1. You can add or remove nodes from the open source Apache Ambari console. True or False?
- True
- False
Question 2. It is recommended that you start all of the services in Ambari in order to speed up communications. True or False?
- True
- False
Question 3.To remove a node using Ambari, you must first remove all of the services using that node. True or False?
- True
- False
Module 4. Hadoop Components
Question 1.The output of the shuffle operation goes into the mapper before going into the reducer. True or False?
- True
- False
Question 2. What is true about Pig and Hive in relation to the Hadoop ecosystem?
- HiveQL requires that you create the data flow
- PigLatin requires that the data have a schema
- Fewer lines of code are required compared to a Java program
- All of the above
Question 3. Which of the following tools is designed to move data to and from a relational database?
- Pig
- Flume
- Oozie
- Sqoop
Hadoop 101 Cognitive Class final exam Answers:-
Question 1. HDFS is designed for:
- Large files, streaming data access, and commodity hardware
- Large files, low latency data access, and commodity hardware
- Large files, streaming data access, and high-end hardware
- Small files, streaming data access, and commodity hardware
- None of the options is correct
Question 2. The Hadoop distributed file system (HDFS) is the only distributed file system supported by Hadoop. True or false?
- True
- False
Question 3. The input to a mapper takes the form < k1, v1 > . What form does the mapper’s output take?
- < list(k2), v2 >
- list( < k2, v2 > )
- < k2, list(v2) >
- < k1, v1 >
- None of the options is correct
Question 4. What is Flume?
- A service for moving large amounts of data around a cluster soon after the data is produced.
- A distributed file system.
- A programming language that translates high-level queries into map tasks and reduce tasks.
- A platform for executing MapReduce jobs.
- None of the options is correct
Question 5. What is the purpose of the shuffle operation in Hadoop MapReduce?
- To pre-sort the data before it enters each mapper node.
- To distribute input splits among mapper nodes.
- To transfer each mapper’s output to the appropriate reducer node based on a partitioning function.
- To randomly distribute mapper output among reducer nodes.
- None of the options is correct
Question 6. Which of the following is a duty of the DataNodes in HDFS?
- Control the execution of an individual map task or a reduce task.
- Maintain the file system tree and metadata for all files and directories.
- Manage the file system namespace.
- Store and retrieve blocks when told to by clients or the NameNode.
- None of the options is correct
Question 7. Which of the following is a duty of the NameNode in HDFS?
- Control the MapReduce job from end-to-end
- Maintain the file system tree and metadata for all files and directories
- Store the block data
- Transfer block data from the data nodes to the clients
- None of the options is correct
Question 8.Which component determines the specific nodes that a MapReduce task will run on?
- The NameNode
- The JobTracker
- The TaskTrackers
- The JobClient
- None of the options is correct
Question 9.Which of the following characteristics is common to Pig, Hive, and Jaql?
- All translate high-level languages to MapReduce jobs
- All operate on JSON data structures
- All are data flow languages
- All support random reads/writes
- None of the options is correct
Question 10.Which of the following is NOT an open source project related to Hadoop?
- Pig
- UIMA
- Jackal
- Avro
- Lucene
Question 11.During the replication process, a block of data is written to all specified DataNodes in parallel. True or false?
- True
- False
Question 12.With IBM BigInsights, Hadoop components can be started and stopped from a command line and from the Ambari Console. True or false?
- True
- False
Question 13.When loading data into HDFS, data is held at the NameNode until the block is filled and then the data is sent to a DataNode. True or false?
- True
- False
Question 14.Which of the following is true about the Hadoop federation?
- Uses JournalNodes to decide the active NameNode
- Allows non-Hadoop programs to access data in HDFS
- Allows multiple NameNodes with their own namespaces to share a pool of DataNodes
- Implements a resource manager external to all Hadoop frameworks
Question 15.Which of the following is true about Hadoop high availability?
- Uses JournalNodes to decide the active NameNode
- Allows non-Hadoop programs to access data in HDFS
- Allows multiple NameNodes with their own namespaces to share a pool of DataNodes
- Implements a resource manager external to all Hadoop frameworks
Question 16.Which of the following is true about YARN?
- Uses JournalNodes to decide the active NameNode
- Allows non-Hadoop programs to access data in HDFS
- Allows multiple NameNodes with their own namespaces to share a pool of DataNodes
- Implements a resource manager external to all Hadoop frameworks
Question 17. Which of the following sentences is true?
- Hadoop is good for OLTP, DSS, and big data
- Hadoop includes open source components and closed source components
- Hadoop is a new technology designed to replace relational databases
- All of the options are correct
- None of the options is correct
Question 18. In which of these scenarios should Hadoop be used?
- Processing billions of email messages to perform text analytics
- Obtaining stock price trends on a per-minute basis
- Processing weather sensor information to predict a hurricane path
- Analyzing vital signs of a baby in real time
- None of the options is correctv