Course Name:- Data Science with Open Data
Module 1:- What is Data Science?
Question1: What is Data Science?
- It uses less complex statistics and generally tries to identify patterns that can improve an organization.
- It is a set of instructions we give a computer so it can take values and manipulate them into a usable form.
- Data science is focused on creating understanding among messy and disparate data.
- Mathematically, it is the average difference between individual values and the mean for the set of values.
- This machine learning method uses a line of branching questions or observations about a given data set to predict a target value.
Question 2: The Hadoop Distributed File System (HDFS) allows for storage, retrieval, and analysis of very large data sets using distributed hardware.: True or False. Enter question
- False
- True
Question3: Which definition(s) of Open Data are true? Select all that apply.
- Open Data is defined as structured data that is machine-readable, freely shared, used and built on without restrictions
- Open Data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet.
- When using Open Data, there should be no discrimination against fields of endeavor or against persons or groups.
- Open data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Open Data is free for academics only
Module 2 :- Up and Running with R
Question 1: R Studio is an IDE for R. What is an IDE?
- integrated development enterprise
- integrated docker environment
- integrated development environment
- initiated development environment
- none of the above
Question 2: In R, the # symbol precedes a comment. True/False
- False
- True
Question 3: R is case-sensitive. True/Fals
- False
- True
Module 3:- The National Board of Canada (NEB)
Question 1: The Open Data sets used in the labs is from the NEB. What is the NEB?
- National Electronic Body of Canada
- National Electronic Bureau of Canada
- National Energy Board of Canada
- National Energy Bureau of Canada
- None of the above
Question 2: ‘Exploring Canada’s Energy Future’ is an interactive tool that allows users to visualize, download, and share the data behind the National Energy Board’s (NEB) series of long-term energy outlooks, Energy Futures.
- False
- True
Question 3: The total energy used in the four sectors of Canada’s economy: residential, commercial, industrial and transportation is described in the NEB Open Data set as ________________?
- Oil Production:
- Electricity Generation
- Natural Gas Production
- Total Demand
- None of the above
Module 4: -Intro to Data Analysis
Question 1: Analyze the Energy Futures Data Visualizations and derive insights from the data e.g., which of the following statements is true? (View by Region, GW.h, 2018, Scenario – Reference)
- By 2040, Coal Energy wil be replaced by Hydro and Natural Gas in British Columbia
- By 2040, Coal Energy will be the second biggest source of Electricity in Ontario
- By 2040, Coal Energy will be the second biggest source of Electricity in Quebec
- By 2040, Coal Energy wil be replaced by Solar/Wind/Geothermal and Natural Gas in Alberta
- By 2040, Coal Energy will be the second biggest source of Electricity in British Columbia
Question 2: Analyze the Energy Futures Data Visualizations and derive insights from the data e.g., The trend for Alberta’s Oil Production declines significantly from 2005-2040, kB/d, 2018, Scenario – Reference. True/False
- False
- True
Question3 : A useful way to understand a large data set and distil insights from data visualizations is ___________?
- A cube approach
- A curved approach
- A circular approach
- An internal approach
- A Pyramid approach
Module 5 :- Data Visualization and Analysis with Open Data
Question 1: When we explore a data set to find insights, it is useful to use visualizations as human beings are good at _________?
- looking into the future
- typing fast
- detecting large numbers
- spotting visual differences
- none of the above
Question 2: Looking at multiple visualizations to find data insights and spot notable differences in different data set. True/False?
- False
- True
Question 3: Drilling down into data can help us find data insights. True/False?
- False
- True
Data Science with Open Data Cognitive class final exam Answers:-
Question 1: What is Open Data?
- Open Data is data that is defined as structured but not machine-readable.
- Open Data is data that may be analyzed computationally to reveal patterns but is never shared.
- Open Data is defined as structured data that is machine-readable, freely shared, used and built on without restrictions.
- Open Data is data that does not have a pre-defined data model or is not organized in a pre-defined manner and it is not freely shared.
- Open Data is data that may be analyzed computationally to reveal patterns and is built with restrictions.
Question 2: RStudio is available as stand-alone software or in the cloud. True or False?
- False
- True
Question 3: RStudio is an IDE – IDE stands for?
- Interactive Decoding Environment
- Integrated Decoding Environment
- Interactive Development Environment
- Intergrated Development Environment
- None of the above
Question 4 : In this course, we accessed RStudio using RStudio Cloud and IBM’s labs.cognitiveclass.ai. True or False?
- False
- True
Question 5 : The National Energy Board of Canada (NEB) regulate: _____________ .
- The construction and operation of international power lines and designated inter-provincial power lines.
- Imports of natural gas and exports of crude oil, natural gas liquids, natural gas, refined petroleum products, and electricity.
- Oil and gas exploration and production activities in specified areas that are not regulated under joint federal/provincial accords.
- The construction, operation, and abandonment of pipelines that cross international borders or provincial boundaries, as well as the related pipeline tolls and tariffs.
***[Select all that are true]***
- only
- and (2) only
- (1 , (2) and (3) only
- (1), (2), (3) and (4) are true
- None of the above are true.
Question 6 : The Minto Pyramid Principle refers to a process for organizing your thinking so that it jumps easily off the page to lodge in a reader’s mind. It notes that people ideally work out their thinking by creating pyramids of ideas. Select the 3 steps invovled in the concept.
- Grouping together low-level facts they see as similar
- Numbering the main ideas
- Listing all the thoughts
- Drawing an insight from having seen the similarity
- Forming a new grouping of related insights, etc.
Question 7: R is case sensitive. True or False?
- False
- True
Question 8: To run a command in R, place your cursor anywhere on the line you wish to run and hit the ‘tab + enter’ buttons (Windows) or ‘tab + enter’ (Mac) True or False?
- False
- True
Question 9 : Which R package can we use to add compelling color schemes to our visualizations?
- shiny
- RColorBrewer
- stringr
- dyplr
- None of these packages.
Question 10: We can use the “read.csv” command to load a CSV. file into the R programming environment. True or False?
- False
- True
Question 11: Which R command can we use to change the orientation of a chart?
- coord_flip()
- ggthemes()
- stringr()
- dyplr()
- None of these packages.
Question 12 : Which R command covered in this course can we use to learn out more about a function?
- by running the function preceded by ‘/’, e.g., “/subset”
- by running the function preceded by ‘#’, e.g., “#subset”
- by running the function preceded by ‘==’, e.g., “==subset”
- by running the function preceded by ‘?’, e.g., “?subset”
- None of these commands will provide more information about a function.
Question 13: When you run Rscript for the first time, you will need to install packages – these are predefined groups of code that perform common instructions. The command to install the ‘ggplot2’ package is________________________.
- packages.install(ggplot2)
- packages.install.ggplot2
- install.packages(“ggplot2”)
- install.packages(ggplot2)
- None of these packages.
Question 14 : In 2005, Wind is the biggest source of energy for Quebec (QC). True or False?
- False
- True
Question 15 : From analyzing the NEB Open Data, we learn that__________________.
- Over time, Canada expects to replace Hydro with Coal sources of electricity.
- Over time, Canada expects to replace coal with solarWindGeoThermal sources of electricity.
- Canada’s electricity will all be imported over time.
- Canada’s electricity generation is decreasing over time.
- Over time, Canada expects to replace Hydro with Coal sources of electricity.v