Friday , April 19 2024

Machine Learning with Python Cognitive Class Exam Answers:-

 Course Name:- Machine Learning with Python

Module 1. Machine Learning

Question 1. Machine Learning uses algorithms that can learn from data without relying on explicitly programmed methods.

  • True
  • False

Question 2. Which are the two types of supervised learning techniques?

  • Classification and Clustering
  • Classification and K-Means
  • Regression and Clustering
  • Regression and Partitioning
  • Classification and Regression

Question 3. Which of the following statements best describes the Python scikit library?

  • A library for scientific and high-performance computation.
  • A collection of algorithms and tools for machine learning.
  • A popular plotting package that provides 2D plotting as well as 3D plotting.
  • A library that provides high-performance, easy to use data structures.
  • A collection of numerical algorithms and domain-specific toolboxes.

Module 2. Regression

Question 1. Training and testing on the same dataset might have a high training accuracy, but its out-of-sample accuracy might be low.

  • True
  • False

Question 2. If the correlation coefficient is 0.7 or lower, it may be appropriate to fit a non-linear regression.

  • True
  • False

Question 3. When we should use Multiple Linear Regression?

  • When we would like to identify the strength of the effect that the independent variables have on a dependent variable.
  • When there are multiple dependent variables.

Module 3. Classification

Question 1.In K-Nearest Neighbors, which of the following is true:

  • A very high value of K (ex. K = 100) produces an overly generalised model, while a very low value of k (ex. k = 1) produces a highly complex model.
  • A very high value of K (ex. K = 100) produces a model that is better than a very low value of K (ex. K = 1)
  • A very high value of k (ex. k = 100) produces a highly complex model, while a very low value of K (ex. K = 1) produces an overly generalized model.

Question 2. A classifier with lower log loss has better accuracy.

  • True
  • False

Question 3. When building a decision tree, we want to split the nodes in a way that decreases entropy and increases information gain.

  • True
  • False

Module 4.  clustring

Question 1. Which one is NOT TRUE about k-means clustering??

  • K-means divides the data into non-overlapping clusters without any cluster internal structure.
  • The objective of k-means is to form clusters in such a way that similar samples go into a cluster and dissimilar samples fall into different clusters.
  • As k-means is an iterative algorithm, it guarantees that it will always converge to the global optimum.

Question 2. Customer segmentation is a supervised way of clustering data based on the similarity of customers to each other.

  • True
  • False

Question 3. How is a center point (centroid) picked for each cluster in k-means?

  • We can randomly choose some observations out of the dataset and use these observations as the initial means.
  • We can select the centroid through correlation analysis.

Module 5. Recommender System

Question 1. Collaborative filtering is based on relationships between products and people’s rating patterns.

  • True
  • False

Question 2.Which one is TRUE about content-based recommendation systems?

  • Content-based recommendation system tries to recommend items to the users based on their profile.
  • In content-based approach, the recommendation process is based on similarity of users.
  • In content-based recommender systems, similarity of users should be measured based on the similarity of the actions of users.

Question 3. Which one is correct about user-based and item-based collaborative filtering?

  • In the item-based approach, the recommendation is based on the profile of a user that shows interest in a specific item.
  • In the user-based approach, the recommendation is based on users of the same neighborhood, with whom he/she shares common preferences.

Machine Learning with Python  Cognitive class final  Exam Answers:-

Question 1. You can define Jaccard as the size of the intersection divided by the size of the union of two label sets.

  • True
  • False

Question 2. When building a decision tree, we want to split the nodes in a way that increases entropy and decreases information gain.

  • True
  • False

Question 3. Which of the following statements are true? (Select all that apply.)

  • K needs to be initialized in K-Nearest Neighbor.
  • Supervised learning works on labelled data.
  • A high value of K in KNN creates a model that is over-fit.
  • KNN takes a bunch of unlabelled points and uses them to predict unknown points.
  • Unsupervised learning works on unlabelled data.

Question 4. To calculate a model’s accuracy using the test set, you pass the test set to your model to predict the class labels, and then compare the predicted values with actual values.

  • True
  • False

Question 5. Which is the definition of entropy?

  • The purity of each node in a decision tree.
  • Information collected that can increase the level of certainty in a particular prediction.
  • The information that is used to randomly select a subset of data.
  • The amount of information disorder in the data.

Question 6. Which of the following is true about hierarchical linkages?

  • Average linkage is the average distance of each point in one cluster to every point in another cluster.
  • Complete linkage is the shortest distance between a point in two clusters.
  • Centroid linkage is the distance between two randomly generated centroids in two clusters.
  • Single linkage is the distance between any points in two clusters.

Question 7.The goal of regression is to build a model to accurately predict the continuous value of a dependent variable for an unknown case.

  • True
  • False

Question 8. Which of the following statements are true about linear regression? (Select all that apply)

  • With linear regression, you can fit a line through the data.
  • y=a+b_x1 is the equation for a straight line which can be used to predict the continuous value y.
  • In y=θ^T.X, θ is the feature set and X is the “weight vector” or “confidences of the equation”, with both of these terms used interchangeably.

Question 9. The Sigmoid function is the main part of logistic regression, where Sigmoid of 𝜃^𝑇.𝑋, gives us the probability of a point belonging to a class, instead of the value of y directly.

  • True
  • False

Question 10.In comparison to supervised learning, unsupervised learning has:

  • Less tests (evaluation approaches)
  • More models
  • A better, controlled environment
  • More tests (evaluation approaches), but less models

Question 11.The points that are classified by Density-Based Clustering and do not belong to any cluster are outliers.

  • True
  • False

Question 12.Which of the following is false about Simple Linear Regression?

  • It does not require tuning parameters.
  • It is highly interpretable.
  • It is fast.
  • It is used for finding outliers.

Question 13.Which one of the following statements is the most accurate?

  • Machine Learning is the branch of AI that covers the statistical and learning part of artificial intelligence.
  • Deep Learning is a branch of Artificial Intelligence where computers learn by being explicitly programmed.
  • Artificial Intelligence is a branch of Machine Learning that covers the statistical part of Deep Learning.
  • Artificial Intelligence is the branch of Deep Learning that allows us to create models.

Question 14.Which of the following are types of supervised learning?

  • Classification
  • Regression
  • KNN
  • K-Means
  • Clustering

Question 15. A bottom-up version of hierarchical clustering is known as divisive clustering. It is a more popular method than the Agglomerative method.

  • True
  • False

Question 16. Select all the true statements related to Hierarchical clustering and K-Means:

  • Hierarchical clustering does not require the number of clusters to be specified.
  • Hierarchical clustering always generates different clusters, whereas k-Means returns the same clusters each time it is run.
  • K-Means is more efficient than Hierarchical clustering for large datasets.

Question 17.  What is a content-based recommendation system?

  • Content-based recommendation system tries to recommend items to the users based on their profile built upon their preferences and taste.
  • Content-based recommendation system tries to recommend items based on similarity among items.
  • Content-based recommendation system tries to recommend items based on the similarity of users when buying, watching, or enjoying something.

Question 18.  Before running Agglomerative clustering, you need to compute a distance/proximity matrix, which is an n by n table of all distances between each data point in each cluster of your dataset.

  • True
  • False

Question 19.  Which of the following statements are true about DBSCAN? (Select all that apply.)

  • DBSCAN can be used when examining spatial data.
  • DBSCAN can be applied to tasks with arbitrary shaped clusters, or clusters within clusters.
  • DBSCAN is a hierarchical algorithm that finds core and border points.
  • DBSCAN can find any arbitrary shaped cluster without getting affected by noise.

Question 20. In recommender systems, a “cold start” happens when you have a large dataset of users who have rated only a limited number of items.

  • True
  • False

About Machine Learning

Check Also

Python for Data Science Cognitive Class Exam Answers:-

Course Name:- Python for Data Science Module 1. Python Basics Question 1. What is the …

Leave a Reply

Your email address will not be published. Required fields are marked *