Sunday , December 22 2024

Spark MLlIB Cognitive Class Exam Answers:-

 Course Name:- Spark MLlIB

Module 1.  Spark MLlIB data Types

Question 1.Sparse Data generally contains many non-zero values, and few zero values.

  • True
  • False

Question 2. Local matrices are generally stored in distributed systems and rarely on single machines.

  • True
  • False

Question 3. Which of the following are distributed matrices?

  • RowMatrix
  • ColumnMatrix
  • CoordinateMatrix
  • SphericalMatrix
  • RowMatrix and CoordinateMatrix
  • All of the Above

Module 2. Review Alogrithms

Question 1. Logistic Regression is an algorithm used for predicting numerical values.

  • True
  • False

Question 2. The SVM algorithm maximizes the margins between the generated hyperplane and two clusters of data.

  • True
  • False

Question 3. Which of the following is true about Gaussian Mixture Clustering?

  • The closer a data point is to a particular centroid, the more likely that data point is to be clustered with that centroid.
  • The Gaussian of a centroid determines the probability that a data point is clustered with that centroid.
  • The probability of a data point being clustered with a centroid is a function of distance from the point to the centroid.
  • Gaussian Mixture Clustering uses multiple centroids to cluster data points.
  • All of the Above

Module 3. Spark MLlIB decision Trees and  Random Forests

Question 1. Which of the following is a stopping parameter in a Decision Tree?

  • The number of nodes in the tree reaches a specific value.
  • The depth of the tree reaches a specific value.
  • The breadth of the tree reaches a specific value.
  • All of the Above

Question 2.When using a regression type of Decision Tree or Random Forest, the value for impurity can be measured as either ‘entropy’ or ‘variance’.

  • True
  • False

Question 3. In a Random Forest, featureSubsetStrategy is considered a stopping parameter, but not a tunable parameter.

  • True
  • False

Module 4. Spark MLlIB clustering

Question 1.In Spark MLlib, the initialization mode for the K-Means training method is called

  • k-means–
  • k-means++
  • k-means||
  • k-means

Question 2. In K-Means, the “runs” parameter determines the number of data points allowed in each cluster.

  • True
  • False

Question 3. In Gaussian Mixture Clustering, the sum of all values outputted from the “weights” function must equal 1.

  • True
  • False

Spark MLlIB cognitive Class Final Exam Answers:-

Question 1.  In Gaussian Mixture Clustering, the predictSoft function provides membership values from the top three Gaussians only.

  • True
  • False

Question 2. In Decision Trees, what is true about the size of a dataset?

  • Large datasets create “bins” on splits, which can be specified with the maxBins parameter.
  • Large datasets sort feature values, then use the ordered values as split calculations.
  • Small datasets create split candidates based on quantile calculations on a sample of the data.
  • Small datasets split on random values for the feature.

Question 3. A Logistic Regression algorithm is ineffective as a binary response predictor.

  • True
  • False

Question 4. What is the Row Pointer for a Matrix with the following Row Indices: [5, 1 | 6 | 2, 8, 10]

  • [1, 6]
  • [0, 2, 3, 6]
  • [0, 2, 3, 5]
  • [2, 3]

Question 5. For multiclass classification, try to use (M-1) Decision Tree split candidates whenever possible.

  • True
  • False

Question 6. In a Decision Tree, choosing a very large maxDepth value can:

  • Increase accuracy
  • Increase the risk of overfitting to the training set
  • Increase the cost of training
  • All of the Above
  • Increase the risk of overfitting and increase the cost of training

Question 7. In Gaussian Mixture Clustering, a large value returned from the weights function represents a large precedence of that Gaussian.

  • True
  • False

Question 8.  Increasing the value of epsilon when creating the K-Means Clustering model can:

  • Decrease training cost and decrease the number of iterations that the model undergoes
  • Decrease training cost and increase the number of iterations that the model undergoes
  • Increase training cost and decrease the number of iterations that the model undergoes
  • Increase training cost and increase the number of iterations that the model undergoes

Question 9. In order to train a machine learning model in Spark MLlib, the dataset must be in the form of a(n)

  • Python List
  • Textfile
  • CSV file
  • RDD

Question 10.What is true about Dense and Sparse Vectors?

  • A Dense Vector can be created using a csc_matrix, and a Sparse Vector can be created using a Python List.
  • A Dense Vector can be created using a SciPy csc_matrix, and a Sparse Vector can be created using a SciPy NumPy Array.
  • A Dense Vector can be created using a Python List, and a Sparse Vector can be created using a SciPy csc_matrix.
  • A Dense Vector can be created using a SciPy NumPy Array, and a Sparse Vector can be created using a Python List.

Question 11.In a Decision Tree, increaing the maxBins parameter allows for more splitting candidates.

  • True
  • False

Question 12.In classification models, the value for the numClasses parameter does not depend on the data, and can change to increase model accuracy.

  • True
  • False

Question 13.What is true about Labeled Points?

  • A – A labeled point is used with supervised machine learning, and can be made using a dense local vector.
  • B – A labeled point is used with unsupervised machine learning, and can be made using a dense local vector.
  • C – A labeled point is used with supervised machine learning, and can be made using a sparse local vector.
  • D – A labeled point is used with unsupervised machine learning, and can be made using a sparse local vector
  • All of the Above
  • A and C only

Question 14.In the Gaussian Mixture Clustering model, the convergenceTol value is a stopping parameter that can be tuned, similar to epsilon in k-means clustering.

  • True
  • False

Question 15.In Gaussian Mixture Clustering, the “Gaussians” function outputs the coordinates of the largest Gaussian, as well as the standard deviation for each Gaussian in the mixture.

  • True
  • False

Question 16.What is true about the maxDepth parameter for Random Forests?

  • A large maxDepth value is preferred since tree averaging yields a decrease in overall bias.
  • A large maxDepth value is preferred since tree averaging yields a decrease in overall variance.
  • A large maxDepth value is preferred since tree averaging yields an increase in overall bias.
  • A large maxDepth value is preferred since tree averaging yields an increase in overall variance.

About Machine Learning

Check Also

Python for Data Science Cognitive Class Exam Answers:-

Course Name:- Python for Data Science Module 1. Python Basics Question 1. What is the …

Leave a Reply

Your email address will not be published. Required fields are marked *