Q: Which of the following statements correctly describe key aspects of k-means? Select all that apply.
- The clustering process has four steps that repeat until the model disperses evenly.
- Poor clustering is caused by local minima, which means there is not an appropriate distance between clusters.
- K-means groups unlabeled data into k clusters based on their similarities.
- K-means organizes data by creating a logical scheme to make sense of it.
Q: A data professional chooses the number of centroids to use in a
k-means model and places them in the data space. Which step of the
model-creation process is the data professional working in?
- Step one
- Step two
- Step three
- Step four
Q: Fill in the blank: To evaluate the intracluster space in a
k-means model, a data professional uses the inertia metric. This is the _____
of the squared distances between each observation and its nearest centroid.
- Ratio
- difference
- average
- sum
Q: A data analyst creates a k-means model. They observe a silhouette
score coefficient with a value of zero. What conclusion should they draw in
this scenario?
- The observation is on the boundary between clusters.
- The observation may be in the wrong cluster.
- The observation is suitably within its own cluster and well separated from other clusters.
- The observation is in an appropriate cluster.
Q: Which Python function fits a k-means model for multiple values of k
by calculating the inertia for each value, appending it to a list, and
returning that list?
- k-means inertia
- silhouette score
- labels
- cluster_image
Q: Which of the following statements accurately describe the elbow
method? Select all that apply.
- With k-means models, the elbow method is used to find all similar values of k.
- The model that will provide the most meaningful clustering of data has inertia that is dropping significantly with added clusters.
- The elbow method helps data professionals decide which clustering gives the most meaningful model.
- The elbow method uses a line plot to visually compare the inertias of different models.
Q: Which of the following statements correctly describe key aspects of
k-means? Select all that apply.
- The value of k is a standard that never changes.
- K-means is an unsupervised partitioning algorithm.
- To avoid poor clustering, data professionals run a k-means model with different starting positions for the centroids.
- K-means clusters are defined by a central point, called a centroid.
Q: A junior data analyst building a K-means model recalculates the
centroid of each cluster. Which step of the model-creation process are they
working in?
- Step one
- Step two
- Step three
- Step four
Q: Which Python function would a data professional use to compare the
inertias of multiple k values?
- k-means inertia
- labels
- silhouette score
- cluster_image
Q: Which of the following statements accurately describe the elbow
method? Select all that apply.
- When using the elbow method, data professionals aim to find the smoothest part of the curve.
- The elbow method uses a line plot to visually compare the inertias of different models.
- There is not always an obvious elbow.
- The sharpest bend in the curve is usually the model that will provide the most meaningful clustering of data.
Q: A data analytics team building a k-means model assigns each data
point to its nearest centroid. Which step of the model-creation process are
they working in?
- Step one
- Step two
- Step three
- Step four
Q: Fill in the blank: In order to evaluate the _____ space in a k-means
model, a data professional uses the inertia metric. This is the sum of the
squared distances between each observation and its nearest centroid.
- Intracluster
- midpoint
- converged
- intercluster
Q: Which of the following statements correctly describe key aspects of
k-means? Select all that apply.
- K-means is a supervised partitioning algorithm.
- K-means organizes unlabeled data into clusters.
- The position of the k-means centroid is the center of the cluster, also known as the mathematical mean.
- The k-means clustering process has four steps that repeat until the model converges.
Q: Fill in the blank: In order to evaluate the intracluster space in a
k-means model, a data professional uses the _____ metric. This is the sum of
the squared distances between each observation and its nearest centroid.
- spread
- inertia
- convergence
- silhouette score
Q: A junior data professional creates a k-means model. They observe a
silhouette score coefficient with a value close to negative one.? What
conclusion should they draw in this scenario?
- The observation is in the correct cluster.
- The observation is on the boundary between clusters.
- The observation is suitably within its own cluster and well separated from other clusters.
- The observation may be in the wrong cluster.
Q: When using k-means, the value of k is always the same, no matter how
many clusters are necessary for a project.
- True
- False
Q: What are the characteristics of an effective clustering model?
Select all that apply.
- The clusters are overlapping.
- The clusters are clearly identifiable.
- Within each intracluster, the points are close to each other.
- Within each intercluster, there is lots of empty space.
Q: Fill in the blank: Silhouette score is the _____ of the silhouette
coefficients of all the observations in a model.
- value
- sum
- range
- mean