Q: A junior data analyst uses tree-based learning for a sales and marketing
project. Currently, they are interested in the section of the tree that
represents where the first decision is made. What are they examining?
- Branches
- Leaves
- Roots
- Splits
Explanation: At this moment, they are investigating the tree's roots. The root node of a decision tree is the point at which the first choice is taken, and it is from this point that the tree branches out into its many different leaves and branches.
Q: What are some disadvantages of decision trees? Select all that apply.
- Preparing data to train a decision is a complex process involving
significant preprocessing
- Decision trees require assumptions regarding the distribution of
underlying data.
- Decision trees can be particularly susceptible to overfitting.
- When new data is introduced, decision trees can be less effective at
prediction.
Explanation: The fact that this applies to a wide variety of machine learning models is not exclusive to decision trees. If we compare decision trees to other models, such as linear regression, which involves assumptions of linearity, homoscedasticity, and normality of residuals, we can see that decision trees do not make significant assumptions about the distribution of the data.
Q: Which section of a decision tree is where the final prediction is
made?
- Decision node
- Split
- Leaf node
- Root node
Explanation: The Leaf node of a decision tree is the component of the tree that is responsible for making the ultimate prediction.
Q: In a decision tree ensemble model, which hyperparameter controls how
many decision trees the model will build for its ensemble?
- max_features
- max_depth
- n_trees
- n_estimators
Explanation: Within the context of a decision tree ensemble model, the hyperparameter known as n_estimators is responsible for determining the number of decision trees that the model will construct for its ensemble.
Q: What process uses different “folds” (portions) of the data to train
and evaluate a model across several iterations?
- Grid search
- Model validation
- Cross validation
- Proportional verification
Explanation: Cross-validation refers to the practice of using several "folds" (portions) of the data to train and assess a model throughout many iterations.
Q: Which of the following statements correctly describe ensemble
learning? Select all that apply.
- When building an ensemble using different types of models, each should
be trained on completely different data.
- Predictions using an ensemble of models can be accurate even when the
individual models are barely more accurate than a random guess.
- Ensemble learning involves aggregating the outputs of multiple models to
make a final prediction.
- If a base learner’s prediction is only slightly better than a random
guess, it is called a “weak learner.”
Explanation: The first statement, which states that "when building an ensemble using different types of models, each should be trained on completely different data," is not accurate. Ensemble learning allows for the training of models on the same data, on various subsets of the data, or by making use of distinct aspects of the data. However, each model doesn't need to be trained on data that is wholly distinct from the others.
Q: Fill in the blank: A random forest is an ensemble of decision-tree
_____ that are trained on bootstrapped data.
- Statements
- Observations
- base learners
- variables
Explanation: An ensemble of decision-tree-based learners that are trained on bootstrapped data is what an ensemble of random forests is called.
Q: What are some benefits of boosting? Select all that apply.
- Boosting is the most interpretable model methodology.
- Boosting is a powerful predictive methodology.
- Boosting can handle both numeric and categorical features.
- Boosting does not require the data to be scaled.
Explanation: The statement that "boosting is the most interpretable model methodology" is not accurate since, similar to other ensemble methods, boosting may be less interpretable than simpler models. In addition, the statement that "boosting does not require the data to be scaled" is not accurate since this is contingent upon the particular implementation and the kind of boosting algorithm that is used. Even though some boosting algorithms, such as decision-tree-based boosting techniques (for example, Gradient Boosting), do not need scaling, other boosting algorithms could benefit from having scaled data.
Q: Which of the following statements correctly describe gradient
boosting? Select all that apply.
- Gradient boosting machines cannot perform classification tasks.
- Gradient boosting machines have many hyperparameters.
- Gradient boosting machines do not give coefficients or directionality
for their individual features.
- Gradient boosting machines are often called black-box models because
their predictions can be difficult to explain.
Explanation: Gradient-boosting computers can carry out classification jobs. In point of fact, people utilize them rather often for issues involving regression as well as classification. Even though they do not offer coefficients in the same manner as linear models do, gradient-boosting machines can provide importance scores for features. These scores might reflect how relevant the characteristics are to the predictions made by the model. Consequently, although they are believed to be rather opaque, they do really provide some insight into the significance of the features.
Q: A data professional uses tree-based learning for an operations
project. Currently, they are interested in the nodes at which the trees split.
What type of nodes do they examine?
Explanation: It is common practice in tree-based learning to refer to the nodes at which the trees divide as decision nodes or internal nodes of the tree. A decision is made based on the value of a characteristic at these nodes, which indicate locations in the tree where choices are made.
Q: What are some benefits of decision trees? Select all that apply.
- When working with decision trees, overfitting is unlikely.
- When preparing data to train a decision tree, very little preprocessing
is required.
- Decision trees enable data professionals to make predictions about
future events based on currently available information.
- Decision trees require no assumptions regarding the distribution of
underlying data.
Explanation: Sometimes the statement "When working with decision trees, overfitting is unlikely" is not accurate; decision trees may overfit if they are not appropriately limited or pruned.In most cases, the statement that "very little preprocessing is required when preparing data to train a decision tree" is not accurate. Certain preprocessing, such as handling missing values, encoding categorical variables, or scaling numeric features, may be required depending on the data and the particular implementation of decision trees.
Q: In a decision tree, what type(s) of nodes can decision nodes point
to? Select all that apply.
- Split
- Root node
- Leaf node
- Decision node
Explanation: There are nodes in the tree that branch off depending on decision criteria, and these are the nodes.To make predictions or judgments, they are the terminal nodes.
Q: In a decision tree model, which hyperparameter sets the threshold
below which nodes become leaves?
- Min child weight
- Min samples tree
- Min samples split
- Min samples leaf
Explanation: The hyperparameter known as Min samples leaf (or Min samples_leaf, depending on the implementation) is the one that determines the minimum value below which nodes are considered to be leaves in a decision tree model. Controlling the minimum number of samples that must be present at a leaf node is the responsibility of this parameter. Following a split, a node will transition into a leaf node rather than being further divided if it has a lower number of samples than this threshold.
Q: When might you use a separate validation dataset? Select all that
apply.
- If you have very little data.
- If you want to choose the specific samples used to validate the
model.
- If you have a very large amount of data.
- If you want to compare different model scores to choose a champion
before predicting on test holdout data.
Explanation: In most cases, this is done to analyze and compare the performance of several models on data that has not been used for training or testing. This helps to pick the model that performs the best, which can then be utilized for further evaluation or deployment. If you want to choose the particular samples that went into validating the model. Using a separate validation dataset is something that you may do in some circumstances, especially when you need control over the individual samples that are used for validation.
Q: What tool is used to confirm that a model achieves its intended
purpose by systematically checking combinations of hyperparameters to identify
which set produces the best results, based on the selected metric?
- GridSearchCV
- Model validation
- Cross validation
- Hyperparameter verification
Explanation: GridSearchCV, which stands for Grid Search Cross-Validation, is the tool that analyzes several combinations of hyperparameters to determine which set of parameters yields the best results depending on a certain measure. GridSearchCV conducts a comprehensive search over the hypothesized values of an estimator's hyperparameters and then uses cross-validation to test these values in order to discover the optimal combination.
Q: Which of the following statements correctly describe ensemble
learning? Select all that apply.
- If a base learner’s prediction is equally effective as a random guess,
it is a strong learner.
- It’s possible to use the same methodology for each contributing model,
as long as there are numerous base learners.
- Ensemble learning involves building multiple models.
- It’s possible to use very different methodologies for each contributing
model.
Explanation: The process of ensemble learning does, in fact, entail the creation of numerous models and the combination of their predictions to enhance accuracy overall. When it comes to each contributing model, it is feasible to employ a significantly diverse methodology. When it comes to creating the ensemble, ensemble learning enables the use of a wide variety of models or techniques, which may result in improved generalization and robustness respectively.
Q: Which of the following statements correctly describe gradient
boosting? Select all that apply.
- Gradient boosting machines build models in parallel.
- Gradient boosting machines tell you the coefficients for each feature.
- Gradient boosting machines work well with missing data.
- Gradient boosting machines do not require the data to be scaled.
Explanation: Unlike some other algorithms, gradient boosting techniques can accommodate missing data by adding it directly into the tree-building process. This is a feature that sets them apart from other algorithms. One does not need to scale the data to use gradient boosting machines. In contrast to some algorithms, gradient boosting machines, particularly tree-based ones such as XGBoost or LightGBM, do not need scaling of the input features. This is important since certain techniques require feature scaling.
Q: Which of the following statements accurately describe decision
trees? Select all that apply.
- Decision trees are equally effective at predicting both existing and new
data.
- Decision trees work by sorting data.
- Decision trees require no assumptions regarding the distribution of
underlying data.
- Decision trees are susceptible to overfitting.
Explanation: The fact that decision trees do not presume any particular distribution of the data is a feature that is useful in a variety of circumstances. In the absence of appropriate pruning or constraint, decision trees have the potential to overfit the training data.
Q: What is the only section of a decision tree that contains no
predecessors?
- Leaf node
- Root node
- Decision node
- Split based on what will provide the most predictive power.
Explanation: The root node is the sole component of a decision tree that does not have any predecessors as its ancestors. Since this is the point at which the decision tree begins, it does not have any nodes that come before it; hence, it does not have any predecessors.
Q: In a decision tree, nodes are where decisions are made, and they are
connected by edges.
Explanation: The nodes in a network are the sites at which choices are made based on the values of attributes. Edges are the pathways or branches that link nodes, and they do this by connecting them according to the decision criteria.