Q: Which of the following statements accurately describe feature
engineering? Select all that apply.
- Feature transformation involves selecting the features in the data that
contribute the most to predicting the response variable.
- Feature engineering involves selecting, transforming, or extracting
elements from within raw data.
- In feature engineering, a data professional may use their practical,
statistical, and data science knowledge.
- Feature extraction involves taking multiple features to create a new one
that will improve the accuracy of the algorithm.
Explanation: When it comes to improving the performance of a model, feature engineering refers to the act of choosing, altering, or synthesizing additional features from raw data. To properly design features that boost the predictive ability of machine learning models, feature engineering requires domain knowledge as well as skills in statistics and data science. The term "feature transformation" often refers to the process of altering the size or distribution of features, and examples of this include normalization and logarithmic transformation. Additionally, it does not necessarily imply picking the traits that are the most predictive. Extraction of features is a specialized methodology that involves the creation of new features from existing ones, often via the use of dimensionality reduction techniques such as principal component analysis (PCA). In addition to feature extraction, it is not the sole component of feature engineering, which involves a wider variety of methods than simply feature extraction specifically.
Q: A data professional resolves a class imbalance in a very large
dataset. They alter the majority class by using fewer of the original data
points in order to produce a split that is more even. What does this scenario
describe?
- Upsampling
- Merging
- Downsampling
- Smoothing
Explanation: The term "downsampling" refers to the situation in which a data professional makes adjustments to the majority class by using a smaller number of the initial data points to rectify the unequal distribution of the classes. The process of downsampling entails lowering the number of instances that belong to the majority class to generate a distribution that is more evenly distributed across the classes. In most cases, this is done to enhance the performance of machine learning models, which, in situations where there is a significant class imbalance, may be skewed toward the class that constitutes the majority.
Q: Fill in the blank: Customer churn is the business term that describes
how many customers stop _____ and at what rate this occurs.
- researching a company’s offerings
- using a product or service
- sharing feedback with a company
- reviewing items online
Explanation: The following is a blank: In the world of business, the phrase "customer churn" refers to the number of customers that stopped using a certain product or service and the pace at which this happens.
Q: Naive Bayes is a supervised classification technique that assumes
independence among predictors. What is the meaning of this concept?
- The value of a predictor variable on a given class is dependent upon the
values of other predictors.
- The value of a predictor variable on a given class is measured by the
values of other predictors.
- The value of a predictor variable on a given class is equal to the
values of other predictors.
- The value of a predictor variable on a given class is not affected by
the values of other predictors.
Explanation: When it comes to Naive Bayes classification, the assumption of independence among predictors (features) indicates that each predictor contributes to the probability of a certain class in a manner that is independent of the other predictors of the same class. With the help of this simplifying assumption, Naive Bayes can effectively calculate probabilities and generate predictions based on the presence or absence of certain characteristics, provided that the class is known.
Q: Fill in the blank: When using a scaler to _____ the columns in a
dataset using MinMaxScaler, a data professional must fit the scaler to the
training data and transform both the training data and the test data using that
same scaler.
- customize
- filter
- sort
- normalize
Explanation: To normalize the columns in a dataset using MinMaxScaler, a data professional must first fit the scaler to the training data and then convert both the training data and the test data using the same scaler. This is necessary to get the desired results. Normalization, which makes use of methods such as MinMaxScaler, guarantees that all features are on the same scale, which is normally between 0 and 1. This is an essential component for the optimum performance of a great number of machine learning algorithms.
Q: A data professional evaluates a model’s performance and considers how
it can be improved. Which PACE stage does this scenario describe?
- Analyze
- Plan
- Construct
- Execute
Explanation: The situation that was described corresponds to the Analyze stage in the PACE framework. In this stage, a data professional assesses the performance of a model and considers ways in which it might be improved.
Q: In the model-development process, which type of feature is useful by
itself because it contains information that will be useful when forecasting the
target?
- Redundant
- Irrelevant
- Predictive
- Interactive
Explanation: For the purposes of forecasting or modeling, a predictive feature is a characteristic that makes a direct contribution to the prediction of the target variable and offers significant content.
Q: Fill in the blank: Log normalization is useful when working with a
model that cannot manage continuous variables with _____ distributions.
- Binomial
- probability
- normal
- skewed
Explanation: It is common practice to apply log normalization, also known as log transformation, to continuous variables that have a skewed distribution, such as right-skewed or positively skewed distributions. Taking the logarithm of these variables makes their distribution more symmetrical, which allows it to better match the requirements of some models that demand normally distributed data, such as linear regression and logistic regression. Examples of these models are logistic regression and linear regression.
Q: A data professional discovers that the dataset they are working with
contains a class imbalance. The majority class comprises 90% of the data and
the minority class comprises 10% of the data. Which of the following statements
best describes the impact of this class imbalance?
- Major issues should not arise if the majority class makes up 10% or less
of the dataset.
- Major issues should not arise because the data has a 50-50 split of
outcomes.
- Major issues will arise if the data professional decides to rebalance
the dataset.
- Major issues will arise because the majority class makes up 90% or more
of the dataset.
Explanation: Therefore, when the majority class accounts for 90% of the dataset and the minority class accounts for only 10%, significant problems can arise in the performance and interpretation of the model. This calls for careful consideration and possibly the application of techniques to address class imbalance, such as resampling methods (for example, oversampling the minority class or undersampling the majority class) or the utilization of appropriate evaluation metrics (for example, precision, recall, and F1-score).
Q: Fill in the blank: Customer churn is a business term that describes
how many customers stop _____ and at what rate this occurs.
- writing positive reviews about a company
- doing business with a company
- returning items to a company
- contacting a company’s customer relations department
Explanation: In the world of business, the phrase "customer churn" refers to the number of customers who discontinue doing business with a certain organization and the pace at which this happens.
Q: What does Bayes’s theorem enable data professionals to calculate?
- Data accuracy
- Posterior probability
- Causation
- Margin of error
Explanation: A basic theorem in probability theory, Bayes' theorem predicts the likelihood of an occurrence based on previous knowledge of circumstances that could be relevant to the event. Bayes' theorem was named after the mathematician who developed it. The outcome is a posterior probability, which is the consequence of updating our views (prior probabilities) about an event in light of new data (likelihood), which results in a posterior probability.
Q: Fill in the blank: When normalizing the columns in a dataset using
MinMaxScaler, the columns’ maximum value scales to one, and the minimum value
scales to _____. Everything else falls somewhere in between.
Explanation: Using MinMaxScaler to normalize the columns in a dataset results in the highest value of the columns scaling to one, while the smallest value scales to zero. Additionally, everything else is located somewhere in the middle of these numbers, which guarantees that all characteristics are on the same scale between 0 and 1.
Q: In the model-development process, which type of feature is not
useful by itself for predicting the target variable, but becomes predictive in
conjunction with other features?
- Predictive
- Irrelevant
- Redundant
- Interactive
Explanation: Interactive features are those that interact with other features or with the target variable in a manner that increases the predictive potential of the model when it is examined in conjunction with other characteristics. Depending on the circumstances, they may include the combination or modification of variables that, when considered alone, do not have any predictive value but, when considered together, contribute considerably.
Q: Naive Bayes’s theorem enables data professionals to calculate
posterior probability for a data project. What does posterior probability
describe?
- The likelihood of an event occurring after taking into consideration all
new, relevant observations and information
- The likelihood of an event occurring after taking into consideration
only the most suitable observations and information
- The likelihood of an event occurring based upon only observations and
information that align with current hypotheses
- The likelihood of an event occurring based upon the observations and
information that were available at the start of the data project
Explanation: Posterior probability refers to the likelihood of an event happening after taking into account any new facts or observations that have come to light. Under Bayes' theorem, it takes into account both the prior probability of the observed data together with the prior probability of the observed data given the hypothesis. These updated probabilities are very important in Bayesian inference as well as decision-making processes in the fields of statistics and data science.
Q: A data professional assesses a business need in order to determine
what type of model is best suited to a project. Which PACE stage does this
scenario describe?
- Analyze
- Construct
- Execute
- Plan
Explanation: Determine the nature of the issue, compile a list of needs, and devise a strategy to meet the requirements of the company. A few examples of this include gaining a grasp of the project's goals, defining the scope of the project, and identifying the modeling approaches or algorithms that are the most suitable to implement.
Q: Fill in the blank: Log normalization involves taking the log of a
_____ feature and making the data more effective for modeling.
- Skewed
- continuous
- normal
- probable
Explanation: It is common practice to apply log normalization, also known as log transformation, to features that have a skewed distribution (such as distributions that are right-skewed or positively skewed). It is possible to better match the requirements of some modeling approaches, such as linear regression or logistic regression, by taking the logarithm of these characteristics. This makes their distribution more symmetrical and allows them to better reflect the data. This transformation has the potential to enhance the performance of models as well as their interpretability, particularly when dealing with data that is skewed.
Q: Fill in the blank: Log normalization involves reducing _____ in
order to make and making the data more effective for modeling.
- Probability
- skew
- continuity
- normality
Explanation: The purpose of log normalization, also known as log transformation, is to lessen the skewness of data that follows a distribution that is skewed. When referring to the probability distribution of a real-valued random variable, the term "skewness" describes the asymmetry of the distribution. Taking the logarithm of skewed data is one way that the transformation may assist in making the distribution of the data more symmetrical. This can be advantageous for some modeling approaches that presume normalcy or need normally distributed data to achieve optimum performance.
Q: In the model-development process, which type of feature does not
contain any useful information for predicting the target variable?
- Predictive
- Irrelevant
- Conducive
- Relevant
Explanation: Not only do irrelevant characteristics not provide useful information to the process of forecasting the target variable, but they also can create noise in the computational model. To enhance the performance of the model, it is essential to recognize and eliminate elements that are not significant during the process of feature selection.
Q: Which of the following statements accurately describe feature
engineering? Select all that apply.
- Feature engineering does not involve using a data professional’s
statistical knowledge.
- Feature engineering may involve transforming the properties of raw
data.
- In feature engineering, feature selection involves choosing the features
in the data that contribute the most to predicting the response variable.
- In feature engineering, feature extraction involves taking multiple
features to create a new one that will improve the accuracy of the
algorithm.
Explanation: Feature engineering is a process that involves altering, producing, or choosing features from raw data to enhance the performance of a model. Feature extraction is a method that falls under the category of feature engineering. It involves the process of deriving new features from existing ones to gather any pertinent information and perhaps increase the quality of the model.
Q: Which of the following statements accurately describe the general
categories of feature engineering? Select all that apply.
- Feature selection involves taking multiple features to create a new one
that will improve the accuracy of the algorithm.
- Feature extraction involves choosing the features in the data that
contribute the most to predicting the response variable.
- Feature transformation involves modifying existing features in a way
that improves accuracy when training a model.
- The three general categories of feature engineering are selection,
extraction, and transformation.
Explanation: There is no error in this assertion. The process of changing or scaling existing features to make them more suited for modeling is part of the feature transformation process. To optimize the distribution of characteristics and their connection with the goal variable, this might include doing things like normalizing, standardizing, or performing mathematical adjustments (like log transformations).
Q: A data professional works with a dataset for a project with their
company’s human resources team. They discover that the dataset has a predictor
variable that contains more instances of one outcome than another. What will
occur as a result of this scenario?
- Class imbalance
- Inconsistent data
- Incompatibility
- Redundancy
Explanation: In the context of the predictor variable, class imbalance occurs when one class (outcome) is considerably more common than another class by a large margin. There is a possibility that this will have an impact on the performance of machine learning models, particularly those that are sensitive to the distribution of classes. Common problems that arise as a result of class imbalance include biased models that favor the majority class, decreased prediction performance for the minority class, and assessment criteria that are deceptive, such as accuracy.
Q: A data professional examines a dataset to reveal key details about
the data that will help inform the plans for building a model. Which PACE stage
does this scenario describe?
- Execute
- Plan
- Construct
- Analyze
Explanation: The stated scenario corresponds to the Analyze stage in the PACE framework. In this step, a data professional evaluates a dataset in order to uncover critical features that will further inform the plans for developing a model.
Q: Fill in the blank: When normalizing the columns in a dataset using
MinMaxScaler, the columns’ maximum value scales to _____, and the minimum value
scales to zero. Everything else falls somewhere in between.
Explanation: For the purpose of normalizing the columns in a dataset, the MinMaxScaler algorithm scales the highest value of the columns to 1, while the lowest value scales to 0. The remaining values are somewhere in the middle of these two extremes, which guarantees that all characteristics are scaled to a range that is between 0 and 1.
Q: Fill in the blank: Customer _____ is the business term that
describes how many customers stop using a product or service, or stop doing
business with a company altogether, and at what rate this occurs.
- Churn
- exchange
- retention
- transfer
Explanation: In the world of business, the word "customer churn" refers to the number of customers who cease using a product or service, or who stop doing business with a firm entirely, as well as the pace at which this happens.
Q: Fill in the blank: Naive Bayes is a supervised classification
technique that is based on Bayes’ Theorem, with an assumption of _____ among
predictors.
- Interdependence
- even distribution
- clear hierarchy
- independence
Explanation: Using Bayes' Theorem as its foundation, the Naive Bayes classification method is a supervised classification approach that makes the premise that predictors are independent individuals.