Q: Which of the following statements accurately describe feature engineering? Select all that apply.
- Feature transformation involves selecting the features in the data that contribute the most to predicting the response variable.
- Feature engineering involves selecting, transforming, or extracting elements from within raw data.
- In feature engineering, a data professional may use their practical, statistical, and data science knowledge.
- Feature extraction involves taking multiple features to create a new one that will improve the accuracy of the algorithm.
Q: A data professional resolves a class imbalance in a very large
dataset. They alter the majority class by using fewer of the original data
points in order to produce a split that is more even. What does this scenario
describe?
- Upsampling
- Merging
- Downsampling
- Smoothing
Q: Fill in the blank: Customer churn is the business term that describes
how many customers stop _____ and at what rate this occurs.
- researching a company’s offerings
- using a product or service
- sharing feedback with a company
- reviewing items online
Q: Naive Bayes is a supervised classification technique that assumes
independence among predictors. What is the meaning of this concept?
- The value of a predictor variable on a given class is dependent upon the values of other predictors.
- The value of a predictor variable on a given class is measured by the values of other predictors.
- The value of a predictor variable on a given class is equal to the values of other predictors.
- The value of a predictor variable on a given class is not affected by the values of other predictors.
Q: Fill in the blank: When using a scaler to _____ the columns in a
dataset using MinMaxScaler, a data professional must fit the scaler to the
training data and transform both the training data and the test data using that
same scaler.
- customize
- filter
- sort
- normalize
Q: A data professional evaluates a model’s performance and considers how
it can be improved. Which PACE stage does this scenario describe?
- Analyze
- Plan
- Construct
- Execute
Q: In the model-development process, which type of feature is useful by
itself because it contains information that will be useful when forecasting the
target?
- Redundant
- Irrelevant
- Predictive
- Interactive
Q: Fill in the blank: Log normalization is useful when working with a
model that cannot manage continuous variables with _____ distributions.
- Binomial
- probability
- normal
- skewed
Q: A data professional discovers that the dataset they are working with
contains a class imbalance. The majority class comprises 90% of the data and
the minority class comprises 10% of the data. Which of the following statements
best describes the impact of this class imbalance?
- Major issues should not arise if the majority class makes up 10% or less of the dataset.
- Major issues should not arise because the data has a 50-50 split of outcomes.
- Major issues will arise if the data professional decides to rebalance the dataset.
- Major issues will arise because the majority class makes up 90% or more of the dataset.
Q: Fill in the blank: Customer churn is a business term that describes
how many customers stop _____ and at what rate this occurs.
- writing positive reviews about a company
- doing business with a company
- returning items to a company
- contacting a company’s customer relations department
Q: What does Bayes’s theorem enable data professionals to calculate?
- Data accuracy
- Posterior probability
- Causation
- Margin of error
Q: Fill in the blank: When normalizing the columns in a dataset using
MinMaxScaler, the columns’ maximum value scales to one, and the minimum value
scales to _____. Everything else falls somewhere in between.
- .5
- -1
- 0.1
- 0
Q: In the model-development process, which type of feature is not
useful by itself for predicting the target variable, but becomes predictive in
conjunction with other features?
- Predictive
- Irrelevant
- Redundant
- Interactive
Q: Naive Bayes’s theorem enables data professionals to calculate
posterior probability for a data project. What does posterior probability
describe?
- The likelihood of an event occurring after taking into consideration all new, relevant observations and information
- The likelihood of an event occurring after taking into consideration only the most suitable observations and information
- The likelihood of an event occurring based upon only observations and information that align with current hypotheses
- The likelihood of an event occurring based upon the observations and information that were available at the start of the data project
Q: A data professional assesses a business need in order to determine
what type of model is best suited to a project. Which PACE stage does this
scenario describe?
- Analyze
- Construct
- Execute
- Plan
Q: Fill in the blank: Log normalization involves taking the log of a
_____ feature and making the data more effective for modeling.
- Skewed
- continuous
- normal
- probable
Q: Fill in the blank: Log normalization involves reducing _____ in
order to make and making the data more effective for modeling.
- Probability
- skew
- continuity
- normality
Q: In the model-development process, which type of feature does not
contain any useful information for predicting the target variable?
- Predictive
- Irrelevant
- Conducive
- Relevant
Q: Which of the following statements accurately describe feature
engineering? Select all that apply.
- Feature engineering does not involve using a data professional’s statistical knowledge.
- Feature engineering may involve transforming the properties of raw data.
- In feature engineering, feature selection involves choosing the features in the data that contribute the most to predicting the response variable.
- In feature engineering, feature extraction involves taking multiple features to create a new one that will improve the accuracy of the algorithm.
Q: Which of the following statements accurately describe the general
categories of feature engineering? Select all that apply.
- Feature selection involves taking multiple features to create a new one that will improve the accuracy of the algorithm.
- Feature extraction involves choosing the features in the data that contribute the most to predicting the response variable.
- Feature transformation involves modifying existing features in a way that improves accuracy when training a model.
- The three general categories of feature engineering are selection, extraction, and transformation.
Q: A data professional works with a dataset for a project with their
company’s human resources team. They discover that the dataset has a predictor
variable that contains more instances of one outcome than another. What will
occur as a result of this scenario?
- Class imbalance
- Inconsistent data
- Incompatibility
- Redundancy
Q: A data professional examines a dataset to reveal key details about
the data that will help inform the plans for building a model. Which PACE stage
does this scenario describe?
- Execute
- Plan
- Construct
- Analyze
Q: Fill in the blank: When normalizing the columns in a dataset using
MinMaxScaler, the columns’ maximum value scales to _____, and the minimum value
scales to zero. Everything else falls somewhere in between.
- 10
- .5
- 100
- 1
Q: Fill in the blank: Customer _____ is the business term that
describes how many customers stop using a product or service, or stop doing
business with a company altogether, and at what rate this occurs.
- Churn
- exchange
- retention
- transfer
Q: Fill in the blank: Naive Bayes is a supervised classification
technique that is based on Bayes’ Theorem, with an assumption of _____ among
predictors.
- Interdependence
- even distribution
- clear hierarchy
- independence