Module 1: Introduction to Statistics

Q: A community college wants to improve student engagement with their new class schedule. They send a text alert to all students with a link to the same webpage. But half of the students get a text with information about the professors, and half get a text with information about newly available class times. What does this scenario describe?

Time series analysis
Hypothesis testing
Regression analysis
A/B testing

Explanation: various sets of students get various versions of a text alert from the community college. One version of the notice is about the instructors, while the other version is about the class hours.To determine which version generates the highest level of engagement (for example, more clicks on the link or more interaction with the site), they conduct an analysis.

Q: Which of the following statements correctly describe key elements of inferential statistics? Select all that apply.

Sample size has minimal impact on the validity of test results.
A statistical population may refer to people, objects, or events.
Data professionals use inferential statistics to predict behaviors.
A sample is a subset of the larger population.

Explanation: When it comes to inferential statistics, the term "population" refers to the complete group of individuals, things, or occurrences that you are interested in researching or drawing conclusions about. Data professionals can make predictions or draw inferences about a population based on a sample of data obtained from that population via the use of analytical techniques known as inferential statistics. Taking a selection of persons or items that are typical of the greater population of interest is what we mean when we talk about a sample. To draw conclusions or make generalizations about the population, inferential statistics makes use of data obtained from samples.

Q: A data team at a high-tech manufacturer wants to better understand customer purchases of webcams over the past five years. Their dataset contains about 3.5 million rows of data about different customers and webcam products. The data team uses summary statistics to better understand the data. What does this scenario describe?

Inferential statistics
Statistical significance
Confidence intervals
Descriptive statistics

Explanation: Descriptive statistics are a kind of statistical analysis that includes the use of tools to summarize and describe the significant aspects of a dataset. This contains measurements such as the mean, median, mode, range, and standard deviation, in addition to additional summary metrics that provide insights into the data.

Q: Fill in the blank: A _____ is a characteristic of a population.

sample
parameter
measure
range

Explanation: A numerical summary of a population is what is known as a parameter in the field of statistics. It provides a description of a certain feature of the complete population, such as the mean of the population, the standard deviation of the population, or the percentage of the population.

Q: A data professional working at an online store analyzes data for a monthly business intelligence report. They calculate the average time customers spend on the store’s website. What descriptive statistic are they using?

Range
Mean
Mode
Standard deviation

Explanation: The mean, which is often referred to as the average, is determined by adding up all of the values in a dataset and then dividing that total by the total number of values. To provide a measurement of the usual value, it is used to provide a description of the central tendency of a dataset.

Q: A data professional works with the following dataset: 2, 2, 4, 7, 10. What is the mean of the dataset?

Q: What concept best describes the standard deviation, variance, and range?

Measures of central tendency
Measures of frequency
Measures of dispersion
Measures of position

Explanation: Provides a measurement of the typical amount of variance or dispersion that data points have concerning the mean.Calculates the average of the squared deviations from the mean and represents it.Provides a straightforward method for determining the degree of dispersion in a dataset by representing the difference between the highest and lowest values in the dataset.

Q: A data professional is analyzing wind speed data. Their dataset includes daily speeds in miles per hour over six months: 1, 8, 9, 14, 22, 28, 35, 46, 55, 60, 71. What is the range of their dataset?

31.7
28
70
349

Q: A data professional is analyzing data about annual work income in dollars. They divide the data into quartiles: Q1 = $40,000, Q2 = $55,000, Q3 = $70,000. What percentage of the values in their dataset are above $70,000?

5%
50%
25%
75%

Q: If you apply the describe() function to numerical data, the results will include which of the following descriptive statistics? Select all that apply.

Range
Median
Mean
Standard deviation

Explanation: Therefore, when you use the describe() function on numerical data, it will often return a summary that contains some statistics, such as the range, median, mean, and standard deviation, among other statistics.

Q: A grocery delivery business wants to improve customer response rates for their company’s monthly postcard mailer. They send a postcard with the same information to all customers. But half of the customers get a headline about faster delivery speeds, and half get a headline about more delivery drivers available in their area. What does this scenario describe?

Regression analysis
Hypothesis testing
Time series analysis
A/B testing

Explanation: To determine which of two versions of anything (in this example, postcard headlines) works better, A/B testing, which is also known as split testing, is a procedure that compares the two versions. We send each version to a distinct set of consumers (in this example, half of them get one headline and the other half receive another), and then we evaluate the replies they provide to determine which version produces better results (in this case, increased customer response rates).

Q: Fill in the blank: A characteristic of a _____ is a parameter.

sample
measure
range
population

Explanation: A population is the complete group of people, things, or events that you wish to generalize about. In statistics, the term "population" refers to the entire group. In statistical terms, a parameter is a numerical summary of a population. It provides a description of a certain feature of the complete population, such as the mean of the population, the standard deviation of the population, or the percentage of the population.

Q: A data analytics team collects responses from a customer satisfaction survey that asked customers to rate their experience from 1 to 10. The analytics team arranges the values in the dataset from worst (1) to best (10). Then, they identify the middle value. What descriptive statistic are they using?

Mode
Minimum
Mean
Median

Explanation: When the values in a dataset are ordered in either ascending or descending order, the median is the value that is located in the center of the collection. It creates two equal halves of the dataset and splits it in. It is the middle value that is considered to represent the median when there are an odd number of observations. When the number of observations is even, the median is calculated by taking the average of the two values that are in the center.

Q: A data professional works with the following dataset: 2, 2, 4, 7, 10. What is the median of the dataset?

Explanation: To determine the median of a dataset, you must first align the values in ascending order and then locate the value that falls exactly in the center of the collection. It is the middle value that is considered to represent the median when there are an odd number of observations. When the number of observations is even, the median is calculated by taking the average of the two values that are in the center.

Q: A data professional is analyzing weather data. Their dataset includes daily rainfall in inches for the previous five days: 1, 2.4, 3.2, 5, 2.8. What is the range of their dataset?

Explanation: The choices seem to include numbers that are unique to the dataset instead of the range that is unique to the dataset.

Q: A data professional is analyzing data about annual work income in dollars. They divide the data into quartiles: Q1 = $40,000, Q2 = $55,000, Q3 = $70,000. What value is the 50th percentile of their dataset?

$30,000
$40,000
$55,000
$70,000

Explanation: Because of this, $55,000 is the number that represents the 50th percentile of their information, or the typical yearly income from work. This means that half of the yearly incomes in their list are less than $55,000 and the other half are more than $55,000.

Q: Which of the following statements correctly describes key elements of inferential statistics? Select all that apply.

Sample size has minimal effect on the validity of test results.
Data professionals use inferential statistics to predict behaviors.
The dataset that a sample is drawn from is called the population.
A sample can be used to draw conclusions about an entire population.

Explanation: With inferential statistics, people who work with data can guess or come to a conclusion about a whole population based on a small amount of data from that population. This is the main idea behind inferential statistics. Data workers can figure out traits or habits of the whole population by looking at a group that is representative of the whole population. This sentence is not true. In inferential statistics, sample size is very important. In general, bigger sample sizes give more accurate estimates of population factors and lower the range of predictions.

Q: A data professional working for a water conservancy researches household water usage in a large city. Their dataset contains about 800,000 rows of data capturing how much water each household uses in a month. The data professional creates visualizations to quickly understand the data and create a summary for stakeholders. What does this scenario describe?

Statistical significance
Confidence intervals
Inferential statistics
Descriptive statistics

Explanation: Descriptive statistics are a kind of statistical analysis that includes the use of tools to summarize and describe the primary characteristics of a dataset. In addition to graphical representations such as histograms, box plots, and scatter plots, this include statistical metrics such as the mean, median, mode, range, variance, and standard deviation measures.

Q: A company conducts an employee satisfaction survey. Employees rate their work experience as unacceptable, average, good, or excellent. The most frequently occurring value in the survey is excellent. What descriptive statistics concept best describes this value?

Standard deviation
Mode
Median
Mean

Explanation: A dataset's mode is the value that occurs the most often in the data collection. To put it another way, it is the category or value that represents the greatest frequency of occurrence.

Q: Which of the following descriptive statistics are measures of dispersion? Select all that apply.

Percentile
Standard deviation
Variance
Range

Explanation: It is a measurement that determines the degree of variance or dispersion that exists between a group of results and the mean.Represents the average of the squared deviations from the mean, which is another measure of how far off the values are from one another.Provides a straightforward method for determining the degree of dispersion in a dataset by representing the difference between the highest and lowest values in the dataset.

Q: A data professional is analyzing data about annual work income in dollars. They divide the data into quartiles: Q1 = $40,000, Q2 = $55,000, Q3 = $70,000. What is the interquartile range, or IQR, of their dataset?

$15,000
$30,000
$40,000
$55,000

Q: A data team at a car dealership wants to improve open rates for their company’s weekly email campaign. They send two versions of the weekly email. Half of the customers get a subject line about new car colors, and half get a subject line about new car interiors. What does this scenario describe?

Hypothesis testing
Regression analysis
A/B testing
Time series analysis

Explanation: Alternately referred to as split testing, A/B testing is a technique that involves comparing two different versions of anything (in this example, subject lines for emails) in order to identify which one is more successful. Each version is sent to a distinct set of clients, and the replies they provide (called open rates in this instance) are compared in order to determine which version produces the most favorable outcomes.

Q: Fill in the blank: A _____ is a characteristic of a sample.

range
parameter
measure
statistic

Explanation: For the purpose of data collection, a sample is a subset of the population that is available.Statistical analysis is the process of calculating a numerical summary or measure based on a sample. Estimation of a population parameter that is unknown is its purpose.

Module 1: Introduction to Statistics

Post a Comment

Contact Form