Introduction to Statistics Final Milestone
Discipline: Statistics
Type of Paper: Question-Answer
Academic Level: Undergrad. (yrs 3-4)
Paper Format: APA
Question
Introduction to Statistics Final Milestone
James participated in an archery competition. He was
allowed four attempts and was supposed to hit the bullseye in the center of the
board.
If the figure shows the positions of James' arrows, which of the following
would best classify the arrangement of arrows?
·
High accuracy and high precision
·
Low accuracy and low precision
·
High accuracy and low precision
·
Low accuracy and high precision
RATIONALE
The arrows are close to the center so they are accurate and they
are also close to one another, so they are precise as well.
CONCEPT
Accuracy and Precision in Measurements
2
The formula for the standard deviation of a sample is:
Select the true statement for the following data set that has a
mean of 8:
4, 6, 6, 6, 9, 9, 12, 12
Answer choices are rounded to the hundredths place.
·
The variance is 2.98 and the standard deviation is 8.86.
·
The variance is 8.86 and the standard deviation is 2.98.
·
The variance is 7.50 and the standard deviation is 2.98.
·
The variance is 8.86 and the standard deviation is 7.50.
RATIONALE
We can first calculate the variance of the data, , by using the part of the formula under the
square root:
Next, we can find the standard deviation, , by simply taking the square root of the
variance:
We can also use the statistical functions in Excel to quickly find the variance
and standard deviation.
CONCEPT
3
What is the probability of drawing a red card or a queen from a
standard deck of 52 cards?
·
·
·
·
RATIONALE
Since it is possible for a card to be both red and a queen,
these two events are overlapping. We can use the following formula:
In a standard deck of cards, half of the 52 cards are red,
so . There is a total of 4 Queens, so . Of the 4 queens, 2 are red and 2 are
black, so .
CONCEPT
"Either/Or" Probability for Overlapping Events
4
In which of these cases should the median be used?
·
When data has no outliers
·
When the data has nominal values
·
When the data has small variance
·
When the data has extreme values
RATIONALE
Since the mean uses the actual values in the data, it is most
affected by outliers and skewness. So, we only want to use the mean when
the data is symmetric as a measure of centrality. When the data is skewed
or has extreme values,
the median is a better measure since it is not as sensitive to these values.
CONCEPT
5
Eric is randomly drawing cards from a deck of 52. He first
draws a red card, places it back in the deck, shuffles the deck, and then draws
another card.
What is the probability of drawing a red card, placing it back in
the deck, and drawing another red card? Answer choices are in the form of
a percentage, rounded to the nearest whole number.
·
25%
·
13%
·
4%
·
22%
RATIONALE
Since Eric puts the card back and re-shuffles, the two events
(first draw and second draw) are independent of each other. To find the
probability of red on the first draw and second draw, we can use the following
formula:
Note that the probability of drawing a red card is or for each event.
CONCEPT
"And" Probability for Independent Events
6
A coin is tossed 50 times, and the number of times heads
comes up is counted.
Which of the following statements about the distributions of
counts and proportions is FALSE?
·
The count of getting heads from a sample proportion of
size 20 can be approximated with a normal distribution.
·
The distribution of the count of getting heads can be
approximated with a normal distribution.
·
The distribution of the count of getting tails can be
approximated with a normal distribution.
·
The count of getting heads is a binomial distribution.
RATIONALE
If we look at the counts from a large population of success and
failures (2 outcomes), this is called a binomial distribution, not a normal
distribution.
CONCEPT
Distribution of Sample Proportions
7
Select the correct statement regarding experiments.
·
A researcher can carefully control the explanatory
variables but not observe human responses.
·
A researcher can carefully control the explanatory
variables and observe human responses.
·
A researcher can observe the explanatory variables but not
control human responses.
·
A researcher can ignore explanatory variables and observe
human responses.
RATIONALE
The defining part of experimental setting is that the researcher
can control the setting and apply some treatment to observe how it affects an
outcome of interest. The responses by the participants are not controlled
by the researcher.
CONCEPT
Observational Studies and Experiments
8
The blood bank at a hospital has 1,200 units of blood, out
of which 37% units are of blood group B+. A clinical researcher randomly
selects 300 units of blood and finds that 33% of those are of blood group B+.
To test his result, he randomly selects 200 units of blood and finds that 40%
of those are of blood group B+.
Which of the following is the reason there is a difference between
the two percentages selected by the researcher?
·
Both samples suffered from non-response bias.
·
The samples were not random samples.
·
The sample sizes were both too small.
·
Random error; the numbers were different due to
variability inherent in sampling.
RATIONALE
When sampling, there is always some variability that
occurs. So, although the sample values are different, since they were
randomly chosen, the differences are simply due to the variability that comes
from sampling and not due to some systematic bias.
CONCEPT
9
A survey result shows that cell phone usage among
teenagers rose from 63% in 2006 to 71% in 2008.
Of the following choices, which statement about cell phone use
among teenagers is true?
·
Cell phone usage rose by 11.2 percentage points.
·
Cell phone usage rose by 8%.
·
Cell phone usage rose by 12.7%.
·
Cell phone usage rose by 12.7 percentage points.
RATIONALE
We can note that the absolute difference between 2006 and 2008
is 63% to 71% or 8 percentage points.
To get the percent difference we take the absolute difference and divide
by the initial value:
So we can say cell phone usage rose by 12.7%.
CONCEPT
Using Percentages in Statistics
10
Which of the following data types will be continuous?
·
The number of children younger than ten that visited a
planetarium last week
·
The total weight of apples harvested in the farm in a
season
·
The number of cars in 100 households
·
The letter grades students received on a class quiz
RATIONALE
The total weight of apples can take on any value and is
therefore continuous. The other measures can only take on a limited
number of values and are discrete.
CONCEPT
11
Which of the following is NOT a guideline for establishing
causality?
·
Look for cases where correlation exists between the
variables of a scatterplot.
·
Keep all variables the same to get duplicate results.
·
Take into consideration all the other possible causes.
·
Perform a randomized, controlled experiment.
RATIONALE
For causality, the association should be something we observe in
slightly varied conditions. So if all variables and conditions are the
same, this is not a way to support causality.
CONCEPT
12
Select the statement that correctly describes a normal
distribution.
·
It is a negatively skewed distribution, as the extreme
values are less than the median.
·
It is a uniform distribution, as all of the values have
equal frequency.
·
It is a symmetric distribution, as the mean and the median
are the same.
·
It is a positively skewed distribution, as the extreme
values are greater than the median.
RATIONALE
A normal distribution is a bell-shaped and symmetric
distribution. So it has a smooth peak, which tells us the mean and
median are the same.
CONCEPT
13
A shoe retailer decides to record the styles and sizes of
shoes that his customers choose. He records this data for an entire year by
keeping track of his customers' purchases.
Which statement accurately describes the type of data the shoe
retailer is collecting?
·
The shoe retailer is receiving raw data on shoe sizes and
styles from nearby shoe companies.
·
The shoe retailer is gathering available data because
customers tell him which shoe sizes and styles they prefer.
·
The shoe retailer is receiving available data on shoe
sizes and styles from nearby shoe companies.
·
The shoe retailer is gathering raw data because he is
recording shoe sizes and styles by himself.
RATIONALE
Since the retailer is gathering the data himself, this would be
an example of raw data.
CONCEPT
14
Rhonda is wondering if there is an association between the
number of hours she studies per week and the number of semester credits she is
enrolled in. The information is shown in the table below.
If Rhonda is taking four credits for the fall semester, how many hours per week
will she study?
·
2
·
4
·
8
·
5
RATIONALE
If we use the scatterplot and note the value above 4 credit
hours on the horizontal axis, we find this value is also 4 on the vertical
axis. She should expect to study 4 hours.
CONCEPT
15
What value of z* should be used
to construct an 88% confidence interval of a population mean? Answer choices
are rounded to the thousandths place.
·
1.555
·
1.645
·
1.175
·
1.220
RATIONALE
Using the z-chart to construct an 88% CI, this means that there
is 6% for each tail. The lower tail would be at 0.06 and the upper tail
would be at (1 - 0.06) or 0.94. The closest to 0.94 on the z-table is
between 0.9394 and 0.9406.
0.9394 corresponds with a z-score of 1.55.
0.9406 corresponds with a z-score of 1.56.
Taking the average of these two scores, we get a z-score of
1.555.
CONCEPT
16
A credit card company surveys 125 of its customers to ask
about satisfaction with customer service. The results of the survey, divided by
gender, are shown below.
Males |
Females |
|
Extremely
Satisfied |
25 |
7 |
Satisfied |
21 |
13 |
Neutral |
13 |
16 |
Dissatisfied |
9 |
14 |
Extremely
Dissatisfied |
2 |
5 |
If you were to choose a female from the group, what is the
probability that she is satisfied with the company's customer service? Answer
choices are rounded to the hundredths place.
·
0.38
·
0.62
·
0.24
·
0.13
RATIONALE
The probability of a person being "satisfied" given
she is a female is a conditional probability. We can use the following
formula:
Remember, to find the total number of females, we need to add
all values in this column: 7 + 13 + 16 + 14 + 5 = 55.
CONCEPT
Conditional Probability and Contingency Tables
17
Joe is measuring the widths of doors
he bought to install in an apartment complex. He measured 72 doors and found a
mean width of 36.1 inches with a standard deviation of 0.3 inches. To test
if the doors differ significantly from the standard industry width of 36
inches, he computes a z-statistic.
What is the value of Joe's z-test statistic?
·
2.83
·
-1.81
·
1.81
·
-2.83
RATIONALE
If we first note the denominator of
Then, getting the z-score we can note it is
This tells us that 36.1 is 2.83 standard deviations above the value of
36.
Note that when you round some values you may get slightly
different results, but the results should be relatively close to this final
calculated value.
CONCEPT
18
The scatterplot below charts the performance of an
electric motor.
Which answer choice correctly indicates the explanatory variable and the response
variable of the scatterplot?
·
Explanatory variable: Rotation
Response variable: Voltage
·
Explanatory variable: Rotation
Response variable: Electric motor
·
Explanatory variable: Voltage
Response variable: Electric motor
·
Explanatory variable: Voltage
Response variable: Rotation
RATIONALE
The explanatory variable is what is along the horizontal axis,
which is voltage. The response variable is along the vertical axis, which
is speed of rotation.
CONCEPT
Explanatory and Response Variables
19
Which of the following situations describes a continuous distribution?
·
A probability distribution of the average time it takes
employees to drive to work.
·
A probability distribution showing the number of pages
employees read during the workday.
·
A probability distribution showing the number of minutes
employees spend at lunch.
·
A probability distribution of the workers who arrive late
to work each day.
RATIONALE
For a distribution to be continuous, there must be an infinite
number of possibilities. Since we are measuring the time to drive to
work, there are an infinite number of values we might observe, for example: 2
hours, 30 minutes, 40 seconds, etc.
CONCEPT
20
A market research company conducted a survey of two groups
of students from different schools. They found that students from school A
spent an average of 90 minutes studying daily, while the students from school B
spent an average of 75 minutes daily.
They want to find
out if the difference in the mean times spent studying by the students of the
two schools is statistically significant.
Which of the following sets shows the correct null hypothesis and
alternative hypothesis?
·
Null Hypothesis: There is no difference in the mean
times spent by the schools' students.
Alternative Hypothesis: There is at least some difference in the mean times
spent by the schools' students.
·
Null Hypothesis: School B students spend more time
studying than School A.
Alternative Hypothesis: The difference in the mean times spent by the schools'
students is 15 minutes.
·
Null Hypothesis: The difference in the mean times spent by
the schools' students is 15 minutes.
Alternative Hypothesis: There is no difference in the mean times spent by the
schools' students.
·
Null Hypothesis: There is at least some difference in the
mean times spent by the schools' students.
Alternative Hypothesis: The students from school B spend more time studying
than the students from school A.
RATIONALE
Recall that the null hypothesis is always of no difference.
So the null hypothesis (Ho) is that the mean time studying
for group A = mean for group B. This would indicate no difference between
the two groups.
The alternative hypothesis (Ha) is that there is difference in
the mean study time between the two groups.
CONCEPT
21
Jerry, Stein, Johnson, and Mary had a competition to see
who could profit the most off of their odd jobs during the summer. They
discussed their earnings on the first day of school. Afterward, each of them
decided to make bar graphs to plot the different amounts they earned
Who made the above graph, and why?
·
Mary, because she wanted to make the amount made by each
person appear reasonably close.
·
Jerry, because he wanted to accurately show the amount
made by each person.
·
Johnson, because he wanted to make the amount made by each
person appear very different.
·
Stein, because he wanted to make it look like he earned
significantly more than the others.
RATIONALE
Since there was a competition, the person who most likely made
this graph would want to represent themselves favorably. Since Stein has
the most sales, it would probably be Stein.
CONCEPT
22
Ralph records the time it takes for each of his classmates
to run around the track one time. As he analyzes the data on the graph, he
notices very little variation between his classmates’ times.
Which component of data analysis is Ralph observing?
·
The overall spread of the data
·
The center of the data set
·
An outlier in the data set
·
The overall shape of the data
RATIONALE
Since Ralph is looking at the variation of data, this is
examining the spread of the data.
CONCEPT
23
Kyle was trying to decide which type of soda to restock
based on popularity: regular cola or diet cola. After studying the data, he
noticed that he sold less diet cola on weekdays and weekends. However, after
combing through his entire sales records, he actually sold more diet cola than
regular cola.
Which paradox had Kyle encountered?
·
False Negative
·
Simpson's Paradox
·
Benford's Law
·
False Positive
RATIONALE
This is an example of Simpson's paradox, which is when the trend
overall is not the same that is examined in smaller groups. Since the
sale of diet coke overall is larger but this trend changes when looking at
weekend/weekday, this is a reversal of the trend.
CONCEPT
24
Select the statement that correctly describes a Type II
error.
·
A Type II error occurs when the null hypothesis is
rejected when it is actually false.
·
A Type II error occurs when the null hypothesis is
accepted when it is actually true.
·
A Type II error occurs when the null hypothesis is
rejected when it is actually true.
·
A Type II error occurs when the null hypothesis is
accepted when it is actually false.
RATIONALE
Recall a Type II error is when we incorrectly accept a false
null hypothesis. In this case, we want to reject H₀ and conclude there is evidence Hₐ is
correct.
CONCEPT
25
Jesse takes two data points from the weight and feed cost
data set to calculate a slope, or average rate of change. A rat weighs 3.5
pounds and costs $4.50 per week to feed, while a Beagle weighs 30 pounds and
costs $9.20 per week to feed.
Using weight as the explanatory variable, what is the slope of the
line between these two points? Answer choices are rounded to the nearest
hundredth.
·
$0.18 / lb.
·
$0.31 / lb.
·
$5.64 / lb.
·
$1.60 / lb.
RATIONALE
In order to get slope, we can use the formula: .
Using the information provided, the two points are: (3.5
lb., $4.50) and (30 lb., $9.20). We can note that: