This is “Large Sample Tests for a Population Proportion”, section 8.5 from the book Beginning Statistics (v. 1.0).
For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. You may also download a PDF copy of this book (33 MB) or just this chapter (2 MB), suitable for printing or most e-readers, or a .zip file containing this book's HTML files (for use in a web browser offline).
Both the critical value approach and the p-value approach can be applied to test hypotheses about a population proportion p. The null hypothesis will have the form ${H}_{0}:p={p}_{0}$ for some specific number p_{0} between 0 and 1. The alternative hypothesis will be one of the three inequalities $p<{p}_{0}$, $p>{p}_{0}$, or $p\ne {p}_{0}$ for the same number p_{0} that appears in the null hypothesis.
The information in Section 6.3 "The Sample Proportion" in Chapter 6 "Sampling Distributions" gives the following formula for the test statistic and its distribution. In the formula p_{0} is the numerical value of p that appears in the two hypotheses, ${q}_{0}=1-{p}_{0}$, $\widehat{p}$ is the sample proportion, and n is the sample size. Remember that the condition that the sample be large is not that n be at least 30 but that the interval
$$\left[\widehat{p}\text{\u2212}3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}},\widehat{p}+3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}}\right]$$lie wholly within the interval $\left[\mathrm{0,1}\right].$
The test statistic has the standard normal distribution.
The distribution of the standardized test statistic and the corresponding rejection region for each form of the alternative hypothesis (left-tailed, right-tailed, or two-tailed), is shown in Figure 8.14 "Distribution of the Standardized Test Statistic and the Rejection Region".
Figure 8.14 Distribution of the Standardized Test Statistic and the Rejection Region
A soft drink maker claims that a majority of adults prefer its leading beverage over that of its main competitor’s. To test this claim 500 randomly selected people were given the two beverages in random order to taste. Among them, 270 preferred the soft drink maker’s brand, 211 preferred the competitor’s brand, and 19 could not make up their minds. Determine whether there is sufficient evidence, at the 5% level of significance, to support the soft drink maker’s claim against the default that the population is evenly split in its preference.
Solution:
We will use the critical value approach to perform the test. The same test will be performed using the p-value approach in Note 8.49 "Example 14".
We must check that the sample is sufficiently large to validly perform the test. Since $\widehat{p}=270\u2215500=0.54$,
$$\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}}=\sqrt{\frac{\left(0.54\right)\left(0.46\right)}{500}}\approx 0.02$$hence
$$\begin{array}{c}\left[\widehat{p}\text{\u2212}3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}},\text{\hspace{0.17em}}\widehat{p}+3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}}\text{\hspace{0.17em}}\right]\hfill \\ =\left[0.54-\left(3\right)\left(0.02\right),0.54+\left(3\right)\left(0.02\right)\right]\hfill \\ =\left[0.\mathrm{48,0}.60\right]\subset \left[\mathrm{0,1}\right]\hfill \end{array}$$so the sample is sufficiently large.
Step 1. The relevant test is
$$\begin{array}{ccc}\hfill {H}_{0}:p& =& 0.50\\ \hfill \text{\hspace{1em}vs.}\text{\hspace{0.17em}}{H}_{a}:p& >& 0.50& @\text{\hspace{0.17em}}\alpha =0.05\end{array}$$where p denotes the proportion of all adults who prefer the company’s beverage over that of its competitor’s beverage.
Step 2. The test statistic is
$$Z=\frac{\widehat{p}-{p}_{0}}{\sqrt{\frac{{p}_{0}\text{\hspace{0.17em}}{q}_{0}}{n}}}$$and has the standard normal distribution.
Step 3. The value of the test statistic is
$$Z=\frac{\widehat{p}-{p}_{0}}{\sqrt{\frac{{p}_{0}\text{\hspace{0.17em}}{q}_{0}}{n}}}=\frac{0.54-0.50}{\sqrt{\frac{\left(0.50\right)\left(0.50\right)}{500}}}=1.789$$Step 5. As shown in Figure 8.15 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H_{0}. In the context of the problem our conclusion is:
The data provide sufficient evidence, at the 5% level of significance, to conclude that a majority of adults prefer the company’s beverage to that of their competitor’s.
Figure 8.15 Rejection Region and Test Statistic for Note 8.47 "Example 12"
Globally the long-term proportion of newborns who are male is 51.46%. A researcher believes that the proportion of boys at birth changes under severe economic conditions. To test this belief randomly selected birth records of 5,000 babies born during a period of economic recession were examined. It was found in the sample that 52.55% of the newborns were boys. Determine whether there is sufficient evidence, at the 10% level of significance, to support the researcher’s belief.
Solution:
We will use the critical value approach to perform the test. The same test will be performed using the p-value approach in Note 8.50 "Example 15".
The sample is sufficiently large to validly perform the test since
$$\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}}=\sqrt{\frac{\left(0.5255\right)\left(0.4745\right)}{5000}}\approx 0.01$$hence
$$\begin{array}{c}[\widehat{p}\text{\u2212}3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}},\text{\hspace{0.17em}}\widehat{p}+3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}}]\hfill \\ =\left[0.5255-0.\mathrm{03,0}.5255+0.03\right]\hfill \\ =\left[0.\mathrm{4955,0}.5555\right]\subset \left[\mathrm{0,1}\right]\hfill \end{array}$$Step 1. Let p be the true proportion of boys among all newborns during the recession period. The burden of proof is to show that severe economic conditions change it from the historic long-term value of 0.5146 rather than to show that it stays the same, so the hypothesis test is
$$\begin{array}{ccc}\hfill {H}_{0}:p& =& 0.5146\\ \hfill \text{\hspace{1em}vs.}\text{\hspace{0.17em}}{H}_{a}:p& \ne & 0.5146& @\text{\hspace{0.17em}}\alpha =0.10\end{array}$$Step 2. The test statistic is
$$Z=\frac{\widehat{p}-{p}_{0}}{\sqrt{\frac{{p}_{0}\text{\hspace{0.17em}}{q}_{0}}{n}}}$$and has the standard normal distribution.
Step 3. The value of the test statistic is
$$Z=\frac{\widehat{p}-{p}_{0}}{\sqrt{\frac{{p}_{0}\text{\hspace{0.17em}}{q}_{0}}{n}}}=\frac{0.5255-0.5146}{\sqrt{\frac{\left(0.5146\right)\left(0.4854\right)}{5000}}}=1.542$$Step 5. As shown in Figure 8.16 "Rejection Region and Test Statistic for " the test statistic does not fall in the rejection region. The decision is not to reject H_{0}. In the context of the problem our conclusion is:
The data do not provide sufficient evidence, at the 10% level of significance, to conclude that the proportion of newborns who are male differs from the historic proportion in times of economic recession.
Figure 8.16 Rejection Region and Test Statistic for Note 8.48 "Example 13"
Perform the test of Note 8.47 "Example 12" using the p-value approach.
Solution:
We already know that the sample size is sufficiently large to validly perform the test.
Figure 8.17 P-Value for Note 8.49 "Example 14"
Perform the test of Note 8.48 "Example 13" using the p-value approach.
Solution:
We already know that the sample size is sufficiently large to validly perform the test.
Figure 8.18 P-Value for Note 8.50 "Example 15"
On all exercises for this section you may assume that the sample is sufficiently large for the relevant test to be validly performed.
Compute the value of the test statistic for each test using the information given.
Compute the value of the test statistic for each test using the information given.
For each part of Exercise 1 construct the rejection region for the test for $\alpha =0.05$ and make the decision based on your answer to that part of the exercise.
For each part of Exercise 2 construct the rejection region for the test for $\alpha =0.05$ and make the decision based on your answer to that part of the exercise.
For each part of Exercise 1 compute the observed significance (p-value) of the test and compare it to $\alpha =0.05$ in order to make the decision by the p-value approach to hypothesis testing.
For each part of Exercise 2 compute the observed significance (p-value) of the test and compare it to $\alpha =0.05$ in order to make the decision by the p-value approach to hypothesis testing.
Perform the indicated test of hypotheses using the critical value approach.
Perform the indicated test of hypotheses using the critical value approach.
Perform the indicated test of hypotheses using the p-value approach.
Perform the indicated test of hypotheses using the p-value approach.
Five years ago 3.9% of children in a certain region lived with someone other than a parent. A sociologist wishes to test whether the current proportion is different. Perform the relevant test at the 5% level of significance using the following data: in a random sample of 2,759 children, 119 lived with someone other than a parent.
The government of a particular country reports its literacy rate as 52%. A nongovernmental organization believes it to be less. The organization takes a random sample of 600 inhabitants and obtains a literacy rate of 42%. Perform the relevant test at the 0.5% (one-half of 1%) level of significance.
Two years ago 72% of household in a certain county regularly participated in recycling household waste. The county government wishes to investigate whether that proportion has increased after an intensive campaign promoting recycling. In a survey of 900 households, 674 regularly participate in recycling. Perform the relevant test at the 10% level of significance.
Prior to a special advertising campaign, 23% of all adults recognized a particular company’s logo. At the close of the campaign the marketing department commissioned a survey in which 311 of 1,200 randomly selected adults recognized the logo. Determine, at the 1% level of significance, whether the data provide sufficient evidence to conclude that more than 23% of all adults now recognize the company’s logo.
A report five years ago stated that 35.5% of all state-owned bridges in a particular state were “deficient.” An advocacy group took a random sample of 100 state-owned bridges in the state and found 33 to be currently rated as being “deficient.” Test whether the current proportion of bridges in such condition is 35.5% versus the alternative that it is different from 35.5%, at the 10% level of significance.
In the previous year the proportion of deposits in checking accounts at a certain bank that were made electronically was 45%. The bank wishes to determine if the proportion is higher this year. It examined 20,000 deposit records and found that 9,217 were electronic. Determine, at the 1% level of significance, whether the data provide sufficient evidence to conclude that more than 45% of all deposits to checking accounts are now being made electronically.
According to the Federal Poverty Measure 12% of the U.S. population lives in poverty. The governor of a certain state believes that the proportion there is lower. In a sample of size 1,550, 163 were impoverished according to the federal measure.
An insurance company states that it settles 85% of all life insurance claims within 30 days. A consumer group asks the state insurance commission to investigate. In a sample of 250 life insurance claims, 203 were settled within 30 days.
A special interest group asserts that 90% of all smokers began smoking before age 18. In a sample of 850 smokers, 687 began smoking before age 18.
In the past, 68% of a garage’s business was with former patrons. The owner of the garage samples 200 repair invoices and finds that for only 114 of them the patron was a repeat customer.
A rule of thumb is that for working individuals one-quarter of household income should be spent on housing. A financial advisor believes that the average proportion of income spent on housing is more than 0.25. In a sample of 30 households, the mean proportion of household income spent on housing was 0.285 with a standard deviation of 0.063. Perform the relevant test of hypotheses at the 1% level of significance. Hint: This exercise could have been presented in an earlier section.
Ice cream is legally required to contain at least 10% milk fat by weight. The manufacturer of an economy ice cream wishes to be close to the legal limit, hence produces its ice cream with a target proportion of 0.106 milk fat. A sample of five containers yielded a mean proportion of 0.094 milk fat with standard deviation 0.002. Test the null hypothesis that the mean proportion of milk fat in all containers is 0.106 against the alternative that it is less than 0.106, at the 10% level of significance. Assume that the proportion of milk fat in containers is normally distributed. Hint: This exercise could have been presented in an earlier section.
Large Data Sets 4 and 4A list the results of 500 tosses of a die. Let p denote the proportion of all tosses of this die that would result in a five. Use the sample data to test the hypothesis that p is different from 1/6, at the 20% level of significance.
http://www.gone.2012books.lardbucket.org/sites/all/files/data4.xls
http://www.gone.2012books.lardbucket.org/sites/all/files/data4A.xls
Large Data Set 6 records results of a random survey of 200 voters in each of two regions, in which they were asked to express whether they prefer Candidate A for a U.S. Senate seat or prefer some other candidate. Use the full data set (400 observations) to test the hypothesis that the proportion p of all voters who prefer Candidate A exceeds 0.35. Test at the 10% level of significance.
http://www.gone.2012books.lardbucket.org/sites/all/files/data6.xls
Lines 2 through 536 in Large Data Set 11 is a sample of 535 real estate sales in a certain region in 2008. Those that were foreclosure sales are identified with a 1 in the second column. Use these data to test, at the 10% level of significance, the hypothesis that the proportion p of all real estate sales in this region in 2008 that were foreclosure sales was less than 25%. (The null hypothesis is ${H}_{0}:p=0.25.$)
http://www.gone.2012books.lardbucket.org/sites/all/files/data11.xls
Lines 537 through 1106 in Large Data Set 11 is a sample of 570 real estate sales in a certain region in 2010. Those that were foreclosure sales are identified with a 1 in the second column. Use these data to test, at the 5% level of significance, the hypothesis that the proportion p of all real estate sales in this region in 2010 that were foreclosure sales was greater than 23%. (The null hypothesis is ${H}_{0}:p=0.23.$)
http://www.gone.2012books.lardbucket.org/sites/all/files/data11.xls
Z = 1.11, ${z}_{0.025}=1.96$, do not reject H_{0}.
Z = 1.93, ${z}_{0.10}=1.28$, reject H_{0}.
$Z=\text{\u2212}0.523$, $\pm {z}_{0.05}=\pm 1.645$, do not reject H_{0}.
Z = 3.04, ${z}_{0.01}=2.33$, reject H_{0}.
${H}_{0}:p=1\u22156$ vs. ${H}_{a}:p\ne 1\u22156.$ Test Statistic: $Z=\text{\u2212}0.76.$ Rejection Region: $\left(\text{\u2212}\infty ,\text{\u2212}1.28\right]\cup \left[1.28,\infty \right).$ Decision: Fail to reject H_{0}.
${H}_{0}:p=0.25$ vs. ${H}_{a}:p<0.25.$ Test Statistic: $Z=\text{\u2212}1.17.$ Rejection Region: $\left(\text{\u2212}\infty ,\text{\u2212}1.28\right].$ Decision: Fail to reject H_{0}.