This is “Large Sample Estimation of a Population Proportion”, section 7.3 from the book Beginning Statistics (v. 1.0).
For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. You may also download a PDF copy of this book (33 MB) or just this chapter (870 KB), suitable for printing or most e-readers, or a .zip file containing this book's HTML files (for use in a web browser offline).
Since from Section 6.3 "The Sample Proportion" in Chapter 6 "Sampling Distributions" we know the mean, standard deviation, and sampling distribution of the sample proportion $\widehat{p}$, the ideas of the previous two sections can be applied to produce a confidence interval for a population proportion. Here is the formula.
A sample is large if the interval $\left[p\text{\u2212}3\text{\hspace{0.17em}}{\mathit{\sigma}}_{\widehat{P}},\text{\hspace{0.17em}}p+3\text{\hspace{0.17em}}{\mathit{\sigma}}_{\widehat{P}}\right]$ lies wholly within the interval $\left[\mathrm{0,1}\right].$
In actual practice the value of p is not known, hence neither is ${\mathit{\sigma}}_{\widehat{P}}.$ In that case we substitute the known quantity $\widehat{p}$ for p in making the check; this means checking that the interval
$$\left[\widehat{p}\text{\u2212}3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}},\text{\hspace{0.17em}}\widehat{p}+3\text{\hspace{0.17em}}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}}\right]$$lies wholly within the interval $\left[\mathrm{0,1}\right].$
To estimate the proportion of students at a large college who are female, a random sample of 120 students is selected. There are 69 female students in the sample. Construct a 90% confidence interval for the proportion of all students at the college who are female.
Solution:
The proportion of students in the sample who are female is $\widehat{p}=69\u2215120=0.575.$
Confidence level 90% means that $\alpha =1-0.90=0.10$ so $\alpha \u22152=0.05.$ From the last line of Figure 12.3 "Critical Values of " we obtain ${z}_{0.05}=1.645.$
Thus
$$\widehat{p}\pm {z}_{\alpha \u22152}\sqrt{\frac{\widehat{p}\left(1-\widehat{p}\right)}{n}}=0.575\pm 1.645\sqrt{\frac{\left(0.575\right)\left(0.425\right)}{120}}=0.575\pm 0.074$$One may be 90% confident that the true proportion of all students at the college who are female is contained in the interval $\left(0.575-0.\mathrm{074,0}.575+0.074\right)=\left(0.\mathrm{501,0}.649\right).$
Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 90% confidence interval for the population proportion.
Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 95% confidence interval for the population proportion.
Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 98% confidence interval for the population proportion.
Information about a random sample is given. Verify that the sample is large enough to use it to construct a confidence interval for the population proportion. Then construct a 99.5% confidence interval for the population proportion.
In a random sample of size 1,100, 338 have the characteristic of interest.
In a random sample of size 2,400, 420 have the characteristic of interest.
A security feature on some web pages is graphic representations of words that are readable by human beings but not machines. When a certain design format was tested on 450 subjects, by having them attempt to read ten disguised words, 448 subjects could read all the words.
In a random sample of 900 adults, 42 defined themselves as vegetarians.
In a random sample of 250 employed people, 61 said that they bring work home with them at least occasionally.
In a random sample of 1,250 household moves, 822 were moves to a location within the same county as the original residence.
In a random sample of 12,447 hip replacement or revision surgery procedures nationwide, 162 patients developed a surgical site infection.
In a certain region prepackaged products labeled 500 g must contain on average at least 500 grams of the product, and at least 90% of all packages must weigh at least 490 grams. In a random sample of 300 packages, 288 weighed at least 490 grams.
A survey of 50 randomly selected adults in a small town asked them if their opinion on a proposed “no cruising” restriction late at night. Responses were coded 1 for in favor, 0 for indifferent, and 2 for opposed, with the results shown in the table.
$$\begin{array}{cccccccccc}1& 0& 2& 0& 1& 0& 0& 1& 1& 2\\ 0& 2& 0& 0& 0& 1& 0& 2& 0& 0\\ 0& 2& 1& 2& 0& 0& 0& 2& 0& 1\\ 0& 2& 0& 2& 0& 1& 0& 0& 2& 0\\ 1& 0& 0& 1& 2& 0& 0& 2& 1& 2\end{array}$$To try to understand the reason for returned goods, the manager of a store examines the records on 40 products that were returned in the last year. Reasons were coded by 1 for “defective,” 2 for “unsatisfactory,” and 0 for all other reasons, with the results shown in the table.
$$\begin{array}{cccccccccc}0& 2& 0& 0& 0& 0& 0& 2& 0& 0\\ 0& 0& 0& 0& 0& 0& 0& 0& 0& 2\\ 0& 0& 2& 0& 0& 0& 0& 2& 0& 0\\ 0& 0& 0& 0& 0& 1& 0& 0& 0& 0\end{array}$$In order to estimate the proportion of entering students who graduate within six years, the administration at a state university examined the records of 600 randomly selected students who entered the university six years ago, and found that 312 had graduated.
In a random sample of 2,300 mortgages taken out in a certain region last year, 187 were adjustable-rate mortgages.
In a research study in cattle breeding, 159 of 273 cows in several herds that were in estrus were detected by means of an intensive once a day, one-hour observation of the herds in early morning.
A survey of 21,250 households concerning telephone service gave the results shown in the table.
Landline | No Landline | |
---|---|---|
Cell phone | 12,474 | 5,844 |
No cell phone | 2,529 | 403 |
In a random sample of 900 adults, 42 defined themselves as vegetarians. Of these 42, 29 were women.
A random sample of 185 college soccer players who had suffered injuries that resulted in loss of playing time was made with the results shown in the table. Injuries are classified according to severity of the injury and the condition under which it was sustained.
Minor | Moderate | Serious | |
---|---|---|---|
Practice | 48 | 20 | 6 |
Game | 62 | 32 | 17 |
The body mass index (BMI) was measured in 1,200 randomly selected adults, with the results shown in the table.
BMI | |||
---|---|---|---|
Under 18.5 | 18.5–25 | Over 25 | |
Men | 36 | 165 | 315 |
Women | 75 | 274 | 335 |
Confidence intervals constructed using the formula in this section often do not do as well as expected unless n is quite large, especially when the true population proportion is close to either 0 or 1. In such cases a better result is obtained by adding two successes and two failures to the actual data and then computing the confidence interval. This is the same as using the formula
$$\begin{array}{c}\stackrel{~}{p}\pm {z}_{\alpha \u22152}\sqrt{\frac{\stackrel{~}{p}\left(1-\stackrel{~}{p}\right)}{\stackrel{~}{n}}}\hfill \\ \hfill \text{where}\hfill \\ \stackrel{~}{p}=\frac{x+2}{n+4}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\stackrel{~}{n}=n+4\hfill \end{array}$$Suppose that in a random sample of 600 households, 12 had no telephone service of any kind. Use the adjusted confidence interval procedure just described to form a 99.9% confidence interval for the proportion of all households that have no telephone service of any kind.
Large Data Sets 4 and 4A list the results of 500 tosses of a die. Let p denote the proportion of all tosses of this die that would result in a four. Use the sample data to construct a 90% confidence interval for p.
http://www.gone.2012books.lardbucket.org/sites/all/files/data4.xls
http://www.gone.2012books.lardbucket.org/sites/all/files/data4A.xls
Large Data Set 6 records results of a random survey of 200 voters in each of two regions, in which they were asked to express whether they prefer Candidate A for a U.S. Senate seat or prefer some other candidate. Use the full data set (400 observations) to construct a 98% confidence interval for the proportion p of all voters who prefer Candidate A.
http://www.gone.2012books.lardbucket.org/sites/all/files/data6.xls
Lines 2 through 536 in Large Data Set 11 is a sample of 535 real estate sales in a certain region in 2008. Those that were foreclosure sales are identified with a 1 in the second column.
http://www.gone.2012books.lardbucket.org/sites/all/files/data11.xls
Lines 537 through 1106 in Large Data Set 11 is a sample of 570 real estate sales in a certain region in 2010. Those that were foreclosure sales are identified with a 1 in the second column.
http://www.gone.2012books.lardbucket.org/sites/all/files/data11.xls
$\widehat{p}\pm 3\sqrt{\frac{\widehat{p}\widehat{q}}{n}}=0.31\pm 0.04$
and
$\left[0.\mathrm{27,0}.35\right]\subset \left[\mathrm{0,1}\right]$
$\widehat{p}\pm 3\sqrt{\frac{\widehat{p}\widehat{q}}{n}}=0.69\pm 0.21$
and
$\left[0.\mathrm{48,0}.90\right]\subset \left[\mathrm{0,1}\right]$
$\left(0.\mathrm{1368,0}.1912\right)$