Chi-Square Goodness-of-Fit Test

Tests on the Distribution Shape of Continuous Data

ROBERT H. RIFFENBURGH , in Statistics in Medicine (Second Edition), 2006

Example Posed: Is the Distribution of Ages of the 301 Urology Patients Normal?

We ask if the data are just normal, not drawn from a specific normal with theoretically given μ. and σ; therefore, we use m = 66.76 and s = 8.10 from DB1. With a sample this large, we can use the chi-square goodness-of-fit test.

Method: Chi-square Goodness-of-Fit Test of Norma1ity

The hypotheses are Ho: The distribution from which the sample was drawn is normal, or alternatively is normal with parameters µ and σ'; and the two-tailed H1: The distribution is different. Choose Ict. Look up the critical χ 2 in Table III (see Tables of Probability Distributions).

1

We define the data intervals, say, k in number, as we would were we to form a histogram of the data. We form a blank table in the format of Table 20.4.

Table 20.4. Format for Table of Values Required to Compute the Chi-square Goodness-of-Ft Statistic

Interval Standard normal z to end of interval P Expected frequencies (ei ) Observed frequencies (ni )
: : : : :
: : : : :
2

To use a normal probability table, we standardize the ends of the intervals by subtracting the mean and dividing by the standard deviation, The "expected" normal is usually specified by the sample m and s, although it could be specified by a theoretical µ. and σ.

3

To relate areas under the normal curve to the intervals, we find the area to the end of an interval from, 3 table of normal probabilities, such as Table I, and subtract the area to the end of the preceding interval.

4

To find the frequencies expected from a normal fit (name them ei ), we multiply the normal probabilities for each interval by the total number of data n.

5

We tally the number of data falling into each interval and enter the tally numbers in the table. Name these numbers ni.

6

Calculate a χ 2 value [the pattern is similar to Eq. (62)] using Eq. (20.2):

(20.2) χ 2 = ( n i e i ) 2 e i = n i n e i n ,

where the first form is easier to understand conceptually, and the second form is easier to compute.
7

If calculated χ 2 is greater than critical X 2 , reject H0; othewise accept H0.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780120887705500605

Inferential Statistics IV: Choosing a Hypothesis Test

Andrew P. King , Robert J. Eckersley , in Statistics for Biomedical Engineers and Scientists, 2019

7.3.5 Chi-Square Test for Normality

The Shapiro–Wilk test was designed for use with small sample sizes. Although there is no hard-and-fast rule, a rule of thumb is that it is suitable when dealing with sample sizes of 50 or less. One alternative to the Shapiro–Wilk test that may be more powerful for larger sample sizes is the χ 2 test. In Section 6.5, we saw the use of the χ 2 test for testing hypotheses about categorical data. A slightly different form of the test can be used to test the goodness-of-fit of a sample against any expected distribution. We will demonstrate its use for testing against a normal distribution.

To illustrate the use of the χ 2 test, we introduce a new case study. A team of biomedical engineers has developed a technique for automatically estimating gestational age from a magnetic resonance (MR) scan of a fetus. They have tested their technique on 300 fetal MR scans for which the "gold standard" gestational age was known. Based on these data, they have computed the errors in gestational age estimation for their technique. These errors are summarized in Table 7.1. To perform further statistical analysis on the error figures, the team would like to know if their data are normally distributed or not. We will work to a 95% degree of confidence.

Table 7.1. Errors in MR-based gestational age estimation for 300 fetuses.

Error in gestational age estimation (days) Number of cases
Less than −10 2
Between −10 and −5 36
Between −5 and 0 123
Between 0 and 5 97
Between 5 and 10 38
More than 10 4

The null hypothesis for the χ 2 goodness-of-fit test is that there is no significant difference between the sample data and the expected distribution (in this case, a normal distribution). The alternative hypothesis is that there is a difference. To decide whether or not we can reject the null hypothesis, we need to compute the χ 2 test statistic Calc χ 2 . In a similar way to the χ 2 tests that we saw in Section 6.5, the χ 2 goodness-of-fit test computes the test statistic by comparing observed frequencies with expected frequencies. This time, we compare observed frequencies of sample values within particular ranges (or bins) with those that would be expected if the sample were from a normal distribution with the same mean and standard deviation as the sample. This comparison is summarized in Table 7.2. The first column shows the bins (or ranges) of errors used (i.e. the same as in Table 7.1). The second column shows the probabilities of sample values from these bins (assuming that the sample was normally distributed). These probabilities can be computed from the areas under a normal distribution with the same mean and standard deviation as the sample (e.g. see Fig. 4.5B). Based on these probabilities and the sample size, we can compute expected frequencies E for each bin. These values are shown in the third column of the table. For example, the value for the <−10 bin is 3.39, which is equal to the probability 0.0113 multiplied by the sample size 300. The fourth column shows the observed frequencies O, which are reproduced from Table 7.1. Finally, the fifth column shows the χ 2 statistic for each row. This is computed as the square of the difference between the observed and expected frequencies divided by the expected frequency. The final test statistic, Calc χ 2 , is the sum of all of these χ 2 statistics. Note that this is the same formula as we used for the χ 2 tests that we saw in Section 6.5, that is, Eq. (6.4), which is reproduced here for convenience:

Table 7.2. Computation of Calcχ 2 for the gestational age error data.

Gest. age error Prob E O ( O E ) 2 E
&lt;−10 0.0113 3.39 }38.09 2 }38 0.0002
⩾−10 and &lt;−5 0.1157 34.7 36
⩾−5 and &lt;0 0.3723 111.7 123 1.14
⩾0 and &lt;5 0.373 111.9 97 2.29
⩾5 and &lt;10 0.1163 34.89 }38.31 38 }42 0.36
⩾10 0.0114 3.42 4
Totals: 1.0 300.0 300 Calc χ 2 = 3.79

χ 2 = ( O E ) 2 E .

Note that the frequencies for the first two bins (<−10, ⩾−10 and <−5) have been combined. We should always do this when the frequency of any bin is less than or equal to 5. In our case, the first bin (<−10) has both observed and expected frequencies that are less than or equal to 5. Therefore, we have to combine the first two bins to ensure that both expected and observed frequencies are greater than 5. We perform the same combination for the last two bins for the same reason.

After we have calculated our test statistic, we simply compare it to a critical value from Table A.5. To look up the critical value, we must know the number of degrees of freedom of the test. For a χ 2 goodness-of-fit test, the number of degrees of freedom is the number of bins minus 3. We subtract 3 because we already know that the sums of the expected and observed frequencies are the same, and we also know the mean and standard deviation of the distribution. For our example, we have 4 bins (after the first two and the last two have been combined). Therefore, we have 4 3 = 1 degree of freedom. From Table A.5 we see that our critical value Tab χ 2 for a 0.05 significance level (i.e. 95% confidence) is equal to 3.841. Because Calc χ 2 = 3.79 is not bigger than Tab χ 2 = 3.841 , we do not reject the null hypothesis that the data are from a normal distribution.

The Intuition. The Chi-Square Goodness-of-Fit Test

The intuition behind the χ 2 goodness-of-fit test is similar to that described for the χ 2 tests that we saw in Section 6.5. The χ 2 test statistic has a known distribution as shown in Fig. 6.4. The null hypothesis of normality is rejected when the statistic is large enough to go beyond the critical χ 2 value Tab χ 2 for the given significance level, that is, it becomes unlikely that we would get a value this large or larger by chance.

Activity 7.4

Resting heart rate data (in beats per minute, bpm) have been gathered from a cohort of 350 volunteers. The data are summarized in the table and histogram below.

O7.F

Heart rate Probability Expected frequency, E Observed frequency, O ( O E ) 2 E
<60 0.0096 3
60–65 0.0238 12
65–70 0.0594 19
70–75 0.115 39
75–80 0.1726 55
80–85 0.2009 75
85–90 0.1814 67
90–95 0.127 47
95–100 0.0689 16
100–105 0.029 12
105–110 0.0095 3
>110 0.0029 2
Totals: 1.0 350 Calcχ 2 =

The mean heart rate is 82.9   bpm, and the standard deviation is 9.8   bpm. The table also shows the probabilities of the different heart rate bins based on a normal distribution with the same mean and standard deviation. The expected frequency and χ 2 test statistic columns have been left blank for you to fill in.

Use the values in the table to perform the χ 2 test by hand to test if the heart rate data come from a normal distribution.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780081029398000165

Analysis of Categorical Data

Ronald N. Forthofer , ... Mike Hernandez , in Biostatistics (Second Edition), 2007

Conclusion

In this chapter, we introduced another nonparametric test — the chi-square goodness-of-fit test — and showed its use with one- and two-way contingency tables. We also showed two related methods — comparison of two binomial proportions and the calculation of the odds ratio — for determining, at some significance level, whether or not there is a relation between two discrete variables with two levels each. The odds ratio is of particular interest as it is used extensively in epidemiologic research. We also presented the extension of the goodness-of-fit test for no interaction to r by c contingency tables. Another test shown was the trend test, and it is of interest because it has a greater chance of detecting a linear relationship between a nominal and an ordinal variable than does the general chi-square test for no interaction. We also showed different ways for testing the hypothesis of no relationship between two discrete variables with two levels each in the matched-pairs situation. The Cochran-Mantel-Haenszel test and estimate of the common odds ratio were introduced for multiple 2 by 2 contingency tables. These procedures are also used extensively by epidemiologists. In the next chapter, we conclude the material on nonparametric procedures with the presentation of several nonparametric methods for the analysis of survival data.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123694928500157

GOODNESS OF FIT TESTS AND CATEGORICAL DATA ANALYSIS

Sheldon M. Ross , in Introduction to Probability and Statistics for Engineers and Scientists (Fourth Edition), 2009

Publisher Summary

This chapter concerns with goodness of fit tests that can be used to test whether a proposed model is consistent with data. In it the classical chi–square goodness of fit test and apply it to test for independence in contingency tables is presented. The chapter also introduces the Kolmogorov–Smirnov procedure for testing whether data comes from a specified continuous probability distribution. The classical approach to obtaining a goodness of fit test of a null hypothesis that a sample has a specified probability distribution is to partition the possible values of the random variables into a finite number of regions. The situation where each member of a population is classified according to two distinct characteristics is also considered and it is shown how to use analysis to test the hypothesis that the characteristics of a randomly chosen member of the population are independent. The chapter also provides a practice exercise and solved examples.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123704832000163

Tests on Variability and Distributions

R.H. Riffenburgh , in Statistics in Medicine (Third Edition), 2012

Example Posed: Is the Distribution of Ages of the 301 Prostate Patients Normal?

We ask if the data are just normal, not drawn from a specific normal with theoretically given μ and σ, so we use m = 66.76 and s = 8.10 from DB1. With this large a sample, we can use the chi-square goodness-of-fit test.

Method: Chi-Square Goodness-of-Fit Test of Normality

The hypotheses are H 0: the distribution from which the sample was drawn is normal, or alternately is normal with parameters μ and σ, and (two-tailed) H 1: the distribution is different. Choose α. Look up the critical χ 2 in Table III.

1.

We define the data intervals, say k in number, as we would were we to form a histogram of the data. We form a blank table in the format of Table 14.10

Table 14.10. Format for Table of Values Required to Compute the Chi-square Goodness-of-Fit Statistic

Interval Standard Normal z to End of Interval Probability Expected Frequencies (e i ) Observed Frequencies (n i )
: : : : :
: : : : :
2.

To use a normal probability table, we standardize the ends of the intervals by subtracting the mean and dividing by the standard deviation. The "expected" normal is usually specified by the sample m and s, although it could be specified by a theoretical μ and σ

3.

To relate areas under the normal curve to the intervals, we find the area to the end of an interval from a table of normal probabilities, such as Table I, and subtract the area to the end of the preceding interval

4.

To find the frequencies expected from a normal fit (name them e i ), we multiply the normal probabilities for each interval by the total number of data n

5.

We tally the number of data falling into each interval and enter the tally numbers in the table. Name these numbers n i

6.

Calculate a χ 2 value [the pattern is similar to Eq. (9.2) in pattern] using Eq. (14.7):

(14.7) χ 2 = ( n i - e i ) 2 e i = n i 2 e i - n ,

where the first form is easier to understand conceptually and the second is easier to compute

7.

If calculated χ 2 is greater than critical χ 2, reject H 0; otherwise, do not reject normality.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123848642000147

Survival, Logistic Regression, and Cox Regression

R.H. Riffenburgh , in Statistics in Medicine (Third Edition), 2012

Example Posed: Survival of Men versus Women with Diabetes

Figure 23.3 superposes the survival curves for 319 men (Figure 23.1) and 370 women with diabetes mellitus during the 1980–1990 decade76. We see a difference by inspection, but is this difference significant?

Figure 23.3. Survival curves for 319 men and 370 women in Rochester, MN, having adult-onset diabetes mellitus who were older than 45 years at onset during the decade 1980–1990.

Method for Testing the Difference of Two Survival Curves

One statistical procedure that answers this question is the log-rank test . This test uses a chi-square statistic based on the difference between the observed survival and the survival that would be expected if the curves were not different, in the same way that a chi-square goodness-of-fit test [Eq. (14.7)] uses the sum of squares of weighted differences between the observed and expected curves. However, the log-rank test's χ 2 is more complicated to calculate. It uses matrix algebra, multiplying vectors of differences for the time periods and the matrix of variances and covariances. A statistical software package should be used for this calculation. The result of the calculation is a χ 2 statistic, which may be compared with a χ 2 critical value from Table III (see Tables of Probability Distributions) for df=number of survival curves −1. When two curves, as in Figure 23.3, are tested, df=1. If the calculated statistic is greater than the critical value, the curves are significantly different.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123848642000238

Data Science: Theory and Applications

Sunil Mathur , in Handbook of Statistics, 2021

3 One-sample methods

In big data, many times a single stream of data is collected on a unit of interest. The data might be collected in batches, periodic, near-real-time, or real-time. As the analysis of big data evolves, it is necessary to build models that integrate different models for developing applications of certain needs. In the case of streaming data in real-time, the processing of real-time data may be followed by batch processing. That gives rise to categories of data due to batches of data. Some of the tests available in the literature are based on the empirical distribution function. The empirical distribution function is an estimate of the population distribution function, which works for big data. It is defined as the proportion of sample observations that are less than or equal to x for all real numbers x.

We consider the classical one-sample location problem with univariate data. Let x 1, x 2, …, x n be an independent random sample of size n from a continuous distribution with distribution function F. Let the hypothesized cumulative distribution function be denoted by F 0(x) and the empirical distribution function be denoted by S n (x) for all x. The hypothesis to be tested is H 0  : F  = F o vs H a   : F  F o . If the null hypothesis is true then the difference between S n (x) and F 0(x) must be close to zero.

Thus, for large n, the test statistic

(1) D n = sup x S n x F X x ,

will have a value close to zero under the null hypothesis.

The test statistic, D n , called the Kolmogorov-Smirnov one-sample statistic (Gibbons and Chakraborti, 2014), does not depend on the population distribution function if the distribution function is continuous and hence D n is a distribution-free test statistic. The goodness-of-fit test for a sample was proposed by Kolmogorov (1933). The Kolmogorov-Smirnov test for two samples was-proposed by Smirnov (1939).

Here we define order statistic X (0)  =     ∞ and X (n  +   1)  =   ∞, and

(2) S n x = i n , for X i x X i + 1 , i = 0 , 1 , . , n .

The probability distribution of the test statistic does not depend on the distribution function F X (X) for a continuous distribution function F X (X). The asymptotic distribution of the test statistic D n is Chi-square.

The exact sampling distribution of the Kolmogorov-Smirnov test statistic is known while the distribution of the Chi-square goodness-of-fit test statistic is approximately Chi-square for finite n. Moreover, the Chi-square goodness-of-fit test requires that the expected number of observations in a cell must be greater than five while the Kolmogorov test statistic does not require this condition. On the other hand, the asymptotic distribution of the Chi-square goodness-of-fit test statistic does not require that the distribution of the population must be continuous but the exact distribution of the Kolmogorov-Smirnov test statistic does require that F X (X) must be a continuous distribution. The power of the Chi-square distribution depends on the number of classes or groups made.

The Wilcoxon signed-rank test (Wilcoxon, 1945) requires that the parent population should be symmetric. When data is collected in batches, the data might be symmetric at some point, particularly in the case of seasonal and periodic data. Let us consider a random sample X 1, X 2, …. . , X n from a continuous cdf F which is symmetric about its median M. The null hypothesis can be stated as

(3) H 0 : M = M 0

The alternative hypotheses can be postulated accordingly. We notice that the differences D i   = X i   M 0 are symmetrically distributed about zero, and hence the number of positive differences will be equal to the number of negative differences. The ranks of the differences |  D 1  |, |  D 2  |, . ……………, |  D N   | are denoted by Rank(.). Then, the test statistic can be defined as

(4) T + = i = 1 n a i Rank D i

(5) T = i = 1 n 1 a i Rank D i

where

(6) a i = 1 , if D i > 0 , 0 , if D i < 0 .

Since the indicator variables a i are independent and identically distributed Bernoulli variates with P(a i   =   1)   = P(a i   =   0)   =   ½, therefore, under the null hypothesis

(7) E T + H 0 = i = 1 n E a i Rank D i = n n + 1 4 .

and

(8) Var T + H 0 = i = 1 n var a i Rank D i 2 = n n + 1 2 n + 1 24

Another common representation for the test statistic T + is given as follows.

(9) T + = 1 i j n T ij

where

(10) T ij = 1 , if D i + D j > 0 , 0 , otherwise .

Similar expressions can be derived for T . The paired-samples can be defined based on the differences X 1  Y 1, X 2  Y 2,..…, X n   Y n of a random sample of n pairs (X 1, Y 1), (X 2, Y 2)..…, (X n , Y n ). Now, these differences are treated as a single sample and the one-sample test procedure is applied. The null hypothesis to be tested will be

H 0 : M = M 0

where M 0 is the median of the differences X 1  Y 1, X 2  Y 2,..…, X n   Y n . These differences can be treated as a single sample with the hypothetical median M 0. Then, the Wilcoxon signed-rank method described above for a single sample can be applied to test the null hypothesis that the median of the differences is M 0.

Since a good test must be not only fast in computing the test value but also should have the ability in finding out information hidden in big data. Wilcoxon signed-rank test fulfills that requirement, however, several other tests are available in the literature that are competitors of the Wilcoxon signed-rank test.

Chattopadhyay and Mukhopadhyay (2019) used the kernel of degree k (>   1) to develop a one-sample nonparametric test. Define a kernel (k   =   2):

(11) ψ 2 X i X j = 1 if X i + X j 2 0 0 if X i + X j 2 < 0

This kernel is equivalent to U-Statistic of degree 2

(12) S n 2 = n 2 1 1 i 1 < i 2 n ψ 2 X i 1 X i 2

Both the sign test and the Mann-Whitney test involve U-statistics with a symmetric kernel of degree one, one, and one respectively.

A general test statistic (Chattopadhyay and Mukhopadhyay, 2019) based on the kernel of k (<   n) can be defined as:

(13) S n k = n k 1 1 i 1 < < i k n ψ k X i 1 X i k

where

(14) ψ k X i 1 X i k = 1 if X ¯ i k 0 0 if X ¯ i k < 0

and X ¯ i k = 1 k j = 1 k X i j , 1   i 1  <     < i k   n for n   >   k.

The null hypothesis, as given by statement (3), can be tested at a level α using the following criterion:

Reject the null hypothesis if S n (k)  > c α , where P H o (S n (k)  > c α )   α, and c α is the critical region.

In big data scenarios, face recognition has received significant attention due to increasing attention to security at public places such as airports, rail stations, and similar places. A single sample is generally received from an ID card or e-passport, captured in a very stable environment while probe images are captured in a highly unstable environment usually from surveillance cameras. The probe images may include noise, blur, arbitrary pose, and illumination, which makes the comparison with the standard database image difficult and hence makes the recognition difficult. The performance of available computational methods based on principal component analysis, linear discriminant analysis, sparse representation, kernel-based and similar methods in face recognition, however, is heavily influenced by the number of training samples per person.

Since there is only one sample available for such problems, we try to increase the number of samples artificially using synthetic sample generation from a 3D model of the available image. The new dataset with multiple artificially generated samples can be used as a gallery set and the probe set contains images from surveillance cameras in an unconstrained environment. Now one can select 2D facial points and the landmark points in the 3D model in the gallery set and find median points at each landmark point. Similarly, select those points in the probe set and run the one-sample test, such as Eq. (9) at each landmark point. The larger similarities at landmark points will point toward similarities of probe and gallery and probe images.

The problem is generally faced when the interest is in identifying an individual at busy common places such as airports, train stations, and public meeting places. That will involve matching the gallery set with a probe set containing millions for images. In order to accomplish that task quickly and efficiently, one can set up three layers of batch processing. The first layer of processing involves eliminating the data which has more than two standard deviations of variations in major landmarks. Thus, the probe data containing a wider face or too small face will get eliminated. In the second layer, matching of finer landmarks such as ear length and width, nose length is done. The data having two standard deviations of variations in those landmarks is eliminated. Thus, the remaining data will be a lot easier to handle with the batch processing method (Fig. 2).

Fig. 2

Fig. 2. Converting 2D facial model to 3D model and increasing virtual sample size for gallery data.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0169716120300481

Analysis of variance

Kandethody M. Ramachandran , Chris P. Tsokos , in Mathematical Statistics with Applications in R (Third Edition), 2021

9.3.2 Testing the assumptions for one-way analysis of variance

The randomness assumption could be tested using the Wald–Wolfowitz test (see Project 12B). The assumption of independence of the samples is hard to test without knowing how the data are collected and should be implemented during collection of data in the design stage. Normality can be tested (this should be performed separately for each sample, not for the total data set) using probability plots or other tests such as the chi-square goodness-of-fit test. ANOVA is fairly robust against violation of this assumption if the sample sizes are equal. Also, if the sample sizes are fairly large, the central limit theorem helps. The presence of outliers is likely to increase the sample variance, thus decreasing the value of the F-statistic for ANOVA, which will result in a lower power of the test. Box plots or probability plots could be used to identify the outliers. If the normality test fails, transforming the data (see Section 14.4.2) or a nonparametric test such as the Kruskal–Wallis test described in Section 12.5.1 may be more appropriate. If the sizes of all the samples are equal, ANOVA is mostly robust for violation of homogeneity of the variances. A rule of thumb used for robustness for this condition is that the ratio of sample variance of the largest sample variance s 2 to the smallest sample variance s 2 should be no more than 3:1. Another popular rule of thumb used in one-way ANOVA to verify the requirement of equality of variances is that the largest sample standard deviation not be larger than two times the smallest sample standard deviation. Graphically, representing side-by-side box plots of the samples can also reveal a lack of homogeneity of variances if some box plots are much longer than others (see Fig. 9.3E). For a significance test on the homogeneity of variances (Levene's test), refer to Section 14.4.3. If these tests reveal that the variances are different, then the populations are different, despite what ANOVA concludes about differences of the means. But this itself is significant, because it shows that the treatments had an effect.

Example 9.3.2

In order to study the effect of automobile size on noise pollution, the following data are randomly chosen from the air pollution data (source: A.Y. Lewin and M.F. Shakun, Policy Sciences: Methodology and Cases, Pergamon Press, 1976, p. 313). The automobiles are categorized as small, medium, and large, and noise level readings (in decibels) are given in Table 9.5.

Table 9.5. Size of Automobile and Noise Level (Decibels).

Size of automobile
Small Medium Large
Noise level (dB) 820 840 785
820 825 775
825 815 770
835 855 760
825 840 770

At the α   =   0.05 level of significance, test for equality of population mean noise levels for different sizes of the automobiles. Comment on the assumptions.

Solution

Let μ1, μ2, and μ3 be population mean noise levels for small, medium, and large automobiles, respectively. First, we test for the assumptions. Using Minitab, run tests for each of the samples; we can justify the assumption of randomness of the sample values. A normality test for each column gives the graphs shown in Figs. 9.3A–9.3C , through which we can reasonably assume normality. Because the sample sizes are equal, we will use the one-way ANOVA method to analyze these data.

Figure 9.3A. Normal plot for noise level of small automobiles.

Figure 9.3B. Normal plot for noise level of medium-sized automobiles.

Figure 9.3C. Normal plot for noise level of large automobiles.

Fig. 9.3D indicates that the relative positions of the sample means are different, and Fig. 9.3E (Minitab steps for creating side-by-side box plots are given at the end of Example 9.7.1 ) gives an indication of within-group variations; perhaps the group 2 (medium size) variance is larger. Now, we will do the analytic testing.

Figure 9.3D. Mean decibel levels for three sizes of automobiles.

Figure 9.3E. Side-by-side box plots for decibel levels for three sizes of automobiles.

We test:

H 0 : μ 1 = μ 2 = μ 3 v e r s u s H a : A t l e a s t t w o o f t h e μ s are different .

Here, k   =   3, n1  =   5, n2  =   5, n3  =   5, and N   =   n1  +   n2  +   n3  =   15.

Also,

T i 4125 4175 3860
n i 5 5 5
T i ¯ 825 835 772

In the following calculations, for convenience we will approximate all values to the nearest integer:

C M = ( i j y i j ) 2 N = ( 12,160 ) 2 15 = 9,857,707 , T o t a l S S = i j y i j 2 C M = 12,893 , S S T = i T i 2 n i C M = 11,463 ,

Hence,

M S T = S S T k 1 = 11,463 2 = 5732 ,

and

M S E = S S E N k = 1430 12 = 119.

The test statistic is:

F = M S T M S E = 5732 119 = 48 . 10 .

From the table, we get F0.05,2,12  =   3.89. Because the test statistic falls in the rejection region, we reject at α =  0.05 the null hypothesis that the mean noise levels are the same. We conclude that the size of the automobile does affect the mean noise level.

It should be noted that the alternative hypothesis H a in this section covers a wide range of situations, from the case where all but one of the population means are equal to the case where they are all different. Hence, with such an alternative, if the samples lead us to reject the null hypothesis, we are left with a lot of unsettled questions about the means of the k populations. These are called post hoc testing. This problem of multiple comparisons is the topic of Section 9.8.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128178157000099

A Historical Account

N. Balakrishnan , ... M.S Nikulin , in Chi-Squared Goodness of Fit Tests with Applications, 2013

During the last 30   years, much work has been done on the classical chi-squared tests and on proposing some very original modifications (Nikulinand Voinov, 2006; Voinov and Nikulin, 2011). Bhalerao et al. (1980) noted that the limiting distribution of Wald-type modifications of the Pearson-Fisher test does not depend on the generalized minimum chi-squared procedure used, but its power may depend on it. A numerical example for the negative binomial distribution was considered for illustration. Moore and Stubblebine (1981) generalized the NRR statistic to test for the two-dimensional circular normality (see also Follmann, 1996). It is usually supposed that observations are realizations of independent and identically distributed (i.i.d) random variables. Gleser and Moore (1983) showed "that if the observations are in fact a stationary process satisfying a positive dependence condition, the test (such as chi-squared) will reject a true null hypothesis too often." Guenther (1977) and Drost (1988) considered the problem of approximation of power and sample size selection for multinomial tests. Drost (1989) also introduced a generalized chi-square goodness of fit test, which is a subclass of the Moore-Spruill class, for location-scale families when the number of equiprobable cells tends to infinity (see also Osius, 1985). He recommended a large number of classes for heavy-tailed alternatives. Heckman (1984) and Andrews (1988) discussed the theory and applications of chi-squared tests for models with covariates. Hall (1985) proposed the chi-squared test for uniformity based on overlapping cells. He showed that modified in such a manner the statistic is able to detect alternatives that are n - 1 / 2 distant from the null hypothesis. Loukas and Kemp (1986) studied applications of Pearson's test for bivariate discrete distributions. Kocherlakota and Kocherlakota (1986) suggested goodness of fit tests for discrete distributions based on probability generating functions. Habib and Thomas (1986) and Bagdonaviçcius and Nikulin (2011) suggested modified chi-squared tests for randomly censored data. Hjort (1990), by using Wald's approach for time-continuous survival data, proposed a new class of goodness of fit tests based on cumulative hazard rates, which work well even when no censoring is present. Nikulin and Solev (1999) presented a chi-squared goodness of fit test for doubly censored data. Singh (1986) proposed a modification of the Pearson-Fisher test based on collapsing some cells. Akritas (1988) (see also Hollander and Pena,1992; Peña, 1998a,b) proposed modified chi-squared tests when data can be subject to random censoring. Cressie and Read (1984) (see also an exhaustive review of Cressie and Read (1989)) introduced the family of power divergence statistics of the form

(1.9) 2 nI λ = 2 λ ( λ + 1 ) i = 1 k X i X i np i λ - 1 , λ R 1 .

Interested readers may refer to the book by Pardo (2006) for an elaborate treatment on statistical inferential techniques based on divergence measures. Pearson's X 2 statistic ( λ = 1 ) , the log-likelihood ratio statistic ( λ 0 ) , the Freeman-Tukey statistic ( λ = - 1 / 2 ) , the modified log-likelihood ratio statistic ( λ = - 1 ) , and the Neyman modified X 2 ( λ = - 2 ) statistic are all particular cases of (1.9). As a compromising alternative to Pearson's X 2 and to likelihood ratio statistic, Cressie and Read (1984) suggested a new goodness of fit test with λ = 2 / 3 . Read (1984) performed exact power comparisons of different tests from that family for symmetric null hypotheses under specified alternatives. Moore (1986) wrote "for general alternatives, we recommend that the Pearson X 2 statistic be employed in practice when a choice is made among the statistics 2 nI λ ." A comparative simulation study of some tests from the power divergence family in (1.9) was performed by Koehler and Gan (1990). Karagrigoriou and Mattheou (2010) (see also the references therein) suggested a generalization of measures of divergence that include as particular cases many other previously considered measures.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123971944000016

Chapter Summaries

ROBERT H. RIFFENBURGH , in Statistics in Medicine (Second Edition), 2006

FORMULAS FOR TESTS OF DISTRIBUTION SHAPE FROM CHAPTER 20

TESTS OF NORMALITY OF A DISTRIBUTION

Details of the Kolmogorov–Smirnov (KS) and chi-square goodness-of-fit methods are given next. (Use the Shapiro–Wilk test only with a statistical software package.)

Table 20.1. Partial Guide to Selecting a Test of Normality of a Distribution

Prefer less conservative test Prefer more conservative test
Small sample (5–50) Shapiro–Wilk test Kolmogorov–Smirnov test (one-sample form)
Medium to large sample (>50) Shapiro–Wilk test Chi-square goodness-of-fit test

KOLMOGOROV—SMIRNOV TEST (ONE-SAMPLE FORM)

Format of Table Providing Calculations Required for the Kolmogorov–Smirnov Test of Normality
Data x k F n (x) z F e (x) | F n (x) – F e (x)|
. : : : : : :
: : : : : : :

Hypotheses are H0: Sample distribution not different from specified normal distribution; and H1: Sample distribution is different. Choose α.

(1)

1. Arrange the n sample values in ascending order.

(2)

Let x denote the sample value each time it changes. Write x-values in a column in order.

(3)

Let k denote the number of sample members less than x. Write k-values next to x-values.

(4)

List each F n (x) = k/n in the next column corresponding to the associated x.

(5)

List z = (xμ)/σ for each x to test against an a priori distribution or z = (xm)/s for each x to test for a normal shape but not a particular normal.

(6)

For each z, list an expected F e (n)as the area under the normal distribution to the left of z. (This area can be found from software, from a very complete normal table, or by interpolation from Table I.)

(7)

List next to these the differences |F n (x)– F e (x)|.

(8)

The test statistic is the largest of these differences, say, L.

(9)

Calculate the critical value. For a 5% α, use ( 1.36 / n ) ( 1 / 4.5 n ) . Critical values for α= 1% and 10% are given in the text. If L is greater than the critical value, reject H0; otherwise, do not reject H0.

LARGE-SAMPLE TEST OF NORMALITY OF A DISTRIBUTION: CHI-SQUARE GOODNESS-OF-FIT TEST

Table 20.4. Format for Table of Values Required to Compute the Chi-square Goodness-of-Fit Statistic

Interval Standard normal z to end of interval P Expected frequencies (ei ) Observed frequencies (n i )
1.

Choose α. Define k data intervals as in forming a histogram of the data.

2.

Standardize interval ends by subtracting mean and dividing by standard deviation.

3.

Find the area under the normal curve to the end of an interval from a table of normal probabilities and subtract the area to the end of the preceding interval.

4.

Multiply normal probabilities by n to find expected frequencies ei .

5.

Tally the number of data ni in each interval.

6.

Calculate χ2 value: χ 2 = n i 2 e i - n .

7.

Find critical χ2 from Table III (see Tables of Probability Distributions). If calculated χ2 is greater than critical)χ2, reject H0; otherwise, do not reject H0.

TEST OF EQUALITY OF Two DISTRIBUTIONS (KOLMOGOROV-SMIRNOV TEST)

H0: The population distributions from which the samples arose are not different; H1: They are different. The sample sizes are n 1 and n 2; n 1 is the larger sample. Choose α.

1.

Combine the two data sets and list in ascending order.

Table 20.8. Format for Table of Data and Calculations Required for the Two-Sample Kolmogorov-Smirnov Test of Equality of Distributions

Ordered data k 1 k 2 F1 F2 |F1 – F2|
2.

For each sample 1 datum different from the datum above it, list for k 1the number of data in sample 1 preceding it. Repeat the process for sample 2, listing entries for k 2.

3.

For every k 1, list F 1= k 1/n 1. Repeat for sample 2. In all blanks, list F from line above.

4.

List F 1F 2 | for every datum.

5.

The test statistic is the largest of these differences; call it L.

6.

Calculate critical value. For 5% α, it is 1.36 n 1 + n 2 n 1 n 2 ; for 1% and 10%, see text.

If L is greater than the critical value, reject H0; otherwise, do not reject H0.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780120887705500678