This work was created by Dr Jamie Love and Creative Commons Licence licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Lesson Eleven
The Chi-Square is a "Ratio Ruling"

by Dr Jamie Love Creative Commons Licence 2002 - 2010

by Dr Jamie Love © 2002 - 2010

In a perfect world, without the complexities of statistics, we would expect a "large family" of 160 offspring from our dihybrid cross to have 90 plants that produce green, inflated pods; 30 plants that produce green, constricted pods; 30 plants that produce yellow, inflated pods and 10 plants that produce yellow, constricted pods.

Sadly, we do not live in such a simple world and statistics come into play. The ratios might be slightly "skewed" - a math term for "off" - due to the number of individuals you collected or just "bad luck". The ratios we expect from a dihybrid cross are not always what we get in the experiment. One way to combat the problem of statistics is to use statistics!

Let's take a step back and look at some of Mendel's original work with monohybrids because they are an easier place to start. You will recall that Mendel did a lot of monohybrid experiments and collected a lot of data from a lot of plants and crosses. Here's one of those sets of data that I showed you earlier.
P = smooth seeds crossed with wrinkled seeds
F1 = all smooth seeds (so smooth is dominant and wrinkled is recessive)
F2 = 5,474 smooth seeds and 1,850 wrinkled seeds is a ratio of 2.96 : 1

Mendel and the Punnett square tell us that we should have a ratio of 3 : 1 not 2.96 : 1! So is Mendel wrong? Is the Punnett square wrong? Is our entire understanding wrong?!
That depends upon how different the actual, observed numbers are from the calculated, expected numbers. But how close is close enough? Is 2.96 : 1 close enough to 3 : 1 that we should accept Mendel's ideas? Some folks would argue, "Well, Mendel says it should be 3 : 1 and it is not 3 : 1, so Mendel is wrong!" But someone else would argue, "Hey, lighten up! I think 2.96 : 1 is close enough to 3 : 1 so I will not reject Mendel's ideas."

Mendel wasn't bothered by the fact that his data was a little off because he knew that, statistically speaking, he was "within bounds". But what are those bounds and how do you calculate them? That's when Mendel fell back on his knowledge of math and showed that these tiny differences were not significant enough to cause him to throw it all away.

Mendel used the chi-square (abbreviated 2) test and so will you.
The chi-square test, or simply the "chi-square", measures the significance of the data in comparison with what you expect to get. This statistical test is useful in many problems in genetics and other sciences, so it is important that you learn how to "do the chi-square". The beauty of the 2 is that it only requires that you know the number of individuals observed in each category and what numbers you expected them to be.

Actually, 2 is pretty easy and to some folks it's even obvious!
I will walk you through the entire process shortly but first let me tell you what we are going to do. [Some folks find this "word version" a nice introduction to the "math version", while others say it just scares them! I hope it doesn't scare you. Please read it as an introduction to the ideas because you will see the math soon enough.]

In chi-square analysis you compare the number of individuals of a certain phenotype (or anything else) that you have found in the experiment to the number you expected to have. That is, you find the difference between the observed and the expected by simply subtracting one from the other. [It doesn't matter which one you subtract from which - all you want is the difference.]
Then, just to make that difference bigger and to make it always a positive number, you square it (multiply it by itself). That gives you the "squared difference".
Then you divide the "squared difference" by what you expected in the first place in order to give you a "squared difference per expected" for that group. [This step brings the numbers into a reasonable zone to work with but the reason you do it has to do with the theory of statistics and I won't go into that!]
Naturally, you have to take into account all the different types, and you do that by adding together these "squared differences per expected" values. The final sum (of the "squared differences per expected") gives you a number called the 2. (Scared yet? )
By the way, chi () is the Greek letter for "c" which mathematicians often use as an abbreviation for "comparisons". The chi-square (2) is a "comparison squared".

OK, let's look at those F2s again.
F2 = 5,474 smooth seeds and 1,850 wrinkled seeds for a ratio of 2.96 : 1
That ratio is convenient but we don't use the ratio in our calculations of 2. Instead we use the "raw numbers" of the data and compare it to the "raw numbers" we expected.
So, if Mendel's 3 : 1 ratio is correct, how many smooth seeds would you expect and how many wrinkled seeds would you expect in the F2s?
Grab your calculator, a pencil and a sheet of paper because here we go!

Step 1: calculate the EXPECTED number of each type.
To do that you must first add together both seed types and when you do that you will see that you start with a total population of 7,324 seeds. [That's 5,474 smooth seeds plus 1,850 wrinkled seeds equals 7,324 seeds total. 5,474 + 1,850 = 7,324 but don't take my word for it - check it with your calculator!]
Of those 7,324 seeds you expected a quarter of them (1 in 4) to be wrinkled. [That's because the wrinkled seeds were the "1" in the 3 : 1 ratio and a 3 : 1 ratio is represented as fractions of ¾ and ¼.]
So, how many of those 7,324 seeds should be wrinkled?
Just divide 7,324 by 4 (the same as multiplying 7,324 by ¼) to get 1,831 wrinkled seeds expected.
Notice that Mendel got 1,850 wrinkled seeds in the experiment. Is that significant? We'll see. (That's what the chi-square is all about!)
Now, how many smooth seeds should you expect from the total of 7,324 seeds?
Well, you expect three times as many smooth as wrinkled so simply multiple 1,831 by 3 to get 5,493. That means you expected to get 5,493 smooth seeds.
Does that make sense? Let's check it to make sure. We expected 5,493 smooth seeds and 1,831 wrinkled seeds. That's a total of 7,324 seeds and that is exactly the total number we are working with. Another way to check your math is to notice that 1,831/7,324 = 0.25 (which is exactly ¼), the correct fraction of wrinkled seeds expected in the total population. [You can check it again by looking at the fraction of smooth seeds. That's 5,493/7,324 = 0.75, which is ¾, the fraction of smooth seeds expected.]
By the way, sometimes you will get a fraction and you might think "I've gone wrong - you cannot have a fraction of a plant!". Well, you are right that you cannot have a fraction of an individual but when we do this mathematical analysis it is acceptable to get fractions.
OK, step one is complete!
We expected 5,493 smooth seeds and 1,831 wrinkled seeds.

Step 2: calculate the "SQUARE OF THE DIFFERENCE PER EXPECTED".
Let's do the smooth seeds first. We observed 5,474 but expected 5,493. That's a DIFFERENCE of 19. [5,474 - 5,493 = -19, but we can ignore the minus sign.]
Next we SQUARE THE DIFFERENCE. 19 x 19 (or 192) = 361.
Next we find the SQUARE OF THE DIFFERENCE PER EXPECTED by dividing that number (361) by the total number of smooth seeds we expected to see (5,493). That is 361/5,493 = 0.066 (rounded to three decimal places is good enough for us). So the "squared differences per expected" of the smooth seeds = 0.066.
Lets' do the wrinkled the same way. We observed 1,850 wrinkled seeds but expected 1,831. That's a difference of 19 (1,850 - 1,831 = 19) again. [It doesn't always work that way, unless you are working with an experiment with only two outcomes, like this one. A dihybrid cross has four outcomes and is more complicate.] So you square the difference to get 361. You then divide it by the expected number of wrinkled seeds (NOT the expected number of smooth seeds - a common mistake) so that is 361/1,831 = 0.197 (rounded to three decimal places, like before).

Step 3: congratulate yourself for having gotten through the toughest part!

Step 4: SUM (ADD up) the "squared differences per expected" from all the categories.
In this case there are only two categories so there are only two values to add. Add the value you calculated for the smooth (0.066) to the value you calculated for the wrinkled (0.197) to get 0.263.

This experiment has a chi-square equal to 0.263 (2 = 0.263).
There. That wasn't too difficult, was it? You found the chi-square value for this set of data. Now all we have to do is ...

Step 5: COMPARE our chi-square value to the value in a chi-square significance table and determine if our value is significant.
The chi-square significance table has been developed by statisticians. These tables come in all shapes and sizes depending upon how exact you want to be and how many categories you are dealing with. For our work we want to know if these results pass a significance level of 5%. (This is a fairly good level of significance and is often used as a "cut-off" in experiments like these.)

OK, what does this mean? What is this "degrees of freedom" stuff?

Chi Square Significance Table

Degrees of Freedom 5 % Significance Levels
1
3.84
2
5.99
3
7.81
4
9.49

The simple answer is that your degrees of freedom are one less than the number of categories you have to work with. [The complicated answer is that degrees of freedom are the number of values that can be randomly assigned while the total is left unchanged. Don't worry about it. ]
We have two categories, smooth and wrinkled, so we have one degree of freedom and you see from this table that with one degree of freedom we could be allowed a chi-square as large as 3.84 and the results would still be considered significant to 5%. That is, we would have to get a chi-square value over 3.84 before we would say that our results were so far from a 3 : 1 ratio that we would have to reject that ratio (and Mendel's explanation of how he got that ratio). Or, to put that another way, with a 2 = 0.263 there's less than a 5% chance that this 3 : 1 ratio happened by accident. There is a better than 95% chance that he 3 : 1 ratio has real meaning or is "significant" in this experiment.

The chi-square is a kind of "mathematical judge" of probabilities.
There are other "mathematical judges" used in other areas of science but the chi-square is the only one we will use in this course.

Let's think a bit more about the chi-square and what it has told us.
Imagine that Mendel's work had come out with exactly the ratio we expected. That is, imagine Mendel observed in this experiment 5,493 smooth seeds and 1,831 wrinkled seeds. That is an exact 3 : 1 ratio.
Let's do a quick chi-square on that imaginary result.
Looking first at the smooth seeds we would see that the difference between the observed and expected is zero! (That's because 5,493 - 5,493 = 0.) When we square zero we still get zero. If we divide zero by the expected value we get zero!
The same happens when we calculate the values for the wrinkled seeds too.
Now we would add those two values together (because they are the "squared differences averaged") to get a final 2 = 0.
In other words, when the chi-square equals zero the experimental results are in exactly the ratio expected! [This rarely happens.]

Degrees of Freedom 5 % Significance Levels
1
3.84
2
5.99
3
7.81
4
9.49

Conversely, the farther the chi-square gets from zero the less likely the ratio "rule" is being followed. If the chi-square had been 1.9 (instead of 0.263) we would have been less confident but still within the 5% significance range. (Right?)

As a matter of fact, we could have gotten a chi-square value as high as 3.84 and still feel that we were close enough to the 3 : 1 ratio to not be worried. With a chi-square of 3.84 the chances of the results fitting a 3 : 1 ratio by chance (by "accident") are 5%. But if our chi-square value was larger than 3.84 we would be drifting into uncertainty. If the chi-square were 13.4 we would not feel at all comfortable and would have a good reason to suspect that the 3 : 1 ratio did not apply . With a larger and more defined table we could even see to what level our confidence had dropped!

Here's another set of results from Mendel's monohybrid cross experiments. Let's do the chi-square analysis of it.
Here I'll condensed the "steps". You'll see it flows a little bit better and there is less "hand holding" or explanation.
P = green seeds crossed with yellow seeds
F1 = all yellow seeds (So which color is dominant? I hope you agree that yellow dominates green seeds.)
F2 = 6,022 yellow seeds and 2,001 green seeds
Is this close enough to the 3 : 1 ratio we expect?

First, calculate the expected number of each type.
You have a total of 8,023 seeds (6,022 yellow + 2,001 green = 8,023).
The green seeds should (are expected to) make up a quarter of that population, so dividing 8,203 by 4 gives you 2,005.75. That means you expected 2,005.75 seeds. [It's nonsense to think in terms of ¾ of a seed but for the chi-square it's OK to continue with these silly fractions.]
The yellow seeds should make up the rest of the sample so we can find their number by subtracting 2,005.75 from the total 8,023 to get 6,017.25. [Or you could have multiplied 8,023 by ¾ and get the same number. It's a good idea to try it both ways to make sure you haven't made an error.]
So, from the total of 8,023 seeds you expected 6,017.25 to be yellow and 2,005.75 to be green.

Second, calculate the "squared differences per expected".
Let's do the greens first. You expected 2,005.75 but observed 2001 and that is a difference of 4.75 (2,005.75 - 2,001 = 4.75). When you square that number you get 22.56. When you divide it by the number you expected (2,005.75) you get 0.011 (to three decimal places).
Now the yellows. You expected 6,017.25 but observed 6,022 and that is a difference of 4.75 (6,017.25 - 6,022 = -4.75, but we can ignore the sign). When you square that number you get 22.56. Now divide it (22.56) by the number you expected you get (6,017.25) to get 0.004 (to three decimal places).

Notice that, because we are working with only two categories, the "squared differences" are the same in both groups (22.56) because the differences are the same. (They MUST be the same if there are only two groups! Think about it.) However, the "squared differences per expected" are different because we have different expectations for the two groups (6,017.25 to be yellow but 2,005.75 to be green) so we divide by different numbers. I point this out because it can be used to highlight two of the most common mistakes in doing the chi-square. Your "squared differences" in an experiment with only two categories (one degree of freedom) must be the same - if they are not you made a math error. However, it is very unlikely that your "squared differences per expected" are the same unless you expected the same number for each group (a 1: 1 ratio) or you made the common mistake of dividing both groups by the same number. Watch your numbers and pay attention.

Third, sum (add up) the "squared differences per expected" from all the categories.
That's 0.011 + 0.004 = 0.015 so your 2 = 0.015.
Wow, that's even better than before but let's look at the table just to make sure. We are still working with only one degree of freedom. (Right?)

Obviously, the ratio observed in this experiment (6,022 yellow : 2,001 green or a 3.01 : 1) is not so far off from the 3 : 1 ratio as to cause concern.

Degrees of Freedom 5 % Significance Levels
1
3.84
2
5.99
3
7.81
4
9.49

Perhaps you found it difficult to follow through all those steps without a simple "formula". This is a good time to present the formula in order to show you what you have been doing and to help you in the future.

2 = [(O - E)2/E]
"O" is the number observed and "E" is the number expected.
The part within the brackets, (O - E)2/E, is the procedure you use to find the difference (O - E), then square it (O - E)2 and then divide by the number of expected, (O - E)2/E. That's what you do for all categories (smooth and wrinkled, green and yellow, etc.).
The symbol "" is called "sigma" and is used throughout math to mean "sum". Here it tells you to add together (sum) the values you calculated for each category.
Some people enjoy equations and some people are panicked by them! Try to get use to understanding and using this chi-square equation. I will not expect you to memorize the equation, but I will expect you to "do the chi-square" and this formula will be useful to help you through all those steps.

Let's do another chi-square (Ugh! ) with some other values.
Let's assume the results of a cross were
3,087 yellow seeds and 2,937 green seeds.
Is this close enough to a 3 : 1 ratio? (It doesn't look like it to me but let's do the chi-square to find out.)

What would be the ratio if it were exactly 3 : 1?
There are 6,024 seeds in total. (That's 3,087 + 2,937 = 6,024 total.)
If the 3 : 1 ratio applies then one quarter of them should be green. That means 1,506 should be green (6,024/4 = 1,506) and the rest 4,518 should be yellow. (That's also 6,024 x ¾ = 4,518 yellows.)
OK. Lets' do the greens first. That's (O - E)2/E = (2,937 - 1,506)2/1,506 = 1359.7 [Run that through your calculator to be sure you can do it.]
The yellows will be (O - E)2/E = (3,087 - 4,518)2/4,518 = 453.2.

Now add them together (that's what means) to get a 2 = 1812.9.

Wow! Our calculated chi-square shows that these experimental results are well outside the acceptable range for a 3 : 1 ratio so we reject the idea that these results represent a 3 : 1 ratio. These offspring are NOT the result of a monohybrid cross or Mendel was wrong!

Degrees of Freedom 5 % Significance Levels
1
3.84
2
5.99
3
7.81
4
9.49

Hmmm. Look at that data again.
3,087 yellow seeds and 2,937 green seeds
Are they close to any other ratios you've seen? What ratio are they close to and how would you test the ratio to see if it is close enough?
I hope you decided that the ratio of 3,087 yellow seeds and 2,937 green seeds is close to a 1 : 1 ratio. Actually the observed ratio is 1.05 : 1 but is that close enough? Maybe. Maybe not. Whenever you are confronted with a problem asking you if the ratio is close enough, think about using the chi-square.
You can use the chi-square to test ANY ratios you have in mind. All you need to have is a hunch of what the ratio should be and then use chi-square to see if the experimental data is close enough. So, let's do another chi-square on that same data (3,087 yellow seeds and 2,937 green seeds) and see if it is close enough to a 1 : 1 ratio!

First, what would be the "perfect" 1 :1 ratio among this group of seeds?
There is a total of 6024 seeds (still) so a 1: 1 ratio should show us 3,012 yellow seeds and 3,012 green seeds.
OK, that's what we expected. Now let's do the chi-square.
Lets' do the green's first. That's (O - E)2/E = (2,937 - 3,012)2/3,012 = 1.867.
The yellows will be (O - E)2/E = (3,087 - 3,012)2/3,012 = 1.867 (again).
[In this example the "squared differences per expected" should be the same because here you expect the same 1 : 1 ratio for the expected in this two category puzzle. ONLY when you have a 1 : 1 ratio to test on a two category chi-square will you get equal "squared differences per expected".]
Adding them together gives me the 2 = 3.734 and I compare that with the values in the table.

I see that my new 2, using a 1 : 1 ratio, is low enough to be within the range of significance. (This 2 is less than 3.84.) Therefore, the results of this experiment are far from being 3 : 1. I think they are really 1 : 1!

Degrees of Freedom 5 % Significance Levels
1
3.84
2
5.99
3
7.81
4
9.49

Surprised? Well, you shouldn't be. First off, the observed ratios look closer to 1 : 1 than to 3 : 1. (Right?) Second, I didn't tell you that this experiment was an F2 population of seeds. (Did I?) Indeed, I made these numbers up to represent the results you would get from a test cross where the unknown genotype turns out to be heterozygote. In other words, this is an acceptable ratio if one parent was ss and the other parent was Ss.

The chi-square is used whenever you want to compare the observed results to the ones you would expect from a certain ratio. That ratio could be 1 : 1 or 3 : 1 or even (oh, no ) 9 : 3 : 3 : 1. That's right! You can use the chi-square to determine if a dihybrid cross is producing offspring in acceptable ratios. Note, however, that with four categories (instead of two) you have three degrees of freedom and twice as many calculations to do.

Do the Chi-Square Workshop (Workshop Three) now and then do the SAQs for this lesson so you will get plenty of practice with the chi-square.


This work was created by Dr Jamie Love and Creative Commons Licence licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Table of Contents Homepage How to get a FREE copy of the entire course (hypertextbook) Frequently Asked Questions