Comparing Normal Distributions & Histograms

Hi there! I’m having difficulty answering questions when they ask us to compare a normal distribution with a histogram (e.g. 3bii 2016).

Shape: I understand we discuss bell-curved and symmetrical for the normal; and asymmetrical (left or right skew) for the histogram.

Centre: The normal is unimodal, and the mean = mode = median. The histogram may or may not be bimodal, and then we say if the median and mean are left/right of the centre. How do we know if mean and median are left or right of the centre? Do we calculate them somehow?

Spread: Find and compare the range (don’t normal distributions technically extend forever? Where does the graph ‘start’ and ‘stop’?) If we were to calculate standard deviation to show range, how could we do this?

Proportions: What does this mean?

Thank you, any help is appreciated!!

Hello! Welcome back :slight_smile:

Shape: I understand we discuss bell-curved and symmetrical for the normal; and asymmetrical (left or right skew) for the histogram.

That is correct. For the histogram you have to say that it skewed to the right (where the “tail” is).

Centre: The normal is unimodal, and the mean = mode = median. The histogram may or may not be bimodal, and then we say if the median and mean are left/right of the centre. How do we know if mean and median are left or right of the centre? Do we calculate them somehow?

  1. You described the first graph correctly.
  2. For the second graph you should mention that it is also unimodal. If it was bimodal histogram it would like like that:
    image
    You should be able to see two obvious peaks. Because histogram in the question doesn’t have them you need to say that it is unimodal.
    You can visually see mode (the highest peak). Median and mean often are not so obvious to be seen on the histogram and you have to calculate them. For this example you can estimate that about 54 bottles (20 from 298-300 and about 34 from 300-302) weigh below 302-304g and about 54 bottles (28 from 304-306, 15 from 306-308, 8 from 308-310 and 3 from 310-312) weigh above 302-304g while 42 bottles weigh between 302 -304g. So you can conclude that median weight of the bottles is between 302-304g. You can also mention that mean is located to the left of the center as the graph is skewed to the right.

Spread: Find and compare the range (don’t normal distributions technically extend forever? Where does the graph ‘start’ and ‘stop’?)

You are correct, theoretical graph does go forever asymptotically approaching X axis. However, in practical situation we can accept that very close to zero is “about zero” and it where it “starts” and 'ends". The graph shows you values of 296 to 324g as “start” and “end” so you can use these numbers to find the range. Your answer should state the range of about 28g. That will indicated infinite nature of the distribution graph but demonstrate that you still can use this graph for the practical purpose.

If we were to calculate standard deviation to show range, how could we do this?

You learn in this course that in order to find standard deviation you need to use probability of a value to be within a certain range. You don’t have this information so you can’t use this method. But if you really want you can estimate it. You know that most of the data (99.7%) is located within +/- 3 standard deviations from the center. It’s about +/- 14g so you standard deviation should be around 4.67g. But the way to find it is again to assume that the range (most of your data) is located between 296 and 324g.

Proportions: What does this mean?

Proportional distribution is worth mentioning if you aim for excellence as you were asked to compare the graphs. The obvious mismatch is that in your histogram a reasonably large proportion of the bottles weigh between 298-300g (20 out 150) compare to the first graph where only very small proportion of the bottles have this weight (it is very close to the left “end” of the graph). You may also notice that on the histogram very small proportion of the bottles weigh between 310 and 312 g (3 out 150) while on the first graph this weight is very close to the center of the diagram and therefore is very popular.

I hope that helps. Please, ask if you need more clarification.

Thanks so much! (to clarify, I wrote bimodal/unimodal because I was saying what you might write in general for any question depending on the histogram, not specific to the 2016 exam)
So I gather that the mean will move away from the skew (so right skew = left of centre, left skew = right of centre)
Can you estimate a standard deviation for the histogram too? I suppose technically, since it’s not normally distributed, it doesn’t have a standard deviation; but could you for comparison’s sake? The 2017 exam had a similar question and the assessment schedule shows:

image

For your normal shape you can assume the range is about 550g (from 3775g to 4325g) so standard deviation would be about 92g (550/6). That is a good measure of the spread, you can tell that 68% of the fish weigh between 3958g and 4142g.

For the skewed shapes (like the histogram) it is better to use IQR, however if you want to compare SD’s you certainly can do that by dividing range by 6 (99.7% of data is within +/-3 SD) so range on the histogram 4200 - 3850 = 350g. And if we divide 350g by 6 we will get about 58g which is way lower than 92g from the normal distribution graph.

You can’t take precise median, or LQ and UQ from the histogram as you are given just the intervals, so you are not expected to give precise answers. Median above 4050g is a good answer: 22 salmon weigh below 4050g and 28 above 4050g, so you can assume that median is slightly above 4050g but you can’t give precise answer from the histogram. LQ would be just below 4000g and UQ just above 4100g so LQR (middle 50%) would be just above 100g.

Any of the statements above with the supporting numbers would be enough to earn you “t” grade for this question.