Grouped Scatterplots

oliviaeiffe · 25 August 2020 09:31

I have to describe the Trend/Association and Strength of a scatter plot that has grouped data. How can I describe this?
Thanks

NewtonsApple · 25 August 2020 10:27

Hi Olivia,

I am just a student like you but I will try my best to help you,
To describe the trend and strength make sure you refer to the correlation coefficient. For example, if your correlation coefficient is close to 1, then you can say there is a strong positive relationship. However, if it is close to -1, then there is a strong negative linear relationship. If it is somewhere in between, then there is a moderate relationship

Look at outliers and discuss whether these outliers are reasonable and what may have caused them. How would your data look different if they were removed.

Another point when talking about the strength could be where the bulk of the data is at. If it is clustered around the regression line at the start but spread out towards the end, you should definitely mention that.

Anything that is particularly striking when observing the data is worth mentioning. Remember to always relate this to your research.

Causation and Correlation is definitely something worth mentioning.

I did this internal in term 1 and that is all I could remember. Sorry if this is stuff you already knew!

MinEDSupport · 15 September 2020 02:35

Hi Olivia
At present the discussion forums are closed, however we can provide you an answer - please pop to the discussions next week when they reopen

Thanks Newton’s Apple for your response too!

This depends on whether you are talking about data that has been grouped and then graphed, or if the data that has been graphed falls into groups on the graph. Ideally, you shouldn’t pre-group your data if you want to graph it and explore the relationship. When data is grouped, some of the variation is inevitably hidden and therefore it will either strengthen or hide the true relationship. When you graph grouped data, it is likely to put all the values of the group stacked on a single value, which means that your graph will essentially just show all the values in the interval as the same point, which may be one of the ends of the interval but more likely the midpoint.
If you are talking about the data clustering in the graph to provide different groups, then look and see if the groups themselves individually have a trend/association and talk about them, and then overall whether there is a trend/association for the entire set of data. For example, it might be that each individual group looks like just a circle without a trend (or only a weak relationship), but overall the data is showing an increasing trend. When you put on the trendline/line of best fit, look at how well it actually reflects the plotted data. Note that the line needs to fit the data, not the data fitting to the line! If the line passes closely through much of the data, no matter how it is grouped, then it is likely that you have a strong relationship.