Anonymous
Anonymous asked in Science & MathematicsMathematics · 4 weeks ago

# MATH NEED HELP (Data Management!!!!!??)?

Mr. Kartye wants to investigate if student performance on the MDM 4U final exam is related to time spent studying for the exam. He gathered the following sample data:

Hours Studying 10 15 21 6 18 20 12

Exam Mark 78 85 96 75 84 45 82

Using Fathom, Mr. Smith generated the scatter plot shown at the right. Unfortunately, it does not display what Mr. Smith wanted. For example, r²=0.0039 meaning there is absolutely no relationship (correlation) between exam marks and hours spent studying. Furthermore, the line of best fit of yp = -0.197x + 81 demonstrates a negative slope. In other words, the more you study, the lower your exam mark. Kartye determines he must have done something wrong, or not taken something into consideration. Suggest how Mr. Smith should modify his data set. Implement your suggestion to determine a new line of best fit, and coefficient of determination.

Relevance
• 4 weeks ago

I don't recall the coefficient of the determination being covered in MDM4U's curriculum...

Just remove outliers; they have a massive impact on the linear regression line, especially when the data set is so small

To find outliers, we need to find the IQR (interquartile range), multiply it by 1.5, and add/subtract it from Q3 (quadrant) / Q1, respectively. Any values lesser/greater than that range is an outlier and should be removed.

• 4 weeks ago

There is one person that studied 20 hours but got a 45. While I think it incorrect to modify his data set to get the conclusion he wants, he should see why that score seems way off. Maybe the person messed up their answer sheet somehow, or the person grading made a mistake, or the student had to leave early and didn't finish the test, or maybe they stayed up studying for so long they fell asleep during the exam, or maybe they lied and said they studied for 20 hours but really didn't.

If you think that data point is a true anomaly, I suppose you could run the numbers again without it, but there should be an explanation about why that number was considered an anomaly.