Friday, May 8, 2009

Letter Grades

In the US, final course grades are expressed in letters (the schemes I've seen are A/B/C, and A/A-/B+/B/B-). This leaves one with the interesting task of drawing barriers in between students (a task best performed at 2am the night before moving out of your apartment).

I am not comfortable with adding different components of the grade (like the midterm and final), as the average and standard deviation are vastly different. Is there a good statistical approach to this problem, with some convincing theory behind it? (Being at IBM Almaden, I should automatically go for rank aggregation, I guess... But there seems to be a lot of noise in the rank, since there are clusters of people with similar scores on each exam.)

Fortunately, if you only give one midterm (and assign low weight to the homework), you can plot the results in 2D. Unfortunately, you may end up with something like this:

Oh well... Everybody knows grades are somewhat random.

13 comments:

JeffE said...

I am not comfortable with adding different components of the grade (like the midterm and final), as the average and standard deviation are vastly different.Huh? I don't understand your objection. I'm sure different exam problems also had very different means and variances, but you had no problem adding those.

Mihai said...

I'm sure different exam problems also had very different means and variances, but you had no problem adding those.
The different problems on an exam share resources (the duration of the exam). Students are optimizing their total score on the exam subject to the constraints — thus, the individual points on a problem are not a meaningful measure.

On the other hand, two exams are supposed to be two independent experiments...

Anonymous said...

Isn't adding independent random variables better than adding dependent random variables? Not sure I understand the reasoning.

Mihai said...

Isn't adding independent random variables better than adding dependent random variables?
Yes, that's why two exams are better than one. But how do you aggregate them? Imagine on one exam the average was 40 and the standard deviation was 10, while on another the average was 70 and the standard deviation 20. You should just add them up — solid performance in the first exam could be wiped out by randomness in the second...

Anonymous said...

I'm surprised you don't need to specify the rules for generating the final grade at the start of the course. Where I'm teaching, you really need to do this. Otherwise you get a lot of "grading rules not clear" comments in the questionnare the students are filling at the end of every course, and that's not good.

Anonymous said...

There is definitely a problem when adding scores of one exam with a very low mean and low standard deviation, and another with a high mean and high standard deviation.

The problem is not with adding scores, it is with your exam design. Well designed exams will have similar means and standard deviations.

Mihai said...

The problem is not with adding scores, it is with your exam design.
Well, if you can design an exam that always has mean 50 and standard deviation 20, you don't need a curve anyway. You can just announce at the beginning that 70+ is A, 50+ is A-, etc.

Anonymous said...

Of course you can't design exams with exactly the same means and standard deviations.

But you can (and should) get close enough.

Mihai said...

Argh. The answer to the scientific question "how should we treat clavicle fractures" is not "teach the patients not to break their bones." Of course we should also teach the patients, even if we find a very good cure.

Anonymous said...

The difference is that in this case you control both the exams, and the grade assignment methodology. (And doctors don't control patients of course.) So yes, the answer is to devise suitable exams. :)

Alan Fekete said...

This discussion seems to ignore the formative aspect of assessment before the end of semester (eg midterm or homeworks). These are set to teach the students about how they are going. One of the best ways to send a clear message is to set/grade each piece of work to the same standard (not to the same curve, necessarily; one hopes the students improve their performance following feedback from the formative assessment). So the teacher should decide what type of work will get an A, and they give an A on each homework or midterm for work of that type.

What then should be used to decide the overall, summative, assessment? Essentially it should just be the performance on the final exam (if a student does badly at first, but learns form it, they shouldn't lose out on their overall grade). However, this is emotionally too harsh (it raises the exam stress, which damages the performance of some students), also many students need the incentive of having early tasks count, or they won't put in the effort that helps them learn. So we give some weight to the early formative tasks, but it should be as small as possible in cases where the student has improved. If students/organisation are flexible enough to deal with a formula beyond a simple sum, one arrangement that meets these constraints is 50%final+50%max(final, earliertasks).

Dan Spielman said...

When adding grades, I like to re-scale them in a continuous, monotone increasing way, so that medians line up. I also require that zero maps to zero, and I usually map the max to the max.

Splines don't do this nicely, but the pchip routine in matlab, which provides a "Piecewise Cubic Hermite Interpolating Polynomial" does a good job. I haven't yet worked forwards from a statistical theory, but if you require the function to have nice derivatives, you won't get something too different from this.

Writing this was much more fun than reading reviews of FOCS papers!

Anonymous said...

How is it going with grading of FOCS submissions? Are you done with it? BTW, what is the status of FOCS results and when do you expect to announce?