An invitation to ungrade

Some thoughts about the role of grades in higher education and my attempts to get rid of them

What is this about?

I have been actively involved with teaching and grading in higher education for almost 15 years, first as a TA and more recently as an instructor. Although some degree of dissatisfaction with the process of grading has been present from the beginning, it’s only in the last year that I’ve started questioning the presence of grades in our pedagogical practices. Before that, I simply took them as a given.

I’ll have more to say about the challenges of grading below. For now, I’ll just remark that the process can be stressful and annoying, while its results are usually frustrating and at best underwhelming. Since I started designing my own courses, I found myself absorbed by the “practical” questions that come with making tests and rubrics, and I became less and less convinced that this was helping my students become better professionals, set aside better people, as a result. And I resented the antagonistic relationship that grades built between me and my students.

Hence it was a very pleasant surprise to find the collaborative book “Ungrading” displayed on a table at Caltech’s CTLO. I couldn’t help starting reading it on the spot and certainly what I found inside was very captivating. It turned out that I was not the only person troubled by grades. More importantly: there was a way out. The book, as its title suggests, is an invitation to get rid of grades, at least as much as we can given institutional constraints.

What follows is a very idiosyncratic summary of a few points made in the book, followed by my experiences implementing a form of ungrading in my courses at Caltech. It’s not meant as a lesson on how things should be done, but as an invitation to question some received practices and to explore a vast world of alternatives.

Why do we grade?

In the “received” pedagogical practice, grades seem to be important as a form of:

Assessment: they quantify how well students have learned
Feedback: they give students information that they can use to improve
Credentialing: they certify that students posses a certain level of skills

However:

Assessment does not have to be graded.
- Moreover, the act of measuring disturbs the process being measured (”Learning for assessment”). Graded assessments just make this effect more pronounced.
Grades are not good feedback.
- By displacing the focus to extrinsic, ego-centered motivation, grades dimisnish the effect of task-oriented feedback.
Grades convey incomplete and ambiguous information as credentials.

Disadvantages and challenges of grading

Before reading my views on this, I’d advice you to dig deeper into your own experience with grades. Think about your grades-related memories, first as a student in school or college, and later as a grader. For instance, you could pause reading this article, take a piece of paper, and list the first ten memories that come to your mind concerning grading and grades. Ten is a good number to force you to go beyond your recent experiences and maybe recover some of your childhood experiences as a “victim” of grades. (This activity was suggested by Laura Gibbs in Ungrading.)

Here are some issues that you might have identified:

Grades are not as informative as they seem. Learners might have very different beginning baselines, and they might have failed a graded assignment for very different reasons (negligence, lack of knowledge, anxiety, health issues, life events…).
Grading flattens nuances in student’s understanding and difficulties. This is related to the previous point. As a letter/number, letters conceal the true reasons behind students successes or failures.
Grading requires uniformity. We pretend that a test is “fair” because it’s the same for everybody, but some students might already know the material while for others everything is new and puzzling (e.g. students coming from wealthy vs disadvantaged backgrounds in first year calculus curses). The grades tell us little about the intellectual or moral qualities of the students.
Grades put the focus on extrinsic motivation and crowd out intrinsic motivation. This is well documented in the psychological literature but still poorly known by college educators. The evidence is discussed in detail in Punished by Rewards by Alfie Kohn; some of his points are summarized here.
When comments are accompanied by grades, students tend to neglect the comments. Butler and Nisan (1986) established that nontheatening, task-oriented evaluations (comments) enhance intrinsic motivation and lead to improvements in learning outcomes, while non-receipt of feedback or receipt of “controlling normative grades” undermine intrinsic motivation and impact performance negatively. Moreover, there are interferences: comments with grades are not better than grades alone. (You can read more about his here.)
Grades become the main focus of interactions between teacher, students amd institutions. Disputess on deadlines, points substracted, content that goes into the tests,… you name it.
Students fixate on grades (rather than learning), which may lead to cheating, corner cutting, gaming the system, misplace the focus on accumulating point… Then institutions take actions that only want to prevent these behaviors, irrespective of learning.
Grades are inconsistent, subjective, somewhat random and arbitrary (even with rubrics).
Students have trauma related to grades, or concerns that are unrelated to learning. At least we have to keep this in mind and consider it at the moment of defining our grading policies.

Are grades good credentials?

I don’t want to talk much about this, but I’d not take for granted that grades are a good mechanism to select or rank students in academic or nonacademic settings. Of course, they give some information, but only in conjuntion with more qualitative indicators.

Some years ago, the NYT published an interview with Laaszlo Bock, senior vice-president of operations at Google. Here’s an extract concerning grades:

Q. Other insights from the data you’ve gathered about Google employees?

A. One of the things we’ve seen from all our data crunching is that G.P.A.’s are worthless as a criteria for hiring, and test scores are worthless — no correlation at all except for brand-new college grads, where there’s a slight correlation. Google famously used to ask everyone for a transcript and G.P.A.’s and test scores, but we don’t anymore, unless you’re just a few years out of school. We found that they don’t predict anything.

[…]

Q. Can you elaborate a bit more on the lack of correlation?

A. […] I think academic environments are artificial environments. People who succeed there are sort of finely trained, they’re conditioned to succeed in that environment. One of my own frustrations when I was in college and grad school is that you knew the professor was looking for a specific answer. You could figure that out, but it’s much more interesting to solve problems where there isn’t an obvious answer. You want people who like figuring out stuff where there is no obvious answer.

This should be taken with a pinch of salt: it might be that grades are bad predictors only conditioned on the fact that one’s exclusively looking at excellent applicants with relatively good grades (this was a point raised by Mikhail Belkin). Nevertheless, the observation that grades train people to think and succeed in an artifically controlled environments is valid, and we should think about ways of introducing the open-endedness of real life in the classroom.

Towards a pedagogy of equality and autonomy

Katopodis and Davidson, in an essay called “Contract grading and peer review” that you’ll find in the Ungrading book, say

“A pedagogy of equality aims to support and inspire the greatest possible student success, creativity, individuality, and achievement, rather than more traditional hierarchies organized around a priori standards of selectivity, credentialing, standardization, ranking, and the status quo.”

Even if you don’t fully agree with this quote, take it as an invitation to disentangle teaching from credentialing. Fostering qualities such as creativity, individuality and achievement is properly in the domain of education, while grades and degrees are credentials. These credentials are by no means necessary: grades are a relatively recent invention (going back to the end of the 18th century), whereas formal instruction can be traced back to the ancient world. Even more, grades can be detrimental to education and its more natural goals.

One could argue that grades belong to a coercive and factory-like model of education, and indeed they were massively adopted during the first half of the 20th century, when many seminal ideas of “management” became widespread. But besides this question of origin, it is clear that grades establish a barrier between students and teachers, and put the focus on fear, coercion, and competitiveness. They are examples of ego-centered interventions that impact negatively intrinsic motivation.

On the contrary, a feeling of autonomy increases intrinsic motivation. It gives students a space to explore what is meaningful and valuable to them. It also helps students develop their own standards of self-scrutiny, a fundamental skill for professional life.

But you’re probably wondering: how can autonomy enter the pedagogical process? I think that we must give students the freedom to explore their own interests, set their own goals, choose the references that are most helpful to them, and—most importantly—make mistakes without being “punished” for them. Mistakes are fundamental for learning! Our role as teachers is to mentor them, so that they can make sound, informed choices.

A possible alternative: labor-based grading

Katopodis and Davidson recognize that besides the ideal of a pedagogy of equality, students also want “something more concrete than a sense of their own learning: they want some formal, institutional recognition of the effort they have invested in their learning” (speacially if they ae paying expensive tuition fees!). Their answer to this conundum is a combination of contract grading and peer evaluations, whic provide a meaningful, documentable and responsible form of credentialed credit.

Labor-based grading contracts are an alternative assessment model that shifts the focus from the quality of assignments to the effort and process involved in completing them. Under this scheme, the instructor and students agree at the start of the course on a contract that outlines the quantity and type of work necessary to achieve a certain grade. Possible criteria are number of assignments to complete, attendance, participation, and adherence to deadlines. The approach is described in detail in the book “Labor-based grading contract” by Asao Inoue.

A grade in this labor-based approach is purely based in quantity of labor. Among the reported benefits, its advocates mention reduced performance anxiety, increased focus on learning, and better equity. Quality is part of the picture, but purely through feedback.

There is a recent and a priori unrelated study by Koedinger et al., “An astonishing regularity in student learning rate” (PNAS, 2023), which I think reinforces the case of labor-grading contracts. Here’s an extract:

Prior research, often using self-report data, hypothesizes that the path to expertise requires extensive practice and that different learners acquire competence at different rates […] Students do need extensive practice, about seven opportunities per component of knowledge. Students do not show substantial differences in their rate of learning. […] Despite being in the same course, students’ initial performance varies substantially from about 55% correct for those in the lower half to 75% for those in the upper half. In contrast, and much to our surprise, we found students to be astonishingly similar in estimated learning rate, typically increasing by about 0.1 log odds or 2.5% in accuracy per opportunity.

I encourage you to read the study. For the purpose of assessing and grading, this is telling us that if all students engage the same number of times with each learning component of our course, then they will all learn a similar amount even if the “outcomes” are obviously not the same. If as an educator you only look at the outcomes, and you only praise and value the students at the top, you’re probably just reinforcing the unequal distribution in exposure to the material the students came with. The students that are behind probably come already with a “learned hopelessness”, a belief that they are dumber, and that it’s not worth investing much effort because they won’t succeed anyway; in the end you’re just reinforcing that belief.

Although my own approach to ungrading is different, this contract-based method was an important inspiration for me and I think it’s fair to mention it. I was not fully satisfied with it because I wanted to center the narrative on learning (and on reflecting critically about the process of learning) rather than in the quantification of labor.

My method: “All-feedback-no-grades” (until the very end)

I want to describe here the way I implemented a form of ungrading in my course Ma140a Probability during Winter 2024. This is an upper-division undegraduate course on measure-theoretic probability, taken by a very heterogeneous population of undergraduates and graduate students. I designed the course mainly with seniors in mind, but several freshmen insisted in taking the course even after I clarified many times that I’d review measure theory very hastily and that the course required mathematical maturity (there are no formal prerequisites). A couple of graduate students from applied sciences also took the course. This heterogeneity was an important motivator to adopt an autonomy-based approach that could accomodate the different backgrounds and interests.

My version of the course put special enphasis on active learning. We had three one-hour lectures per week (Monday-Wednesday-Friday). Mondays were mainly expository: I introduced the main concepts and theorems for the current week, trying to show the “big picture” instead of getting caught in technical details. Wednesdays we generally had collaborative problem solving sessions, in which I divided students in groups of two or three people that worked on a list of proposed problems. On Fridays, we had more “interactive” sessions, were I addressed the main difficulties that arose on Wednesday, answered student questions, and sometimes presented more advanced topics.

At the end of the first week of classes, the students submitted a beginning-of-term reflection. They had to explore the course contents beforehand and then reflect on the articulation between this content and the learning goals that I proposed as a instructor in the syllabus. They also had to reflect about their career goals and come up with personalized learning goals that responded to them.

At the end of weeks 2-10, the students submitted written solutions for the two most challenging problems that they solved during that week. These could be some of the problems we discussed together on Wednesday as well as other problems that I proposed online or that they found in books. The solutions were commented by the TA and by peers (using Canvas’s FERPA-compliant peer-revieew function). Besides the problem solutions, students submitted a weekly report where they documented their work, described their learning process, justified their learning decisions, and reflected on the feedback received.

During midterm week and at the very end of the course, students submitted a portfolio instead of the usual weekly report. This portfolio was a cumulative self-evaluation, where students examined all their written submissions, reflected on their improvements and on how much they met the learning goals (external and personal), and gave themselves a grade. We discussed in class some of the criteria that were relevant for this final grade; these included time spent, learning progress, quantity and quality of solutions to problems, presence in class and in collaborative problem-solving sessions, and contributions to peers’ learning. They were free to combine these criteria in a way that made sense to them.

After reviewing the portfolios, I scheduled a 6-8 minute meeting with each student to talk about what they wrote and be sure that we agreed on the grade. I had told the students that I reserved the right to disagree with the grade they gave themselves. But I seldom did: students were very serious in the process of grading themselves, they provided good justifications for the grade they chose and were generally very aware of how well or bad they were doing in comparison to their peers (the collaborative sessions certainly helped in defining a common standard). When I proposed changes, they always involved one “step” (e.g. A- to A, etc.). Two students dropped the course when they felt they couldn’t keep up with its pace. The final grades were distributted between B- and A+, although most non-As were obtained by the freshmen, so according to Caltech’s policies counted as pass-fail on their transcript.

Although it was important to me that the final grades made some sense and reflected learning and effort in a meaningful way, these final grades are definitely not the main point of this exercise. Please read the feedback given by the students (in the next section) to get an idea of the positive effects of ungrading.

Here you can find some of the materials that I used. They might be helpful if you want to implement somthing similar.

The last to are based on similar questionaires used by Susan D. Blum. See her essay “Just one change (just kidding): ungrading and its necesary accompaniments” in Ungrading. I took the motto “all-feedback-no-grades” from Laura Gibb’s essay “Let’s talk about grading” in the same book.

Feedback received from the students

Finally, I want to share with you some of the feedback that I got from the students of Ma140a Probability during Winter 2024. These fragments are taken from answers to the question “What was your initial reaction to the lack of focus on grades? How do you feel about it now?” which was included in the midterm and final self-evaluation questionaires. Although this is obviously a selection, I’ll add that every single student had a positive opinion about the system.

“[The lack of emphasis on grades has] allowed for more flexibility in my work schedule as well as allowing me to attempt more difficult problems without fear of my grade dropping as a result.”
“My initial reaction was worry. I was worried that without a focus on grades, I would end up learning less. I thought I would have to work extra hard to make sure I do work without the added pressure of grades. Now, I am quite surprised by how much I have learned and how I have not felt like I have needed more motivation to work. I have felt quite motivated to learn the topics. I think somehow the combination of working with others in class and reflecting seriously about my progress has been sufficient to keep me on track”
“I think it is very conducive to a better learning environment. The motivation for me is now to clarify and understand the workshop problems and problems from outside class. This is a better motivator than grades.”
“I feel that I have learned a lot about probability theory in this course. In particular, I find that I have been going out of my way to learn as much as I can about the topics, especially when it’s confusing. I think that this has a lot to do with the grade scheme of the course allowing my time to be spent worrying about understanding instead of grades.”
“My initial reaction was a tentative optimism since I thought some people may not capitalize on the opportunity to learn more without the stress of grades. In my experience thus far, I see that it is hard not to pursue an understanding of the content under this grading scheme.”
“I was excited to see how it would play out. I think I feel much more relaxed about assignments, and thus more keen to understand them properly rather than just complete them. I also think that having many problems, but choosing only two to submit is ideal. It allows me to pick what I want to understand and doesn’t pressure me to do hard problems I feel aren’t interesting, or easy problems I feel would be time consuming without benefit. The problems I choose to work on tend to be at the right level so I can learn something new, and I consistently have been every week”

Conclusion

I hope this blog post has motivated you to think about why we grade and how we could do things differently. I’ve simply wanted to report some of my thoughts and experiences. I’m not claiming here that the method presented here is a general magic formula that would work in every case. Each course is different and comes with its own challenges. But whatever you end up doing in your courses, I hope you do with a fuller understading of its pedagogical implications. I hope we break the cycle of treating our students the way we were treated just because that’s supposedly the way things are meant to be. If we want to see a breader change, we have to start today in our classrooms.