The End of Grading

How the irrational mathematics of measuring, ranking, and rating distort the value of stuff, work, people—everything.
Video: Jacqui VanLiew; Getty Images

On March 14 at 1:59 (3.14159), people all over the world eat pies, run circles, crash computers, and generally act irrationally—all in celebration of everyone’s favorite irrational number. Pi Day was born at San Francisco’s Exploratorium in 1988 but has since colonized the globe, recognized by both UNESCO and the US House of Representatives. Ubiquitous in equations, pi begat Pilish, a constrained language fun for writing poems and stories (“how I want a drink—alcoholic of course” uses the first nine digits); 3.14 is also, incidentally, Einstein’s birthday. I’m not the only person with a cat named Pi.

But we’ve got pi all wrong. It’s not really a number at all. It’s a relationship—between the diameter of a circle and its circumference. Its richness only becomes irrational when shoehorned into ill-fitting numbers (think of Cinderella’s slipper), shattering its beauty and burying its meaning. Numbers aren’t pi’s native language. We shouldn’t be surprised that its essence gets lost in translation.

I was struck by this recently when faced with the horror of grading—slotting students into numbers. My fellow instructors at the University of Washington, where I teach in the honors program, feel the same dread. More irrational even than pi, assessing people amounts to quantifying a relationship between unknown, usually unknowable things. Every measurement, the mathematician Paul Lockhart reminds us in his book Measurement, is a comparison: “We are comparing the thing we are measuring to the thing we are measuring it with.”

What thing do we use to measure undergraduates? What aspects can be compared? Quality or quantity? Originality or effort? Participation or progress? Apples and oranges at best. Closer to bananas and elephants. Even quantitative tests mark, at most, a comparison between what the test-maker thought the student should know and the effectiveness of instruction. Grades become the permanent records of these passing encounters.

And how do we grade the grader? When a physicist friend found out that a first-year Harvard student he knew—a math star in high school—got an F in physics, he said: “Harvard should be ashamed of itself.” A Harvard grad himself, he believed that schools fail students far more often than students fail schools. Some STEM profs, I’m told, tell the class at the outset that half of them will fail. I give that teacher an F.

I’m not alone in my discomfort with the irrational business of ranking, rating, and grading. The deans of Yale's and Harvard's law schools recently removed themselves from the rankings of US News & World Report, followed by Harvard Medical School and scores of others. “Rankings cannot meaningfully reflect … educational excellence,” Harvard dean George O. Daley explained. Rankings lead schools to falsify data and make policies designed to raise rankings rather than “nobler objectives.”

The very thing that’s been eating education is now devouring everything else. My doctor recently urged me to get an expensive diagnostic test because it “makes our numbers look good.” Her nurse asked me to rank my pain on a totem pole of emojis. Then after the visit, to rate my experience. The numbers are all irrational. And rather like the never-ending digits of pi, there seems to be no end to them.

We do our best to brush off the annoying swarm of asks that shadow every restaurant meal, plumber visit, plane ride, pleading for points, stars, likes, thumbs-up (or middle fingers), if only because they’re nibbling away at our sanity.

The true cost, however, is more than irritation. Misunderstanding measurement misunderstands understanding itself. The ubiquitous, incessant surveying smothers knowledge with noise, drowns out the information we actually need for finding out how things work, what’s going on, what we’re doing, what actually matters. 

For starters, we should suspect any measurement that doesn’t acknowledge the “what compared to what.” Counting the number of Covid deaths without comparing it to the prevalence of the virus in a population gives us no clue to its fatality, to how many people recover or linger in “long Covid,” or even which variations are “trending.” We can’t know those numbers, since no one’s counting anymore. Denominators have disappeared again.

Or take a simpler case: You can measure the length of a rug by comparing it to marks on a tape measure as long as someone—e.g., the National Institute for Standards and Technology (NIST)—keeps track of what’s a foot (so to speak). One type of foot was deemed obsolete on the stroke of midnight on January 1, 2023. The standard “international foot” is 0.3048 meters, though it’s actually measured in wavelengths of light. Whatever the version, “foot” refers to a known relation—like diameters to circumferences, or space to time. All things considered, it’s solid.

Most measurements, in contrast, are “impossible,” Lockhart writes. “It is only the simplest objects we have any hope of measuring.”

And nothing we measure is simple, for the simple reason that everything is connected to everything else, and any single measure contains plethoras of players, a cosmos of considerations. Consider, for example, the trouble physicists had understanding “motion” before they grasped its complexity. It wasn’t a thing so much as a family of, well, moving parts: velocity, acceleration, momentum, force. 

Like everyone else, I’m constantly assessing my status, how I am measuring up. Against a younger self? Against other people my age? Against some societal expectation? Evaluated by my chronological age? My biological age? In a recent dance class, I measured myself against the other students and graded myself at the bottom. I asked the teacher if I was in over my head: “Oh, you’re better than last time,” she said. A low bar, indeed. Regression to mean tells me that if I was the worst in the class the first week, there’s only one way to go: up! The improvement my teacher saw was simple probability.

But I wonder: If a student starts the term at the top and turns in mediocre work at mid-term, do I mark him down? Do I excessively reward the middling student who later turns in A-plus work? In all probability, probably.

Aging accelerates the urge, and maybe the need, for appraisal. Some people track their waistlines, some count steps, others seem peeled to portfolios. Many compare themselves to others. I think this is pointless, since we know the feeling of “wellness,” like wealth, is relative, and personal. Friends who hang out with rich people feel much poorer than I do.

Science and tech, in contrast, obsess with getting measurement right. NIST exacts precision from everything under the sun (including sunlight): rocket thrust and alcohol level, smoke alarms and earthquake probability, the spiciness of food and the price of peanut butter. It sets standards for fathoms, furlongs, leagues, acres, and, of course, the foot.

An uncertainty “budget” accompanies every measurement, accounting for uncertainties in elastic deformation or instrument geometry or reproducibility. Sometimes authors get creative. Perhaps the best uncertainty statement ever written, according to a source at NIST, was from a Dr. C. H. Meyers, reporting on his measurements of the heat capacity of ammonia: “We think our reported value is good to 1 part in 10 000: we are willing to bet our own money at even odds that it is correct to 2 parts in 10 000. Furthermore, if by any chance our value is shown to be in error by more than 1 part in 1000, we are prepared to eat the apparatus and drink the ammonia.”

The biggest issue NIST faces, as I do with grading, is noise—the thick dusting of extraneous stuff that clouds my view, messes with judgments. It’s not possible to separate students’ performance in class from everything else going on in their lives, including whether they’ve slept or eaten. Hungry judges, unable to separate their stomachs from serious decisions, are less likely to grant parole than judges who’ve just had lunch. How can you ever hope to (to use my favorite math term) deconvolute?

Measurement must, at times, suffice as a substitute for understanding, but measurers must fess up to this fudging up-front—as did the inimitable Richard Feynman. Physicists, he famously groused, should be “utterly ashamed of the way they take energy and measure it in a host of different ways, with different names.” They have rules for calculating quantities, so they know that energy is conserved. And yet: “In physics today, we have no knowledge of what energy is. We do not have a picture that energy comes in little blobs of a definite amount … As far as we know there are no real units, no little ball bearings. It is abstract, purely mathematical that there is a number such that whenever you calculate it it does not change. I could not interpret it any better than that.”

Fifty years later, not much has changed. The reason, according to one present-day physicist, is that energy is fundamentally about relationships—just like pi! “A relation cannot be held in the hand without noticing that it consists of multiple component parts.”

In fact, the very act of precise measurement, at the subatomic level, can destroy the thing you’re trying to measure—as spelled out in Heisenberg’s infamous Uncertainty Principle. It also happens in the so-called real world all the time. A dance instructor might break down a turn, say, into smaller parts. Midway through, you “freeze,” so he can check that everyone’s on the right (or left) foot. The problem is, freezing stops the motion; once it stops, it’s no longer a turn.

I’d wager that most measurements are in some way destructive—though I won’t eat my computer if I’m wrong.

Major publishing houses, according to a recent column in The Wall Street Journal, have become so concentrated on “measurable success” that they pour resources into big bets on same-old same-old, destroying the mid-list, which sits below the A-list but above the backlist. Like the middle class, it was once a majority, the space where most authors spent the bulk of their writerly lives.

Occasionally, a corporation notices that measures aren’t revealing what they need to know, may even have led them astray. The CEO of Mattel blamed a financial slump on creative doldrums he attributed to “a fixation on the numbers. The company was being driven by spreadsheets and checklists … We weren’t really asking ourselves, ‘Are we making good toys?’”

Of course, most things we measure are proxies—concrete substitutes for the elusive stuff we want to know. Analytics software can tell you how many people shared your post. It can’t tell you whether you’ve made an impact, gained trust, changed anyone’s mind.

A colleague explained the perils of evaluating success in public health outreach, especially in the often backwater villages where she works. You can count the number of flyers you hand out, but are they read? Used for toilet paper (also useful)? Did the malaria nets you distributed actually control the disease? In one case my colleague knew of, the nets were (also) used as wedding veils. They were pretty, she said; it made sense. They had to measure something because the donors wanted numbers. Even though everyone knew that no one knew what the numbers really meant.

Employers have taken to using the time remote workers spend messing with computer mice as a proxy for productivity—implying that the longer they work the harder they work, and the better the result. Timed tests, on the other hand, reward speed—a proxy for smarts. Neither stands to reason, but time is simple to measure, a handy substitute. Critics who initially scoffed at a piece of AI art thought better of it once they were told the humongous number of hours the creator had put in. Do I judge students by the time I think they spent on assignments? Probably.

And what of “wasted” time? What’s the grade for that? My physicist friend talked about a hike he took to a gorgeous view atop a mountain, along the way stumbling on hidden caves and trickling streams and fairy slippers. He then dragged his friends up with him, they had to see the view at a certain time, no chance to linger; his friends got cranky and, he suspected, would be ill-inclined to take such a trip on their own. As a physics teacher, he too felt pressure to cover the material, get to the end, no matter how much huffing and puffing it takes, no matter the lost opportunities for delight in discoveries only possible by stepping off path. 

We worry that time in class digressing is “wasted.” And yet we know that “wasted time” can be the most productive.

The dumbest thing we measure with time spent is age. Some people want to squeeze out (or in) every possible second, live to be 100 or more. Count me and most “seniors” I know out. “Longevity,” the aging Fran remarks in Margaret Drabble’s new novel The Dark Flood Rises, “has fucked up our pensions, our work-life balance, our health services, our housing, our happiness. It’s fucked up old age itself.”

Years lived seems a silly measure of life. It compares life to nonlife, years lived to years not lived, and what, pray tell, is that? You can count the years between birth and death, but the years before and after spool off to infinity. Just like pi.

A friend pesters me about what “wisdom” I’ve acquired now that I’m officially old. I don’t know what to measure that with. I have tried to let go of comparing myself with anyone else, even with my prior selves—in other words, let go of measuring. I don’t particularly like the signs I see in the mirror. But what do I expect to see? Someone younger? I’d make do with someone more familiar.

I do find comfort, if not exactly wisdom, in my old pal pi. And to be clear: Pi is a Real number, which is a real thing; it’s also transcendental, certainly to me and worlds of others.

Pi tells us: It’s not the numbers, stupid. What matters are relationships. If relationships often seem irrational, so be it. Rationality is overrated.

Please do rate your experience reading this. On a scale of 1 to pi. Have fun with that.