The Morrissey had the right melodrama in his limbs, and his voice was strong and pained. I was at Gramercy Theatre in Manhattan to see a Smiths tribute band. I tried to get Morrissey’s acid yodel in my throat, to sing along. I am human and I need to be loved / just like everybody else does. But it didn’t feel right to copy a copy.
Most tribute bands don’t practice outright impersonation, so the way this fake-Smiths singer captured everything about Morrissey was messing with my mind. I’d hoped to be able to savor the music’s maudlin glory without the headache of the flesh-and-blood Morrissey, who seems to have aligned himself with white supremacists. The contempt in Morrissey’s lyrics and politics was presumably not native to Seanissey, as the tribute singer called himself. Seanissey’s performance probably didn’t, as they say, “come from a bad place”—or a misanthropic place, or a far-right place, or even a vegan one.
What place did it come from? I’ve had this no-there-there anxiety with ChatGPT dozens of times. When it uses idioms like “in my life”—when it doesn’t have a life—I go cold. Likewise, to invest into Seanissey, a gentle Manhattanite who happened to sing and dance as Moz did, the passions that were first aroused in me by the Smiths 30 years ago felt like a bad emotional bet.
Maybe AI that aims to seem human is best understood as a tribute act. A tribute to human neediness, caprice, bitterness, love, all the stuff we mortals do best. All that stuff at which machines typically draw a blank. But humans have a dread fear of nonhumans passing as the real thing—replicants, lizard people, robots with skin. An entity that feigns human emotions is arguably a worse object of affection than a cold, computational device that doesn’t emote at all.
When I got home, stuck in an uncanny valley scored with Smiths Muzak, there was an email from Andrew Goff, widely considered the greatest Diplomacy player of all time.
This lifted my spirits. Diplomacy, a 69-year-old American strategy game, is, by many estimates, the most human game ever imagined. Mechanically, it’s simple: Seven players compete to control supply centers on a map, and a player wins by controlling more than half of these centers. But it’s played almost entirely in a series of conversations, often complex and impassioned ones. Agony and ecstasy—Moz-like agony and ecstasy, no less—commonly enter the negotiations. In the live game, players are known to yell, end friendships, throw the game, or simply sit by themselves and sob.
With his various punk haircuts and black plugs in his earlobes, Goff is a Smiths fan, and he even looks a bit like the band’s late bassist, Andy Rourke. To my amazement, Goff once named a Diplomacy board “Girlfriend in a Coma.” Forever crisscrossing the world for tournaments and his corporate job, Goff comes across as more gregarious than most elite players of board games.
Goff is also known for a brilliantly subversive, kill-’em-with-kindness style of gameplay. As Siobhan Nolen, the former president of the North American Diplomacy Federation, put it, “It hurts less to lose against somebody like Andrew.” In Diplomacy, players are sometimes forced to choose which attacks on their territory to repel and which to surrender to. Players often let Goff roll his forces in because they know that he, unlike many others, won’t be a dick about it.
There are excellent Diplomacy players who rage and issue threats, hollow and otherwise: “If you backstab me, I will throw the game.” Goff is not one of these. Even his breakup notes are masterpieces of directness and decency. “Apologies, Turkey! I decided it was in my best interest to work with Russia now. I hope there are no hard feelings.” In his congeniality is also empathy. “I genuinely feel bad for players when they get beaten, even if it is me beating them,” Goff told me. I believed him.
The email was about Cicero, a Diplomacy-playing AI that Goff helped create for Meta AI. Last fall, Cicero managed to best Goff in several games, sometimes partnering with weaker players to bring him down. Noam Brown and Adam Lerer, who were part of the immense team of experts in game theory, natural language processing, and Diplomacy that created the AI, both say that Cicero is the most humanlike AI they’ve ever created. Lerer, who now works at DeepMind, goes further: Cicero may be the most humanlike AI on earth.
Could Cicero even be conscious? “A threshold for determining AI consciousness is whether the program is capable of outwitting humans at Diplomacy,” wrote Irish Diplomacy champion Conor Kostick in The Art of Correspondence in the Game of Diplomacy, in 2015.
Cicero is also something of a Goff tribute band. It plays the same magnanimous game Goff does. In one memorable showdown, Lerer told me, Cicero played Russia and allied with a human who played Austria. Throughout the game, Lerer said, Cicero was “really nice and helpful to Austria, although it maneuvered in its discussions with other players to make sure Austria was weakened and eventually lost. But at the end of the game [the human playing] Austria was overflowing with praise for Cicero, saying they really liked working with it and were happy it was winning.”
In general, grandmasters who lose to AIs take it hard. “I lost my fighting spirit,” Garry Kasparov said in 1997, after losing at chess to Deep Blue. “I am speechless,” said Lee Se-dol in 2016, after losing at Go to AlphaGo. Goff seemed to be the opposite. He was revitalized, he said. “Diplomacy has a reputation for being a game of lies, but at the highest level it is anything but that. Having that affirmed by an AI was a delight.”
This filled me with relief. Maybe AI will just amplify what’s best about humans. Maybe AI will become a buoyant tribute band for our entire species. Maybe AI will be a delight—and a force humans will be content to lose to. We’ll go down in peace. We really liked working with you, robots, and are happy you are winning.
Diplomacy was created in the 1950s by Allan B. Calhamer, a Harvard student who was studying European history with Sidney Bradshaw Fay, an eminent historian. Fay’s 1928 book, The Origins of the World War, suggested a compelling puzzle: Could World War I have been prevented with better diplomacy?
Calhamer’s game is traditionally played over a 1901 map of Europe, Ottoman Turkey, and North Africa. Players get to taste the thrill of 20th-century empire building without all the blood, subjugation, and genocide. They get so much authority over Western civ, in fact, that modern players sometimes cosplay as kaisers and czars.
Though the board resembles Risk, Diplomacy gameplay is more like Survivor. Everyone takes their turn at a kind of tribal council, but the action happens in the negotiations between turns. Another analogue for Diplomacy might be The Bachelor.
Historically, Diplomacy has been known as a game for snakes, and a pastime of figures like JFK, Henry Kissinger, Walter Cronkite, and Sam Bankman-Fried. But Cicero, which plays a non-zero-sum version of the game that incentivizes collaboration, is not snaky. Mike Lewis of the Meta team says Cicero uses dialog only “to establish trust and coordinate actions with other players”—never to troll, destabilize, or vindictively betray. What’s more, as Lewis said on social media, “It’s designed to never intentionally backstab.” Like a canny Bachelor contestant, Cicero can persuade another human to pair up with it.
Cicero integrates a large language model with algorithms that allow it to plan moves by inferring other players’ beliefs and intentions from the way they converse. It then produces normal-sounding dialog to propose and plan mutually beneficial moves. Across 40 blitz games in an anonymous online Diplomacy league, Cicero, according to Meta, achieved more than twice the average score of the human players. Over 72 hours of play that involved sending 5,277 natural language messages, Cicero ranked in the top 10 percent of participants who played more than one game.
When Cicero wins, Goff told me, there is no gloating, “no ‘Haha, you loser’ talk.” Instead, “the talk is much more, ‘Your position isn’t great, but we all have games like that sometimes.’”
Diplomacy is a niche pursuit. It’s nowhere near as venerable a game as chess or Go. And it’s never been seen as a universal intelligence test; instead, it’s a hobby of amateur historians. Since 1976 , the game has been published by Avalon Hill, a label that is to strategy games what Rough Trade Records is to indie rock. Diplomacy is so new that it’s not yet in the public domain, that stately arcade where chess and Go have acquired millions upon millions of adherents who have collectively developed those beautiful games in tandem with our human brains. By contrast, Diplomacy is just getting started. It was dubbed “the board game of the alpha nerds” by Grantland in 2014.
I guess I could call myself a Diplomacy mom. When my son was in middle school, he and his friends played weekend-long Diplomacy games in my apartment. We set up the august map on a dining table hauled into the living room, served soda in brandy snifters, and burned a candle that smelled like pipe tobacco. For the boys’ tête-à-têtes, we arranged folding chairs in the bedrooms. When possible, I eavesdropped on their preadolescent plans for the future of Europe.
To my surprise, these conversations about what seemed like a technical enterprise used overwrought idioms like “I swear on my life” and “I mean it from my heart.” Most curiously—and chronically—the boys used the phrase “I love,” as in, “I love this move.” Cicero talks that way too.
As I kept an eye on the two-day games, each of the seven kids seemed to pass through a spectrum of emotional and physical states. They become, by turns, sleepy, spacey, hungry, paranoid, hubristic, defeatist, craven, and profoundly bored. One or two quit. I loved the rhythm of their emotions as they ebbed and flowed. At least two styles of (mostly ineffective) play are named for the emotionalism that characterizes them: Revenge-Seeking and Carebear.
You don’t teach an AI to play a particular game to improve that game; you teach an AI to play a game to improve artificial intelligence. Diplomacy is a great proving ground for natural language processing. If Meta had simply been trying to build an AI that uses words to make humans do things, it couldn’t have found a better way to train it than with zillions of games of Diplomacy.
Still, the online games Cicero plays are markedly different from the fervent games I used to host. For one, they’re short—around 90 minutes each. Solo victories, which usually require stinging betrayals, are rare. And communication is more utilitarian than passionate.
Human: A little nervous with all your fleets there. Going to put MAO (Mid-Atlantic Ocean fleets) in SPA-SC (Spain’s south coast).
Cicero: I get it, I really do. Don’t worry. I’m going to move Tunis to Ionian. I have no reason to attack you now. I’m going after Turkey. I can totally understand your nervousness.
Human: Thanks—I’ll leave SPA-NC at once after.
Cicero: Awesome! Thanks so much for understanding.
When I first heard about Cicero, I assumed it would bomb at Diplomacy because no one wants to buddy up with a heartless machine that’s trying to sound human. If players knew Cicero was an AI, I reasoned, it wouldn’t stand a chance. Humans would join forces and clobber it just to avenge our species. Kostick, for his part, said he’d be more willing to stab a bot since a bot has no feelings to hurt.
But as I thought about it more deeply, I realized that partnerships form for non-affective reasons too. Someone who excels at strategic reasoning could actually be a solid sidekick. Perhaps a little R2-D2 could win me over as an ally, not with human kindness but by sharing my reading of a situation and presenting me with elegant, data-driven options for how to address it.
When I asked Lerer about my R2-D2 idea, he concurred. “I actually think a human that used Cicero as an assistant to develop tactical and strategic plans, but who could navigate some of the human aspects better than Cicero—such as when it is safe to lie, or how to avoid irritating an ally—would be super strong.”
Cicero definitely says “Awesome!” too much. But it can be especially irritating in that signature AI way: It sometimes hallucinates. It proposes illegal moves. Worse yet, it denies saying something it just said. Faced with these glitches, Cicero’s human opponents would sometimes get mad. But they didn’t guess it was an AI. They thought it was drunk. And perhaps these personality glitches are a small price to pay for the bot’s deep reserves of raw intelligence and foresight.
If Cicero’s aura of “understanding” is, behind the scenes, just another algorithmic operation, sometimes an alignment in perception is all it takes to build a bond. I see, given the way your position often plays out, why you’d be nervous about those fleets. Or, outside of Diplomacy: I understand, since living alone diminishes your mood, why you’d want to have a roommate. When the stock customer service moves—“I can understand why you’re frustrated”—figured into Cicero’s dialog, they had a pleasing effect. No wonder moral philosophies of AI lean heavily on the buzzword alignment. When two minds’ perceptions of a third thing line up, we might call that congruity the cognitive equivalent of love.
All the same, I wasn’t seduced. To me, Cicero sounded like one of those considerate, practical, honest spouses—the kind of uncomplicated partner that die-hard Smiths fans, in it for the passion, sometimes wish they could be satisfied with. But if Cicero’s gameplay was going to be more pragmatic than tender, it was still going to have to use the language of the heart for purposes of persuasion. “Run away with me” is a better pitch than “Let’s save money by filing a joint tax return.”
For Cicero to learn the subtleties of engaging humans emotionally, it couldn’t train by “self-play” alone. It couldn’t be left in a corner, playing Diplomacy against itself, churning through an infinite number of games, assuming perfect rationality in all robot players and generating intellectual capital in the onanistic way a bitcoin miner generates currency. Self-play works well to learn a finite, two-person, zero-sum game like chess. But in a game that involves both competing and cooperating with fickle humans, a self-playing agent runs the risk of converging to “a policy that is incompatible with human norms and expectations,” as a paper about Cicero in Science puts it. It would alienate itself. In this way, too, Cicero is like a human. When it plays only with itself all day every day, it can become too weird to play with others.
When Noam Brown explained to me how he and his team trained Cicero, he emphasized the metagame problem. The metagame of Diplomacy (or jackstraws, Scrabble, bowling, etc.) can be seen as its place in the world. Why play this game? Why here and why now? Is it a test of raw intelligence, social skills, physical prowess, aesthetic refinement, cunning? You might play Wordle, say, because your friends do, or it relaxes you, or it’s rumored to stave off aging. An AI that’s programmed to play Wordle just to win is playing a different metagame.
Brown and the Cicero team needed to be sure that their AI and the human players saw themselves as playing the same game. This is trickier than it sounds. Metagames can change very suddenly, and as Thomas Kuhn wrote of paradigm shifts, they can change for sociological reasons, cultural reasons, aesthetic reasons, or no apparent reason at all. Human reasons, then.
In early seasons of Survivor, Brown told me, participants saw themselves as pursuing social goals they collectively deemed important, while ignoring openings for strategic derring-do that, for later players, became the heart of the game. “It’s not that one game is right or wrong,” Brown said. “But if early-season players of Survivor were to play a modern Survivor game, they’d lose.” (Even a social phenomenon like motherhood might have a metagame. A good mother in one era is a bad one in the next.)
The metagame of Diplomacy has likewise changed. In its first postwar decades, players were keen to try their hand at the kind of grand European diplomacy that their forebears had so catastrophically failed at. These early players made beautiful, idealistic speeches, often invoking pacifism. (Diplomacy, paradoxically, is a war game without bloodshed; the goal is to occupy centers, not blow people up.) But because they also had to execute tactical goals that were at odds with idealistic rhetoric, and because the game was usually played winner-takes-all (“to 18”), they were frequently obliged to lie. Thus: stabbing.
But then, as statecraft in the real world came to favor game theory over traditional diplomacy, the metagame likewise shifted. Online players were no longer calling one another into solaria or billiards rooms to speechify about making the world safe for democracy. Games became shorter. Communication got blunter. Where someone playing Diplomacy by mail in the 1960s might have worked Iago-like angles to turn players against one another, a modern player might just text “CON-BUL?” (For “Constantinople to Bulgaria?”)
This is the current Diplomacy metagame. Game theory calculations undergird most utterances, and even humans communicate in code. Lerer joked that in modern-day online Diplomacy, even human players wouldn’t pass the Turing test. Before Cicero, it seems, humans had already started playing like AIs. Perhaps, for an AI to win at Diplomacy, Diplomacy had to become a less human game.
Kostick, who won a European grand prix Diplomacy event in 2000 and was on the Irish team that took the Diplomacy National World Cup in 2012, misses the old style of gameplay. “The whole purpose of Allan Calhamer’s design of the game,” he told me, “is to create a dynamic where the players all fear a stab and yet must deploy a stab or a lie to be the only person to reach 18.”
Kostick believes that while he “would have been delighted with the practical results of Cicero’s website play,” Meta’s project misses the mark. Cicero’s glitches, Kostick believes, would make it easy to outwit with spam and contradictory inputs. Moreover, in Kostick’s opinion, Cicero doesn’t play real Diplomacy. In the online blitz, low-stab game Cicero does play, the deck is stacked in its favor, because players don’t have to lie, which Cicero does badly. (As Lerer told me, “Cicero didn’t really understand the long-term cost of lying, so we ended up mostly making it not lie.”) Kostick believes Cicero’s metagame is off because it “never knowingly advocates to a human a set of moves that it knows are not in the human’s best interest.” Stabbing, Kostick believes, is integral to the game. “A Diplomacy player who never stabs is like a grandmaster at chess who never checkmates.”
With some trepidation, I mentioned Kostick’s complaint to Goff.
Unsurprisingly, Goff scoffed. He thinks it’s Kostick and his generation who misunderstand the game and give it its unfair reputation for duplicity. “Cicero does stab, just rarely,” Goff said. “I reject outright that [compelling players to stab] was Calhamer’s intent.”
I could tell we were in metagame territory when Goff and Kostick began arguing about the intent of the game’s creator, as if they were a couple of biblical scholars or constitutional originalists. For good measure, Goff bolstered his case by citing an axiom from high-level theory and invoking an elite consensus.
“Regardless of Calhamer’s intent, game theory says, ‘Don’t lie,’” he told me. “This is not controversial among any of the top 20 players in the world.”
For one person or another to claim that their metagame is the “real” one—because the founder wanted it that way, or all the best people agree, or universal academic theory says x or y—is a very human way to try to manage a destabilizing paradigm shift. But, to follow Kuhn, such shifts are actually caused when enough people or players happen to “align” with one vision of reality. Whether you share that vision is contingent on all the vagaries of existence, including your age and temperament and ideology. (Kostick, an anarchist, tends to be suspicious of everything Meta does; Goff, a CFO of a global content company, believes clear, non-duplicitous communications can advance social justice.)
Maybe someday around the Diplomacy board at my place, Kostick, who is 59, and Goff, who is 45, will light up some chocolate cigarettes and align on what to do with Austria or Turkey. As for the present, they weren’t even aligned on chess. “Grandmasters in chess never checkmate,” Goff told me.
This one I resolved on my own. Chess grandmasters have, in various epochs, played all the way through to the checkmate, rather than ending the game when an opponent resigns early to save face. There are still times when a checkmate is so beautiful that both players want to see it come to fruition. But Goff is right. Today, it’s rare to unheard-of for a grand-master to checkmate.
But it’s an aesthetic matter, playing to the checkmate. Just like speechifying and stabbing and being so nice that people don’t mind if you beat them. An absolutist like Morrissey might say that indie rock must always be played one way, or that Britain is, at its heart, this way or that. But it doesn’t matter. Metagames change. Only humans, in all our caprice, grounded in all of our competing and cooperating supply centers, decide which games are worth playing and how to play them—and why.
I couldn’t get over what a pleasant person Goff is. He seemed to like Cicero, even as it had beaten him. Cicero, Goff mused, played “at a very high standard indeed.” And it didn’t just defeat him, he allowed; “a few times it absolutely humiliated me, including one where it guided a beginner player to work together to beat me up.”
So here’s the rare AI story that doesn’t end with an existential reckoning for humankind, I thought. We’re not staring into an abyss. Bots like Cicero are going to understand our wants and needs and align with our distinctive worldviews. We will form buddy-movie partnerships that will let us drink from their massive processing power with a spoonful of sugary natural language. And if forced at the end of the road to decide whether to lose to obnoxious humans or gracious bots, we won’t give it a thought. We’ll change our wills, leave them all we have, and let them roll their upbeat tanks right over our houses.
But had I been played by Goff’s affability, as so many have before me? I wondered one last time if he might, just might, be faking his insouciance about Cicero. Once again he set me straight: “I probably had a winning record against it over the life of the experiment,” he said.
So he’d actually won. That was why he didn’t mind. Then he added, of course graciously, “It was a close-run thing.”
Artwork in collaboration between the illustrator, Sienna O’Rourke, and Midjourney AI.
This article appears in the October 2023 issue. Subscribe now.
Let us know what you think about this article. Submit a letter to the editor at [email protected].