How ‘radioactive data’ could help reveal malicious AIs

10 Views


Bit by bit, text generated by artificial intelligence is creeping into the mainstream. This week brought news that the venerable consumer tech site CNET, where I worked from 2012 to 2013, has been using “automation technology” to publish at least 73 explainers on financial topics since November. While the site has refused to answer any questions, it’s hardly the first news organization to explore replacing human labor with robots: the Associated Press has been publishing automated stories since 2014.

This week the New York Times’ Cade Metz profiled Character A.I., a website that lets you interact with chatbots that mimic countless real people and fictional characters. The site launched last summer and for the moment leans heavily on entertainment uses — offering carousels of conversations with anime stars, video game characters, and the My Little Pony universe. But there are hints of more serious business, with bots that will instruct you in new languages, help you with creative writing projects, and teach you history.

All of these projects rely on the suspension of disbelief. When you read an article like “What Is Zelle and How Does It Work?,” the text offers no clear evidence that it was generated using predictive text. (The fine print under the CNET Money byline says only that “this article was assisted by an AI engine and reviewed, fact-checked and edited by our editorial staff”; the editor’s byline appears as well.) And in this case, that probably doesn’t matter: this article was created not out for traditional editorial reasons but because it satisfies a popular Google search; CNET sells ads on the page, which it generated for pennies, and pockets the difference.

What if it did have ulterior motives, though?

Over time, we should expect more consumer websites to feature this kind of “gray” material: good-enough AI writing, lightly reviewed (but not always) by human editors, will take over as much of digital publishing as readers will tolerate. Sometimes the true author will be disclosed; other times it will be hidden.

The quiet spread of AI kudzu vines across CNET is a grim development for journalism, as more of the work once reserved for entry-level writers building their resumes is swiftly automated away. The content, though, is essentially benign: it answers reader questions accurately and efficiently, with no ulterior motives beyond serving a few affiliate links.

What if it did have ulterior motives, though? That’s the question at the heart of a fascinating new paper I read this week, which offers a comprehensive analysis of how AI-generated text can and almost certainly will be used to spread propaganda and other influence operations — and offers some thoughtful ideas on what governments, AI developers, and tech platforms might do about it.

The paper is “Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations,” and it was written as a collaboration between Georgetown University’s Center for Security and Emerging Technology, and Stanford Internet Observatory, and AI.

The emergence of global-scale social networks during the past decade offered state actors a rich new canvas on which they could attempt to shape public opinion. Most famously, Russia used its troll army to create thousands of fake Americans on Facebook, Instagram, and other platforms and pit them against real ones in an effort to swing the 2016 election to Donald Trump. There’s a real debate about how effective that campaign — and influence operations in general — really are. But for years now, I’ve been following the story of how how technology empowers these kinds of attacks.

The paper suggests that AI has the potential to make these attacks much more effective — in part by making them invisible. Here are Josh A. Goldstein, Girish Sastry, Micah Musser, Renée DiResta, Matthew Gentzel, and Katerina Sedova:

The potential of language models to rival human-written content at low cost suggests that these models—like any powerful technology—may provide distinct advantages to propagandists who choose to use them. These advantages could expand access to a greater number of actors, enable new tactics of influence, and make a campaign’s messaging far more tailored and potentially effective. 

Influence operations can take many shapes; most such operations are conducted by governments on their own citizens. Typically they seek to deflect criticism and cast the ruling party in a positive light; they can also advocate for or against policies, or attempt to shift opinion about allies or rivals. Other times, as in the Russia case, influence operations seek to destabilize adversaries.

And even in the best of times, they can be hard to detect. As the authors note, “Identifying these inauthentic accounts often relies on subtle cues: a misused idiom, a repeated grammatical error, or even the use of a backtick (`) where an authentic speaker would use an apostrophe (‘).”

That will likely attract more kinds of adversaries to consider using AI to wage influence operations

In the coming months and years, tools like OpenAI’s ChatGPT are expected to become more widely available, better at their tasks, and cheaper to use. The paper’s authors say that will likely attract more kinds of adversaries to consider using AI to wage influence operations, starting with state-level actors but soon trickling down to wealthy people and eventually average citizens.

What kind of AI attacks are they on the lookout for? Here are some proposed by the authors:

  • Automated “spear phishing” campaigns, personalized with AI, designed to get you to reveal confidential information.
  • Deepfakes for attacking your reputation.
  • Deploying bots to social networks to make personalized threads and apply social pressure.
  • Using AI to generate false and misleading claims, and refining the AI to become more effective over time based on which falsehoods generate most engagement on social platforms.

The worst case scenario might look something like what the researcher Aviv Ovadya has called the infocalypse: an internet where ubiquitous synthetic media reduces societal trust to near zero, as no one is ever sure who created what they are looking at or why.

So what to do about it? There are four places that various parties can intervene, the authors say.

We can regulate how AI models are designed, and who has access to the hardware necessary to build them. We can regulate who gets access to the models. Platforms can develop tools to identify AI influence operations and stop their spread. And industry, civil society, and journalists can promote media literacy and build counter-AI tools that attempt to identify AI text.

All of these solutions come with important tradeoffs, which the authors detail. The paper runs to 71 pages and is worth reading in full for anyone with an interest in AI or platform integrity.

One because it’s so straightforward and necessary; the other because it kind of blew my mind

But I want to highlight two proposed solutions to the next generation of influence operations: one because it’s so straightforward and necessary; the other because it kind of blew my mind.

Start with the straightforward. There’s one respect in which 2022 influence operations look identical to 2016: the bad guys still need platforms to spread their message. And platforms have become much more sophisticated since then at identifying and removing influence networks.

AI introduces some difficult new policy questions for social platforms, though. They’re unlikely to ban posting text generated by AI, since there are so many valid and creative uses for it. But there are trickier questions, too, the paper’s authors note: “Should posts determined to have been authored by an AI be flagged? If platforms know that certain external sites host AI-generated content—especially content of a political nature—without disclosing it as such, might that be in itself sufficient grounds to block links to those sites?”

If I worked on platform integrity, I’d start a working group to begin talking about these questions now.

But as the authors note, it’s often difficult or even impossible to identify text that has been generated by an AI. (This seems to be especially true with shorter texts.) And so the authors suggest that platforms like Facebook collaborate with AI developers like OpenAI. Imagine that OpenAI stored every output from ChatGPT for some period of time, and allowed Facebook to flag suspected inauthentic content and check it against OpenAI’s database.

“This type of collaboration could have follow-on benefits,” the authors write. “Once an AI company ascertains that a user is reposting outputs to social media, they can work with platforms to determine if other content generated by that user has been reposted to other social media platforms, potentially catching other coordinated inauthentic accounts that the platforms may initially have missed.”

That seems smart to me — and I hope conversations like this are already taking place.

But we’re going to need more tools to understand where text has been generated, the authors write. And that brings us to my favorite term of the year so far: “radioactive data.”

In the subfield of computer vision, researchers at Meta have demonstrated that images produced by AI models can be identified as AI-generated if they are trained on “radioactive data”—that is, images that have been imperceptibly altered to slightly distort the training process. This detection is possible even when as little as 1% of a model’s training data is radioactive and even when the visual outputs of the model look virtually identical to normal images. It may be possible to build language models that produce more detectable outputs by similarly training them on radioactive data; however, this possibility has not been extensively explored, and the approach may ultimately not work.

No one is sure exactly how (or if) this would work; it’s much easier to alter an image imperceptibly than it is text. But the basic idea would be to “require proliferators to engage in secretive posting of large amounts of content online,” they write, in hopes that models trained on it would produce text that could be traced back to those “radioactive” posts.

This nuke-the-web plan “raises strong ethical concerns”

If by now you’re thinking “that’s bonkers,” you’re not alone. Among other things, the authors note, this nuke-the-web plan “raises strong ethical concerns regarding the authority of any government or company to deliberately reshape the internet so drastically.” And even if someone did go to those lengths, they write, “it is unclear whether this retraining would result in more detectable outputs, and thus detectable influence operations.”

Still, the time to be having these conversations is now. Even if you think the ultimate threat posed to society by disinformation was overstated over the past half-decade or so, there’s no guarantee that the AI-powered version of it won’t pose a real threat. Hopefully we won’t have to make the web “radioactive” to save it. But as this paper makes clear, some heavy-handed measures might very well prove necessary.

TAGS

COMMENTS

Wordpress (0)
error: Content is protected!!