How do you get an AI model to confess what’s inside?
Credit: Aurich Lawson | Getty Images
Since ChatGPT became an instant hit roughly two years ago, tech companies around the world have rushed to release AI products while the public is still in awe of AI’s seemingly radical potential to enhance their daily lives.
But at the same time, governments globally have warned it can be hard to predict how rapidly popularizing AI can harm society. Novel uses could suddenly debut and displace workers, fuel disinformation, stifle competition, or threaten national security—and those are just some of the obvious potential harms.
While governments scramble to establish systems to detect harmful applications—ideally before AI models are deployed—some of the earliest lawsuits over ChatGPT show just how hard it is for the public to crack open an AI model and find evidence of harms once a model is released into the wild. That task is seemingly only made harder by an increasingly thirsty AI industry intent on shielding models from competitors to maximize profits from emerging capabilities.
The less the public knows, the seemingly harder and more expensive it is to hold companies accountable for irresponsible AI releases. This fall, ChatGPT-maker OpenAI was even accused of trying to profit off discovery by seeking to charge litigants retail prices to inspect AI models alleged as causing harms.
In a lawsuit raised by The New York Times over copyright concerns, OpenAI suggested the same model inspection protocol used in a similar lawsuit raised by book authors.
Under that protocol, the NYT could hire an expert to review highly confidential OpenAI technical materials “on a secure computer in a secured room without Internet access or network access to other computers at a secure location” of OpenAI’s choosing. In this closed-off arena, the expert would have limited time and limited queries to try to get the AI model to confess what’s inside.
Ars Video
How Lighting Design In The Callisto Protocol Elevates The Horror
The NYT seemingly had few concerns about the actual inspection process but bucked at OpenAI’s intended protocol capping the number of queries their expert could make through an application programming interface to $15,000 worth of retail credits. Once litigants hit that cap, OpenAI suggested that the parties split the costs of remaining queries, charging the NYT and co-plaintiffs half-retail prices to finish the rest of their discovery.
In September, the NYT told the court that the parties had reached an “impasse” over this protocol, alleging that “OpenAI seeks to hide its infringement by professing an undue—yet unquantified—’expense.'” According to the NYT, plaintiffs would need $800,000 worth of retail credits to seek the evidence they need to prove their case, but there’s allegedly no way it would actually cost OpenAI that much.
“OpenAI has refused to state what its actual costs would be, and instead improperly focuses on what it charges its customers for retail services as part of its (for profit) business,” the NYT claimed in a court filing.
In its defense, OpenAI has said that setting the initial cap is necessary to reduce the burden on OpenAI and prevent a NYT fishing expedition. The ChatGPT maker alleged that plaintiffs “are requesting hundreds of thousands of dollars of credits to run an arbitrary and unsubstantiated—and likely unnecessary—number of searches on OpenAI’s models, all at OpenAI’s expense.”
How this court debate resolves could have implications for future cases where the public seeks to inspect models causing alleged harms. It seems likely that if a court agrees OpenAI can charge retail prices for model inspection, it could potentially deter lawsuits from any plaintiffs who can’t afford to pay an AI expert or commercial prices for model inspection.
Lucas Hansen, co-founder of CivAI—a company that seeks to enhance public awareness of what AI can actually do—told Ars that probably a lot of inspection can be done on public models. But often, public models are fine-tuned, perhaps censoring certain queries and making it harder to find information that a model was trained on—which is the goal of NYT’s suit. By gaining API access to original models instead, litigants could have an easier time finding evidence to prove alleged harms.
It’s unclear exactly what it costs OpenAI to provide that level of access. Hansen told Ars that costs of training and experimenting with models “dwarfs” the cost of running models to provide full capability solutions. Developers have noted in forums that costs of API queries quickly add up, with one claiming OpenAI’s pricing is “killing the motivation to work with the APIs.”
The NYT’s lawyers and OpenAI declined to comment on the ongoing litigation.
US hurdles for AI safety testing
Of course, OpenAI is not the only AI company facing lawsuits over popular products. Artists have sued makers of image generators for allegedly threatening their livelihoods, and several chatbots have been accused of defamation. Other emerging harms include very visible examples—like explicit AI deepfakes, harming everyone from celebrities like Taylor Swift to middle schoolers—as well as underreported harms, like allegedly biased HR software.
A recent Gallup survey suggests that Americans are more trusting of AI than ever but still twice as likely to believe AI does “more harm than good” than that the benefits outweigh the harms. Hansen’s CivAI creates demos and interactive software for education campaigns helping the public to understand firsthand the real dangers of AI. He told Ars that while it’s hard for outsiders to trust a study from “some random organization doing really technical work” to expose harms, CivAI provides a controlled way for people to see for themselves how AI systems can be misused.
“It’s easier for people to trust the results, because they can do it themselves,” Hansen told Ars.
Hansen also advises lawmakers grappling with AI risks. In February, CivAI joined the Artificial Intelligence Safety Institute Consortium—a group including Fortune 500 companies, government agencies, nonprofits, and academic research teams that help to advise the US AI Safety Institute (AISI). But so far, Hansen said, CivAI has not been very active in that consortium beyond scheduling a talk to share demos.
The AISI is supposed to protect the US from risky AI models by conducting safety testing to detect harms before models are deployed. Testing should “address risks to human rights, civil rights, and civil liberties, such as those related to privacy, discrimination and bias, freedom of expression, and the safety of individuals and groups,” President Joe Biden said in a national security memo last month, urging that safety testing was critical to support unrivaled AI innovation.
“For the United States to benefit maximally from AI, Americans must know when they can trust systems to perform safely and reliably,” Biden said.
But the AISI’s safety testing is voluntary, and while companies like OpenAI and Anthropic have agreed to the voluntary testing, not every company has. Hansen is worried that AISI is under-resourced and under-budgeted to achieve its broad goals of safeguarding America from untold AI harms.
“The AI Safety Institute predicted that they’ll need about $50 million in funding, and that was before the National Security memo, and it does not seem like they’re going to be getting that at all,” Hansen told Ars.
Biden had $50 million budgeted for AISI in 2025, but Donald Trump has threatened to dismantle Biden’s AI safety plan upon taking office.
The AISI was probably never going to be funded well enough to detect and deter all AI harms, but with its future unclear, even the limited safety testing the US had planned could be stalled at a time when the AI industry continues moving full speed ahead.
That could largely leave the public at the mercy of AI companies’ internal safety testing. As frontier models from big companies will likely remain under society’s microscope, OpenAI has promised to increase investments in safety testing and help establish industry-leading safety standards.
According to OpenAI, that effort includes making models safer over time, less prone to producing harmful outputs, even with jailbreaks. But OpenAI has a lot of work to do in that area, as Hansen told Ars that he has a “standard jailbreak” for OpenAI’s most popular release, ChatGPT, “that almost always works” to produce harmful outputs.
The AISI did not respond to Ars’ request to comment.
NYT “nowhere near done” inspecting OpenAI models
For the public, who often become guinea pigs when AI acts unpredictably, risks remain, as the NYT case suggests that the costs of fighting AI companies could go up while technical hiccups could delay resolutions. Last week, an OpenAI filing showed that NYT’s attempts to inspect pre-training data in a “very, very tightly controlled environment” like the one recommended for model inspection were allegedly continuously disrupted.
“The process has not gone smoothly, and they are running into a variety of obstacles to, and obstructions of, their review,” the court filing describing NYT’s position said. “These severe and repeated technical issues have made it impossible to effectively and efficiently search across OpenAI’s training datasets in order to ascertain the full scope of OpenAI’s infringement. In the first week of the inspection alone, Plaintiffs experienced nearly a dozen disruptions to the inspection environment, which resulted in many hours when News Plaintiffs had no access to the training datasets and no ability to run continuous searches.”
OpenAI was additionally accused of refusing to install software the litigants needed and randomly shutting down ongoing searches. Frustrated after more than 27 days of inspecting data and getting “nowhere near done,” the NYT keeps pushing the court to order OpenAI to provide the data instead. In response, OpenAI said plaintiffs’ concerns were either “resolved” or discussions remained “ongoing,” suggesting there was no need for the court to intervene.
So far, the NYT claims that it has found millions of plaintiffs’ works in the ChatGPT pre-training data but has been unable to confirm the full extent of the alleged infringement due to the technical difficulties. Meanwhile, costs keep accruing in every direction.
“While News Plaintiffs continue to bear the burden and expense of examining the training datasets, their requests with respect to the inspection environment would be significantly reduced if OpenAI admitted that they trained their models on all, or the vast majority, of News Plaintiffs’ copyrighted content,” the court filing said.