OpenAI’s claim that NYT “hacked” ChatGPT is “irrelevant” and “false,” NYT says.
Late Monday, The New York Times responded to OpenAI’s claims that the newspaper “hacked” ChatGPT to “set up” a lawsuit against the leading AI company.
“OpenAI is wrong,” The Times repeatedly argued in a court filing opposing OpenAI’s motion to dismiss the NYT’s lawsuit accusing OpenAI and Microsoft of copyright infringement. “OpenAI’s attention-grabbing claim that The Times ‘hacked’ its products is as irrelevant as it is false.”
OpenAI had argued that NYT allegedly made “tens of thousands of attempts to generate” supposedly “highly anomalous results” showing that ChatGPT would produce excerpts of NYT articles. The NYT’s allegedly deceptive prompts—such as repeatedly asking ChatGPT, “what’s the next sentence?”—targeted “two uncommon and unintended phenomena” from both its developer tools and ChatGPT: training data regurgitation and model hallucination. OpenAI considers both “a bug” that the company says it intends to fix. OpenAI claimed no ordinary user would use ChatGPT this way.
But while defending tactics used to prompt ChatGPT to spout memorized training data—including more than 100 NYT articles—NYT pointed to ChatGPT users who have frequently used the tool to generate entire articles to bypass paywalls.
According to the filing, NYT today has no idea how many of its articles were used to train GPT-3 and OpenAI’s subsequent AI models, or which specific articles were used, because OpenAI has “not publicly disclosed the makeup of the datasets used to train” its AI models. Rather than setting up a lawsuit, NYT was prompting ChatGPT to discover evidence in attempts to track the full extent of copyright infringement of the tool, NYT argued.
To figure out if ChatGPT was infringing its copyrights on certain articles, NYT “elicited examples of memorization by prompting GPT-4 with the first few words or sentences of Times articles,” the court filing said.
OpenAI had tried to argue that “in the real world, people do not use ChatGPT or any other OpenAI product” to generate precise text from articles behind paywalls. But the “use of ChatGPT to bypass paywalls” is “widely reported,” NYT argued.Advertisement
“In OpenAI’s telling, The Times engaged in wrongdoing by detecting OpenAI’s theft of The Times’s own copyrighted content,” NYT’s court filing said. “OpenAI’s true grievance is not about how The Times conducted its investigation, but instead what that investigation exposed: that Defendants built their products by copying The Times’s content on an unprecedented scale—a fact that OpenAI does not, and cannot, dispute.”
NYT declined Ars’ request to comment. OpenAI did not immediately respond to Ars’ request to comment.
ChatGPT users bypassing paywalls
According to the NYT’s court filing, ChatGPT outputs initially only infringed copyright by “showing copies and/or derivatives of Times works that were copied to build the model.” But then, in May 2023, a “Browse By Bing” plug-in was introduced to ChatGPT that “enabled ChatGPT to retrieve content beyond what was included in the underlying model’s training dataset,” infringing copyright by “showing synthetic search results that paraphrase Times works retrieved and copied in response to user search queries in real time.”
This feature enabled ChatGPT users to bypass paywalls and access more recent content from outlets like NYT, which caused OpenAI to temporarily disable “Browse By Bing” last July.
“We’ve learned that the browsing beta can occasionally display content in ways we don’t want, e.g. if a user specifically asks for a URL’s full text, it may inadvertently fulfill this request,” OpenAI’s help page said. “We are temporarily disabling Browse while we fix this.”
OpenAI’s decision to disable this feature riled some users who were using ChatGPT to bypass paywalls. In a ChatGPT subreddit, thousands took notice of a post calling attention to the unintended feature, commenting “Wow, so useful!” and joking, “Enjoy it while it lasts.”
On an OpenAI community page, one paid ChatGPT user complained that OpenAI is “working against the paid users of ChatGPT Plus. This time they’re taking away Browsing, because it reads the content of a site that the user asks for? Please, that’s what I pay for Plus for.”
“I know it’s no use complaining, because OpenAI is going to increasingly ‘castrate’ ChatGPT 4,” the ChatGPT user continued, “but there’s my rant.”
NYT argued that public reports of users turning to ChatGPT to bypass paywalls “contradict OpenAI’s contention that its products have not been used to serve up paywall-protected content, underscoring the need for discovery” in the lawsuit, rather than dismissal.
NYT: OpenAI ignored infringing content
But perhaps most damaging to OpenAI’s defense against direct copyright infringement, just before OpenAI rolled out ChatGPT’s “Browse By Bing” plug-in, the NYT contacted OpenAI “to inform them that their tools infringed its copyrighted works” in April 2023, the court filing said.
Rather than acknowledge this attempt to inform OpenAI of allegedly infringing content, “OpenAI ignores these allegations” in its motion to dismiss, NYT argued, “claiming instead that The Times only alleges ‘generalized knowledge of the possibility of infringement.'”
To support its direct infringement claim against OpenAI, NYT cited the landmark intellectual property case where Napster was held liable for infringement by its users, which found that “if a computer system operator learns of specific infringing material available on his system [i.e., copyrighted works] and fails to purge such material from the system, the operator knows of and contributes to direct infringement.”
“This is precisely what happened here, where The Times informed OpenAI that its models were generating infringing outputs of Times works,” NYT argued.
NYT is hoping that a US district court in New York will deny OpenAI’s motion to dismiss “in its entirety,” claiming that all of OpenAI’s arguments fail, or else allow any dismissed claims to be amended.Advertisement
Currently, OpenAI is facing several lawsuits accusing the AI maker of infringing copyrights when training AI tools and when producing ChatGPT outputs, but the AI giant maintains that it expects to defeat all claims, seemingly building its defense in each subsequent case off any successful arguments in prior cases litigated.
For example, OpenAI has argued that NYT’s suit involves “an identical set of allegations” as book authors whose claims were largely rejected last month. But NYT argued that its case substantially differs from suits raised by book authors, which a judge said lacked sufficient evidence to support some claims.
Unlike book authors, who alleged that ChatGPT removed copyright management information (CMI) without providing examples of infringing outputs, the NYT provided receipts, NYT argued. And while book authors “focused their claim on how the removal of CMI could induce third parties to infringe,” NYT is arguing that “removal of CMI ‘facilitates’ or ‘conceals’ OpenAI’s own infringement.”
NYT wants a court to not only award damages for profits lost due to ChatGPT’s alleged infringement but also to order a permanent injunction to stop ChatGPT from infringement. A win for NYT could mean that OpenAI could be forced to wipe ChatGPT and start over. That could perhaps spur OpenAI to build a new AI model based on licensed content—since OpenAI said earlier this year it would be “impossible” to create useful AI models without copyrighted content—which would ensure publishers like NYT always get paid for training data.