Stay informed with free updates

OpenAI says it has found evidence that Chinese artificial intelligence start-up DeepSeek used the US company’s proprietary models to train its own open-source competitor, as concerns grow over a potential breach of intellectual property.

The San Francisco-based ChatGPT maker told the Financial Times it had seen some evidence of “distillation”, which it suspects to be from DeepSeek.

The technique is used by developers to obtain better performance on smaller models by using outputs from larger, more capable ones, allowing them to achieve similar results on specific tasks at a much lower cost.

Distillation is a common practice in the industry but the concern was that DeepSeek may be doing it to build its own rival model, which is a breach of OpenAI’s terms of service.

“The issue is when you [take it out of the platform and] are doing it to create your own model for your own purposes,” said one person close to OpenAI.

OpenAI declined to comment further or provide details of its evidence. Its terms of service state users cannot “copy” any of its services or “use output to develop models that compete with OpenAI”.

DeepSeek’s release of its R1 reasoning model has surprised markets, as well as investors and technology companies in Silicon Valley. Its built-on-a-shoestring models have attained high rankings and comparable results to leading US models.

Shares in Nvidia fell 17 per cent on Monday, wiping $589bn off its market value, on fears that big investments in its expensive AI hardware might not be needed. They recovered by 9 per cent on Tuesday, along with other tech stocks.

OpenAI and its partner Microsoft investigated accounts believed to be DeepSeek’s last year that were using OpenAI’s application programming interface, or API, and blocked their access on suspicion of distillation that violated the terms of service, another person with direct knowledge added. These investigations were first reported by Bloomberg.

Microsoft declined to comment and OpenAI did not immediately respond to a request for comment on this detail. DeepSeek did not respond to a request for comment. China is shut for the lunar new year holiday.

Earlier, President Donald Trump’s AI and crypto tsar David Sacks said “it is possible” that IP theft had occurred.

“There’s a technique in AI called distillation . . . when one model learns from another model [and] kind of sucks the knowledge out of the parent model,” Sacks told Fox News on Tuesday.

“And there’s substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI models, and I don’t think OpenAI is very happy about this,” Sacks added, although he did not provide evidence.

DeepSeek said it used just 2,048 Nvidia H800 graphics cards and spent $5.6mn to train its V3 model with 671bn parameters, a fraction of what OpenAI and Google spent to train comparably sized models. Some experts said the model generated responses that indicated it had been trained on outputs from OpenAI’s GPT-4, which would violate its terms of service. 

Industry insiders say that it is common practice for AI labs in China and the US to use outputs from companies such as OpenAI, which have invested in hiring people to teach their models how to produce responses that sound more human. This is expensive and labour-intensive, and smaller players often piggyback off this work, say the insiders.

“It is a very common practice for start-ups and academics to use outputs from human-aligned commercial LLMs, like ChatGPT, to train another model,” said Ritwik Gupta, a PhD candidate in AI at the University of California, Berkeley.

“That means you get this human feedback step for free. It is not surprising to me that DeepSeek supposedly would be doing the same. If they were, stopping this practice precisely may be difficult,” he added.

The practice highlights the difficulty for companies keen to protect their technical edge. “We know [China]-based companies — and others — are constantly trying to distil the models of leading US AI companies,” OpenAI said in its latest statement.

It added: “We engage in countermeasures to protect our IP, including a careful process for which frontier capabilities to include in released models, and believe . . . it is critically important that we are working closely with the US government to best protect the most capable models from efforts by adversaries and competitors to take US technology.”

OpenAI is battling allegations of its own copyright infringement from newspapers and content creators, including lawsuits from The New York Times and prominent authors, who accuse the company of training its models on their articles and books without permission.


LEAVE A REPLY

Please enter your comment!
Please enter your name here