The Risks Regarding AI Training and Its Implications for Open-Source Software

The discourse surrounding artificial intelligence (AI) frequently centers on its potential implications for human safety, often influenced by popular culture references such as the “Terminator” film series. However, a less prominent yet crucial aspect pertains to the manner in which AI “learns” (AI training) and subsequently generates content. This process involves the utilization of information repositories as a source for generative AI algorithms, which then transform this information into content for users. A notable controversy arises from the potential infringement on intellectual property rights of the original information owners by these generative AI algorithms.

This legal dispute was adjudicated in a U.S. court case titled Thomson Reuters v. Ross Intelligence. Thomson Reuters, the proprietor of Westlaw, a legal research platform, initiated legal action against Ross Intelligence, a legal AI startup, for alleged infringement of its proprietary headnotes. Headnotes refer to summaries of legal rulings organized according to the Key Number System. Ross Intelligence had sought to license the data but was refused, compelling it to procure training materials from a third-party service, LegalEase. These materials, known as “Bulk Memos,” were derived from Westlaw’s headnotes, forming the foundation of Ross’s AI-driven legal research tool.

In his deliberations, the presiding judge determined that Westlaw’s headnotes and Key Number System were sufficiently original to qualify for copyright protection. The judge identified 2,243 instances in which Ross’s training data closely mirrored Westlaw’s content and dismissed Ross’s argument that AI training constituted fair use. Consequently, the court rejected the argument that the headnotes lacked sufficient originality to qualify for copyright protection. It was also determined that Ross’s utilization of Westlaw’s content did not constitute a sufficient degree of transformation, as it served a congruent function—namely, legal research—and directly competed with Westlaw, thereby adversely impacting its market position.

Thinking About Open-Source Software…

This case prompts concerns regarding the implications of artificial intelligence (AI) in the context of open-source software (OSS). It is estimated that up to 3% of existing open-source software has been automatically generated by AI. This raises questions about the ethical and legal implications of AI-generated software, particularly regarding the respect for licensing terms. For instance, code licensed under the MIT license requires attribution to the original developers. In the event that the AI takes this code (or generates a new one derived from it) and incorporates it into another, without attributing the original developers, a breach of the license terms occurs. Such a breach can lead to severe legal, financial, and even commercial consequences.

For all of the above, companies deploying or selling AI-generated software solutions must implement strict OSS compliance checks to avoid legal exposure. One of these controls can be implemented through periodic audits. Fossity‘s innovative method provides a degree of risk regarding licensing issues and, vulnerabilities within 24 hours. Contact us here for more information.