Recently, the non-profit organization Epoch AI caused controversy because its AI mathematics benchmark FrontierMath project received funding from OpenAI but failed to disclose it in a timely manner. This incident involves issues such as information disclosure transparency, conflicts of interest, and the objectivity of benchmarking, and has aroused widespread concern among academics and the public. This article will analyze the ins and outs of this incident in detail and explore its potential impact.
Recently, Epoch AI, a non-profit organization that develops AI mathematical benchmarks, has caused controversy for failing to promptly disclose the funding it received from OpenAI. The organization announced on December 20 that OpenAI funded a project called FrontierMath, a benchmark designed to test the mathematical capabilities of AI. OpenAI also used the benchmark to showcase its upcoming flagship AI product o3. A contractor for Epoch AI who uses the nickname "Meemi" on the forum LessWrong said that many contributors to the FrontierMath project were unaware of OpenAI's funding until it was made public. He mentioned: "There is a lack of transparency in communication on this point. In my opinion, Epoch AI should disclose the source of funding for OpenAI in advance, and contractors should also know that their work may be used for capability assessment before they decide whether to participate in the benchmark." development.” On social media, some users expressed concern that such secrecy could damage FrontierMath’s reputation as an objective benchmark. In addition to funding FrontierMath, OpenAI also has visibility into many problems and solutions in the benchmark, which Epoch AI did not disclose before December 20. Carina Hong, a doctoral student in mathematics at Stanford University, pointed out on social platforms that OpenAI received priority access to FrontierMath due to its cooperation with Epoch AI, which made some contributors unhappy. "Six mathematicians who have made significant contributions to the FrontierMath benchmark confirmed that they did not know that OpenAI would monopolize this benchmark and that others would not be able to access it." Hong said that after learning this, most contributors said that if they had known about it earlier, With this arrangement, they may not participate in the project. Tamay Besiroglu, deputy director of Epoch AI, said that although the organization's transparency has been insufficient, he believes that the integrity of FrontierMath has not been affected. He admitted that Epoch AI had made communication errors and failed to inform contributors of OpenAI’s participation in advance. Besiroglu said that while OpenAI has access to FrontierMath, there is a "verbal agreement" between the two parties that OpenAI will not use the benchmark's problem set to train its AI. Epoch AI also maintains a "separate holdout" to ensure independent verification of FrontierMath benchmark results. Epoch AI chief mathematician Ellot Glazer mentioned on Reddit that Epoch AI has not independently verified OpenAI’s FrontierMath o3 results. He believes OpenAI's score is credible but cannot confirm it until an independent review is completed.
The Epoch AI incident exposed the importance of transparency and information disclosure in scientific research cooperation. In the future, similar AI benchmark projects will need clearer funding agreements and more effective communication mechanisms to ensure their objectivity and impartiality and maintain the trust and reputation of the academic community. Only in this way can the healthy development of the AI field be promoted.