Anthropic has announced an ambitious plan to fund the development of new AI model benchmarks. The program aims to improve the assessment of AI model performance and impact, paying particular attention to important aspects such as AI safety and social impact. The move is intended to address shortcomings in existing benchmarks, such as their inability to effectively capture real-world application scenarios and their failure to accurately measure what they claim to measure. Anthropic calls for the development of more challenging benchmark tests to evaluate the capabilities of AI models in cyber attacks, weapon enhancements, information manipulation, etc., and to explore the potential of AI in scientific research, multi-language communication and other fields. This will help provide a more comprehensive understanding of the capabilities and risks of AI and promote the development of the AI security field.
Anthropic will provide financial support to third-party organizations to encourage them to develop more effective assessment methods. This program reflects Anthropic’s commitment to improving the overall AI safety field and promoting comprehensive AI assessment as the industry standard. However, given Anthropic's own business interests, the objectivity and fairness of its plan still need to be further considered. In the future, the success of the program will depend on the investment of funds and people, as well as the degree of cooperation with other relevant organizations.