The editor of Downcodes reported: Baichuan Intelligence teamed up with Tianjin University to develop an agent framework called Sibyl System, which won first place in the GAIA Leader Board evaluation jointly launched by Meta, Huggingface and AutoGPT. The GAIA evaluation focuses on evaluating the Agent's execution capabilities and solution design in complex tasks. It tests questions that are closer to real-world application scenarios and poses extremely high challenges to AI models. This achievement marks a major breakthrough in China's AI technology in the field of complex task processing.
Baichuan Intelligence cooperated with Tianjin University to launch the Sibyl System intelligent agent framework and achieved first place on the GAIA Leader Board. GAIA is a new evaluation scheme proposed by Meta, Huggingface and AutoGPT in November 2023. It mainly evaluates the Agent's capabilities and solutions in executing complex tasks. This evaluation plan reveals the capability deficiencies of existing models and provides improvement directions for model and Agent development.
GAIA's test questions are closer to the real world and require AI to have reasoning, multi-modal understanding (text, pictures, audio/video), web browsing and tool usage capabilities. These questions are not difficult for humans to understand, but are extremely challenging for models. For example, GPT-4's success rate in testing was only 15%, while human experimenters could achieve 92%. Completing these problems often requires long logical links and time, involving multiple steps and tools.
Design features of the Sibyl System framework include:
Human-like browser interface replacement search enhancement generation.
Question and answer replaces dialogue, using stateless question and answer functions to simplify the system architecture.
Use only two common tools, a web browser and a Python environment, to reduce dependence on specialized tools.
From System1 to System2, a "jury" mechanism is introduced to conduct self-criticism and correction through multi-agent debate, and use information in the global workspace to improve the accuracy of responses.
Sibyl System is a simple but powerful Agent framework based on large language models that can solve complex reasoning problems by using a small number of tools. It reduces system complexity by introducing Global Workspace and Multi-Agent mechanisms, as well as browser-based universal information acquisition channels, while expanding the complexity of problem solving and realizing the transformation of the model from "fast thinking" to "slow thinking" change. Sibyl System also has good scalability and easy debugging. It can easily replace the Agent modules of other models and improve the capabilities of the model.
Technical report: https://arxiv.org/pdf/2407.10718
The success of the Sibyl System framework not only demonstrates the strong strength of Baichuan Intelligence and Tianjin University in the field of artificial intelligence, but also provides valuable experience and reference for the design and development of future intelligent agent frameworks. I believe that in the near future, we will see more innovative applications based on the Sibyl System framework, promoting the development of artificial intelligence technology to a deeper level.