Researchers develop AI attack method BEAST: bypassing LLM guardrail in one minute

Author：Eve Cole Update Time：2025-01-05 17:48:01

Researchers at the University of Maryland have developed a new technology called BEAST that can induce harmful reactions in large language models (LLM) within one minute, with a success rate of 89%. This research highlights the security vulnerabilities of current LLMs and poses serious challenges to the security of AI models. BEAST technology utilizes relatively common hardware (Nvidia RTX A6000 GPU and 48GB of memory) to attack commercial models, including OpenAI's GPT-4, without accessing the entire content of the model. This shows that even seemingly secure models may be at risk of being exploited for malicious purposes.

Researchers at the University of Maryland in the United States have successfully developed BEAST technology, which can trigger harmful reactions in large language models within one minute, with a success rate as high as 89%. This technology utilizes Nvidia RTX A6000 GPU with 48GB of memory and can attack commercial models, such as OpenAI's GPT-4, without accessing the entire language model. The speed and efficiency of the BEAST attack method demonstrates the vulnerability to the AI model and breaks through the guardrail of LLM.

The emergence of BEAST technology has sounded an alarm in the field of AI security, reminding us that we need to further strengthen security research on large language models, develop more effective defense mechanisms to deal with potential security threats, and ensure the healthy development of artificial intelligence technology. In the future, more powerful defense technology and stricter security standards will become the key to the development of the AI field.