Large language models (LLMs) have great potential for many language-based tasks but can also produce harmful or incorrect content. Traditionally, human testers have used red-teaming, which involves creating prompts that elicit unwanted model responses to identify and fix these problems. This process is expensive and time-consuming, and while recent attempts to automate it with reinforcement learning have shown promise, they often miss many potential prompts, limiting their effectiveness. Our research introduces curiosity-driven red-teaming (CRT), which uses curiosity-driven exploration to create a broader range of test cases. CRT generates new and unique prompts, often exceeding the effectiveness of current methods, and can even identify toxic prompts in advanced models. However, CRT faces a challenge with novelty rewards that require careful tuning. To address this, we propose Extrinsic-Intrinsic Policy Optimization (EIPO), a reinforcement learning approach that automatically adjusts intrinsic reward importance. EIPO suppresses unnecessary exploration and enhances it when needed, ensuring effective exploration without manual tuning and leading to consistent performance gains across tasks. By integrating EIPO, our CRT method improves automated red-teaming, offering a more robust way to test LLMs and highlighting the need for curiosity-driven exploration to enhance LLM safety.