Dark Side of the Moon Kimi multi-modal image understanding model API released - AI article

Author：Eve Cole Update Time：2025-01-28 16:32:01

Beijing Dark Side of the Moon Technology Co., Ltd. has released a new multi-modal image understanding model moonshot-v1-vision-preview. This model is an extension of the moonshot-v1 series and significantly improves Kimi's ability to understand image information. The Vision model has powerful image recognition capabilities and can accurately distinguish subtle differences, such as easily distinguishing very similar pictures of blueberry muffins and Chihuahuas. In addition, it also performs well in OCR text recognition, and can accurately recognize various documents including scrawled handwriting, such as receipts and express delivery orders. This model supports a variety of features, such as multi-round dialogue, streaming output, etc., providing users with a more convenient and efficient experience.

On January 15, 2025, Beijing Dark Side of the Moon Technology Co., Ltd. announced the official release of the new multi-modal image understanding model moonshot-v1-vision-preview. This model improves the multi-modal capabilities of the moonshot-v1 model series and helps Kimi Understand the world better.

The Vision model has powerful image recognition capabilities and can accurately identify complex details and nuances in images, whether it is food or animals, and can distinguish similar but not identical objects. For example, faced with 16 similar pictures of blueberry muffins and Chihuahuas that are difficult for the human eye to distinguish, the Vision model can accurately distinguish and identify them.

The Vision model also has the country's leading advanced image recognition capabilities and performs well in OCR text recognition and image understanding scenarios. It is more accurate than ordinary document scanning and OCR recognition software, and can recognize scrawled handwritten content such as receipts and express delivery orders.

微信截图_20250115135433.png

The Vision vision model supports multiple rounds of dialogue, streaming output, tool calling, JSON Mode, Partial Mode and other features, but it does not currently support online search. It does not support the creation of Context Cache with image content, but it supports the use of successfully created Cache calls. The Vision model does not support images in URL format and currently only supports base64-encoded image content.

Model billing

Model billing unit price moonshot-v1-8k-vision-preview1M tokens¥12.00moonshot-v1-32k-vision-preview1M tokens¥24.00moonshot-v1-128k-vision-preview1M tokens¥60.00

The release of the moonshot-v1-vision-preview model marks a new breakthrough made by Beijing Dark Side of the Moon Technology Co., Ltd. in the field of multi-modal artificial intelligence and provides a new direction for the development of image understanding technology. Its powerful performance and rich functions give it broad application prospects in many application scenarios, and it is worth looking forward to its future development and application.