Visual large models (LVLMs) have made significant progress in the field of image understanding, but the "illusion phenomenon" has become a bottleneck in its development. To address this problem, the Taotian Group Future Life Laboratory team proposed a new method called "Token Preference Optimization" (TPO), which effectively improves the model's visual impact by introducing a self-calibrated visual anchoring reward mechanism. Information dependence, thereby reducing the probability of hallucinations. The core of TPO is to automate the generation of token-level reward signals, avoiding tedious manual annotation, and assigning rewards to each token that reflect its dependence on visual information, improving model performance.
The biggest innovation of TPO is that it implements automated token-level reward signals. This method can automatically identify visually anchored tokens in preference data, avoiding the tedious manual fine-grained annotation, while assigning each token a reward that reflects its dependence on visual information during the training process. This self-calibrated visually anchored reward signal is designed to optimize the model's dependence on visual information, thereby effectively mitigating the occurrence of hallucinations.
Research shows that models using TPO significantly outperform traditional methods in multiple evaluation benchmarks, especially in more complex tasks, where the answers generated by the model increasingly rely on image information rather than prior knowledge of the language model. This progress not only improves the understanding of the model, but also provides an important theoretical basis for further research.
In addition, the research team also conducted ablation experiments on different parameter settings of TPO and found that optimized noise addition steps and reward distribution strategies can further improve model performance. This discovery undoubtedly points the direction for future research and application of large visual models.
In short, this innovative achievement of Taotian provides a new idea for multi-modal alignment technology and promotes the in-depth application of AI technology in the fields of life and consumption.
Through the application of the TPO method, the "illusion" problem of large visual models is effectively solved, the reliability and accuracy of the model are improved, and a new direction is provided for the future development of large visual models. It also provides a new direction for the application of artificial intelligence in real life. The application has laid a solid foundation and has important theoretical significance and application value. This research result contributes new strength to the development of multi-modal technology.