GPT-4o and Sonnet-3.5 failed in the vision test, are VLMs actually "blind"?
The editor of Downcodes will reveal the "truth" of visual language models (VLMs) with you! VLMs, which claim to be able to "understand" images, perform poorly in simple visual tests, with an average accuracy of only 56.20%. Even the be
2024-12-08