The Amazon Research team released Shopping MMLU, a multi -tasking online shopping benchmark test based on real Amazon data, aiming to comprehensively evaluate the potential of a large -scale language model (LLM) as a general shopping assistant. This benchmark contains 57 tasks, covering four major modules: conceptual understanding, knowledge reasoning, user behavior alignment, and multi -language capabilities. To examine whether AI assistants can understand user needs like a live shopping guide, and provide precise services. Through the test of more than 20 AI models, Shopping MMLU revealed the essence of multi -tasking learning of online shopping, and pointed out the challenges faced by existing AI models in specific fields, such as the difficulty of excessive fitting and less sample learning instructions and minor sample learning. Essence
Machine learning has long penetrated into various online services, and online shopping is one of the most successful areas. In recent years, machine learning has been applied to various online shopping tasks, such as user inquiries, browsing records, review analysis, product attribute extraction, and so on. In order to promote the development of machine learning methods, many benchmark tests came into being, which aims to reduce the threshold for new solutions to the development and evaluation of researchers and engineers to develop and evaluate real online shopping tasks.
However, existing models and benchmarks are usually tailored for specific tasks, and the complexity of online shopping cannot be completely captured. Large language models (LLM) can completely change the online shopping experience by relying on its multi -tasks and small sample learning capabilities to completely change the online shopping experience by reducing the engineering workload of specific tasks and providing interactive dialogue. Despite the huge potential, large language models also face unique challenges in the field of online shopping, such as shopping concepts, hidden knowledge and heterogeneous user behavior in specific fields.
To cope with these challenges, Amazon researchers proposed Shopping MMLU, which is a multi -task online shopping benchmark test based on real Amazon data. Shopping MMLU contains 57 tasks, covering 4 major shopping skills: conceptual understanding, knowledge reasoning, user behavior alignment, and multi -language capabilities, so it can comprehensively evaluate the potential of large -scale language models as general shopping assistants.
This Shopping MMLU is not an ordinary "test". It has extracted 57 tasks from real Amazon shopping data, covering four major modules: conceptual understanding, knowledge reasoning, user behavior alignment, and multi -language capabilities. To put it simply, it is to examine whether the AI assistant can understand your needs and help you find your favorite baby like a live shopping guide.
Amazon's researchers tested more than 20 existing AI models with Shopping MMLU, and found that::
Those well-known AI models, such as Claude-3Sonnet, ChatGPT, are really good, sitting in the first echelon. However, the open source AI model also caught up, which has a great momentum of challenging "authority".
The test results of Shopping MMLU also reveal an interesting phenomenon: online shopping is actually a multi -task learning problem. In other words, AI assistants need to master a variety of skills at the same time to be competent.
What is even more surprising is that those AI models that perform well in the general field are not inferior to online shopping. This shows that AI assistants can migrate general knowledge to specific fields to quickly learn new skills.
Of course, AI assistants are not born perfect. Researchers have found that some commonly used AI training methods, such as instructions, fine -tuning (IFT), may cause excessive models in some cases and affect their performance.
In addition, less sample learning is also a major challenge facing AI assistants. This means that when facing new tasks, AI assistants need to learn quickly, not always rely on a lot of training data.
In short, Amazon's Shopping MMLU benchmark test indicates the direction of the development of AI assistants. In the future, we look forward to seeing more intelligent and more user -friendly online shopping AI assistants, making our shopping experience more convenient and more pleasant.
Researchers have also discovered some details worthy of attention:
Shopping MMLU is more complicated and challenging than other existing other online shopping AI data sets.
The fine -tuning effect of the instructions in specific fields is not always good, and it is effective only in the powerful model of a large number of general knowledge.
At present, even the most advanced AI models, the performance of some online shopping tasks is not as good as algorithms specifically designed for these tasks.
The results of this study show that the construction of a perfect online shopping AI assistant still has a long way to go. Future research directions include: develop more effective AI training methods, build more diverse online shopping AI data sets, and combine AI models with specific task algorithms to create a more powerful mixed AI system.
Finally, the researchers also frankly pointed out some of the limitations of this study:
The data in Shopping MMLU mainly comes from Amazon, and may not be able to fully represent user behavior on other e -commerce platforms.
Although researchers have tried their best to avoid, data in Shopping mmlu may still have some errors.
All in all, Amazon's research opened the door to the era of intelligent shopping for us. I believe that in the near future, online shopping AI assistants will become an indispensable part of our lives.
Thesis address: https: //arxiv.org/pdf/2410.20745
Data and evaluation code:
https://github.com/kl4805/shoppingmmlu
KDD CUP 2024 Workshop and the award -winning team solution:
https://amazon-kddcup24.github.io/
Evaluation list:
https://huggingface.co/spaces/kl4805/shopping_mmlu_leaderboard
Through the SHOPPING MMLU benchmark test launched by Amazon, we can more clearly understand the application status and future development direction of large language models in the field of online shopping. This research not only provides valuable references for the improvement of the AI model, but also pointed out the road to enhance the online shopping experience of users, indicating that a more intelligent and convenient shopping era is coming.