Alibaba’s latest AI data science assistant, DS Assistant, aims to simplify and accelerate the data science process. It automates the entire process from data exploration to model evaluation, making it easy to use even for users without a strong data science background. DS Assistant is based on Alibaba's open source Modelscope-Agent framework, which has a rich tool ecosystem and flexible module design. It supports access to mainstream open source models and provides RAG components, which greatly improves efficiency and ease of use. Its core advantage lies in the automated workflow. Users only need to provide requirements, and DS Assistant can automatically perform various steps, greatly lowering the threshold for data science.
Recently, Alibaba launched an AI data science assistant called DS Assistant, which can automate the entire process from data exploration to model evaluation, making data science work easier and more efficient.
DS Assistant is developed based on the Modelscope-Agent framework, which is open sourced by Alibaba and has a rich tool ecosystem and flexible module design. The launch of DS Assistant marks that even users without a deep data science background can easily handle complex data science problems.
The core strength of DS Assistant is its automated workflow. Users only need to provide requirements, and DS Assistant can automatically perform steps such as exploratory data analysis, data preprocessing, feature engineering, model training and evaluation. This process not only improves work efficiency, but also lowers the threshold for data science work.
The Modelscope-Agent framework is the powerful support behind DS Assistant. It has the following characteristics:
Supports access to various mainstream open source models, such as vllm, ollama, etc.;
Provide RAG components and quickly access the knowledge base;
Rich tool ecosystem, supporting Modelscope community model and langchain tools.
DS Assistant adopts the emerging plan-and-execute framework to efficiently complete complex tasks through clear planning and execution steps. Its workflow includes task planning, sub-task scheduling, task execution and result integration, which greatly improves the efficiency and controllability of task execution.
In terms of system architecture, DS Assistant consists of four main modules: DS Assistant itself serves as the system brain and is responsible for overall scheduling; the Plan module is responsible for generating task lists and performing topological sorting; the Execution module is responsible for specific execution and saving results; the Memory management module records the tasks in progress Execution results.
In a practical case, DS Assistant was successfully applied to the ICR - Identifying Age-Related Conditions competition task on Kaggle. Through automated data processing and analysis processes, DS Assistant not only improves the success rate of task execution, but also generates detailed processing records for users.
The effect of DS Assistant was evaluated through ML-Benchmark. From the three dimensions of Normalized Performance Score (NPS), total time and total number of tokens, DS Assistant has achieved better results than open source SOTA on some complex data science tasks.
The application value of DS Assistant lies in:
For users who are not familiar with the data analysis process, DS Assistant provides a way to quickly understand data processing ideas and technical points;
For users who understand the data analysis process, DS Assistant provides a detailed description of the processing method to facilitate experimental reference comparison;
For everyone, DS Assistant automates and quickly achieves a deeper understanding of the current file.
In the future, DS Assistant will be optimized in three directions: improving the task execution success rate, supporting conversational interactive task advancement, and supporting batch processing of multiple batches of files for the same task to further enhance the user experience.
This innovative tool from Alibaba not only lowers the entry barrier to data science, but also provides data scientists with a powerful automated assistant, heralding new changes in the field of data science.
Official repository: https://github.com/modelscope/modelscope-agent/blob/master/examples/agents/data_science_assistant.ipynb
Reference: https://blog.langchain.dev/planning-agents/
All in all, DS Assistant has brought significant efficiency improvements and convenience to the field of data science with its automated processes and powerful Modelscope-Agent framework, and has huge potential for future development. It is not only a powerful assistant for data scientists, but also opens the door to data science for more people.