Large-scale, Informative, and Diverse Multi-round Dialogue Data, and Models
UltraLM • Data Explorer • Nomic AI Atlas Explorer • Data Release • Construction Process • Paper
UltraLM is a series of chat language models trained on UltraChat. Currently, we have released the 13B version, which ranks #1 among open-source models and ranks #4 among all models on AlpacaEval Leaderboard (June 28, 2023). UltraLM-13B is based upon LLaMA-13B and supported by BMTrain in the training process.
Model | Link | Version |
---|---|---|
UltraLM-13B | Huggingface Repo | v1.0 |
UltraLM-65B | Huggingface Repo | v1.0 |
UltraLM-13B | Huggingface Repo | v2.0 |
UltraRM-13B | Huggingface Repo | v1.0 |
UltraCM-13B | Huggingface Repo | v1.0 |
/UltraLM/recover.sh
to obtain the final weights of the recovered model./UltraLM/chat_cli.sh
with your path and run to start chatting!Note: Different hyper-parameters or system prompts will affect the outputs. You can refer to details in /UltraLM/inference_cli.py
for our default setting.
We report three evaluations in this section: Alpaca-Eval from Stanford, Evol-instruct from Microsoft's WizardLM, and our curated evaluation set. Evaluations of modern LLMs may be biased and affected by many factors, we are also actively working on more comprehensive evaluation methods.
AlpacaEval is a leaderboard specifically designed for evaluating LLMs. The leaderboard is created based on the win-rate against Text-Davince-003 automatically evaluated by GPT-4.
This dataset is constructed with an evolutionary strategy by rewriting the instructions through multiple rounds to obtain instructions at different complexity levels. The benchmark is developed by the WizardLM project, another excellent chat language model!
We curate an evaluation set, encompassing the Vicuna Benchmark and additional 300 questions and instructions generated by GPT-4. The questions/instructions cover a wide range of topics, including commonsense, world knowledge, professional knowledge (specifically physics and biology) , mathematics, and writing tasks on different levels of difficulty. We use GPT-4 for evaluation. Here is the dataset.
Example A: Question of Common Knowledge
User: Which animal is known as man's best friend?
UltraLM: There are many animals that humans enjoy as companion animals, but one of the absolute standout answers to the question "which animal is known as a man's best friend?" is the domestically-bred dog. Domesticated dogs have been by humans' sides for tens of thousands of years, providing faithful companionship and help; guarding people and property; and lending emotional support in everything from long, lonely days to military deployments. Dogs come in an almost mind-boggling variety of breeds of various sizes, with different energy levels, personalities, and temperaments, meaning there's almost certainly a pup out there who will be a perfect fit for whoever needs it most.
Example B: Commonsense Reasoning
User: Which weighs more: a pound of feathers or a pound of bricks?
UltraLM: A pound of feathers and a pound of bricks both weigh exactly one pound. This is because one pound is a unit of weight, just as a foot is a unit of length or a second is a unit of time. We use the pound (or other unit of weight) to describe the weight of different objects, regardless of what that object is made of. So the answer to your question is that a pound of feathers and a pound of bricks both weigh the same amount.
Example C: Reasoning
User: If a farmer has 17 sheep and all but 9 die, how many are left?
UltraLM: The total number of sheep that are left is 9.
Example D: Reasoning
User: If a red house is made from red bricks, and a blue house is made from blue bricks, what is a greenhouse made from?
UltraLM: A greenhouse is not made from green bricks, but rather from materials such as glass or plastic that allow light to pass through and create a controlled environment for plants to grow. Greenhouses can come in a variety of colors, including green, but the color of the building itself does not affect the plants growing inside.
Example E: Creation
User: Write a poem about electromagnetism.
UltraLM: Electromagnetism is a force we can't ignore,
It's the reason for many things we adore. In a magnetic field, a wire will glow,
And the electrons will align, both old and new.
Electricity and magnetism are hand in hand,
One creates a field, the other has a brand.
The magnetic field can push and pull,
On a wire with current, it will react in full.
The attraction and repulsion it brings,
Are the reason for many things.
From motors to generators, and much more in between,
Electromagnetism is at the root of what it means.
It's a force that's all around us,
From the compass to the light switch.
With its power, it can do much,
From a simple needle to a rocket's thrust.
It's a force we can't deny,
For it's a part of our lives.
Electromagnetism is something to admire,
For its strength and its might, it has no equal, it's sure.
Example F: Creation
User: Write a short story about the first second of the universe.
UltraLM: In the first second of the universe, darkness reigned supreme. There was no light, no sound, no shape, no substance. Nothing yet was, but there was an intense power, a raw energy waiting to be unleashed. Within infinity, the first second held the potential for creation in its grasp. And then, with a mighty bang, a bright point of light emerged into existence. In it, the most fundamental particles were born, their forms yet unknown as they frenetically collided in a chaotic dance of creation. Within this primordial light, every force of nature and every component of life was stirred from pure energy to physical reality. The first atoms quickly assembled, and from these, stars and galaxies began to form. Their fusion creating a hot environment that pushed back the very void, shaping the cosmos through expanding and cooling waves. In this cosmic first, no structure of experience, not even time or space, could establish its rule. For an instant, the entirety of the universe existed in an embryonic nexus, a cosmic singularity from which all future existence originated. It was a microcosm forging the macrocosm of which it ultimately becomes a part, its moment of birth a metaphor for the cycles of creation that permeate the multiverse(s).From that single point of light, an infinite universe unfolded. For centuries, eons ahead, and forevermore, it remained the most foundational blink of its unceasing journey into the depths of the Great Beyond. The first second is the beginning of everything.
This project aims to construct open-source, large-scale, and multi-round dialogue data powered by Turbo APIs to facilitate the construction of powerful language models with general conversational capability. In consideration of factors such as safeguarding privacy, we do not directly use any data available on the Internet as prompts.
Disclaimer: Although the process of building UltraChat does NOT involve any publicly available benchmark data, scaling to a certain extent may still result in some overlap in some evaluation benchmarks. We would like to emphasize again that all the data is automatically generated (including the instructions and responses), and we do not insert any open benchmark data. For example, UltraChat was released (April, 2023) earlier than Alpaca Eval (May, 2023). We encourage users to closely monitor such phenomena, while we are also actively considering how to evaluate LLMs more properly.
The dataset is intended solely for research and educational purposes and should not be construed as reflecting the opinions or views of the creators, owners, or contributors of this dataset. And it is distributed under the MIT license.
Explore the data before downloading, or use Atlas explorer.
Direct Download links:
Each line in the downloaded data file is a json dict containing the data id and dialogue data in a list format. Below is an example line.
{
"id": "0",
"data": [
"How can cross training benefit groups like runners, swimmers, or weightlifters?",
"Cross training can benefit groups like runners, swimmers, or weightlifters in the following ways: ...",
"That makes sense. I've been wanting to improve my running time, but I never thought about incorporating strength training. Do you have any recommendations for specific exercises?",
"Sure, here are some strength training exercises that can benefit runners: ...",
"Hmm, I'm not really a fan of weightlifting though. Can I incorporate other forms of exercise into my routine to improve my running time?",
"Yes, absolutely! ...",
"..."
]
}
We provide training code to fine-tune LLaMa (however we are not distributing the weights of LLaMa) on UltraChat in .src/
, the training is accelerated by BMTrain.
Download the released data and put it under ./data
Run train_bm.py
, for example:
WANDB_MODE="offline" torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:50003 train_bm.py --tensorboard ./ultrachat_llama_tb_2 --save_step 5000 --logging_step 100
We also provide a training script to fine-tune GPT-J on UltraChat in .src/train_legacy/
, which is implemented with OpenPrompt
./data
accelerate launch train.py
to start trainingThe general idea of UltraChat is to use separate LLMs to generate opening lines, simulate users and respond to queries. Each sector of UltraChat has its own challenges and requires particular strategy designs. We will specify the construction process once a sector of UltraChat is released.
Feel free to cite the repo if you think UltraChat is useful.
@article{ding2023enhancing,
title={Enhancing Chat Language Models by Scaling High-quality Instructional Conversations},
author={Ding, Ning and Chen, Yulin and Xu, Bokai and Qin, Yujia and Zheng, Zhi and Hu, Shengding and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen},
journal={arXiv preprint arXiv:2305.14233},
year={2023}
}