bigscience Download - bigscience Source code download

bigscience

Other source code

1.0.0

Download

bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

JZ - Lots of information about our work environment which helps evaluate, plan and get things done
Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
Datasets info
Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as:

hub integration

Trainings

While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned

Train 1 - 13B - unmodified Megatron gpt2 - baseline

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (d+)/; 
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' 
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

Size	1B3	760M	350M	125M
C4 + low warmup	a	b	c
OSCAR + low warmup	f
C4 + high warmup	e
OSCAR + high warmup	d (current baseline)	g	h	i
Pile + high warmup	m	j	k	l

Train 8

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (d+)/; 
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' 
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Train 11

This is the current main training

tr11-176B-ml

the full spec and discussions
the training script
checkpoints and logs:
- tensorboard
- logs
chronicles-prequel
chronicles

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (d+)/s; 
print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' 
https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2024-11-28
size 2.45MB
From Github

Related Applications

waymo open dataset

2024-11-18
SmartTube

2024-12-14
Sunamu

2024-12-14
MySchedule.py

2024-12-15
viptools for eslam

2024-12-15
VITAident

2024-12-15

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
waymo open dataset

Other source code

December 2023 Update
SmartTube

Other source code

24.71 Stable
Sunamu

Other source code

Release 2.2.0
waymo open dataset

Other source code

December 2023 Update
wp functions

Other categories

1.0.0
termwind

Other categories

v2.3.0

Related Information All