ScriptsForVoxBlink2 Download - ScriptsForVoxBlink2 Source code download

ScriptsForVoxBlink2

Other source code

Download

The VoxBlink2 Dataset

The VoxBlink2 dataset is a Large Scale speaker recognition dataset with 100K+ speakers obtained from YouTube platform. This repository provides guidelines to build the corpus and relative resources to reproduce the results in our article . For more introduction, please see cite. If you find this repository helpful to your research, don't forget to give us star?.

Resource

Let's start with obtaining the resource files and decompressing tar-files.

tar -zxvf spk_info.tar.gz
tar -zxvf vb2_meta.tar.gz 
tar -zxvf asr_res.tar.gz

File structure

% The file structure is summarized as follows: 
|---- data               
|     |---- ossi    # [Folder]evaluation protocols for open-set speaker identification
|     |---- test_vox # [Folder] evaluation protocols for speaker verification
|     |---- spk2videos	# [spk,video1,video2,...]
|---- ckpt #checkpoints for evaluation
|     |---- ecapatdnn # [Folder]
|     |---- resnet34 # [Folder]
|     |---- resnet100 # [Folder]
|     |---- resnet293 # [Folder]
|     |---- face_model # [Folder]
|---- spk_info             # video'tags of speakers：
|     |---- id000000	
|     |---- id000001	
|     |---- ...
|---- asr_res            # ASR annotations by Whisper：
|     |---- id000000	
|     |---- id000001	
|     |---- ...
|---- meta		# timestamps for video/audio cropping
|     |---- id000000	# spkid
|           |---- DwgYRqnQZHM	#videoid
|                 |---- 00000.txt	#uttid
|                 |---- ...
|           |---- ... 
|     |---- ...	
|---- face_id            # face_identification modules
|     |---- api.py # corresponding inference functions
|     |---- arcface.py # corresponding model definitions
|     |---- README.md 
|     |---- test.py # Test
|---- ossi            # video'tags of speakers：
|     |---- eval.py # recipe for evaluate openset speaker identification
|     |---- utils.py 
|     |---- example.npy # eg. Resnet34-based embedding for evaluate OSSI 
|---- audio_cropper.py	# extract audio-only segments by timestamps from downloaded audios
|---- video_cropper.py	# extract audio-visual segments by timestamps from downloaded videos
|---- downloader.py	# scripts for download videos
|---- LICENSE		# license
|---- README.md	
|---- requirement.txt

Download

The following procedures show how to construct your VoxBlink2

Pre-requisites

Install ffmpeg:

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install ffmpeg

Install Python library:

pip install -r requirements.txt

Download videos

We provide two alternatives for you to download video or audio-only segments. We Also leverage multi-thread to facilate download process.

For Audio-Visual

python downloader.py --base_dir ${BASE_DIR} --num_workers 4 --mode video

For Audio-Only

python downloader.py --base_dir ${BASE_DIR} --num_workers 4 --mode audio

Crop Audio/Videos

For Audio-Visual

python cropper_video.py --save_dir_audio ${SAVE_PATH_AUDIO} --save_dir_video ${SAVE_PATH_VIDEO} --timestamp_path meta --video_root=${BASE_DIR} --num_workers 4

For Audio-Only

python cropper_audio.py --save_dir ${SAVE_PATH_AUDIO} --timestamp_path meta --audio_root=${BASE_DIR} --num_workers 4

FID Evaluation

We provide simple scripts of our face identification model, which is adopted in curating VoxBlink2. For more, please look at fid.

SV Evaluation

We provide simple scripts for model evaluation of ASV, just execute run_eval.sh in asv folder. For more, please look at asv.

Open-Set Speaker Identification Evaluation

We provide simple scripts for model evaluation of our proposed task: Open-Set Speaker-Identification(OSSI). just execute run_eval_ossi.sh in ossi folder. For more, please look at ossi.

License

The dataset is licensed under the CC BY-NC-SA 4.0 license. This means that you can share and adapt the dataset for non-commercial purposes as long as you provide appropriate attribution and distribute your contributions under the same license. Detailed terms can be found here.

Important Note: Our released dataset only contains annotation data, including the YouTube links, time stamps and speaker labels. We do not release audio or visual data and it is the user's responsibility to decide whether and how to download the video data and whether their intended purpose with the downloaded data is legal in their country. For YouTube users with concerns regarding their videos' inclusion in our dataset, please contact us via E-mail: [email protected] or [email protected].

Citation

Please cite the paper below if you make use of the dataset:

@misc{lin2024voxblink2100kspeakerrecognition,
      title={VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark}, 
      author={Yuke Lin and Ming Cheng and Fulin Zhang and Yingying Gao and Shilei Zhang and Ming Li},
      year={2024},
      eprint={2407.11510},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2407.11510}, 
}

Expand

Additional Information

Version
Type Other source code
Update Time 2024-12-30
size 50MB
From Github

Related Applications

waymo open dataset

2024-11-18
SmartTube

2024-12-14
Sunamu

2024-12-14
viptools for eslam

2024-12-15
MySchedule.py

2024-12-15
VITAident

2024-12-15

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
waymo open dataset

Other source code

December 2023 Update
SmartTube

Other source code

24.71 Stable
Sunamu

Other source code

Release 2.2.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All