ShareGPT4Video下載 - ShareGPT4Video原始碼下載

ShareGPT4Video

其他源碼

1.0.0

下載

ShareGPT4Video：透過更好的字幕提高影片理解和生成

️我們的系列作品： [ MMStar ][ ShareGPT4V ][ ShareGPT4Omni ]

ShareGPT4Video 的正式實施：透過更好的字幕提高視訊理解和產生。

以下是清楚介紹ShareGPT4Video的影片：

demo_clip_v2.mp4

作者：陳林*、魏熙林*、李勁松*、董曉義、張潘、臧太空、陳澤輝、段浩東、林斌、唐振宇、袁莉、喬宇、林大華、趙峰?、王嘉琪?
院校：中國科技大學；香港中文大學；北京大學；上海人工智慧實驗室
資源：[論文] [專案頁] [ShareGPT4Video 資料集] [Colab]
型號：[?ShareGPT4Video-8B] [?ShareCaptioner-Video]
示範：[?ShareGPT4Video-8B] [?ShareCaptioner-Video]

亮點

大規模的高度描述性的視訊文字資料集， 40K GPT4-Vision 產生的視訊字幕，大約400K隱式視訊分割字幕。
適用於各種影片長度、解析度和寬高比的通用視訊字幕產生器，接近 GPT4-Vision 的字幕功能，具有兩種分別針對品質和效率的推理模式。
卓越的大型視訊語言模型ShareGPT4Video-8B ，在 8xA100 GPU 上分別持續5 小時的訓練。
透過我們的 ShareCaptioner-Video 產生的高品質視訊字幕提高文字到影片的效能。感謝開放索拉計劃。

訊息

[2024/10/1] ShareGPT4Video被NeurIPS 2024 D&B賽道接受！

[2024/7/1] ShareCaptioner-Video批次推理程式碼現已發布！

[2024/6/11] ShareCaptioner-Video 網頁版和本機版現已上線！

[2024/6/11] ShareGPT4Video-8B 網頁演示和本地演示現已推出！

[2024/6/7]我們的論文被HuggingFace Daily Papers推薦並以6.7排名第一。

[2024/5/27] ShareGPT4Video-8B模型發佈！

[2024/5/26] ShareGPT4Video資料集與專案頁面發布！

?‍ 都都

ShareGPT4Video-8B 的訓練代碼
ShareCaptioner-Video 的批次推理程式碼
ShareCaptioner-Video 的網頁演示和本地演示
ShareGPT4Video-8B 的網頁演示和本地演示
ShareGPT4Video-8B 的檢查點

快速使用

您可以透過以下命令直接使用我們的 ShareGPT4Video 模型與您自己的影片進行對話：

 python run.py --model-path Lin-Chen/sharegpt4video-8b --video examples/yoga.mp4 --query Describe this video in detail.

或者您可以使用以下命令建立本地演示來欣賞我們的 ShareGPT4Video-8B：

 python app.py

您可以使用以下命令建立本機示範以欣賞我們的 ShareCaptioner-Video：

 cd captioner

python app.py

安裝

git clone https://github.com/ShareGPT4Omni/ShareGPT4Video
conda create -n share4video python=3.10 -y
conda activate share4video

cd ShareGPT4Video
pip install --upgrade pip
pip install -e .
pip install -e " .[train] "
pip install flash-attn --no-build-isolation

火車

驗證高品質視訊字幕的有效性，以幫助提高 LVLM 的理解能力。我們選擇 VideoLLaVA 和 LLaMA-VID 模型作為我們的基線。兩個模型使用的 SFT 資料是 LLaVA-mix665K 影像資料加上 VideoChatGPT-100K 視訊資料。我們用 ShareGPT4Video 中的 28K 高品質字幕資料取代 VideoChatGPT-100K 中的 28K 字幕資料。接下來我們以VideoLLaVA為例。

您需要先按照VideoLLaVA中的說明準備好圖片和視頻，然後從HuggingFace下載ShareGPT4Video中使用的28K視頻（僅涉及bdd100k、ego4d和panda）。

最後，您可以在finetune.sh中指定llava_v1_5_mix665k_with_video_chatgpt72k_share4video28k.json檔案來執行SFT以重現論文中的結果。

✒️引用

如果您發現我們的工作對您的研究有幫助，請考慮給予星星和引用

 @article { chen2024sharegpt4video ,
  title = { ShareGPT4Video: Improving Video Understanding and Generation with Better Captions } ,
  author = { Chen, Lin and Wei, Xilin and Li, Jinsong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Chen, Zehui and Duan, Haodong and Lin, Bin and Tang, Zhenyu and others } ,
  journal = { arXiv preprint arXiv:2406.04325 } ,
  year = { 2024 }
}

@article { chen2023sharegpt4v ,
  title = { ShareGPT4V: Improving Large Multi-Modal Models with Better Captions } ,
  author = { Chen, Lin and Li, Jisong and Dong, Xiaoyi and Zhang, Pan and He, Conghui and Wang, Jiaqi and Zhao, Feng and Lin, Dahua } ,
  journal = { arXiv preprint arXiv:2311.12793 } ,
  year = { 2023 }
}

@article { chen2024we ,
  title = { Are We on the Right Way for Evaluating Large Vision-Language Models? } ,
  author = { Chen, Lin and Li, Jinsong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Chen, Zehui and Duan, Haodong and Wang, Jiaqi and Qiao, Yu and Lin, Dahua and others } ,
  journal = { arXiv preprint arXiv:2403.20330 } ,
  year = { 2024 }
}