ทรัพยากร Splatting 3D Gaussian ที่ยอดเยี่ยม
รายชื่อเอกสารและแหล่งข้อมูลโอเพ่นซอร์สที่รวบรวมไว้ซึ่งมุ่งเน้นไปที่ 3D Gaussian Splatting ซึ่งมีจุดมุ่งหมายเพื่อให้ทันกับการวิจัยที่คาดว่าจะเพิ่มขึ้นในอีกไม่กี่เดือนข้างหน้า หากคุณมีข้อมูลเพิ่มเติมหรือข้อเสนอแนะใด ๆ โปรดมีส่วนร่วม เรายินดีต้อนรับแหล่งข้อมูลเพิ่มเติม เช่น บล็อกโพสต์ วิดีโอ ฯลฯ
สารบัญ
- Seminal Paper ขอแนะนำ 3D Gaussian Splatting
- การตรวจจับวัตถุ 3 มิติ
- การขับขี่แบบอัตโนมัติ
- อวตาร
- งานคลาสสิค
- การบีบอัด
- การแพร่กระจาย
- พลวัตและการเสียรูป
- การแก้ไข
- การฝังภาษา
- การสกัดตาข่ายและฟิสิกส์
- เบ็ดเตล็ด
- การทำให้เป็นมาตรฐานและการเพิ่มประสิทธิภาพ
- กำลังแสดงผล
- รีวิว
- สแลม
- เบาบาง
- ระบบนำทางและการขับขี่อัตโนมัติ
- โพสท่า
- ขนาดใหญ่
- การใช้งานโอเพ่นซอร์ส
- อ้างอิง
- การดำเนินการอย่างไม่เป็นทางการ
- การสาดแบบเกาส์เซียน 2 มิติ
- เครื่องยนต์เกม
- ผู้ชม
- สาธารณูปโภค
- บทช่วยสอน
- กรอบ
- อื่น
- โพสต์ในบล็อก
- วิดีโอสอน
- เครดิต
บันทึกการอัปเดต:
24 ต.ค. 2024
16 ต.ค. 2024
07 กันยายน 2024
10 พฤษภาคม 2024
- เพิ่มเอกสาร 18 รายการ: Z-Splat, Dual-Camera, StylizedGS, Hash3D, Revisiting Densification, Gaussian Pancakes, Gaussians ที่ปรับรูปได้แบบ 3 มิติ, SpikeNVS, การทำ PC แบบ Zero-shot, SplatPose, DreamScene360, RealmDreamer, Gaussian-ILC, การเรียนรู้เสริมด้วย GGS , GoMAvatar, OccGaussian, LoopGaussian ทบทวน
11 เมษายน 2024
- การเปิดตัวโค้ดของ latentSplat
9 เมษายน 2024
- เพิ่ม 1 กระดาษ: EgoLifter
8 เมษายน 2024
- เพิ่ม 3 เอกสาร: Robust Gaussian Splatting, SC4D และ MM-Gaussian
5 เมษายน 2024
- เพิ่มเอกสาร 5 ฉบับ: การสร้างพื้นผิวใหม่, TCLC-GS, GaSpCT, OmniGS และ Per-Gaussian Embedding,
- แก้ไข
2 เมษายน 2024
- เพิ่มเอกสาร 11 ฉบับ: HO, SGD, HGS, Snap-it, InstantSplat, 3DGSR, MM3DGS, HAHA, CityGaussain, Mirror-3DGS และ Feature Splatting
30 มีนาคม 2567
- เพิ่มเอกสาร 8 ฉบับ: ความไม่แน่นอนของการสร้างแบบจำลอง, GRM, Gamba, CoherentGS, TOGS, SA-GS และ GaussianCube
27 มีนาคม 2024
- เพิ่มการใช้งานอื่นๆ: 360-gaussian-splatting
- เพิ่มป้ายกำกับ CVPR '24 แล้ว
- เพิ่ม 5 เอกสาร: Comp4D, DreamPolisher, DN-Splatter, 2D GS และ Octree-GS
26 มีนาคม 2024
- เพิ่มกระดาษ 13 แผ่น: latentSplat, GS on the Move, RadSplat, Mini-Splatting, SyncTweedies, HAC, STAG4D, EndoGSLAM, Pixel-GS, Semantic Gaussians, Gaussian in the Wild, CG-SLAM และ GSDF
24 มีนาคม 2567 :
- กระดาษที่เพิ่ม: Gaussian Frosting
20 มีนาคม 2567 :
- เพิ่มเอกสาร 4 รายการ: GVGEN, HUGS, RGBD GS-ICP SLAM และ High-Fidelity SLAM
19 มีนาคม 2567 :
- เพิ่มพอยทริกซ์แล้ว
- เพิ่มบทช่วยสอน 3DGS โดยผู้เขียนต้นฉบับ
- เพิ่ม GauStudio แล้ว
- เพิ่มเอกสาร 23 ฉบับ: Touch-GS, GGRt, FDGaussian, SWAG, Den-SOFT, Gaussian-Flow, การแก้ไข 3D ที่ดูสอดคล้องกัน, BAGS, GeoGaussian, GS-Pose, Analytic-Splatting, แผนที่ 3D แบบไม่มีรอยต่อ, Texture-GS, ความก้าวหน้าล่าสุด ใน 3DGS, Compact 3DGS สำหรับ Dense Visual SLAM, BrightDreamer, 3DGS-Reloc, Beyond ความไม่แน่นอน, Motion-Aware 3DGS, Fed3DGS, GaussNav, 3DGS-Calib และ NEDS-SLAM
17 มีนาคม 2567 :
- อัปเดตชื่อ repo และลิงก์สำหรับ 3DGS.cpp (เดิมคือ VulkanSplatting)
16 มีนาคม 2567 :
- SplatTV
- เพิ่มเอกสาร 6 รายการ: GaussianGrasper, อัลกอริธึมการแยกใหม่, การสร้างข้อความเป็น 3D ที่ควบคุมได้, Spring-Mass 3DGS, Hyper-3DGS และ DreamScene
14 มีนาคม 2567 :
- เพิ่มเอกสาร 6 รายการ: SemGauss, StyleGaussian, Gaussian Splatting in Style, GaussCtrl, GaussianImage และ RAIN-GS
8 มีนาคม 2567 :
- บทช่วยสอน: วิธีจับภาพสำหรับ 3DGS
- เพิ่ม 6 เอกสาร: SplattingAvatar, DNGaussian, Radiative Gaussians, BAGS, GSEdit และ ManiGaussian
8 มีนาคม 2567 :
6 มีนาคม 2567 :
- เพิ่มกระดาษ 1 รายการ: Splat-Nav
5 มีนาคม 2567 :
- เพิ่มกระดาษ 1 รายการ: 3DGStream
- การเปิดตัวรหัส
- เพิ่มผู้ดูใหม่แล้ว
2 มีนาคม 2567 :
- เพิ่มกระดาษ 1 รายการ: โมเดลเกาส์เซียน 3 มิติสำหรับแอนิเมชั่นและพื้นผิว
- ส่วนใหม่: หลักสูตรที่สอน 3DGS ด้วย
28 กุมภาพันธ์ 2567 :
27 กุมภาพันธ์ 2567 :
- เพิ่มเอกสาร 2 รายการ: Spec-Gaussian และ GEA
- รหัส SC-GS ออกแล้ว
24 กุมภาพันธ์ 2567 :
- เพิ่มเอกสาร 2 รายการ: การระบุ Gaussians และ Gaussian Pro ที่ไม่จำเป็น
23 กุมภาพันธ์ 2567 :
- ผู้เขียนที่ถูกต้องและบทคัดย่อที่อัปเดตสำหรับ EndoGS: การสร้างเนื้อเยื่อส่องกล้องที่เปลี่ยนรูปได้ใหม่ด้วย Gaussian Splatting
21 กุมภาพันธ์ 2567 :
- เพิ่มหนึ่งรายงาน: การปรับรูปร่าง SLAM: แบบสำรวจ
20 กุมภาพันธ์ 2567 :
- เปิดตัวโค้ด GaussianObject แล้ว
- เพิ่มหนึ่งกระดาษ: GaussianHair
19 กุมภาพันธ์ 2567 :
- เพิ่มโพสต์ในบล็อก: NeRFs กับ 3DGS
16 กุมภาพันธ์ 2567 :
- เพิ่มเอกสาร 2 รายการ: IM-3D และ GES
- เปิดตัวโค้ด GaMeS แล้ว
14 กุมภาพันธ์ 2567 :
- เพิ่มโปรแกรมดู: VulkanSplatting - ตัวเรนเดอร์ 3DGS ประสิทธิภาพสูงข้ามแพลตฟอร์มใน C++ และ Vulkan Compute
13 กุมภาพันธ์ 2567 :
- การเผยแพร่โค้ด: (16 มกราคม 2024) การแสดงและการเรนเดอร์ฉากไดนามิกเสมือนจริงแบบเรียลไทม์ด้วย 4D Gaussian Splatting
- เพิ่มเอกสาร 3 รายการ: 3DGala, ImplicitDeepFake และ 3D Gaussians ในฐานะยุควิสัยทัศน์ใหม่
9 กุมภาพันธ์ 2567 :
- เพิ่มกระดาษ 1 รายการ: HeadStudio
8 กุมภาพันธ์ 2567 :
- เพิ่มเอกสาร 3 รายการ: Rig3DGS, GS แบบ Mesh และ LGM 6 กุมภาพันธ์ 2567 :
- เพิ่ม 2 เอกสาร: SGS-SLAM และ 4D Gaussian Splatting
5 กุมภาพันธ์ 2567 :
- ย้าย SWAGS ไปที่ส่วนไดนามิกส์และการเปลี่ยนรูป
- เพิ่ม 2 กระดาษ: GaussianObject และ GaMeSh
- GS++ เปลี่ยนชื่อเป็น Optimal Projection
2 กุมภาพันธ์ 2567 :
- เพิ่มเอกสาร 6 รายการ: VR-GS, Segment Anything, Gaussian Splashing, GS++, 360-GS และ StopThePop
- การเปิดตัวรหัส TRIPS
30 มกราคม 2567 :
- การเปลี่ยนแปลงรหัส: รหัส GaussianAvatars เปลี่ยนเป็นส่วนตัว
29 มกราคม 2567 :
- เพิ่ม 2 เอกสาร: LIV-GaussMap และ TIP-Editor
26 มกราคม 2567 :
- กระดาษที่หดกลับออก: Gaussians 3D แบบเคลื่อนไหวได้สำหรับการสังเคราะห์การเคลื่อนไหวของมนุษย์ที่มีความเที่ยงตรงสูง
- เพิ่มเอกสาร 3 รายการ: EndoGaussians, PSAvatar และ GauU-Scene
25 มกราคม 2567 :
- โปรแกรมดูที่เพิ่ม: Splatapult - ตัวเรนเดอร์ 3d gaussian splatting ใน C++ และ OpenGL ทำงานร่วมกับ OpenXR สำหรับ VR ที่เชื่อมต่ออินเทอร์เน็ต
24 มกราคม 2567 :
- เพิ่มยูทิลิตี้: GSOPs (Gaussian Splat Operators) สำหรับ SideFX Houdini
- การเผยแพร่โค้ด: GaussianAvatars
23 มกราคม 2567 :
- เพิ่มเอกสาร 3 ฉบับ: ตัดจำหน่าย Gen3D, เนื้อเยื่อส่องกล้องที่เปลี่ยนรูปได้, การสร้างวัตถุ 3 มิติแบบไดนามิกที่รวดเร็ว
- การเผยแพร่โค้ด: อวตารที่เคลื่อนไหวได้, Gaussians 3D ที่บีบอัด, GaussianAvatar
13 มกราคม 2567 :
- เพิ่มเอกสาร 4 รายการ: CoSSegGaussians, TRIPS, Gaussian Shadow Casting สำหรับอักขระทางประสาทและ DISTWAR
9 มกราคม 2567 :
- เพิ่มบทความ 1 เรื่อง: แบบสำรวจเรื่อง 3D Gaussian Splatting (การสำรวจครั้งแรก)
8 มกราคม 2567 :
- เพิ่มเอกสาร 4 รายการ: SWAGS (เพิ่มรายงานจากปี 2023 ซึ่งฉันลืมเพิ่มก่อนหน้า, ), เอกสารตรวจสอบฉบับแรก, 3DGS ที่ถูกบีบอัด และเอกสารแอปพลิเคชันสำหรับการกำหนดลักษณะเรขาคณิตของดาวเทียม
7 มกราคม 2567 :
- 1 การใช้งานโอเพ่นซอร์ส: taichi-splatting - งานเดิมได้มาจาก Taichi 3D Gaussian Splatting โดยมีการจัดระเบียบและการเปลี่ยนแปลงครั้งสำคัญ
5 มกราคม 2567 :
- เพิ่มเอกสาร 3 รายการ: FMGS, PEGASUS และ Repaint123
2 มกราคม 2567 :
- เพิ่ม 1 บทความ: Street Gaussians
2 มกราคม 2567 :
- อัปเดตลิงก์กระดาษ Gaussians ที่เบลอแล้ว
- รหัส SAGA ออกมาแล้ว
- เพิ่มเอกสาร 2 ฉบับจากปี 2023: Text2Immersion และ 2D-Guided 3DG Segmentation
- การเสริมทางคณิตศาสตร์ของ gsplat lib
- เพิ่มปีในหมวดหมู่
- รหัส GSM ออกมาแล้ว
29 ธันวาคม 2566 :
- เพิ่มกระดาษ 1 แผ่น (ดูเหมือนจะพลาดไปก่อนหน้านี้): Gaussian-Head-Avatar
- เพิ่มอวาตาร์ส่วนหัวของโพสต์บล็อกแล้ว
29 ธันวาคม 2566 :
- เพิ่มเอกสาร 3 รายการ: DreamGaussian4D, 4DGen และ Spacetime Gaussian
27 ธันวาคม 2566 :
- เพิ่มเอกสาร 3 รายการ: LangSplat, 3DGS ที่เปลี่ยนรูปได้ และ Human101
- เพิ่มโพสต์ในบล็อก: การตรวจสอบ 3DGS อย่างครอบคลุม
25 ธันวาคม 2566 :
- เปิดตัวโค้ด 3D Gaussian Representation สำหรับ Monocular/Multi-view Dynamic Scenes ที่มีประสิทธิภาพ
- เปิดตัวรหัส GPS-Gaussian
24 ธันวาคม 2566 :
- เพิ่มเอกสาร 2 ฉบับ: กริดแบบเกาส์ขององค์กรตนเองและการแยกแบบเกาส์เซียน
- เพิ่ม repo เพื่อปรับปรุงการเรนเดอร์แบบเกาส์เซียนเพื่อสร้างโมเดลฉากที่ซับซ้อนมากขึ้น
21 ธันวาคม 2566 :
- เพิ่มเอกสาร 3 รายการ: Splatter Image, pixelSplat และจัดตำแหน่ง Gaussians ของคุณ
- เปิดตัวโค้ด Gaussian Grouping แล้ว
19 ธันวาคม 2566 :
- เพิ่มเอกสาร 2 รายการ: GAvatar และ GauFRe
18 ธันวาคม 2566 :
- เพิ่มยูทิลิตี้: SpectacularAI - สคริปต์การแปลงสำหรับแบบแผน 3DGS ที่แตกต่างกัน
- เปิดตัวโค้ด SuGaR แล้ว
16 ธันวาคม 2566 :
- เพิ่มโปรแกรมดู WebGL 3: Gauzilla
15 ธันวาคม 2566 :
- เพิ่มเอกสาร 4 รายการ: DrivingGaussian, iComMa, Triplane และ 3DGS-Avatar
- มีการเผยแพร่รหัส Gaussians ที่สามารถติดไฟได้
13 ธันวาคม 2566 :
- เพิ่มเอกสาร 5 รายการ: Gaussian-SLAM, CoGS, ASH, CF-GS และ Photo-SLAM
11 ธันวาคม 2566 :
- เพิ่มเอกสาร 2 ฉบับ: Gaussian Splatting SLAM และ Denoising Scores สำหรับการสร้าง 3D
- เปิดตัวโค้ด ScaffoldGS แล้ว
8 ธันวาคม 2566 :
- เพิ่มเอกสาร 2 รายการ: EAGLES และ MonoGaussianAvatar
7 ธันวาคม 2566 :
- เปิดตัวโค้ด LucidDreamer
- เพิ่มเอกสาร 9 รายการ: GauHuman, HeadGaS, HiFi4G, Gaussian-Flow, Feature-3DGS, Gaussian-Avatar, FlashAvatar, Relightable และ Deblurring Gaussians
5 ธันวาคม 2566 :
- เพิ่มเอกสาร 9 รายการ: NeuSG, GaussianHead, GaussianAvatars, GPS-Gaussian, Gaussians แบบพาราเมตริกประสาทสำหรับการสร้างวัตถุ Non-Rigid แบบตาข้างเดียว, SplaTAM, MANUS, Segment Any และภาษาที่ฝังอยู่ใน 3D Gaussians
4 ธันวาคม 2566 :
- เพิ่มเอกสาร 8 รายการ: Gaussian Grouping, MD Splatting, DynMF, Scaffold-GS, SparseGS, FSGS, Control4D และ SC-GS
1 ธันวาคม 2566 :
- เพิ่มเอกสาร 4 รายการ: Compact3D, GaussianShader, แผนที่ Gaussian แบบสั่นสะเทือนเป็นระยะ และ Gaussian Shell เพื่อการสร้างมนุษย์ 3 มิติที่มีประสิทธิภาพ
- สร้างสารบัญสำหรับแต่ละหมวดหมู่และเพิ่มตัวแบ่งบรรทัด
30 พฤศจิกายน 2566 :
- เพิ่มการใช้งานเอ็นจิ้นเกม Unreal
- เพิ่มเอกสาร 5 รายการ: LightGaussian, FisherRF, HUGS, HumanGaussian, CG3D และ Multi Scale 3DGS
29 พฤศจิกายน 2566 :
- เพิ่มเอกสารสองฉบับ: ชี้แล้วย้ายและ IR-GS
28 พฤศจิกายน 2566 :
- เพิ่มเอกสารห้าฉบับ: GaussinEditor, Relightable Gaussians, GART, Mip-Splatting, HumanGaussian
27 พฤศจิกายน 2566 :
- เพิ่มเอกสารสองฉบับ: Gaussian Editing และ Compact 3D Gaussians
25 พฤศจิกายน 2566 :
- เพิ่มโครงการ Gaussians แบบเคลื่อนไหวได้ (กระดาษยังไม่ออก)
22 พฤศจิกายน 2566 :
- เพิ่มเอกสาร GS ใหม่ 3 รายการ: Animatable, Depth-Regularized และ Monocular/Multi-view 3DGS
- เพิ่มเอกสารคลาสสิกบางส่วน
- เพิ่มเอกสาร GS อีกฉบับที่เรียกว่า LucidDreamer
21 พฤศจิกายน 2566 :
- เพิ่มเอกสาร GS ใหม่ 3 รายการ: GaussianDiffusion, LucidDreamer, PhysGaussian
- เพิ่มเอกสาร GS อีก 2 รายการ: SuGaR, PhysGaussian
21 พฤศจิกายน 2566 :
17 พฤศจิกายน 2566 :
- เพิ่มการใช้งาน PlayCanvas ในส่วนของ Game Engines
16 พฤศจิกายน 2566 :
- เปิดตัวโค้ด 3D Gaussians ที่เปลี่ยนรูปได้
- เพิ่มกระดาษ 3D Gaussian Avatars ที่ขับเคลื่อนได้
8 พฤศจิกายน 2566 :
- หมายเหตุบางประการเกี่ยวกับการใช้งาน 3DGS และการอภิปรายเกี่ยวกับรูปแบบ unsive/rsal
4 พฤศจิกายน 2566 :
- เพิ่มการสาดแบบเกาส์เซียน 2 มิติ
- เพิ่มโพสต์บล็อกที่มีรายละเอียดมาก (ทางเทคนิค) ซึ่งอธิบายการสาดแบบเกาส์เซียน 3 มิติ
28 ตุลาคม 2566 :
- เพิ่มส่วนสาธารณูปโภค
- เพิ่ม 3DGS Converter สำหรับการแก้ไขไฟล์ 3DGS .ply ใน Cloud เปรียบเทียบกับยูทิลิตี้
- เพิ่ม Kapture (สำหรับ Bundler เพื่อแปลงโมเดล Colmap) และสคริปต์ครอบตัดรูปภาพ Kapture พร้อมคำแนะนำในการแปลงเป็น Utilities
23 ตุลาคม 2566 :
- เพิ่มโปรแกรมดู python WebGL 2
- เพิ่มข้อมูลเบื้องต้นให้กับบล็อกวิดีโอแบบ Gaussian splatting (และโปรแกรมดู Unity)
21 ตุลาคม 2566 :
- เพิ่มโปรแกรมดู Python OpenGL
- เพิ่มโปรแกรมดู typescript WebGPU
20 ตุลาคม 2566 :
- ทำให้บทคัดย่อสามารถอ่านได้ (ลบเครื่องหมายยัติภังค์ออก)
- เพิ่มบทช่วยสอน Windows
- การแก้ไขข้อความเล็กน้อยอื่นๆ
- เพิ่มโปรแกรมดูสมุดบันทึก Jupyter
19 ตุลาคม 2566 :
- เพิ่มลิงก์หน้า Github สำหรับการนำเสนอฉากไดนามิกเสมือนจริงแบบเรียลไทม์
- เรียงลำดับหัวเรื่องใหม่
- เพิ่มการใช้งานที่ไม่เป็นทางการอื่น ๆ
- ย้าย Nerfstudio gsplat และรวดเร็ว: C++/CUDA ไปสู่การใช้งานอย่างไม่เป็นทางการ
- เพิ่มโปรแกรมดู Nerfstudio, Blender, WebRTC, iOS และ Metal
17 ตุลาคม 2566 :
- เปิดตัวโค้ด GaussianDreamer แล้ว
- เพิ่มการแสดงฉากไดนามิกเสมือนจริงแบบเรียลไทม์
16 ตุลาคม 2566 :
- เพิ่มกระดาษ 3D Gaussians ที่เปลี่ยนรูปได้
- เปิดตัวโค้ด Dynamic 3D Gaussians 15 ตุลาคม 2566 : รายชื่อเบื้องต้นพร้อมเอกสาร 6 ฉบับแรก
Seminal Paper ขอแนะนำ 3D Gaussian Splatting:
3D Gaussian Splatting สำหรับการเรนเดอร์สนาม Radiance แบบเรียลไทม์
ผู้แต่ง : Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis
เชิงนามธรรม
เมื่อเร็วๆ นี้ วิธีการ Radiance Field ได้ปฏิวัติการสังเคราะห์มุมมองแบบใหม่ของฉากที่ถ่ายด้วยภาพถ่ายหรือวิดีโอหลายรายการ อย่างไรก็ตาม การบรรลุคุณภาพของภาพระดับสูงยังคงต้องใช้โครงข่ายประสาทเทียมซึ่งมีค่าใช้จ่ายสูงในการฝึกอบรมและเรนเดอร์ ในขณะที่วิธีการที่เร็วกว่าเมื่อเร็วๆ นี้ย่อมต้องแลกความเร็วกับคุณภาพอย่างหลีกเลี่ยงไม่ได้ สำหรับฉากที่ไม่จำกัดและสมบูรณ์ (แทนที่จะเป็นวัตถุที่แยกออกจากกัน) และการเรนเดอร์ความละเอียด 1080p ไม่มีวิธีใดในปัจจุบันที่สามารถรับอัตราการแสดงผลแบบเรียลไทม์ เราแนะนำองค์ประกอบหลักสามประการที่ช่วยให้เราได้รับคุณภาพของภาพที่ล้ำสมัยในขณะที่ยังคงรักษาเวลาการฝึกซ้อมที่แข่งขันได้ และที่สำคัญช่วยให้สามารถสังเคราะห์มุมมองใหม่แบบเรียลไทม์คุณภาพสูง (≥ 30 fps) ที่ความละเอียด 1080p ประการแรก เริ่มต้นจากจุดกระจัดกระจายที่เกิดขึ้นระหว่างการปรับเทียบกล้อง เรานำเสนอฉากด้วย 3D Gaussians ที่จะรักษาคุณสมบัติที่ต้องการของฟิลด์การแผ่รังสีเชิงปริมาตรต่อเนื่องเพื่อการปรับฉากให้เหมาะสม ในขณะเดียวกันก็หลีกเลี่ยงการคำนวณที่ไม่จำเป็นในพื้นที่ว่าง ประการที่สอง เราทำการเพิ่มประสิทธิภาพ/การควบคุมความหนาแน่นแบบอินเทอร์ลีฟของ 3D Gaussians โดยเฉพาะอย่างยิ่งการปรับความแปรปรวนร่วมแบบแอนไอโซทรอปิกให้เหมาะสมเพื่อให้ได้การแสดงฉากที่แม่นยำ ประการที่สาม เราพัฒนาอัลกอริธึมการเรนเดอร์ที่รับรู้การมองเห็นที่รวดเร็ว ซึ่งรองรับการสาดแบบแอนไอโซทรอปิก และทั้งเร่งการฝึกอบรมและอนุญาตการเรนเดอร์แบบเรียลไทม์ เราแสดงให้เห็นถึงคุณภาพของภาพที่ล้ำสมัยและการเรนเดอร์แบบเรียลไทม์บนชุดข้อมูลที่สร้างขึ้นหลายชุด - กระดาษ (ความละเอียดต่ำ) | - กระดาษ (ความละเอียดสูง) | หน้าโครงการ | รหัส | - การนำเสนอสั้น | - วิดีโอคำอธิบาย
การตรวจจับวัตถุ 3 มิติ
2024
1. 3DGS-DET: เพิ่มพลังให้กับ 3D Gaussian Splatting พร้อม Boundary Guidance และการสุ่มตัวอย่างแบบ Box-Focused สำหรับการตรวจจับวัตถุ 3 มิติ
ผู้แต่ง : Yang Cao, Yuanliang Jv, Dan Xu
เชิงนามธรรม
Neural Radiance Fields (NeRF) ใช้กันอย่างแพร่หลายสำหรับการสังเคราะห์มุมมองใหม่ และได้รับการปรับเปลี่ยนสำหรับการตรวจจับวัตถุ 3 มิติ (3DOD) ซึ่งนำเสนอแนวทางที่มีแนวโน้มในการตรวจจับวัตถุ 3 มิติผ่านการเป็นตัวแทนการสังเคราะห์มุมมอง อย่างไรก็ตาม NeRF เผชิญกับข้อจำกัดโดยธรรมชาติ: (i) มีความสามารถในการนำเสนอที่จำกัดสำหรับ 3DOD เนื่องจากลักษณะโดยนัย และ (ii) มีความเร็วในการเรนเดอร์ที่ช้า เมื่อเร็วๆ นี้ 3D Gaussian Splatting (3DGS) ได้กลายเป็นการนำเสนอ 3 มิติที่ชัดเจน ซึ่งจัดการกับข้อจำกัดเหล่านี้ด้วยความสามารถในการเรนเดอร์ที่เร็วขึ้น แรงบันดาลใจจากข้อดีเหล่านี้ บทความนี้แนะนำ 3DGS เข้าสู่ 3DOD เป็นครั้งแรก โดยระบุถึงความท้าทายหลักสองประการ: (i) การกระจายเชิงพื้นที่ที่ไม่ชัดเจนของ Gaussian blobs – 3DGS อาศัยการควบคุมระดับพิกเซล 2D เป็นหลัก ส่งผลให้การกระจายเชิงพื้นที่ 3D ของ Gaussian blobs ที่ไม่ชัดเจน และความแตกต่างที่ไม่ดีระหว่างวัตถุและพื้นหลัง ซึ่งเป็นอุปสรรคต่อ 3DOD (ii) หยดพื้นหลังมากเกินไป - รูปภาพ 2D มักจะมีพิกเซลพื้นหลังจำนวนมาก นำไปสู่ 3DGS ที่สร้างขึ้นใหม่อย่างหนาแน่น โดยมีหยด Gaussian ที่มีเสียงดังจำนวนมากซึ่งเป็นตัวแทนของพื้นหลัง ซึ่งส่งผลเสียต่อการตรวจจับ เพื่อรับมือกับความท้าทาย (i) เราใช้ประโยชน์จากข้อเท็จจริงที่ว่าการสร้าง 3DGS ใหม่นั้นได้มาจากภาพ 2D และนำเสนอโซลูชันที่สวยงามและมีประสิทธิภาพโดยผสมผสาน 2D Boundary Guidance เพื่อปรับปรุงการกระจายเชิงพื้นที่ของ Gaussian blobs อย่างมีนัยสำคัญ ส่งผลให้ความแตกต่างที่ชัดเจนยิ่งขึ้นระหว่างวัตถุและ พื้นหลัง (ดูรูปที่ 1) เพื่อจัดการกับความท้าทาย (ii) เราเสนอกลยุทธ์การสุ่มตัวอย่างแบบ Box-Focused โดยใช้กล่อง 2D เพื่อสร้างการกระจายความน่าจะเป็นของวัตถุในพื้นที่ 3 มิติ ช่วยให้การสุ่มตัวอย่างความน่าจะเป็นที่มีประสิทธิภาพในแบบ 3 มิติสามารถรักษา blobs ของวัตถุได้มากขึ้น และลด blobs พื้นหลังที่มีเสียงดัง การได้รับประโยชน์จาก Boundary Guidance และการสุ่มตัวอย่างแบบ Box-Focused ที่นำเสนอ ทำให้วิธีการสุดท้ายของเรา 3DGS-DET ได้รับการปรับปรุงที่สำคัญ (+5.6 บน [email protected], +3.7 บน [email protected]) เหนือเวอร์ชันไปป์ไลน์พื้นฐานของเรา โดยไม่ต้องแนะนำพารามิเตอร์ที่สามารถเรียนรู้เพิ่มเติมใดๆ ได้ . นอกจากนี้ 3DGS-DET ยังมีประสิทธิภาพเหนือกว่าวิธีการที่ใช้ NeRF ที่ล้ำสมัยอย่าง NeRF-Det อย่างมาก โดยได้รับการปรับปรุง +6.6 บน [email protected] และ +8.1 บน [email protected] สำหรับชุดข้อมูล ScanNet และ +31.5 บนชุดข้อมูล ScanNet ที่น่าประทับใจ [email protected] สำหรับชุดข้อมูล ARKITScenes รหัสและรุ่นสามารถดูได้จากสาธารณะที่: https://github.com/yangcaoai/3DGS-DET - กระดาษ | รหัส (ยังไม่มี)
การขับขี่แบบอัตโนมัติ:
2024:
1. Street Gaussians สำหรับการสร้างแบบจำลองฉากเมืองแบบไดนามิก
ผู้แต่ง : Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng
เชิงนามธรรม
บทความนี้มีจุดมุ่งหมายเพื่อแก้ไขปัญหาการสร้างแบบจำลองฉากท้องถนนในเมืองแบบไดนามิกจากวิดีโอแบบตาข้างเดียว วิธีการล่าสุดได้ขยาย NeRF ด้วยการรวมท่าทางของยานพาหนะที่ถูกติดตามเข้ากับยานพาหนะที่เคลื่อนไหว ช่วยให้สามารถสังเคราะห์มุมมองภาพถ่ายที่สมจริงของฉากท้องถนนในเมืองแบบไดนามิก อย่างไรก็ตาม ข้อจำกัดที่สำคัญคือการฝึกฝนที่ช้าและความเร็วในการเรนเดอร์ ควบคู่ไปกับความต้องการที่สำคัญสำหรับความแม่นยำสูงในท่าทางของยานพาหนะที่ถูกติดตาม เราขอแนะนำ Street Gaussians ซึ่งเป็นการนำเสนอฉากที่ชัดเจนแบบใหม่ที่จัดการกับข้อจำกัดเหล่านี้ทั้งหมด โดยเฉพาะอย่างยิ่ง ถนนในเมืองที่มีพลวัตจะแสดงเป็นชุดของ point cloud ที่ติดตั้ง semantic logits และ 3D Gaussians ซึ่งแต่ละจุดเกี่ยวข้องกับยานพาหนะเบื้องหน้าหรือพื้นหลัง ในการสร้างแบบจำลองไดนามิกของยานพาหนะวัตถุเบื้องหน้า แต่ละจุดคลาวด์ของวัตถุได้รับการปรับให้เหมาะสมด้วยท่าทางติดตามที่ปรับให้เหมาะสมได้ ควบคู่ไปกับแบบจำลองฮาร์โมนิกทรงกลมแบบไดนามิกสำหรับรูปลักษณ์แบบไดนามิก การแสดงที่ชัดเจนช่วยให้จัดองค์ประกอบยานพาหนะและพื้นหลังได้ง่าย ซึ่งช่วยให้แก้ไขฉากและเรนเดอร์ที่ 133 FPS (ความละเอียด 1066×1600) ภายในครึ่งชั่วโมงของการฝึกอบรม วิธีการที่นำเสนอได้รับการประเมินบนเกณฑ์มาตรฐานที่ท้าทายหลายประการ รวมถึงชุดข้อมูล KITTI และ Waymo Open การทดลองแสดงให้เห็นว่าวิธีการที่เสนอมีประสิทธิภาพเหนือกว่าวิธีการที่ทันสมัยในชุดข้อมูลทั้งหมดอย่างสม่ำเสมอ นอกจากนี้ การนำเสนอที่นำเสนอยังให้ประสิทธิภาพที่เทียบเท่ากับการทำท่าโพสท่าจริงที่แม่นยำ แม้ว่าจะอาศัยท่าโพสจากเครื่องมือติดตามที่มีจำหน่ายทั่วไปเท่านั้นก็ตาม - กระดาษ | หน้าโครงการ | รหัส (ยังไม่มี)
2. TCLC-GS: LiDAR-Camera Gaussian Splatting ที่เชื่อมต่อกันอย่างแน่นหนาสำหรับฉากการขับขี่อัตโนมัติโดยรอบ
ผู้แต่ง : Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren
เชิงนามธรรม
วิธีการที่ใช้ 3D Gaussian Splatting (3D-GS) ส่วนใหญ่สำหรับฉากในเมืองจะเริ่มต้น 3D Gaussians โดยตรงด้วยจุด 3D LiDAR ซึ่งไม่เพียงแต่ใช้ความสามารถด้านข้อมูล LiDAR น้อยเกินไป แต่ยังมองข้ามข้อดีที่เป็นไปได้ของการรวม LiDAR เข้ากับข้อมูลกล้องอีกด้วย ในบทความนี้ เราได้ออกแบบ LiDAR-Camera Gaussian Splatting (TCLC-GS) ที่เชื่อมโยงเข้าด้วยกันอย่างแน่นหนาเพื่อใช้ประโยชน์จากจุดแข็งที่รวมกันของทั้ง LiDAR และเซ็นเซอร์กล้องอย่างเต็มที่ ช่วยให้สามารถสร้าง 3D ใหม่ได้อย่างรวดเร็วและมีคุณภาพสูงและการสังเคราะห์ RGB/ความลึกของมุมมองใหม่ TCLC-GS ออกแบบการแสดงภาพ 3 มิติแบบไฮบริดที่ชัดเจน (ตาข่าย 3 มิติที่มีสี) และโดยนัย (คุณสมบัติแปดลำดับชั้น) ที่ได้มาจากข้อมูลกล้อง LiDAR เพื่อเพิ่มคุณสมบัติของ 3D Gaussians สำหรับการสาดสี คุณสมบัติของ 3D Gaussian ไม่เพียงแต่เริ่มต้นให้สอดคล้องกับ 3D mesh ซึ่งให้ข้อมูลรูปร่างและสี 3D ที่สมบูรณ์ยิ่งขึ้นเท่านั้น แต่ยังได้รับข้อมูลบริบทที่กว้างขึ้นผ่านคุณสมบัติโดยนัยของ octree ที่ดึงข้อมูลมาอีกด้วย ในระหว่างกระบวนการเพิ่มประสิทธิภาพ Gaussian Splatting นั้น 3D mesh จะนำเสนอข้อมูลเชิงลึกที่หนาแน่นในฐานะการควบคุมดูแล ซึ่งปรับปรุงกระบวนการฝึกโดยการเรียนรู้รูปทรงเรขาคณิตที่แข็งแกร่ง การประเมินที่ครอบคลุมที่ดำเนินการกับชุดข้อมูล Waymo Open และชุดข้อมูล nuScenes จะตรวจสอบประสิทธิภาพการทำงานที่ล้ำสมัย (SOTA) ของวิธีการของเรา ด้วยการใช้ NVIDIA RTX 3090 Ti เครื่องเดียว วิธีการของเราสาธิตการฝึกที่รวดเร็วและบรรลุการเรนเดอร์ RGB แบบเรียลไทม์และการเรนเดอร์เชิงลึกที่ 90 FPS ในความละเอียด 1920x1280 (Waymo) และ 120 FPS ในความละเอียด 1600x900 (nuScenes) ในสถานการณ์ในเมือง - กระดาษ
3. OmniRe: การสร้างฉากเมือง Omni ใหม่
ผู้แต่ง : Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, Yue Wang
เชิงนามธรรม
เราแนะนำ OmniRe ซึ่งเป็นแนวทางแบบองค์รวมสำหรับการสร้างฉากในเมืองแบบไดนามิกที่มีความเที่ยงตรงสูงขึ้นมาใหม่อย่างมีประสิทธิภาพจากบันทึกบนอุปกรณ์ วิธีการล่าสุดสำหรับการสร้างแบบจำลองลำดับการขับขี่โดยใช้สนามรัศมีของประสาทหรือ Gaussian Splatting ได้แสดงให้เห็นถึงศักยภาพของการสร้างฉากไดนามิกที่ท้าทายขึ้นมาใหม่ แต่มักจะมองข้ามคนเดินถนนและนักแสดงไดนามิกอื่นๆ ที่ไม่ใช่ยานพาหนะ ซึ่งเป็นอุปสรรคต่อกระบวนการที่สมบูรณ์สำหรับการสร้างฉากในเมืองแบบไดนามิกใหม่ ด้วยเหตุนี้ เราจึงเสนอเฟรมเวิร์ก 3DGS ที่ครอบคลุมสำหรับฉากการขับขี่ที่ชื่อว่า OmniRe ซึ่งช่วยให้สามารถสร้างวัตถุไดนามิกที่หลากหลายได้อย่างแม่นยำและเต็มรูปแบบในบันทึกการขับขี่ OmniRe สร้างกราฟฉากนิวรัลแบบไดนามิกโดยอิงตามการแทนแบบเกาส์เซียน และสร้างพื้นที่มาตรฐานในท้องถิ่นหลายแห่งที่สร้างแบบจำลองนักแสดงไดนามิกต่างๆ รวมถึงยานพาหนะ คนเดินเท้า และนักปั่นจักรยาน และอื่นๆ อีกมากมาย ความสามารถนี้ไม่ตรงกับวิธีการที่มีอยู่ OmniRe ช่วยให้เราสามารถสร้างวัตถุต่างๆ ที่มีอยู่ในฉากขึ้นมาใหม่ได้แบบองค์รวม จากนั้นจึงเปิดใช้งานการจำลองสถานการณ์ที่สร้างขึ้นใหม่โดยที่นักแสดงทุกคนที่เข้าร่วมแบบเรียลไทม์ (~60Hz) การประเมินชุดข้อมูล Waymo อย่างกว้างขวางแสดงให้เห็นว่าวิธีการของเรามีประสิทธิภาพเหนือกว่าวิธีการล้ำสมัยก่อนหน้านี้ทั้งในเชิงปริมาณและเชิงคุณภาพด้วยส่วนต่างที่มาก เราเชื่อว่างานของเราเติมเต็มช่องว่างที่สำคัญในการขับเคลื่อนการฟื้นฟู - กระดาษ | หน้าโครงการ | รหัส
2023:
1. [CVPR '24] DrivingGaussian: การสาดแบบเกาส์เซียนแบบคอมโพสิตสำหรับฉากการขับขี่อัตโนมัติแบบไดนามิกโดยรอบ
ผู้แต่ง : เซียวหยู โจว, จิเว่ย หลิน, เซียวจุน ชาน, หวังหยงเทา, เตอชิง ซุน, หมิง-ซวน หยาง
เชิงนามธรรม
เรานำเสนอ DrivingGaussian ซึ่งเป็นเฟรมเวิร์กที่มีประสิทธิภาพและประสิทธิผลสำหรับฉากการขับขี่อัตโนมัติแบบไดนามิกโดยรอบ สำหรับฉากที่ซับซ้อนซึ่งมีวัตถุเคลื่อนไหว ขั้นแรกเราจะสร้างแบบจำลองพื้นหลังแบบคงที่ของฉากทั้งหมดตามลำดับและต่อเนื่องด้วย Gaussians 3D แบบคงที่แบบเพิ่มทีละขั้น จากนั้นเราจะใช้ประโยชน์จากกราฟเกาส์เซียนแบบไดนามิกแบบคอมโพสิตเพื่อจัดการกับวัตถุที่เคลื่อนไหวหลายรายการ สร้างแต่ละวัตถุขึ้นใหม่ทีละรายการ และฟื้นฟูตำแหน่งที่แม่นยำและความสัมพันธ์ของการบดบังภายในฉาก นอกจากนี้เรายังใช้ LiDAR ก่อน Gaussian Splatting เพื่อสร้างฉากใหม่ที่มีรายละเอียดมากขึ้น และรักษาความสอดคล้องแบบพาโนรามา DrivingGaussian มีประสิทธิภาพเหนือกว่าวิธีการที่มีอยู่ในการขับเคลื่อนการสร้างฉากใหม่และเปิดใช้งานการสังเคราะห์มุมมองรอบทิศทางแบบเสมือนจริงด้วยความเที่ยงตรงสูงและมีความสม่ำเสมอของกล้องหลายตัว - กระดาษ | หน้าโครงการ | รหัส (ยังไม่มี)
2. [CVPR '24] HUGS: การทำความเข้าใจฉาก 3 มิติในเมืองแบบองค์รวมผ่าน Gaussian Splatting
ผู้แต่ง : Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao
เชิงนามธรรม
ความเข้าใจแบบองค์รวมของฉากในเมืองโดยใช้ภาพ RGB ถือเป็นปัญหาที่ท้าทายแต่สำคัญ ประกอบด้วยความเข้าใจทั้งเรขาคณิตและรูปลักษณ์เพื่อให้สามารถสังเคราะห์มุมมองใหม่ แยกวิเคราะห์ป้ายกำกับความหมาย และติดตามวัตถุที่เคลื่อนไหว แม้จะมีความก้าวหน้าไปมาก แต่แนวทางที่มีอยู่มักจะมุ่งเน้นไปที่ลักษณะเฉพาะของงานนี้ และต้องการข้อมูลเพิ่มเติม เช่น การสแกน LiDAR หรือกล่องขอบเขต 3 มิติที่มีคำอธิบายประกอบด้วยตนเอง ในบทความนี้ เราขอแนะนำไปป์ไลน์ใหม่ที่ใช้ 3D Gaussian Splatting เพื่อทำความเข้าใจฉากเมืองแบบองค์รวม แนวคิดหลักของเราเกี่ยวข้องกับการปรับเรขาคณิต ลักษณะที่ปรากฏ ความหมาย และการเคลื่อนไหวร่วมกันโดยใช้การผสมผสานระหว่างเกาส์เซียน 3 มิติแบบคงที่และไดนามิก โดยที่ท่าทางของวัตถุที่กำลังเคลื่อนที่จะถูกทำให้เป็นมาตรฐานผ่านข้อจำกัดทางกายภาพ วิธีการของเรานำเสนอความสามารถในการแสดงมุมมองใหม่แบบเรียลไทม์ โดยให้ข้อมูลความหมาย 2D และ 3D ที่มีความแม่นยำสูง และสร้างฉากไดนามิกขึ้นใหม่ แม้ในสถานการณ์ที่การตรวจจับ 3D Bounding Box มีเสียงรบกวนสูง ผลการทดลองกับ KITTI, KITTI-360 และ Virtual KITTI 2 แสดงให้เห็นถึงประสิทธิผลของแนวทางของเรา - กระดาษ | หน้าโครงการ | รหัส
อวตาร:
2024:
1. GaussianBody: การสร้างมนุษย์ด้วยเสื้อผ้าใหม่ผ่าน 3d Gaussian Splatting
ผู้แต่ง : Mengtian Li, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang
เชิงนามธรรม
ในงานนี้ เราเสนอวิธีการฟื้นฟูมนุษย์ด้วยเสื้อผ้าแบบใหม่ที่เรียกว่า GaussianBody ซึ่งมีพื้นฐานมาจาก 3D Gaussian Splatting เมื่อเปรียบเทียบกับโมเดลที่ใช้ Neural Radiance ที่มีราคาแพง 3D Gaussian Splatting ได้แสดงให้เห็นถึงประสิทธิภาพที่ยอดเยี่ยมเมื่อเร็ว ๆ นี้ในแง่ของเวลาการฝึกอบรมและคุณภาพการเรนเดอร์ อย่างไรก็ตาม การใช้แบบจำลอง 3D Gaussian Splatting แบบคงที่กับปัญหาการสร้างใหม่ของมนุษย์แบบไดนามิกนั้นไม่ใช่เรื่องเล็กน้อย เนื่องจากการเสียรูปที่ไม่เข้มงวดที่ซับซ้อนและรายละเอียดผ้าที่หลากหลาย เพื่อจัดการกับความท้าทายเหล่านี้ วิธีการของเราจะพิจารณาการเสียรูปที่มีท่าทางที่ชัดเจนเพื่อเชื่อมโยงเกาส์เซียนแบบไดนามิกข้ามพื้นที่มาตรฐานและพื้นที่การสังเกต การแนะนำตามทางกายภาพก่อนที่จะมีการแปลงเป็นประจำจะช่วยลดความคลุมเครือระหว่างสองช่องว่าง ในระหว่างกระบวนการฝึกอบรม เรายังเสนอกลยุทธ์การปรับแต่งท่าทางเพื่ออัปเดตการถดถอยของท่าทางเพื่อชดเชยการประมาณค่าเบื้องต้นที่ไม่ถูกต้อง และกลไกแบบแยกตามมาตราส่วนเพื่อเพิ่มความหนาแน่นของ point cloud ที่ถดถอย การทดลองตรวจสอบว่าวิธีการของเราสามารถบรรลุผลการเรนเดอร์มุมมองใหม่เสมือนจริงที่ล้ำสมัยพร้อมรายละเอียดคุณภาพสูงสำหรับร่างกายมนุษย์ที่สวมเสื้อผ้าแบบไดนามิก ควบคู่ไปกับการสร้างรูปทรงเรขาคณิตใหม่อย่างชัดเจน - กระดาษ
2. PSAvatar: โมเดลรูปร่างที่ปรับเปลี่ยนได้ตามจุดสำหรับการสร้างอวาตาร์ศีรษะแบบเรียลไทม์พร้อมการสาดแบบเกาส์เซียน 3 มิติ
ผู้แต่ง : Zhongyuan Zhao, Zhenyu Bao, Qing Li, Guoping Qiu, Kanglin Liu
เชิงนามธรรม
แม้จะมีความก้าวหน้าไปมาก แต่การบรรลุแอนิเมชั่นหัวที่มีความแม่นยำสูงแบบเรียลไทม์ยังคงเป็นเรื่องยาก และวิธีการที่มีอยู่ต้องแลกมาระหว่างความเร็วและคุณภาพ วิธีการที่ใช้ 3DMM มักจะล้มเหลวในการสร้างแบบจำลองโครงสร้างที่ไม่ใช่ใบหน้า เช่น แว่นตาและทรงผม ในขณะที่แบบจำลองโดยนัยของระบบประสาทประสบกับความไม่ยืดหยุ่นของการเสียรูปและทำให้ไร้ประสิทธิภาพ แม้ว่า 3D Gaussian ได้รับการพิสูจน์แล้วว่ามีความสามารถที่น่าหวังสำหรับการแสดงรูปทรงเรขาคณิตและการสร้างสนามความกระจ่างใส แต่การใช้ 3D Gaussian ในการสร้างอวตารของศีรษะยังคงเป็นความท้าทายที่สำคัญ เนื่องจาก 3D Gaussian เป็นเรื่องยากสำหรับการสร้างแบบจำลองความแปรผันของรูปร่างศีรษะที่เกิดจากการเปลี่ยนท่าทางและการแสดงออก ในบทความนี้ เราขอแนะนำ PSAvatar ซึ่งเป็นเฟรมเวิร์กใหม่สำหรับการสร้างอวาตาร์ส่วนหัวแบบเคลื่อนไหวได้ ซึ่งใช้เรขาคณิตดั้งเดิมแบบแยกส่วนเพื่อสร้างแบบจำลองรูปร่างที่แปรผันได้แบบพาราเมตริก และใช้ 3D Gaussian สำหรับการแสดงรายละเอียดอย่างละเอียดและการเรนเดอร์ที่มีความเที่ยงตรงสูง โมเดลรูปร่างแปรผันได้แบบพาราเมตริกคือโมเดลรูปร่างแปรผันตามจุด (PMSM) ซึ่งใช้จุดแทนตาข่ายสำหรับการแสดงภาพ 3 มิติ เพื่อเพิ่มความยืดหยุ่นในการแสดงภาพ ในขั้นแรก PMSM จะแปลงตาข่าย FLAME ให้เป็นจุดโดยการสุ่มตัวอย่างบนพื้นผิวและนอกตาข่าย เพื่อให้ไม่เพียงแต่สร้างโครงสร้างที่เหมือนพื้นผิวเท่านั้น แต่ยังรวมถึงรูปทรงเรขาคณิตที่ซับซ้อน เช่น แว่นตาและทรงผมด้วย ด้วยการจัดตำแหน่งจุดเหล่านี้กับรูปร่างของศีรษะในลักษณะการวิเคราะห์โดยการสังเคราะห์ PMSM ทำให้สามารถใช้ 3D Gaussian สำหรับการแสดงรายละเอียดและการสร้างแบบจำลองลักษณะที่ปรากฏได้อย่างละเอียด จึงทำให้สามารถสร้างอวตารที่มีความเที่ยงตรงสูงได้ เราแสดงให้เห็นว่า PSAvatar สามารถสร้างอวตารส่วนหัวที่มีความเที่ยงตรงสูงของวัตถุที่หลากหลายขึ้นมาใหม่ได้ และอวาตาร์สามารถเคลื่อนไหวได้แบบเรียลไทม์ (≥ 25 fps ที่ความละเอียด 512 × 512 ) - กระดาษ
3. Rig3DGS: การสร้างภาพบุคคลที่ควบคุมได้จากวิดีโอตาข้างเดียวทั่วไป
ผู้แต่ง : อัลเฟรโด ริเวโร, ชาห์รุค อาธาร์, จี้ซิน ชู, ดิมิทริส ซามารัส
เชิงนามธรรม
การสร้างภาพบุคคล 3 มิติที่ควบคุมได้จากวิดีโอบนสมาร์ทโฟนทั่วไปเป็นที่ต้องการอย่างมาก เนื่องจากมีคุณค่ามหาศาลในแอปพลิเคชัน AR/VR การพัฒนาล่าสุดของ 3D Gaussian Splatting (3DGS) ได้แสดงให้เห็นถึงการปรับปรุงคุณภาพการเรนเดอร์และประสิทธิภาพการฝึกอบรม อย่างไรก็ตาม ยังคงเป็นเรื่องท้าทายในการสร้างแบบจำลองและแยกการเคลื่อนไหวของศีรษะและการแสดงออกทางสีหน้าอย่างแม่นยำจากการถ่ายภาพแบบมุมมองเดียวเพื่อให้ได้การเรนเดอร์คุณภาพสูง ในบทความนี้ เราแนะนำ Rig3DGS เพื่อจัดการกับความท้าทายนี้ เรานำเสนอฉากทั้งหมด รวมถึงวัตถุแบบไดนามิก โดยใช้ชุด 3D Gaussians ในพื้นที่มาตรฐาน การใช้ชุดสัญญาณควบคุม เช่น ท่าโพสและการแสดงออก เราจะแปลงสัญญาณเหล่านี้เป็นพื้นที่ 3 มิติพร้อมการเรียนรู้การเปลี่ยนรูปเพื่อสร้างการเรนเดอร์ที่ต้องการ นวัตกรรมที่สำคัญของเราคือวิธีการเปลี่ยนรูปที่ออกแบบมาอย่างระมัดระวังซึ่งได้รับการชี้นำโดยสิ่งที่เรียนรู้ได้ก่อนหน้านี้ซึ่งได้มาจากแบบจำลอง 3 มิติที่ปรับเปลี่ยนได้ วิธีการนี้มีประสิทธิภาพสูงในการฝึกอบรมและมีประสิทธิผลในการควบคุมการแสดงออกทางสีหน้า ตำแหน่งศีรษะ และการสังเคราะห์มุมมองในการจับภาพต่างๆ เราแสดงให้เห็นถึงประสิทธิผลของการเปลี่ยนรูปแบบที่เรียนรู้ผ่านการทดลองเชิงปริมาณและเชิงคุณภาพที่ครอบคลุม - กระดาษ | หน้าโครงการ
4. HeadStudio: ข้อความเป็นอวตารศีรษะที่เคลื่อนไหวได้พร้อม 3D Gaussian Splatting
ผู้เขียน : เจิ้งหลิน โจว, ฟาน หม่า, เหอเหอ ฟาน, ยี่ หยาง
เชิงนามธรรม
การสร้างอวตารดิจิทัลจากข้อความแจ้งเป็นงานที่พึงปรารถนามานานแล้วแต่ก็ท้าทาย แม้จะมีผลลัพธ์ที่น่าหวังที่ได้รับจากการแพร่กระจายแบบ 2 มิติในงานล่าสุด แต่วิธีการปัจจุบันเผชิญกับความท้าทายในการบรรลุอวตารคุณภาพสูงและภาพเคลื่อนไหวได้อย่างมีประสิทธิภาพ ในบทความนี้ เรานำเสนอ HeadStudio ซึ่งเป็นเฟรมเวิร์กใหม่ที่ใช้ 3D Gaussian Splatting เพื่อสร้างอวตารที่สมจริงและเป็นภาพเคลื่อนไหวจากข้อความแจ้ง วิธีการของเราขับเคลื่อน 3D Gaussians ตามความหมายเพื่อสร้างรูปลักษณ์ที่ยืดหยุ่นและสามารถทำได้ผ่านการเป็นตัวแทน FLAME ระดับกลาง โดยเฉพาะอย่างยิ่ง เราได้รวม FLAME ไว้ในการนำเสนอ 3 มิติและการกลั่นคะแนน: 1) การสาดแบบเกาส์เซียน 3 มิติที่ใช้ FLAME ขับเคลื่อนจุดเกาส์เซียน 3 มิติโดยการเชื่อมโยงแต่ละจุดเข้ากับตาข่าย FLAME 2) การสุ่มตัวอย่างการกลั่นคะแนนโดยใช้ FLAME โดยใช้สัญญาณควบคุมแบบละเอียดตาม FLAME เพื่อเป็นแนวทางในการกลั่นคะแนนจากข้อความแจ้ง การทดลองอย่างกว้างขวางแสดงให้เห็นถึงประสิทธิภาพของ HeadStudio ในการสร้างอวตารที่เคลื่อนไหวได้จากการแจ้งข้อความ ซึ่งแสดงรูปลักษณ์ที่ดึงดูดสายตา รูปประจำตัวสามารถแสดงมุมมองใหม่แบบเรียลไทม์คุณภาพสูง (≥40 fps) ที่ความละเอียด 1024 สามารถควบคุมได้อย่างราบรื่นด้วยคำพูดและวิดีโอในโลกแห่งความเป็นจริง เราหวังว่า HeadStudio จะสามารถสร้างความก้าวหน้าในการสร้างอวาตาร์ดิจิทัลได้ และวิธีนี้จะสามารถนำไปใช้ในโดเมนต่างๆ อย่างกว้างขวาง - กระดาษ | หน้าโครงการ | รหัส (ยังไม่มี)
5. ImplicitDeepfake: การสลับใบหน้าที่เป็นไปได้ผ่านการสร้าง Deepfake โดยนัยโดยใช้ NeRF และ Gaussian Splatting
ผู้แต่ง : Georgii Stanishevskii, Jakub Steczkiewicz, Tomasz Szczepanik, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek
เชิงนามธรรม
เทคนิคการเรียนรู้เชิงลึกที่เกิดขึ้นใหม่จำนวนมากมีผลกระทบอย่างมากต่อคอมพิวเตอร์กราฟิก ความก้าวหน้าที่มีแนวโน้มมากที่สุดอย่างหนึ่งคือการเพิ่มขึ้นเมื่อเร็วๆ นี้ของ Neural Radiance Fields (NeRFs) และ Gaussian Splatting (GS) NeRF เข้ารหัสรูปร่างและสีของวัตถุในน้ำหนักของโครงข่ายประสาทเทียมโดยใช้รูปภาพจำนวนหนึ่งพร้อมตำแหน่งกล้องที่รู้จักเพื่อสร้างมุมมองแปลกใหม่ ในทางตรงกันข้าม GS ให้การฝึกอบรมและการอนุมานที่รวดเร็วโดยไม่ทำให้คุณภาพการเรนเดอร์ลดลงโดยการเข้ารหัสคุณลักษณะของวัตถุในชุดการแจกแจงแบบเกาส์เซียน เทคนิคทั้งสองนี้พบกรณีการใช้งานมากมายในการคำนวณเชิงพื้นที่และโดเมนอื่นๆ ในทางกลับกัน การเกิดขึ้นของวิธีการ Deepfake ได้ก่อให้เกิดความขัดแย้งอย่างมาก เทคนิคดังกล่าวอาจมีรูปแบบของวิดีโอที่สร้างโดยปัญญาประดิษฐ์ซึ่งเลียนแบบภาพจริงอย่างใกล้ชิด การใช้แบบจำลองกำเนิดสามารถปรับเปลี่ยนคุณลักษณะของใบหน้า ทำให้สามารถสร้างอัตลักษณ์ที่เปลี่ยนแปลงหรือการแสดงออกทางสีหน้าที่แสดงลักษณะที่สมจริงอย่างน่าทึ่งต่อบุคคลจริง แม้จะมีข้อโต้แย้งเหล่านี้ Deepfake ก็สามารถนำเสนอโซลูชันยุคถัดไปสำหรับการสร้างอวาตาร์และการเล่นเกมในคุณภาพที่ต้องการ ด้วยเหตุนี้ เราจึงแสดงวิธีผสมผสานเทคโนโลยีเกิดใหม่เหล่านี้เพื่อให้ได้ผลลัพธ์ที่น่าเชื่อถือยิ่งขึ้น ImplicitDeepfake1 ของเราใช้อัลกอริธึม Deepfake แบบคลาสสิกเพื่อแก้ไขภาพการฝึกทั้งหมดแยกจากกัน จากนั้นฝึก NeRF และ GS บนใบหน้าที่ถูกดัดแปลง กลยุทธ์ที่ค่อนข้างง่ายดังกล่าวสามารถสร้างอวตารที่ใช้ Deepfake 3 มิติที่เป็นไปได้ - กระดาษ | รหัส (ยังไม่มี)
6. GaussianHair: การสร้างแบบจำลองผมและการเรนเดอร์ด้วย Gaussians ที่รับรู้ถึงแสง
ผู้แต่ง : Haimin Luo, Min Ouyang, Zijun Zhao, Suyi Jiang, Longwen Zhang, Qixuan Zhang, Wei Yang, Lan Xu, Jingyi Yu
เชิงนามธรรม
ทรงผมสะท้อนถึงวัฒนธรรมและชาติพันธุ์ตั้งแต่แรกเห็น ในยุคดิจิทัล ทรงผมของมนุษย์ที่เหมือนจริงหลายแบบยังมีความสำคัญต่อทรัพย์สินของมนุษย์ดิจิทัลที่มีความเที่ยงตรงสูงในด้านความงามและการไม่แบ่งแยก อย่างไรก็ตาม การสร้างโมเดลผมที่เหมือนจริงและการเรนเดอร์ภาพเคลื่อนไหวแบบเรียลไทม์ถือเป็นความท้าทายที่น่าเกรงขาม เนื่องจากมีเส้นผมจำนวนมาก โครงสร้างทางเรขาคณิตที่ซับซ้อน และการโต้ตอบกับแสงที่ซับซ้อน บทความนี้นำเสนอ Gaussianhair ซึ่งเป็นตัวแทนของผมที่ชัดเจน มันช่วยให้การสร้างแบบจำลองที่ครอบคลุมของเรขาคณิตของเส้นผมและรูปลักษณ์จากภาพเสริมสร้างเอฟเฟกต์การส่องสว่างที่เป็นนวัตกรรมและความสามารถในการเคลื่อนไหวแบบไดนามิก หัวใจของ Gaussianhair เป็นแนวคิดใหม่ของการเป็นตัวแทนของเส้นผมแต่ละเส้นเป็นลำดับของรูปทรงกระบอก 3 มิติแบบเกาส์เซียนที่เชื่อมต่อกัน วิธีการนี้ไม่เพียง แต่ยังคงรักษาโครงสร้างและรูปลักษณ์ทางเรขาคณิตของเส้นผม แต่ยังช่วยให้การแรสเตอร์ที่มีประสิทธิภาพบนระนาบภาพ 2D ช่วยให้การเรนเดอร์ปริมาตรที่แตกต่างกัน เราปรับปรุงแบบจำลองนี้ด้วย "แบบจำลองการกระเจิงของ Gaussianhair" เชี่ยวชาญในการสร้างโครงสร้างเรียวของเส้นผมและจับสีกระจายในท้องถิ่นอย่างแม่นยำในแสงที่สม่ำเสมอ จากการทดลองอย่างกว้างขวางเรายืนยันว่า Gaussianhair บรรลุความก้าวหน้าทั้งในรูปทรงเรขาคณิตและลักษณะที่ปรากฏอยู่เหนือข้อ จำกัด ที่พบในวิธีการที่ทันสมัยสำหรับการสร้างเส้นผม นอกเหนือจากการเป็นตัวแทน Gaussianhair ขยายเพื่อสนับสนุนการแก้ไขการลดลงและการแสดงผลของเส้นผมแบบไดนามิกซึ่งนำเสนอการบูรณาการอย่างราบรื่นกับเวิร์กโฟลว์ท่อส่งสัญญาณ CG ทั่วไป การเสริมความก้าวหน้าเหล่านี้เราได้รวบรวมชุดข้อมูลที่กว้างขวางของเส้นผมมนุษย์จริงแต่ละอันมีเรขาคณิตที่มีรายละเอียดอย่างละเอียดเพื่อขับเคลื่อนการวิจัยเพิ่มเติมในสาขานี้ - กระดาษ
7. GVA: การสร้างอวตาร 3D Gaussian Vivid 3D จากวิดีโอ monocular
ผู้เขียน : Xinqi Liu, Chenming Wu, Jialun Liu, Xing Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang
เชิงนามธรรม
ในบทความนี้เรานำเสนอวิธีการใหม่ที่อำนวยความสะดวกในการสร้างอวตาร 3D Gaussian ที่สดใสจากอินพุตวิดีโอตาข้างเดียว (GVA) นวัตกรรมของเราอยู่ในการจัดการกับความท้าทายที่ซับซ้อนในการส่งมอบร่างกายมนุษย์ที่มีความเที่ยงตรงสูงและจัดแนวเกาส์ 3D กับพื้นผิวของมนุษย์อย่างแม่นยำ การมีส่วนร่วมที่สำคัญของบทความนี้คือสองเท่า ประการแรกเราแนะนำเทคนิคการปรับแต่งท่าทางเพื่อปรับปรุงความแม่นยำของมือและเท้าโดยจัดตำแหน่งแผนที่และเงาปกติ ท่าทางที่แม่นยำเป็นสิ่งสำคัญสำหรับการสร้างรูปร่างและรูปลักษณ์ที่ถูกต้อง ประการที่สองเราจัดการกับปัญหาของการรวมตัวที่ไม่สมดุลและอคติการเริ่มต้นซึ่งก่อนหน้านี้ลดคุณภาพของอวตาร 3D เกาส์เซียนผ่านวิธีการเริ่มต้นใหม่ที่นำเสนอพื้นผิวใหม่ที่ทำให้มั่นใจได้ว่าการจัดตำแหน่ง 3D Gaussian Points กับพื้นผิวอวตาร ผลการทดลองแสดงให้เห็นว่าวิธีการที่เรานำเสนอนั้นได้รับการสร้างความเที่ยงตรงสูงและมีความเป็นอยู่สูง 3D Gaussian Avatar การวิเคราะห์การทดลองอย่างกว้างขวางตรวจสอบประสิทธิภาพการทำงานในเชิงคุณภาพและเชิงปริมาณแสดงให้เห็นว่ามันได้รับประสิทธิภาพที่ล้ำสมัยในการสังเคราะห์มุมมองนวนิยายที่มีความสมจริงในขณะที่เสนอการควบคุมที่ดีต่อร่างกายมนุษย์และท่าทาง - กระดาษ | หน้าโครงการ | รหัส (ยังไม่)
8. [CVPR '24] Splattingavatar: อวตารมนุษย์แบบเรียลไทม์ที่สมจริงด้วยการสปิตติ้งแบบเกาส์ที่ฝังตาข่าย
ผู้เขียน : Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang
เชิงนามธรรม
เรานำเสนอ Splattingavatar ซึ่งเป็นตัวแทนลูกผสม 3 มิติของอวตารของมนุษย์ด้วยแสงที่มีการสปิตต์แบบเกาส์เซียนที่ฝังอยู่บนตาข่ายสามเหลี่ยมซึ่งแสดงผลมากกว่า 300 fps บน GPU ที่ทันสมัยและ 30 fps บนอุปกรณ์เคลื่อนที่ เราแยกแยะการเคลื่อนไหวและการปรากฏตัวของมนุษย์เสมือนจริงที่มีรูปทรงเรขาคณิตตาข่ายที่ชัดเจนและการสร้างแบบจำลองลักษณะที่ปรากฏโดยนัยด้วยการสปิตต์แบบเกาส์เซียน ชาวเกาส์ถูกกำหนดโดยพิกัด barycentric และการกระจัดบนตาข่ายสามเหลี่ยมเป็นพื้นผิวพงษ์ เราขยายการเพิ่มประสิทธิภาพที่ยกขึ้นเพื่อเพิ่มประสิทธิภาพพารามิเตอร์ของ Gaussians ในขณะที่เดินบนตาข่ายสามเหลี่ยม Splattingavatar เป็นตัวแทนลูกผสมของมนุษย์เสมือนจริงที่ตาข่ายแสดงถึงการเคลื่อนไหวความถี่ต่ำและการเสียรูปของพื้นผิวในขณะที่ชาวเกาส์เข้ามามีลักษณะเรขาคณิตความถี่สูงและมีลักษณะโดยละเอียด ซึ่งแตกต่างจากวิธีการเสียรูปที่มีอยู่ซึ่งขึ้นอยู่กับฟิลด์การผสมเชิงเส้นแบบเชิงเส้น MLP (LBS) สำหรับการเคลื่อนไหวเราควบคุมการหมุนและการแปลของ Gaussians โดยตรงโดย Mesh ซึ่งช่วยให้เข้ากันได้กับเทคนิคแอนิเมชั่นต่างๆเช่นภาพเคลื่อนไหวโครงร่าง และการแก้ไขตาข่าย ฝึกอบรมได้จากวิดีโอตาข้างเดียวสำหรับทั้งอวตารทั้งร่างกายและหัว Splattingavatar แสดงคุณภาพการเรนเดอร์ที่ล้ำสมัยในชุดข้อมูลหลายชุด - กระดาษ | หน้าโครงการ | รหัส | - การนำเสนอสั้น ๆ
9. Splatface: Gaussian Splat Face Reconstruction ใช้ประโยชน์จากพื้นผิวที่ปรับให้เหมาะสม
ผู้เขียน : Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang
เชิงนามธรรม
เรานำเสนอ Splatface ซึ่งเป็นกรอบการแยกแบบเกาส์เซียนที่ออกแบบมาสำหรับการสร้างใบหน้ามนุษย์ 3 มิติโดยไม่ต้องพึ่งพาเรขาคณิตที่กำหนดไว้ล่วงหน้าได้อย่างแม่นยำ วิธีการของเราได้รับการออกแบบมาเพื่อส่งมอบการแสดงผลมุมมองใหม่ที่มีคุณภาพสูงพร้อมกันและการสร้างตาข่าย 3 มิติที่แม่นยำ เรารวมโมเดล morphable 3D ทั่วไป (3DMM) เพื่อให้โครงสร้างทางเรขาคณิตพื้นผิวทำให้สามารถสร้างใบหน้าขึ้นใหม่ด้วยชุดอินพุตที่ จำกัด เราแนะนำกลยุทธ์การเพิ่มประสิทธิภาพร่วมกันซึ่งปรับแต่งทั้งแบบเกาส์และพื้นผิวที่จำเพาะได้ผ่านกระบวนการจัดตำแหน่งแบบไม่ใช้กัน ตัวชี้วัดระยะทางที่แปลกใหม่, splat-to-surface ได้รับการเสนอเพื่อปรับปรุงการจัดตำแหน่งโดยพิจารณาทั้งตำแหน่ง Gaussian และความแปรปรวนร่วม ข้อมูลพื้นผิวยังถูกนำมาใช้เพื่อรวมกระบวนการหนาแน่นของพื้นที่โลกส่งผลให้คุณภาพการฟื้นฟูที่เหนือกว่า การวิเคราะห์การทดลองของเราแสดงให้เห็นว่าวิธีการที่นำเสนอนั้นมีการแข่งขันกับเทคนิคการแยกแบบเกาส์เซียนอื่น ๆ ในการสังเคราะห์มุมมองใหม่และวิธีการสร้างใหม่ 3 มิติอื่น ๆ ในการผลิตตาข่ายใบหน้า 3 มิติที่มีความแม่นยำทางเรขาคณิตสูง - กระดาษ
10. ฮ่าฮ่า: อวตารของมนุษย์เกาส์เซียนที่มีความชัดเจนสูงพร้อมกับตาข่ายพื้นผิวก่อน
ผู้เขียน : Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang
เชิงนามธรรม
เรานำเสนอฮ่าฮ่า - วิธีการใหม่สำหรับการสร้างอวตารมนุษย์ที่เคลื่อนไหวได้จากวิดีโออินพุตตาข้างเดียว วิธีการที่นำเสนอขึ้นอยู่กับการเรียนรู้การแลกเปลี่ยนระหว่างการใช้การสปิตติ้งแบบเกาส์และตาข่ายพื้นผิวเพื่อการเรนเดอร์ที่มีประสิทธิภาพและมีความเที่ยงตรงสูง เราแสดงให้เห็นถึงประสิทธิภาพในการเคลื่อนไหวและแสดงอวตารมนุษย์เต็มรูปแบบควบคุมผ่านโมเดลพารามิเตอร์ SMPL-X โมเดลของเราเรียนรู้ที่จะใช้การสปิตต์แบบเกาส์เซียนเฉพาะในพื้นที่ของตาข่าย SMPL-X ซึ่งเป็นสิ่งจำเป็นเช่นเส้นผมและเสื้อผ้านอกตาข่าย สิ่งนี้ส่งผลให้ Gaussians มีจำนวนน้อยที่สุดที่ใช้เพื่อเป็นตัวแทนของอวตารเต็มรูปแบบและลดสิ่งประดิษฐ์การแสดงผล สิ่งนี้ช่วยให้เราสามารถจัดการกับภาพเคลื่อนไหวของส่วนต่าง ๆ ของร่างกายขนาดเล็กเช่นนิ้วที่ไม่สนใจแบบดั้งเดิม เราแสดงให้เห็นถึงประสิทธิภาพของวิธีการของเราในชุดข้อมูลเปิดสองชุด: Snapshotpeople และ X-Humans วิธีการของเราแสดงให้เห็นถึงคุณภาพการฟื้นฟูที่ตราไว้ให้กับผู้ที่ล้ำสมัยบนสแนปช็อตเป้ในขณะที่ใช้ Gaussians น้อยกว่าหนึ่งในสาม ฮ่าฮ่ามีประสิทธิภาพสูงกว่าก่อนหน้านี้ในการโพสท่านวนิยายจาก X-Humans ทั้งเชิงปริมาณและเชิงคุณภาพ - กระดาษ
11. [CVPRW '24] ตัวถอดรหัสแบบแยกเกาส์สำหรับเครือข่ายฝ่ายตรงข้ามที่รับรู้ 3D
ผู้เขียน : Florian Barthel, Arian Beckmann, Wieland Morgenstern, Anna Hilsmann, Peter Eisert
เชิงนามธรรม
เครือข่ายฝ่ายตรงข้ามแบบ 3D ที่มีพื้นฐานจาก NERF เช่น EG3D หรือ Giraffe ได้แสดงคุณภาพการเรนเดอร์ที่สูงมากภายใต้ความหลากหลายของการเป็นตัวแทนขนาดใหญ่ อย่างไรก็ตามการเรนเดอร์ด้วยความเปล่งประกายของระบบประสาททำให้เกิดความท้าทายหลายประการสำหรับแอปพลิเคชัน 3D ส่วนใหญ่: ประการแรกความต้องการการคำนวณที่สำคัญของการแสดงผล NERF ทำให้การใช้งานบนอุปกรณ์ที่ใช้พลังงานต่ำเช่นโทรศัพท์มือถือและชุดหูฟัง VR/AR ประการที่สองการเป็นตัวแทนโดยนัยที่ใช้เครือข่ายประสาทนั้นยากที่จะรวมเข้ากับฉาก 3 มิติที่ชัดเจนเช่นสภาพแวดล้อม VR หรือวิดีโอเกม 3D Gaussian Splatting (3DGS) เอาชนะข้อ จำกัด เหล่านี้โดยการแสดง 3D ที่ชัดเจนซึ่งสามารถแสดงผลได้อย่างมีประสิทธิภาพในอัตราเฟรมสูง ในงานนี้เรานำเสนอวิธีการใหม่ ๆ ที่รวมคุณภาพการแสดงผลสูงของเครือข่ายที่มีความเป็นจริงแบบ 3D ที่มีพื้นฐานจาก NERF เข้ากับความยืดหยุ่นและข้อได้เปรียบในการคำนวณของ 3DGS โดยการฝึกอบรมตัวถอดรหัสที่แผนที่การเป็นตัวแทนของ NERF โดยนัยเพื่อคุณลักษณะการแยก 3D Gaussian เราสามารถรวมความหลากหลายของการเป็นตัวแทนและคุณภาพของ Gans 3D เข้ากับระบบนิเวศของ 3D Gaussian Splatting เป็นครั้งแรก นอกจากนี้วิธีการของเรายังช่วยให้มีการผกผันของ Gan ที่มีความละเอียดสูงและการแก้ไข GAN แบบเรียลไทม์ด้วยฉากสปิตติ้งแบบเกาส์ 3D - กระดาษ | หน้าโครงการ | รหัส
12. Gomavatar: การสร้างแบบจำลองมนุษย์ที่มีประสิทธิภาพจากวิดีโอ monocular โดยใช้ Gaussians-on-mesh
ผู้เขียน : Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing, Shenlong Wang
เชิงนามธรรม
เราแนะนำ Gomavatar ซึ่งเป็นวิธีการใหม่สำหรับการสร้างแบบจำลองมนุษย์ที่มีคุณภาพสูงแบบเรียลไทม์ Gomavatar ใช้เป็นอินพุตวิดีโอเดียวเดียวเพื่อสร้างอวตารดิจิตอลที่สามารถใช้งานซ้ำในโพสท่าใหม่และการแสดงผลแบบเรียลไทม์จากมุมมองใหม่ในขณะที่รวมเข้ากับท่อกราฟิกที่ใช้แรสเตอร์ ศูนย์กลางของวิธีการของเราคือการเป็นตัวแทนของ Gaussians-on-mesh โมเดล 3D แบบไฮบริดที่รวมคุณภาพการแสดงผลและความเร็วของการแยกเกาส์เซียนกับการสร้างแบบจำลองเรขาคณิตและความเข้ากันได้ของตาข่ายที่เปลี่ยนรูปได้ เราประเมิน Gomavatar เกี่ยวกับข้อมูล ZJU-MOCAP และวิดีโอ YouTube ต่างๆ Gomavatar จับคู่หรือเหนือกว่าอัลกอริธึมการสร้างแบบจำลองมนุษย์เดียวในปัจจุบันในคุณภาพการแสดงผลและมีประสิทธิภาพสูงกว่าอย่างมีนัยสำคัญในประสิทธิภาพการคำนวณ (43 fps) ในขณะที่มีประสิทธิภาพหน่วยความจำ (3.63 MB ต่อเรื่อง) - กระดาษ | หน้าโครงการ | รหัส
13. Occgaussian: 3d Gaussian Splatting สำหรับการเรนเดอร์ของมนุษย์
ผู้เขียน : Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu
เชิงนามธรรม
การเรนเดอร์มนุษย์ 3D แบบไดนามิกจากวิดีโอตาข้างเดียวเป็นสิ่งสำคัญสำหรับแอพพลิเคชั่นต่าง ๆ เช่นความเป็นจริงเสมือนและความบันเทิงดิจิทัล วิธีการส่วนใหญ่ถือว่าผู้คนอยู่ในฉากที่ไม่มีสิ่งกีดขวางในขณะที่วัตถุต่าง ๆ อาจทำให้เกิดการอุดตันของส่วนต่าง ๆ ในสถานการณ์จริง วิธีการก่อนหน้านี้ใช้ NERF สำหรับการเรนเดอร์พื้นผิวเพื่อกู้คืนพื้นที่ที่ถูกบดบัง แต่ต้องใช้เวลามากกว่าหนึ่งวันในการฝึกอบรมและหลายวินาทีในการแสดงผลไม่ตรงตามข้อกำหนดของการใช้งานแบบอินเทอร์แอคทีฟแบบเรียลไทม์ เพื่อแก้ไขปัญหาเหล่านี้เราเสนอ Occgaussian โดยใช้การสปิตติ้งแบบเกาส์ 3D ซึ่งสามารถฝึกฝนได้ภายใน 6 นาทีและสร้างการเรนเดอร์มนุษย์ที่มีคุณภาพสูงถึง 160 fps พร้อมอินพุตที่เกิดขึ้น Occgaussian เริ่มต้นการแจกแจงแบบเกาส์ 3 มิติในพื้นที่บัญญัติและเราดำเนินการค้นหาคุณสมบัติการบดเคี้ยวในภูมิภาคที่ถูกนำไปใช้คุณสมบัติการจัดตำแหน่งพิกเซลแบบรวมจะถูกสกัดเพื่อชดเชยข้อมูลที่หายไป จากนั้นเราใช้คุณสมบัติ Gaussian MLP เพื่อประมวลผลคุณสมบัติเพิ่มเติมพร้อมกับฟังก์ชั่นการสูญเสียการบดเคี้ยวเพื่อรับรู้พื้นที่ที่ถูกบดบังได้ดีขึ้น การทดลองอย่างกว้างขวางทั้งในการจำลองและการอุดตันในโลกแห่งความเป็นจริงแสดงให้เห็นว่าวิธีการของเรานั้นสามารถเทียบเคียงได้หรือแม้กระทั่งประสิทธิภาพที่เหนือกว่าเมื่อเทียบกับวิธีการที่ทันสมัย และเราปรับปรุงการฝึกอบรมและความเร็วการอนุมาน 250x และ 800x ตามลำดับ - กระดาษ
14. [cvpr '24] เดา The Unseen: การสร้างฉาก 3 มิติแบบไดนามิกจาก Glimpses 2D บางส่วน
ผู้เขียน : Inhee Lee, Byungjun Kim, Hanbyul Joo
เชิงนามธรรม
ในบทความนี้เรานำเสนอวิธีการสร้างโลกใหม่และมนุษย์ไดนามิกหลายคนใน 3D จากอินพุตวิดีโอตาข้างเดียว ในฐานะที่เป็นแนวคิดหลักเราเป็นตัวแทนของทั้งโลกและมนุษย์หลายคนผ่านการเป็นตัวแทน 3D Gaussian Splatting (3D-GS) ที่เพิ่งเกิดขึ้นเมื่อเร็ว ๆ นี้ทำให้สามารถเขียนและทำให้พวกเขาเข้าด้วยกันได้อย่างสะดวกสบาย โดยเฉพาะอย่างยิ่งเราจัดการกับสถานการณ์ที่มีการสังเกตอย่าง จำกัด อย่างรุนแรงและเบาบางในการฟื้นฟูมนุษย์แบบ 3 มิติซึ่งเป็นความท้าทายทั่วไปที่พบในโลกแห่งความเป็นจริง เพื่อจัดการกับความท้าทายนี้เราแนะนำวิธีการใหม่ ๆ ในการเพิ่มประสิทธิภาพการเป็นตัวแทน 3D-GS ในพื้นที่ที่เป็นที่ยอมรับโดยการหลอมรวมตัวชี้นำที่กระจัดกระจายในพื้นที่ส่วนกลางซึ่งเราใช้ประโยชน์จากแบบจำลองการแพร่กระจาย 2D ที่ผ่านการฝึกอบรมมาก่อนเพื่อสังเคราะห์มุมมองที่มองไม่เห็นในขณะที่รักษาความสอดคล้อง การปรากฏตัว 2D ที่สังเกตได้ เราแสดงให้เห็นถึงวิธีการของเราสามารถสร้างมนุษย์ 3 มิติที่มีคุณภาพสูงในตัวอย่างที่ท้าทายต่าง ๆ ในการปรากฏตัวของการบดเคี้ยวพืชภาพไม่กี่นัดและการสังเกตที่กระจัดกระจายมาก หลังจากการสร้างใหม่วิธีการของเราไม่เพียง แต่แสดงฉากในมุมมองนวนิยายใด ๆ ในกรณีเวลาโดยพลการ แต่ยังแก้ไขฉาก 3 มิติโดยการลบมนุษย์แต่ละคนหรือใช้การเคลื่อนไหวที่แตกต่างกันสำหรับมนุษย์แต่ละคน จากการทดลองต่าง ๆ เราแสดงให้เห็นถึงคุณภาพและประสิทธิภาพของวิธีการของเราผ่านวิธีการทางเลือกที่มีอยู่ - กระดาษ | หน้าโครงการ | รหัส
15. [Neurips '24] Avatar หัว Gaussian Gaussian ทั่วไป
ผู้เขียน : Xuangeng Chu, Tatsuya Harada
เชิงนามธรรม
ในบทความนี้เราเสนอ Avatar Head Gaussian Head (Gagavatar) แบบทั่วไป วิธีการที่มีอยู่นั้นขึ้นอยู่กับทุ่งนาของระบบประสาทนำไปสู่การบริโภคการเรนเดอร์อย่างหนัก เพื่อจัดการกับข้อ จำกัด เหล่านี้เราจะสร้างพารามิเตอร์ของ 3D Gaussians จากภาพเดียวในการส่งต่อไปข้างหน้าเพียงครั้งเดียว นวัตกรรมที่สำคัญของงานของเราคือวิธีการยกคู่ที่นำเสนอซึ่งสร้างเกาส์ 3D ความเที่ยงตรงสูงที่จับภาพตัวตนและรายละเอียดใบหน้า นอกจากนี้เรายังใช้ประโยชน์จากคุณสมบัติของภาพทั่วโลกและแบบจำลอง 3D morphable เพื่อสร้าง 3D Gaussians สำหรับการควบคุมการแสดงออก หลังจากการฝึกอบรมแบบจำลองของเราสามารถสร้างอัตลักษณ์ที่มองไม่เห็นโดยไม่ต้องเพิ่มประสิทธิภาพเฉพาะและทำการแสดงการแสดงใหม่ด้วยความเร็วตามเวลาจริง การทดลองแสดงให้เห็นว่าวิธีการของเราแสดงประสิทธิภาพที่เหนือกว่าเมื่อเทียบกับวิธีการก่อนหน้านี้ในแง่ของคุณภาพการฟื้นฟูและความแม่นยำในการแสดงออก เราเชื่อว่าวิธีการของเราสามารถสร้างเกณฑ์มาตรฐานใหม่สำหรับการวิจัยในอนาคตและการประยุกต์ใช้ Avatars ดิจิตอลล่วงหน้า - กระดาษ | หน้าโครงการ | รหัส
16. [SIGGRAPH ASIA'24] Dualgs: Dual Gaussian Splatting สำหรับวิดีโอปริมาตรของมนุษย์เป็นศูนย์กลางที่ดื่มด่ำ
ผู้เขียน : Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu
เชิงนามธรรม
วิดีโอปริมาตรแสดงถึงความก้าวหน้าในการเปลี่ยนแปลงในสื่อภาพทำให้ผู้ใช้สามารถนำทางประสบการณ์เสมือนจริงที่ดื่มด่ำได้อย่างอิสระและทำให้ช่องว่างระหว่างโลกดิจิตอลและโลกแห่งความจริงลดลง อย่างไรก็ตามความจำเป็นในการแทรกแซงด้วยตนเองอย่างกว้างขวางเพื่อสร้างความมั่นคงให้กับลำดับตาข่ายและการสร้างสินทรัพย์ขนาดใหญ่ที่มากเกินไปในเวิร์กโฟลว์ที่มีอยู่นั้นเป็นอุปสรรคต่อการยอมรับที่กว้างขึ้น ในบทความนี้เรานำเสนอวิธีการที่ใช้แบบเกาส์ซึ่งเป็นนวนิยายขนานนาม textit {dualgs} สำหรับการเล่นแบบเรียลไทม์และความเที่ยงตรงสูงของประสิทธิภาพของมนุษย์ที่ซับซ้อนพร้อมอัตราส่วนการบีบอัดที่ยอดเยี่ยม แนวคิดสำคัญของเราใน Dualgs คือการแยกการเคลื่อนไหวและลักษณะที่ปรากฏโดยใช้ผิวหนังที่สอดคล้องกันและข้อต่อ Gaussians ความไม่พอใจอย่างชัดเจนดังกล่าวสามารถลดความซ้ำซ้อนของการเคลื่อนไหวได้อย่างมีนัยสำคัญและเพิ่มความเชื่อมโยงทางโลกอย่างมีนัยสำคัญ เราเริ่มต้นด้วยการเริ่มต้นการเริ่มต้น Dualgs และการยึด Skin Gaussians ไปยัง Gaussians ร่วมกันในเฟรมแรก ต่อจากนั้นเราใช้กลยุทธ์การฝึกอบรมแบบหยาบถึงระยะไกลสำหรับการสร้างแบบจำลองประสิทธิภาพของมนุษย์แบบเฟรมต่อเฟรม มันรวมถึงขั้นตอนการจัดตำแหน่งหยาบสำหรับการทำนายการเคลื่อนไหวโดยรวมรวมถึงการเพิ่มประสิทธิภาพที่ละเอียดสำหรับการติดตามที่แข็งแกร่งและการแสดงผลที่มีความเที่ยงตรงสูง ในการรวมวิดีโอปริมาตรเข้ากับสภาพแวดล้อม VR ได้อย่างราบรื่นเราจะบีบอัดการเคลื่อนไหวอย่างมีประสิทธิภาพโดยใช้การเข้ารหัสเอนโทรปีและลักษณะที่ปรากฏโดยใช้การบีบอัดตัวแปลงสัญญาณควบคู่ไปกับหนังสือรหัสถาวร วิธีการของเราบรรลุอัตราส่วนการบีบอัดสูงถึง 120 ครั้งเพียงต้องการพื้นที่เก็บข้อมูลประมาณ 350KB ต่อเฟรมเท่านั้น เราแสดงให้เห็นถึงประสิทธิภาพของการเป็นตัวแทนของเราผ่านประสบการณ์การถ่ายภาพและความสมจริงและฟรีในชุดหูฟัง VR ทำให้ผู้ใช้สามารถดูนักดนตรีในการแสดงและรู้สึกถึงจังหวะของโน้ตที่ปลายนิ้วของนักแสดง - กระดาษ | หน้าโครงการ | - การนำเสนอสั้น ๆ ชุดข้อมูล
17. [SIGGRAPH ASIA'24] V^3: การดูวิดีโอปริมาตรบนโทรศัพท์มือถือผ่านแบบไดนามิก 2D ที่สตรีมได้
ผู้เขียน : Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, Lan Xu
เชิงนามธรรม
การประสบกับวิดีโอปริมาตรที่มีความเที่ยงตรงสูงเช่นเดียวกับวิดีโอ 2D เป็นความฝันที่ยาวนาน อย่างไรก็ตามวิธีการ 3DGS แบบไดนามิกในปัจจุบันแม้จะมีคุณภาพการเรนเดอร์สูง แต่ก็เผชิญกับความท้าทายในการสตรีมบนอุปกรณ์มือถือเนื่องจากข้อ จำกัด การคำนวณและแบนด์วิดท์ ในบทความนี้เราแนะนำ v^3 (ดูวิดีโอปริมาตร) ซึ่งเป็นวิธีการใหม่ที่ช่วยให้การแสดงผลมือถือคุณภาพสูงผ่านการสตรีมของแบบไดนามิก Gaussians นวัตกรรมสำคัญของเราคือการดูไดนามิก 3DGS เป็นวิดีโอ 2D อำนวยความสะดวกในการใช้ตัวแปลงสัญญาณวิดีโอฮาร์ดแวร์ นอกจากนี้เรายังเสนอกลยุทธ์การฝึกอบรมสองขั้นตอนเพื่อลดความต้องการการจัดเก็บด้วยความเร็วในการฝึกอบรมอย่างรวดเร็ว ขั้นตอนแรกใช้การเข้ารหัสแฮชและ MLP ตื้นเพื่อเรียนรู้การเคลื่อนไหวจากนั้นลดจำนวน Gaussians ผ่านการตัดแต่งกิ่งเพื่อให้ตรงตามข้อกำหนดการสตรีมในขณะที่ขั้นตอนที่สองปรับคุณลักษณะ Gaussian อื่น ๆ โดยใช้การสูญเสียเอนโทรปีที่เหลือและการสูญเสียชั่วคราวเพื่อปรับปรุงความต่อเนื่องทางโลก กลยุทธ์นี้ซึ่งแยกแยะการเคลื่อนไหวและลักษณะที่ปรากฏรักษาคุณภาพการแสดงผลสูงพร้อมข้อกำหนดการจัดเก็บขนาดกะทัดรัด ในขณะเดียวกันเราได้ออกแบบผู้เล่นหลายแพลตฟอร์มเพื่อถอดรหัสและแสดงวิดีโอ Gaussian 2D การทดลองอย่างกว้างขวางแสดงให้เห็นถึงประสิทธิภาพของ V^3 ซึ่งมีประสิทธิภาพสูงกว่าวิธีอื่น ๆ โดยการเปิดใช้งานการเรนเดอร์คุณภาพสูงและสตรีมบนอุปกรณ์ทั่วไปซึ่งมองไม่เห็นมาก่อน ในฐานะคนแรกที่สตรีมแบบไดนามิก Gaussians บนอุปกรณ์มือถือผู้เล่น Companion ของเรามอบประสบการณ์วิดีโอปริมาตรที่ไม่เคยเกิดขึ้นมาก่อนรวมถึงการเลื่อนอย่างราบรื่นและการแบ่งปันทันที หน้าโครงการของเราพร้อมซอร์สโค้ดมีอยู่ที่ URL HTTPS นี้ - กระดาษ | หน้าโครงการ | - การนำเสนอสั้น ๆ
2023:
1. อวตาร 3d Gaussian Driverable
ผู้เขียน : Wojciech Zielonka, Timur Bagautdinov, Shunsuke Saito, Michael Zollhöfer, Justus Thies, Javier Romero
เชิงนามธรรม
เรานำเสนอ Avatars 3D Gaussian ที่ขับเคลื่อนได้ (D3GA) แบบจำลอง 3 มิติแรกที่สามารถควบคุมได้สำหรับร่างกายมนุษย์ที่แสดงด้วย Splats แบบเกาส์เซียน อวตารที่สามารถขับได้ด้วยแสงในปัจจุบันต้องการการลงทะเบียน 3 มิติที่แม่นยำระหว่างการฝึกอบรมภาพอินพุตที่หนาแน่นระหว่างการทดสอบหรือทั้งสองอย่าง คนที่อยู่บนพื้นฐานของการเปล่งประกายของระบบประสาทก็มีแนวโน้มที่จะช้าลงอย่างห้ามสำหรับการใช้งาน telepresence งานนี้ใช้เทคนิค 3D Gaussian Splatting (3DGS) ที่นำเสนอเมื่อเร็ว ๆ นี้เพื่อแสดงผลมนุษย์ที่สมจริงในกรอบเรียลไทม์โดยใช้วิดีโอหลายมุมมองที่สอบเทียบหนาแน่นเป็นอินพุต เพื่อเปลี่ยนรูปแบบดั้งเดิมเหล่านั้นเราออกจากวิธีการเปลี่ยนรูปแบบจุดที่ใช้กันทั่วไปของการผสมผสานเชิงเส้น (LBS) และใช้วิธีการเปลี่ยนรูปแบบปริมาตรคลาสสิก: การเปลี่ยนรูปของกรง ด้วยขนาดที่เล็กกว่าของพวกเขาเราขับเคลื่อนการเสียรูปเหล่านี้ด้วยมุมร่วมและจุดคีย์ซึ่งเหมาะสำหรับแอปพลิเคชันการสื่อสาร การทดลองของเราในเก้าวิชาที่มีรูปร่างร่างกายที่หลากหลายเสื้อผ้าและการเคลื่อนไหวได้รับผลลัพธ์ที่มีคุณภาพสูงกว่าวิธีการที่ทันสมัยเมื่อใช้การฝึกอบรมและการทดสอบข้อมูลเดียวกัน - กระดาษ | หน้าโครงการ | - การนำเสนอสั้น ๆ
2. Splatarmor: Splatting Gaussian ที่เปล่งออกมาสำหรับมนุษย์ที่เคลื่อนไหวได้จากวิดีโอ RGB Monocular
ผู้แต่ง : Rohit Jena, Ganesh Subramanian Iyer, Siddharth Choudhary, Brandon Smith, Pratik Chaudhari, James Gee
เชิงนามธรรม
เราเสนอ Splatarmor ซึ่งเป็นวิธีการใหม่สำหรับการกู้คืนแบบจำลองมนุษย์ที่มีรายละเอียดและเคลื่อนไหวได้โดย `armoring 'แบบจำลองร่างกายพารามิเตอร์ที่มี 3D Gaussians วิธีการของเราแสดงให้เห็นว่ามนุษย์เป็นชุดของ 3D Gaussians ภายในพื้นที่บัญญัติซึ่งมีการประกบโดยการขยายการสกินของเรขาคณิต SMPL พื้นฐานไปยังสถานที่ตั้งตามอำเภอใจในพื้นที่บัญญัติ เพื่ออธิบายถึงผลกระทบที่ขึ้นอยู่กับท่าทางเราแนะนำฟิลด์ SE (3) ซึ่งช่วยให้เราสามารถจับทั้งตำแหน่งและ anisotropy ของ Gaussians นอกจากนี้เรายังเสนอการใช้สนามสีประสาทเพื่อให้การทำให้เป็นมาตรฐานสีและการกำกับดูแล 3 มิติสำหรับการวางตำแหน่งที่แม่นยำของ Gaussians เหล่านี้ เราแสดงให้เห็นว่าการแยกแบบเกาส์เป็นทางเลือกที่น่าสนใจสำหรับวิธีการเรนเดอร์ของระบบประสาทโดยการเรียกใช้การแรสเตอร์แบบดั้งเดิมโดยไม่ต้องเผชิญกับความท้าทายที่ไม่แตกต่างและความท้าทายในการเพิ่มประสิทธิภาพโดยทั่วไปต้องเผชิญในวิธีการดังกล่าว กระบวนทัศน์แรสเตอร์เซชั่นช่วยให้เราสามารถใช้ประโยชน์จากการสกินไปข้างหน้าและไม่ได้รับผลกระทบจากความคลุมเครือที่เกี่ยวข้องกับการผกผันและการแปรปรวน เราแสดงผลลัพธ์ที่น่าสนใจเกี่ยวกับชุดข้อมูล ZJU MOCAP และชุดข้อมูลสแน็ปช็อตของผู้คนซึ่งเน้นย้ำถึงประสิทธิภาพของวิธีการของเราสำหรับการสังเคราะห์มนุษย์ที่ควบคุมได้ - กระดาษ | หน้าโครงการ | รหัส (ยังไม่)
3. [CVPR '24] Gaussians Animatable: การเรียนรู้แผนที่ Gaussian ขึ้นอยู่กับการสร้างแบบจำลองอวตารมนุษย์ที่มีความเที่ยงตรงสูง
ผู้เขียน : Zhe Li, Zerong Zheng, Lizhen Wang, Yebin Liu
เชิงนามธรรม
การสร้างแบบจำลองอวตารมนุษย์ที่เคลื่อนไหวได้จากวิดีโอ RGB เป็นปัญหาที่ยาวนานและท้าทาย งานล่าสุดมักจะใช้ Fields Neural Radiance Fields (NERF) ที่ใช้ MLP เพื่อเป็นตัวแทนของมนุษย์ 3 มิติ แต่ก็ยังคงเป็นเรื่องยากสำหรับ MLP ที่บริสุทธิ์ในการถดถอยรายละเอียดเสื้อผ้าขึ้นอยู่กับท่าทาง ด้วยเหตุนี้เราจึงแนะนำ Gaussians Animatable ซึ่งเป็นตัวแทนอวตารใหม่ที่ใช้ประโยชน์จาก CNNs 2D ที่ทรงพลังและ 3D Gaussian Splatting เพื่อสร้างอวตารที่มีความเที่ยงตรงสูง ในการเชื่อมโยง 3D Gaussians กับ Avatar ที่เป็นอนิเมชั่นเราได้เรียนรู้เทมเพลตพารามิเตอร์จากวิดีโออินพุตจากนั้นกำหนดพารามิเตอร์เทมเพลตบนแผนที่ Gaussian สองด้านหน้าและด้านหลังซึ่งแต่ละพิกเซลแสดงถึง 3D Gaussian เทมเพลตที่เรียนรู้นั้นปรับให้เข้ากับเสื้อผ้าที่สวมใส่สำหรับการสร้างแบบจำลองเสื้อผ้าแบบหลวม ๆ เช่นเดรส การกำหนดพารามิเตอร์ 2D แบบนำทางแบบแม่แบบดังกล่าวช่วยให้เราสามารถใช้ CNN ที่ใช้ Stylegan ที่ทรงพลังในการเรียนรู้แผนที่ Gaussian ที่ขึ้นกับท่าทางสำหรับการสร้างแบบจำลองการปรากฏตัวแบบไดนามิกโดยละเอียด นอกจากนี้เรายังแนะนำกลยุทธ์การฉายภาพสำหรับการวางนัยทั่วไปที่ดีขึ้นเนื่องจากโพสท่านวนิยาย โดยรวมแล้ววิธีการของเราสามารถสร้างอวตารที่เหมือนจริงด้วยการปรากฏตัวแบบไดนามิกสมจริงและทั่วไป การทดลองแสดงให้เห็นว่าวิธีการของเรามีประสิทธิภาพสูงกว่าวิธีการอื่น ๆ ที่ล้ำสมัย - กระดาษ | หน้าโครงการ | รหัส
4. [CVPR '24] GART: รุ่นเทมเพลต Gaussian Articulated Models
ผู้เขียน : Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis
เชิงนามธรรม
เราแนะนำ Gaussian template Model Gart ซึ่งเป็นตัวแทนที่ชัดเจนมีประสิทธิภาพและแสดงออกสำหรับการจับภาพที่ไม่แข็งขันและการแสดงผลจากวิดีโอตาข้างเดียว Gart ใช้ส่วนผสมของการเคลื่อนย้าย 3D Gaussians เพื่อประมาณเรขาคณิตและรูปลักษณ์ของเรื่องที่ผิดรูปได้อย่างชัดเจน มันใช้ประโยชน์จากโมเดลเทมเพลตหมวดหมู่ก่อน (SMPL, SMAL ฯลฯ ) ด้วยการสกินไปข้างหน้าที่เรียนรู้ได้ การ์ทสามารถสร้างขึ้นใหม่ผ่านการเรนเดอร์ที่แตกต่างจากวิดีโอตาข้างเดียวในไม่กี่วินาทีหรือนาทีและแสดงผลในนวนิยายที่เร็วกว่า 150fps - กระดาษ | หน้าโครงการ | รหัส | - การนำเสนอสั้น ๆ
5. [CVPR '24] การแยก Gaussian Human: การเรนเดอร์ตามเวลาจริงของอวตารแอนิเมชั่น
ผู้เขียน : Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero
เชิงนามธรรม
งานนี้กล่าวถึงปัญหาของการเรนเดอร์ตามเวลาจริงของอวตารร่างกายมนุษย์ที่เรียนรู้จากวิดีโอหลายมุมมอง ในขณะที่วิธีการแบบคลาสสิกในการจำลองและทำให้มนุษย์เสมือนจริงใช้ตาข่ายพื้นผิวการวิจัยล่าสุดได้พัฒนาตัวแทนของระบบประสาทที่บรรลุคุณภาพการมองเห็นที่น่าประทับใจ อย่างไรก็ตามโมเดลเหล่านี้ยากที่จะแสดงผลแบบเรียลไทม์และคุณภาพของพวกเขาจะลดลงเมื่อตัวละครมีภาพเคลื่อนไหวด้วยร่างกายที่แตกต่างจากการสังเกตการฝึกอบรม เราเสนอแบบจำลองมนุษย์ที่เคลื่อนไหวได้โดยใช้การสปิตติ้งแบบเกาส์ 3 มิติซึ่งเพิ่งเกิดขึ้นเมื่อเร็ว ๆ นี้ว่าเป็นทางเลือกที่มีประสิทธิภาพมากสำหรับสนามความเปล่งประกายของระบบประสาท ร่างกายถูกแสดงโดยชุดของ Gaussian primitives ในพื้นที่บัญญัติซึ่งผิดรูปด้วยวิธีที่หยาบไปจนถึงวิธีการที่ดีซึ่งรวมการปรับแต่งไปข้างหน้าและการปรับแต่งแบบไม่แข็งตัวในท้องถิ่น เราอธิบายถึงวิธีการเรียนรู้แบบจำลองการสปิฟตติ้ง Gaussian (HUGS) ของเราในแบบครบวงจรจากการสังเกตแบบหลายมุมมองและประเมินมันกับแนวทางที่ทันสมัยสำหรับการสังเคราะห์ท่าทางนวนิยายของร่างกายที่สวมใส่ วิธีการของเราได้รับการปรับปรุง PSNR 1.5 เดซิเบลผ่านทางชุดข้อมูล THUMAN4 ในขณะที่สามารถแสดงผลแบบเรียลไทม์ (80 fps สำหรับความละเอียด 512x512) - กระดาษ | หน้าโครงการ | - การนำเสนอสั้น ๆ
6. [CVPR '24] HUGS: Splats Gaussian Human
ผู้เขียน : Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan
เชิงนามธรรม
ความก้าวหน้าล่าสุดในการเรนเดอร์ระบบประสาทได้ดีขึ้นทั้งการฝึกอบรมและการแสดงผลตามคำสั่งของขนาด ในขณะที่วิธีการเหล่านี้แสดงให้เห็นถึงคุณภาพและความเร็วที่ล้ำสมัยพวกเขาได้รับการออกแบบมาสำหรับโฟโตแกรมของฉากคงที่และไม่ได้พูดคุยกันไม่ดีนักที่จะเคลื่อนย้ายมนุษย์ในสภาพแวดล้อมได้อย่างอิสระ ในงานนี้เราแนะนำ Splats Gaussian Human (กอด) ที่แสดงถึงมนุษย์ที่เคลื่อนไหวได้พร้อมกับฉากโดยใช้ 3D Gaussian Splatting (3DGs) วิธีการของเราใช้เพียงวิดีโอตาข้างเดียวที่มีเฟรม (50-100) จำนวนน้อยและเรียนรู้โดยอัตโนมัติเพื่อคลี่คลายฉากคงที่และอวตารมนุษย์ที่เคลื่อนไหวได้อย่างเต็มที่ภายใน 30 นาที เราใช้แบบจำลองร่างกาย SMPL เพื่อเริ่มต้นชาวเกาส์มนุษย์ ในการจับรายละเอียดที่ไม่ได้เป็นแบบจำลองโดย SMPL (เช่นผ้าขน) เราอนุญาตให้ Gaussians 3D เบี่ยงเบนจากแบบจำลองร่างกายมนุษย์ การใช้ 3D Gaussians สำหรับมนุษย์เคลื่อนไหวนำมาซึ่งความท้าทายใหม่ ๆ รวมถึงสิ่งประดิษฐ์ที่สร้างขึ้นเมื่อพูดคุยกับ Gaussians เราเสนอให้เพิ่มประสิทธิภาพการผสมผสานของน้ำหนักที่ผสมผสานเชิงเส้นเพื่อประสานการเคลื่อนไหวของแต่ละเกาส์ในระหว่างการเคลื่อนไหว วิธีการของเราช่วยให้การสังเคราะห์ใหม่ของมนุษย์และการสังเคราะห์มุมมองใหม่ของทั้งมนุษย์และฉาก เราบรรลุคุณภาพการแสดงผลที่ล้ำสมัยด้วยความเร็วในการแสดงผล 60 fps ในขณะที่ ~ 100x เร็วขึ้นในการฝึกอบรมงานก่อนหน้านี้ - กระดาษ | หน้าโครงการ | รหัส (ยังไม่)
7. [CVPR '24] แผนที่เชลล์เกาส์เซียนสำหรับมนุษย์รุ่น 3 มิติที่มีประสิทธิภาพ
ผู้เขียน : Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-yan Yeung, Gordon Wetzstein
เชิงนามธรรม
การสร้างมนุษย์ดิจิตอล 3 มิติที่มีประสิทธิภาพมีความสำคัญในหลายอุตสาหกรรมรวมถึงความเป็นจริงเสมือนจริงสื่อสังคมออนไลน์และการผลิตภาพยนตร์ เครือข่ายฝ่ายตรงข้าม 3D (GANS) ได้แสดงให้เห็นถึงคุณภาพและความหลากหลายที่ทันสมัย (SOTA) สำหรับสินทรัพย์ที่สร้างขึ้น อย่างไรก็ตามสถาปัตยกรรม 3D Gan ในปัจจุบันโดยทั่วไปจะพึ่งพาการเป็นตัวแทนระดับเสียงซึ่งช้าในการแสดงผลดังนั้นจึงขัดขวางการฝึกอบรม GAN และต้องใช้ตัวตัวอย่าง 2D แบบหลายมุมมอง ที่นี่เราแนะนำแผนที่เชลล์ Gaussian (GSMS) เป็นเฟรมเวิร์กที่เชื่อมต่อสถาปัตยกรรมเครือข่าย Sota Generator กับ 3D Gaussian Primitives ที่เกิดขึ้นใหม่ ในการตั้งค่านี้ CNN จะสร้างสแต็กพื้นผิว 3 มิติพร้อมคุณสมบัติที่แมปกับเชลล์ หลังเป็นตัวแทนของพื้นผิวเทมเพลตที่พองตัวและยุบของมนุษย์ดิจิตอลในท่าทางที่เป็นที่ยอมรับ แทนที่จะแรสเตอร์เชลล์โดยตรงเราจะสุ่มตัวอย่าง Gaussians 3D บนเปลือกหอยที่มีการเข้ารหัสแอตทริบิวต์ในคุณสมบัติพื้นผิว Gaussians เหล่านี้มีประสิทธิภาพและแตกต่างกันอย่างมีประสิทธิภาพ ความสามารถในการสื่อสารเปลือกหอยเป็นสิ่งสำคัญในระหว่างการฝึกอบรม GAN และในเวลาอนุมานเพื่อทำให้ร่างกายเปลี่ยนรูปเป็นโพสท่าที่ผู้ใช้กำหนดโดยพลการ รูปแบบการเรนเดอร์ที่มีประสิทธิภาพของเราหลีกเลี่ยงความจำเป็นในการเพิ่มมุมมองที่ไม่สอดคล้องกันและประสบความสำเร็จในการเรนเดอร์ที่สอดคล้องกันหลายมุมมองสูงที่ความละเอียดดั้งเดิมของ 512 × 512 พิกเซล เราแสดงให้เห็นว่า GSMS สร้างมนุษย์ 3 มิติได้สำเร็จเมื่อได้รับการฝึกฝนในชุดข้อมูลมุมมองเดียวรวมถึง SHHQ และ DeepFashion - กระดาษ | หน้าโครงการ | รหัส
8. Gaussianhead: อวตารหัวที่มีความเที่ยงตรงสูงพร้อมกับ Gaussian ที่เรียนรู้ได้
ผู้แต่ง : Jie Wang, Jiu-Cheng Xie, Xianyan Li, Chi-Man Pun, Feng Xu, Hao Gao
เชิงนามธรรม
การสร้างอวตารหัว 3D ที่สดใสสำหรับวิชาที่ได้รับและตระหนักถึงชุดของแอนิเมชั่นบนพวกเขานั้นมีค่า แต่ท้าทาย บทความนี้นำเสนอ Gaussianhead ซึ่งเป็นแบบจำลองหัวมนุษย์แอ็คชั่นกับ Anisotropic 3D Gaussians ในเฟรมเวิร์กของเราฟิลด์การเปลี่ยนรูปแบบการเคลื่อนที่และสามระนาบการแก้ปัญหาจะถูกสร้างขึ้นตามลำดับเพื่อจัดการกับรูปทรงเรขาคณิตแบบไดนามิกของศีรษะและพื้นผิวที่ซับซ้อน โดยเฉพาะอย่างยิ่งเรากำหนดรูปแบบการสืบทอดพิเศษในแต่ละเกาส์เซียนซึ่งสร้าง doppelgangers หลายตัวผ่านชุดพารามิเตอร์ที่เรียนรู้ได้สำหรับการแปลงตำแหน่ง ด้วยการออกแบบนี้เราสามารถเข้ารหัสข้อมูลลักษณะที่ปรากฏของ Gaussians ได้อย่างถูกต้องและแม่นยำแม้กระทั่งส่วนประกอบเฉพาะของศีรษะที่เหมาะสมกับโครงสร้างที่ซับซ้อน นอกจากนี้กลยุทธ์การสืบทอดที่สืบทอดมาสำหรับ Gaussians ที่เพิ่มเข้ามาใหม่นั้นได้รับการยอมรับเพื่ออำนวยความสะดวกในการเร่งความเร็วการฝึกอบรม การทดลองอย่างกว้างขวางแสดงให้เห็นว่าวิธีการของเราสามารถสร้างการเรนเดอร์ที่มีความเที่ยงตรงสูงมีประสิทธิภาพสูงกว่าแนวทางที่ทันสมัยในการสร้างใหม่การสร้างซ้ำข้ามอัตลักษณ์และงานการสังเคราะห์มุมมองใหม่ - กระดาษ | หน้าโครงการ | รหัส
9. [CVPR '24] Gaussianavatars: Avatars Head Photorealistic กับ Gaussians 3D Rigged
ผู้เขียน : Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias Nießner
เชิงนามธรรม
เราแนะนำ Gaussianavatars ซึ่งเป็นวิธีการใหม่ในการสร้างอวตารหัวแสงที่สามารถควบคุมได้อย่างเต็มที่ในแง่ของการแสดงออกท่าทางและมุมมอง แนวคิดหลักคือการเป็นตัวแทน 3D แบบไดนามิกโดยอิงจากส่วนแบ่ง 3D เกาส์เซียนที่ถูกยึดติดกับโมเดลใบหน้าแบบพารามิเตอร์แบบ morphable การรวมกันนี้อำนวยความสะดวกในการเรนเดอร์เรนเดอเรียลิสต์ในขณะที่อนุญาตให้มีการควบคุมภาพเคลื่อนไหวที่แม่นยำผ่านโมเดลพารามิเตอร์พื้นฐานเช่นผ่านการถ่ายโอนการแสดงออกจากลำดับการขับขี่หรือโดยการเปลี่ยนพารามิเตอร์โมเดล morphable ด้วยตนเอง เรากำหนดพารามิเตอร์แต่ละ splat โดยเฟรมพิกัดท้องถิ่นของสามเหลี่ยมและปรับให้เหมาะสมสำหรับการชดเชยการกระจัดที่ชัดเจนเพื่อให้ได้การแสดงทางเรขาคณิตที่แม่นยำยิ่งขึ้น ในระหว่างการสร้างใหม่ของ Avatar เราได้ปรับให้เหมาะสมสำหรับพารามิเตอร์แบบจำลองที่จำเพาะได้และพารามิเตอร์ Splat Gaussian ในรูปแบบ end-to-end เราแสดงให้เห็นถึงความสามารถในการเคลื่อนไหวของอวตารแสงของเราในสถานการณ์ที่ท้าทายหลายประการ ตัวอย่างเช่นเราแสดง reenactments จากวิดีโอการขับขี่ซึ่งวิธีการของเรามีประสิทธิภาพสูงกว่าผลงานที่มีอยู่โดยอัตรากำไรขั้นต้นที่สำคัญ - กระดาษ | หน้าโครงการ | รหัส | - การนำเสนอสั้น ๆ
10. [CVPR '24] GPS-GAUSSIAN: การแยกพิกเซลพิกเซล 3D แบบ Gaussian สำหรับการสังเคราะห์มุมมองของมนุษย์แบบเรียลไทม์แบบเรียลไทม์
ผู้เขียน : Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu
เชิงนามธรรม
เรานำเสนอวิธีการใหม่ที่เรียกว่า GPS-Gaussian สำหรับการสังเคราะห์มุมมองนวนิยายของตัวละครในลักษณะเรียลไทม์ วิธีการที่เสนอช่วยให้การเรนเดอร์ความละเอียด 2K ภายใต้การตั้งค่ากล้องแบบเบาบาง ซึ่งแตกต่างจากวิธีการเรนเดอร์แบบเกาส์แบบดั้งเดิมหรือวิธีการเรนเดอร์โดยปริยายของระบบประสาทที่จำเป็นต่อการเพิ่มประสิทธิภาพตามหัวข้อเราแนะนำแผนที่พารามิเตอร์แบบเกาส์เซียนที่กำหนดไว้ในมุมมองแหล่งที่มา ด้วยเหตุนี้เราจึงฝึกอบรมโมดูลการถดถอยพารามิเตอร์เกาส์ของเราในข้อมูลการสแกนของมนุษย์จำนวนมากร่วมกับโมดูลการประมาณความลึกเพื่อยกแผนที่พารามิเตอร์ 2D ไปยังพื้นที่ 3D เฟรมเวิร์กที่เสนอนั้นมีความแตกต่างอย่างเต็มที่และการทดลองในชุดข้อมูลหลายชุดแสดงให้เห็นว่าวิธีการของเรามีประสิทธิภาพสูงกว่าวิธีการที่ทันสมัยในขณะที่บรรลุความเร็วในการเรนเดอร์เกิน - กระดาษ | หน้าโครงการ | รหัส | - การนำเสนอสั้น ๆ
11. Gauhuman: Gaussian Splatting จากวิดีโอมนุษย์ธรรมดา
ผู้เขียน : Shoukang Hu Ziwei Liu
เชิงนามธรรม
We present, GauHuman, a 3D human model with Gaussian Splatting for both fast training (1~2 minutes) and real-time rendering (up to 189 FPS), compared with existing NeRF-based implicit representation modelling frameworks demanding hours of training and seconds of rendering per frame. Specifically, GauHuman encodes Gaussian Splatting in the canonical space and transforms 3D Gaussians from canonical space to posed space with linear blend skinning (LBS), in which effective pose and LBS refinement modules are designed to learn fine details of 3D humans under negligible computational cost. Moreover, to enable fast optimization of GauHuman, we initialize and prune 3D Gaussians with 3D human prior, while splitting/cloning via KL divergence guidance, along with a novel merge operation for further speeding up. Extensive experiments on ZJU_Mocap and MonoCap datasets demonstrate that GauHuman achieves state-of-the-art performance quantitatively and qualitatively with fast training and real-time rendering speed. Notably, without sacrificing rendering quality, GauHuman can fast model the 3D human performer with ~13k 3D Gaussians. - กระดาษ | Project Page | Code | - Short Presentation
12. HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Authors : Helisa Dhamo, Yinyu Nie, Arthur Moreau, Jifei Song, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero
เชิงนามธรรม
3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by the advances in differentiable rendering and neural radiance fields. Real-time rendering is a highly desirable goal for real-world applications. We propose HeadGaS, the first model to use 3D Gaussian Splats (3DGS) for 3D head reconstruction and animation. In this paper we introduce a hybrid model that extends the explicit representation from 3DGS with a base of learnable latent features, which can be linearly blended with low-dimensional parameters from parametric head models to obtain expression-dependent final color and opacity values. We demonstrate that HeadGaS delivers state-of-the-art results in real-time inference frame rates, which surpasses baselines by up to ~2dB, while accelerating rendering speed by over x10. - กระดาษ
13. [CVPR '24] HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Authors : Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu
เชิงนามธรรม
We have recently seen tremendous progress in photo-real human modeling and rendering. Yet, efficiently rendering realistic human performance and integrating it into the rasterization pipeline remains challenging. In this paper, we present HiFi4G, an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage. Our core intuition is to marry the 3D Gaussian representation with non-rigid tracking, achieving a compact and compression-friendly representation. We first propose a dual-graph mechanism to obtain motion priors, with a coarse deformation graph for effective initialization and a fine-grained Gaussian graph to enforce subsequent constraints. Then, we utilize a 4D Gaussian optimization scheme with adaptive spatial-temporal regularizers to effectively balance the non-rigid prior and Gaussian updating. We also present a companion compression scheme with residual compensation for immersive experiences on various platforms. It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame. Extensive experiments demonstrate the effectiveness of our approach, which significantly outperforms existing approaches in terms of optimization speed, rendering quality, and storage overhead. - กระดาษ | Project Page | - Short Presentation | ชุดข้อมูล
14. [CVPR '24] GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Authors : Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, Liqiang Nie
เชิงนามธรรม
We present GaussianAvatar, an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video. We start by introducing animatable 3D Gaussians to explicitly represent humans in various poses and clothing styles. Such an explicit and animatable representation can fuse 3D appearances more efficiently and consistently from 2D observations. Our representation is further augmented with dynamic properties to support pose-dependent appearance modeling, where a dynamic appearance network along with an optimizable feature tensor is designed to learn the motion-to-appearance mapping. Moreover, by leveraging the differentiable motion condition, our method enables a joint optimization of motions and appearances during avatar modeling, which helps to tackle the long-standing issue of inaccurate motion estimation in monocular settings. The efficacy of GaussianAvatar is validated on both the public dataset and our collected dataset, demonstrating its superior performances in terms of appearance quality and rendering efficiency. - กระดาษ | Project Page | Code | - Short Presentation
15. [CVPR '24] FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding
Authors : Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang
เชิงนามธรรม
We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. - กระดาษ | Project Page | รหัส
16. [CVPR '24] Relightable Gaussian Codec Avatars
Authors : Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam
เชิงนามธรรม
The fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with spatially all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars. - กระดาษ | Project Page
17. MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar
Authors : Yufan Chen, Lizhen Wang, Qijing Li, Hongjiang Xiao, Shengping Zhang, Hongxun Yao, Yebin Liu
เชิงนามธรรม
The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
18. [CVPR '24] ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Authors : Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann
เชิงนามธรรม
Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
19. [CVPR '24] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Authors : Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang
เชิงนามธรรม
We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively. - กระดาษ | Project Page | Code | - Short Presentation
20. [CVPR '24] GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Authors : Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal
เชิงนามธรรม
Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (eg, flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (eg, colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution. - กระดาษ | Project Page | - Short Presentation
21. Deformable 3D Gaussian Splatting for Animatable Human Avatars
Authors : HyunJun Jung, Nikolas Brasch, Jifei Song, Eduardo Perez-Pellitero, Yiren Zhou, Zhihao Li, Nassir Navab, Benjamin Busam
เชิงนามธรรม
Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually. - กระดาษ
22. Human101: Training 100+FPS Human Gaussians in 100s from 1 View
Authors : Mingwei Li, Jiachen Tao, Zongxin Yang, Yi Yang
เชิงนามธรรม
Reconstructing the human body from single-view videos plays a pivotal role in the virtual reality domain. One prevalent application scenario necessitates the rapid reconstruction of high-fidelity 3D digital humans while simultaneously ensuring real-time rendering and interaction. Existing methods often struggle to fulfill both requirements. In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS. Our method leverages the strengths of 3D Gaussian Splatting, which provides an explicit and efficient representation of 3D humans. Standing apart from prior NeRF-based pipelines, Human101 ingeniously applies a Human-centric Forward Gaussian Animation method to deform the parameters of 3D Gaussians, thereby enhancing rendering speed (ie, rendering 1024-resolution images at an impressive 60+ FPS and rendering 512-resolution images at 100+ FPS). Experimental results indicate that our approach substantially eclipses current methods, clocking up to a 10 times surge in frames per second and delivering comparable or superior rendering quality. - กระดาษ | Project Page | Code (not yet)
23. [CVPR '24] Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Authors : Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, Yebin Liu
เชิงนามธรรม
Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions. - กระดาษ | Project Page | - Code | - Short Presentation
24. HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
Authors : Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu
เชิงนามธรรม
Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat that predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis. Project page: https://humansplat.github.io/. - กระดาษ | Project Page
Classic work:
1. A Generalization of Algebraic Surface Drawing
Authors : James F. Blinn
Comment: : First paper rendering 3D gaussians.
เชิงนามธรรม
The mathematical description of three-dimensional surfaces usually falls into one of two classifications: parametric and implicit. An implicit surface is defined to be all points which satisfy some equation F (x, y, z) = 0. This form is ideally suited for image space shaded picture drawing; the pixel coordinates are substituted for x and y, and the equation is solved for z. Algorithms for drawing such objects have been developed primarily for first- and second-order polynomial functions, a subcategory known as algebraic surfaces. This paper presents a new algorithm applicable to other functional forms, in particular to the summation of several Gaussian density distributions. The algorithm was created to model electron density maps of molecular structures, but it can be used for other artistically interesting shapes. - กระดาษ
2. Approximate Differentiable Rendering with Algebraic Surfaces
Authors : Leonid Keselman and Martial Hebert
Comment: : First paper to do differentiable rendering optimization of 3D gaussians.
เชิงนามธรรม
Differentiable renderers provide a direct mathematical link between an object's 3D representation and images of that object. In this work, we develop an approximate differentiable renderer for a compact, interpretable representation, which we call Fuzzy Metaballs. Our approximate renderer focuses on rendering shapes via depth maps and silhouettes. It sacrifices fidelity for utility, producing fast runtimes and high-quality gradient information that can be used to solve vision tasks. Compared to mesh-based differentiable renderers, our method has forward passes that are 5x faster and backwards passes that are 30x faster. The depth maps and silhouette images generated by our method are smooth and defined everywhere. In our evaluation of differentiable renderers for pose estimation, we show that our method is the only one comparable to classic techniques. In shape from silhouette, our method performs well using only gradient descent and a per-pixel loss, without any surrogate losses or regularization. These reconstructions work well even on natural video sequences with segmentation artifacts. - กระดาษ | Project Page | Code | - Short Presentation
3. Unbiased Gradient Estimation for Differentiable Surface Splatting via Poisson Sampling
Authors : Jan U. Müller, Michael Weinmann, Reinhard Klein
Comment: Builds 2D screen-space gaussians from underlying 3D representations.
เชิงนามธรรม
We propose an efficient and GPU-accelerated sampling framework which enables unbiased gradient approximation for differentiable point cloud rendering based on surface splatting. Our framework models the contribution of a point to the rendered image as a probability distribution. We derive an unbiased approximative gradient for the rendering function within this model. To efficiently evaluate the proposed sample estimate, we introduce a tree-based data-structure which employs multi-pole methods to draw samples in near linear time. Our gradient estimator allows us to avoid regularization required by previous methods, leading to a more faithful shape recovery from images. Furthermore, we validate that these improvements are applicable to real-world applications by refining the camera poses and point cloud obtained from a real-time SLAM system. Finally, employing our framework in a neural rendering setting optimizes both the point cloud and network parameters, highlighting the framework's ability to enhance data driven approaches. - Paper Code
4. Generating and Real-Time Rendering of Clouds
Authors : Petr Man
Comment: Splatting of anisotropic gaussians. Basically a non-differentiable implementation of 3DGS.
เชิงนามธรรม
This paper presents a method for generation and real-time rendering of static clouds. Perlin noise function generates three dimensional map of a cloud. We also present a twopass rendering algorithm that performs physically based approximation. In the first preprocessed phase it computes multiple forward scattering. In the second phase first order anisotropic scattering at runtime is evaluated. The generated map is stored as voxels and is unsuitable for the real-time rendering. We introduce a more suitable inner representation of cloud that approximates the original map and contains much less information. The cloud is then represented by a set of metaballs (spheres) with parameters such as center positions, radii and density values. The main contribution of this paper is to propose a method, that transforms the original cloud map to the inner representation. This method uses the Radial Basis Function (RBF) neural network. - กระดาษ
Compression:
2024:
1. [I3D '24] Reducing the Memory Footprint of 3D Gaussian Splatting
Authors : Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, George Drettakis
เชิงนามธรรม
3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and realtime rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolutionaware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a ×27 reduction in overall size on disk on the standard datasets we tested, along with a x1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device (see Fig. 1). - กระดาษ | Project Page | Code (not yet)
2. [CVPR '24] Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Authors : Simon Niedermayr, Josef Stumpfegger, Rüdiger Westermann
เชิงนามธรรม
Recently, high-fidelity scene reconstruction with an optimized 3D Gaussian splat representation has been introduced for novel view synthesis from sparse image sets. Making such representations suitable for applications like network streaming and rendering on low-power devices requires significantly reduced memory consumption as well as improved rendering efficiency. We propose a compressed 3D Gaussian splat representation that utilizes sensitivity-aware vector clustering with quantization-aware training to compress directional colors and Gaussian parameters. The learned codebooks have low bitrates and achieve a compression rate of up to 31× on real-world scenes with only minimal degradation of visual quality. We demonstrate that the compressed splat representation can be efficiently rendered with hardware rasterization on lightweight GPUs at up to 4× higher framerates than reported via an optimized GPU compute pipeline. Extensive experiments across multiple datasets demonstrate the robustness and rendering speed of the proposed approach. - กระดาษ | Project Page | รหัส
3. HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Authors : Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin
เชิงนามธรรม
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over 75× compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over 11× size reduction over SOTA 3DGS compression approach Scaffold-GS. - กระดาษ | Project Page | รหัส
4. [ECCV '24] End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Authors : Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen
เชิงนามธรรม
3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible and continuous rate control. RDO-Gaussian addresses two main issues that exist in current schemes: 1) Different from prior endeavors that minimize the rate under the fixed distortion, we introduce dynamic pruning and entropy-constrained vector quantization (ECVQ) that optimize the rate and distortion at the same เวลา. 2) Previous works treat the colors of each Gaussian equally, while we model the colors of different regions and materials with learnable numbers of parameters. We verify our method on both real and synthetic scenes, showcasing that RDO-Gaussian greatly reduces the size of 3D Gaussian over 40×, and surpasses existing methods in rate-distortion performance. - กระดาษ | Project Page | รหัส
5. 3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods
Authors : Milena T. Bagdasarian, Paul Knoll, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern
เชิงนามธรรม
We present a work-in-progress survey on 3D Gaussian Splatting compression methods, focusing on their statistical performance across various benchmarks. This survey aims to facilitate comparability by summarizing key statistics of different compression approaches in a tabulated format. The datasets evaluated include TanksAndTemples, MipNeRF360, DeepBlending, and SyntheticNeRF. For each method, we report the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and the resultant size in megabytes (MB), as provided by the respective authors . This is an ongoing, open project, and we invite contributions from the research community as GitHub issues or pull requests. Please visit http://wm.github.io/3dgs-compression-survey/ for more information and a sortable version of the table. - กระดาษ | Project Page
6. LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming
Authors : Yuang Shi, Simone Gasparini, Géraldine Morin, Wei Tsang Ooi,
เชิงนามธรรม
The rise of Extended Reality (XR) requires efficient streaming of 3D online worlds, challenging current 3DGS representations to adapt to bandwidth-constrained environments. We propose LapisGS, a layered 3DGS that supports adaptive streaming and progressive rendering. Our method constructs a layered structure for cumulative representation, incorporates dynamic opacity optimization to maintain visual fidelity, and utilizes occupancy maps to efficiently manage Gaussian splats. This proposed model offers a progressive representation supporting a continuous rendering quality adapted for bandwidth-aware streaming. Extensive experiments validate the effectiveness of our approach in balancing visual fidelity with the compactness of the model, with up to 50.71% improvement in SSIM, 286.53% improvement in LPIPS, and 318.41% reduction in model size, and shows its potential for bandwidth-adapted 3D streaming and rendering applications. - กระดาษ | Project Page
7. Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation
Authors : Minye Wu, Tinne Tuytelaars
เชิงนามธรรม
Recent advancements in photo-realistic novel view synthesis have been significantly driven by Gaussian Splatting (3DGS). Nevertheless, the explicit nature of 3DGS data entails considerable storage requirements, highlighting a pressing need for more efficient data representations. To address this, we present Implicit Gaussian Splatting (IGS), an innovative hybrid model that integrates explicit point clouds with implicit feature embeddings through a multi-level tri-plane architecture. This architecture features 2D feature grids at various resolutions across different levels, facilitating continuous spatial domain representation and enhancing spatial correlations among Gaussian primitives. Building upon this foundation, we introduce a level-based progressive training scheme, which incorporates explicit spatial regularization. This method capitalizes on spatial correlations to enhance both the rendering quality and the compactness of the IGS representation. Furthermore, we propose a novel compression pipeline tailored for both point clouds and 2D feature grids, considering the entropy variations across different levels. Extensive experimental evaluations demonstrate that our algorithm can deliver high-quality rendering using only a few MBs, effectively balancing storage efficiency and rendering fidelity, and yielding results that are competitive with the state-of-the-art. - กระดาษ | รหัส
2023:
1. LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS
Authors : Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
เชิงนามธรรม
Recent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting come with a substantial storage overhead caused by growing the SfM points to millions, often demanding gigabyte-level disk space for a single unbounded scene, posing significant scalability challenges and hindering the splatting efficiency. To address this challenge, we introduce LightGaussian, a novel method designed to transform 3D Gaussians into a more efficient and compact format. Drawing inspiration from the concept of Network Pruning, LightGaussian identifies Gaussians that are insignificant in contributing to the scene reconstruction and adopts a pruning and recovery process, effectively reducing redundancy in Gaussian counts while preserving visual effects. Additionally, LightGaussian employs distillation and pseudo-view augmentation to distill spherical harmonics to a lower degree, allowing knowledge transfer to more compact representations while maintaining reflectance. Furthermore, we propose a hybrid scheme, VecTree Quantization, to quantize all attributes, resulting in lower bitwidth representations with minimal accuracy losses. In summary, LightGaussian achieves an averaged compression rate over 15x while boosting the FPS from 139 to 215, enabling an efficient representation of complex scenes on Mip-NeRF 360, Tank and Temple datasets. - กระดาษ | Project Page | Code | - Short Presentation
2. Compact3D: Compressing Gaussian Splat Radiance Field Models with Vector Quantization
Authors : KL Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash
เชิงนามธรรม
3D Gaussian Splatting is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with a drawback in the much larger storage demand compared to NeRF methods since it needs to store the parameters for several 3D Gaussians. We notice that many Gaussians may share similar parameters, so we introduce a simple vector quantization method based on kmeans algorithm to quantize the Gaussian parameters. Then, we store the small codebook along with the index of the code for each Gaussian. Moreover, we compress the indices further by sorting them and using a method similar to run-length encoding. We do extensive experiments on standard benchmarks as well as a new benchmark which is an order of magnitude larger than the standard benchmarks. We show that our simple yet effective method can reduce the storage cost for the original 3D Gaussian Splatting method by a factor of almost 20× with a very small drop in the quality of rendered images. - กระดาษ | รหัส
3. [CVPR '24] Compact 3D Gaussian Representation for Radiance Field
Authors : Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park
เชิงนามธรรม
Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. In our extensive experiments, we consistently show over 10× reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. - กระดาษ | Project Page | รหัส
4. [ECCV '24] Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Authors : Wieland Morgenstern, Florian Barthel, Anna Hilsmann, Peter Eisert
เชิงนามธรรม
3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it utilizes efficient rasterization allowing for very fast rendering at high-quality. However, the storage size is significantly higher, which hinders practical deployment, eg on resource constrained devices. In this paper, we introduce a compact scene representation organizing the parameters of 3D Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a drastic reduction in storage requirements without compromising visual quality during rendering. Central to our idea is the explicit exploitation of perceptual redundancies present in natural scenes. In essence, the inherent nature of a scene allows for numerous permutations of Gaussian parameters to equivalently represent it. To this end, we propose a novel highly parallel algorithm that regularly arranges the high-dimensional Gaussian parameters into a 2D grid while preserving their neighborhood structure. During training, we further enforce local smoothness between the sorted parameters in the grid. The uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless integration with established renderers. Our method achieves a reduction factor of 17x to 42x in size for complex scenes with no increase in training time, marking a substantial leap forward in the domain of 3D scene distribution and consumption. - กระดาษ | Project Page | รหัส
Diffusion:
2024:
1. AGG: Amortized Generative 3D Gaussians for Single Image to 3D
Authors : Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat
เชิงนามธรรม
Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster. - กระดาษ | Project Page| - Short Presentation
2. Fast Dynamic 3D Object Generation from a Single-view Video
Authors : Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang
เชิงนามธรรม
Generating dynamic three-dimensional (3D) object from a single-view video is challenging due to the lack of 4D labeled data. Existing methods extend text-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling, but they are slow and expensive to scale (eg, 150 minutes per object) due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this limitation, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly train a novel 4D Gaussian splatting model with explicit point cloud geometry, enabling real-time rendering under continuous camera trajectories. Extensive experiments on synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the same level of innovative view synthesis quality. For example, Efficient4D takes only 14 minutes to model a dynamic object. - กระดาษ | Project Page | Code | - Short Presentation
3. GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
เชิงนามธรรม
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods. - กระดาษ | Project Page | Code | - Short Presentation
4.LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-view Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: (1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. (2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation. - กระดาษ | Project Page | รหัส
5. GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Authors : Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
เชิงนามธรรม
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an object-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene . Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. - กระดาษ | Project Page | Code (not yet)
6. IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
Authors : Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
เชิงนามธรรม
Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In this paper, we further explore the design space of text-to-3D models. We significantly improve multi-view generation by considering video instead of image generators. Combined with a 3D reconstruction algorithm which, by using Gaussian splatting, can optimize a robust image-based loss, we directly produce high-quality 3D outputs from the generated views. Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x, resulting in a much more efficient pipeline, better quality, fewer geometric inconsistencies, and higher yield of usable 3D assets. - กระดาษ
7. Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Authors : Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
เชิงนามธรรม
While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content. - กระดาษ | Project Page | รหัส
8. Hyper-3DG:Text-to-3D Gaussian Generation via Hypergraph
Authors : Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gao
เชิงนามธรรม
Text-to-3D generation represents an exciting field that has seen rapid advancements, facilitating the transformation of textual descriptions into detailed 3D models. However, current progress often neglects the intricate high-order correlation of geometry and texture within 3D objects, leading to challenges such as over-smoothness, over-saturation and the Janus problem. In this work, we propose a method named ``3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects. Our framework is anchored by a well-established mainflow and an essential module, named ``Geometry and Texture Hypergraph Refiner (HGRefiner)''. This module not only refines the representation of 3D Gaussians but also accelerates the update process of these 3D Gaussians by conducting the Patch-3DGS Hypergraph Learning on both explicit attributes and latent visual features. Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation. Extensive experimentation has shown that our proposed method significantly enhances the quality of 3D generation while incurring no additional computational overhead for the underlying framework. - กระดาษ | Code (not yet)
9. DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Authors : Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik Hang Lee, Pengyuan Zhou
เชิงนามธรรม
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors, increasingly capturing the attention of both academic and industry circles. Despite significant progress, current methods still struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework that leverages Formation Pattern Sampling (FPS) for core structuring, augmented with a strategic camera sampling and supported by holistic object-environment integration to overcome these hurdles. FPS, guided by the formation patterns of 3D objects, employs multi-timesteps sampling to quickly form semantically rich, high-quality representations, uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. The camera sampling strategy incorporates a progressive three-stage approach, specifically designed for both indoor and outdoor settings, to effectively ensure scene-wide 3D consistency. DreamScene enhances scene editing flexibility by combining objects and environments, enabling targeted adjustments. Extensive experiments showcase DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. - กระดาษ | Project Page | Code (not yet)
10. FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
Authors : Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang
เชิงนามธรรม
Reconstructing detailed 3D objects from single-view images remains a challenging task due to the limited information available. In this paper, we introduce FDGaussian, a novel two-stage framework for single-image 3D reconstruction. Recent methods typically utilize pre-trained 2D diffusion models to generate plausible novel views from the input image, yet they encounter issues with either multi-view inconsistency or lack of geometric fidelity. To overcome these challenges, we propose an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, enabling the generation of consistent multi-view images. Moreover, we further accelerate the state-of-the-art Gaussian Splatting incorporating epipolar attention to fuse images from different viewpoints. We demonstrate that FDGaussian generates images with high consistency across different views and reconstructs high-quality 3D objects, both qualitatively and quantitatively. - กระดาษ | Project Page
11. BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors
Authors : Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, Baoquan Chen
เชิงนามธรรม
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods. - กระดาษ
12. BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis
Authors : Lutao Jiang, Lin Wang
เชิงนามธรรม
Text-to-3D synthesis has recently seen intriguing advances by combining the text-to-image models with 3D representation methods, eg, Gaussian Splatting (GS), via Score Distillation Sampling (SDS). However, a hurdle of existing methods is the low efficiency, per-prompt optimization for a single 3D object. Therefore, it is imperative for a paradigm shift from per-prompt optimization to one-stage generation for any unseen text prompts, which yet remains challenging. A hurdle is how to directly generate a set of millions of 3D Gaussians to represent a 3D object. This paper presents BrightDreamer, an end-to-end single-stage approach that can achieve generalizable and fast (77 ms) text-to-3D generation. Our key idea is to formulate the generation process as estimating the 3D deformation from an anchor shape with predefined positions. For this, we first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions, used as the centers (one attribute) of 3D Gaussians. To estimate the other four attributes (ie, scaling, rotation, opacity, and SH coefficient), we then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object. The center of each Gaussian enables us to transform the triplane feature into the four attributes. The generated 3D Gaussians can be finally rendered at 705 frames per second. Extensive experiments demonstrate the superiority of our method over existing methods. Also, BrightDreamer possesses a strong semantic understanding capability even for complex text prompts. - กระดาษ | Project Page | รหัส
13. GVGEN: Text-to-3D Generation with Volumetric Representation
Authors : Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He
เชิงนามธรรม
In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed (∼7 seconds), effectively striking a balance between quality and efficiency. - กระดาษ | Project Page | Code (not yet)
14. SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
Authors : Jaihoon Kim, Juil Koo, Kyeongmin Yeo, Minhyuk Sung
เชิงนามธรรม
We introduce a general framework for generating diverse visual content, including ambiguous images, panorama images, mesh textures, and Gaussian splat textures, by synchronizing multiple diffusion processes. We present exhaustive investigation into all possible scenarios for synchronizing multiple diffusion processes through a canonical space and analyze their characteristics across applications. In doing so, we reveal a previously unexplored case: averaging the outputs of Tweedie's formula while conducting denoising in multiple instance spaces. This case also provides the best quality with the widest applicability to downstream tasks. We name this case SyncTweedies. In our experiments generating visual content aforementioned, we demonstrate the superior quality of generation by SyncTweedies compared to other synchronization methods, optimization-based and iterative-update-based methods. - กระดาษ | Project Page | Code (not yet)
15. STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Authors : Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao
เชิงนามธรรม
Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from 3D generation techniques, we utilize a multi-view diffusion model to initialize multi-view images anchoring on the input video frames, where the video can be either real-world captured or generated by a video diffusion model. To ensure the temporal consistency of the multi-view sequence initialization, we introduce a simple yet effective fusion strategy to leverage the first frame as a temporal anchor in the self-attention computation. With the almost consistent multi-view sequences, we then apply the score distillation sampling to optimize the 4D Gaussian point cloud. The 4D Gaussian spatting is specially crafted for the generation task, where an adaptive densification strategy is proposed to mitigate the unstable Gaussian gradient for robust optimization. Notably, the proposed pipeline does not require any pre-training or fine-tuning of diffusion networks, offering a more accessible and practical solution for the 4D generation task. Extensive experiments demonstrate that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video. - กระดาษ | Project Page | Code | - Short Presentation
16. Comp4D: LLM-Guided Compositional 4D Scene Generation
Authors : Dejia Xu, Hanwen Liang, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Plataniotis, Zhangyang Wang
เชิงนามธรรม
Recent advancements in diffusion models for 2D and 3D content creation have sparked a surge of interest in generating 4D content. However, the scarcity of 3D scene datasets constrains current methodologies to primarily object-centric generation. To overcome this limitation, we present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Utilizing Large Language Models (LLMs), the framework begins by decomposing an input text prompt into distinct entities and maps out their trajectories. It then constructs the compositional 4D scene by accurately positioning these objects along their designated paths. To refine the scene, our method employs a compositional score distillation technique guided by the pre-defined trajectories, utilizing pre-trained diffusion models across text-to-image, text-to-video, and text-to-3D domains. Extensive experiments demonstrate our outstanding 4D content creation capability compared to prior arts, showcasing superior visual quality, motion fidelity, and enhanced object interactions. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
17. DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Authors : Yuanze Lin, Ronald Clark, Philip Torr
เชิงนามธรรม
We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions. - กระดาษ | Project Page | Code (not yet)
18. SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Authors : Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai
เชิงนามธรรม
Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
19. Hash3D: Training-free Acceleration for 3D Generation
Authors : Xingyi Yang, Xinchao Wang
เชิงนามธรรม
The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models, demonstrate Hash3D's versatility to speed up optimization, enhancing efficiency by 1.3 to 4 times. Additionally, Hash3D's integration with 3D Gaussian splatting largely speeds up 3D model creation, reducing text-to-3D processing to about 10 minutes and image-to-3D conversion to roughly 30 seconds. - กระดาษ | Project Page | รหัส
20. Zero-shot Point Cloud Completion Via 2D Priors
Authors : Tianxin Huang, Zhiwen Yan, Yuyang Zhao, Gim Hee Lee
เชิงนามธรรม
3D point cloud completion is designed to recover complete shapes from partially observed point clouds. Conventional completion methods typically depend on extensive point cloud data for training %, with their effectiveness often constrained to object categories similar to those seen during training. In contrast, we propose a zero-shot framework aimed at completing partially observed point clouds across any unseen categories. Leveraging point rendering via Gaussian Splatting, we develop techniques of Point Cloud Colorization and Zero-shot Fractal Completion that utilize 2D priors from pre-trained diffusion models to infer missing regions. Experimental results on both synthetic and real-world scanned point clouds demonstrate that our approach outperforms existing methods in completing a variety of objects without any requirement for specific training data. - กระดาษ
21. [ECCV '24] DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Authors : Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi
เชิงนามธรรม
The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360∘ scene generation pipeline that facilitates the creation of comprehensive 360∘ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary "flat" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360∘ perspective, providing an enhanced immersive experience over existing techniques. - กระดาษ | Project Page | Code | - Short Presentation
22. RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
Authors : Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi
เชิงนามธรรม
We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image - กระดาษ | Project Page | Code (not yet)
23. GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
Authors : Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo
เชิงนามธรรม
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. Existing radiance representations either require an implicit feature decoder, which significantly degrades the modeling power of the representation, or are spatially unstructured, making them difficult to integrate with mainstream 3D diffusion methods. We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting using a fixed number of free Gaussians, and then rearranging these Gaussians into a predefined voxel grid via Optimal Transport. Since GaussianCube is a structured grid representation, it allows us to use standard 3D U-Net as our backbone in diffusion modeling without elaborate designs. More importantly, the high-accuracy fitting of the Gaussians allows us to achieve a high-quality representation with orders of magnitude fewer parameters than previous structured representations for comparable quality, ranging from one to two orders of magnitude. The compactness of GaussianCube greatly eases the difficulty of 3D generative modeling. Extensive experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D synthesis all show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a highly accurate and versatile radiance representation for 3D generative modeling. - กระดาษ | Project Page | รหัส
24. 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Authors : Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee
เชิงนามธรรม
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal deformation based on the canonical representation to capture dynamic interactions in the reference video. The pipeline facilitates the generation of dynamic scenes with enhanced photorealism and structural integrity, viewable from multiple perspectives, thereby setting a new standard in 4D scene generation. - กระดาษ | Project Page | Code (not yet)
2023:
1. [CVPR '24] Text-to-3D using Gaussian Splatting
Authors : Zilong Chen, Feng Wang, Huaping Liu
เชิงนามธรรม
In this paper, we present Gaussian Splatting based text-to-3D generation (GSGEN), a novel approach for generating high-quality 3D objects. Previous methods suffer from inaccurate geometry and limited fidelity due to the absence of 3D prior and proper representation. We leverage 3D Gaussian Splatting, a recent state-of-the-art representation, to address existing shortcomings by exploiting the explicit nature that enables the incorporation of 3D prior. Specifically, our method adopts a pro- gressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative refinement to enrich details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D content with delicate details and more accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. - กระดาษ | Project Page | Code | - Short Presentation | - Explanation Video
2. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Authors : Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng
เชิงนามธรรม
Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods. - กระดาษ | Project Page | Code | - Explanation Video
3. GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Authors : Taoran Yi1, Jiemin Fang, Guanjun Wu1, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Tian Qi, Xinggang Wang
เชิงนามธรรม
In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but the 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides point cloud priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our GaussianDreamer can generate a high-quality 3D instance within 25 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. - กระดาษ | Project Page | รหัส
4. GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise
Authors : Xinhai Li, Huaibin Wang, Kuo-Kun Tseng
เชิงนามธรรม
Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the amalgamation of Nerf and 2D diffusion models frequently yields oversaturated images, posing severe limitations on downstream industrial applications due to the constraints of pixelwise rendering method. Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes. - กระดาษ
5. [CVPR '24] LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Authors : Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen
เชิงนามธรรม
The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency. - กระดาษ | รหัส
6. LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Authors : Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee
เชิงนามธรรม
With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. - กระดาษ | Project Page | รหัส
7. [CVPR '24] HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Authors : Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu
เชิงนามธรรม
Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. - กระดาษ | Project Page | Code | - Short Presentation
8. CG3D: Compositional Generation for Text-to-3D
Authors : Alexander Vilesov, Pradyumna Chari, Achuta Kadambi
เชิงนามธรรม
With the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi-object scenes, (ii) inability to textually control multi-object configurations, and (iii) physically realistic scene composition. In this work, we propose CG3D, a method for compositionally generating scalable 3D assets that resolves these constraints. We find that explicit Gaussian radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes. By utilizing a guidance framework built around this explicit representation, we show state of the art results, capable of even exceeding the guiding diffusion model in terms of object combinations and physics accuracy. - กระดาษ | Project Page | - - Short Presentation
9. Learn to Optimize Denoising Scores for 3D Generation - A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting
Authors : Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu and Guosheng Lin
เชิงนามธรรม
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss. - กระดาษ | Project Page | รหัส
10. [CVPR '24] Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Authors : Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
เชิงนามธรรม
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. - กระดาษ | Project Page
11. DreamGaussian4D: Generative 4D Gaussian Splatting
Authors : Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu
เชิงนามธรรม
Remarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In this paper, we introduce DreamGaussian4D, an efficient 4D generation framework that builds on 4D Gaussian Splatting representation. Our key insight is that the explicit modeling of spatial transformations in Gaussian Splatting makes it more suitable for the 4D generation setting compared with implicit representations. DreamGaussian4D reduces the optimization time from several hours to just a few minutes, allows flexible control of the generated 3D motion, and produces animated meshes that can be efficiently rendered in 3D engines. - กระดาษ | Project Page | รหัส
12. 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
Authors : Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei
เชิงนามธรรม
Aided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs, they incur significant time and effort in prompt engineering through trial and error. This work introduces 4DGen, a novel, holistic framework for grounded 4D content generation that decomposes the 4D generation task into multiple stages. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos), thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution supervision through rendering during training, thereby facilitating high-quality 4D generation. Additionally, we employ spatial-temporal pseudo labels on anchor frames, along with seamless consistency priors implemented through 3D-aware score distillation sampling and smoothness regularizations. Compared to existing baselines, our approach yields competitive results in faithfully reconstructing input signals and realistically inferring renderings from novel viewpoints and timesteps. Most importantly, our method supports grounded generation, offering users enhanced control, a feature difficult to achieve with previous methods. - กระดาษ | Project Page | Code | - Short Presentation
13. Text2Immersion: Generative Immersive Scene with 3D Gaussian
Authors : Hao Ouyang, Kathryn Heal, Stephen Lombardi, Tiancheng Sun
เชิงนามธรรม
We introduce Text2Immersion, an elegant method for producing high-quality 3D immersive scenes from text prompts. Our proposed pipeline initiates by progressively generating a Gaussian cloud using pre-trained 2D diffusion and depth estimation models. This is followed by a refining stage on the Gaussian cloud, interpolating and refining it to enhance the details of the generated scene. Distinct from prevalent methods that focus on single object or indoor scenes, or employ zoom-out trajectories, our approach generates diverse scenes with various objects, even extending to the creation of imaginary scenes. Consequently, Text2Immersion can have wide-ranging implications for various applications such as virtual reality, game development, and automated content creation. Extensive evaluations demonstrate that our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation. - กระดาษ | Project Page | Code (not yet)
14. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Authors : Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
เชิงนามธรรม
Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch. - กระดาษ | Project Page | Code (not yet)
Dynamics and Deformation:
2024:
1. 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
Authors : Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen
เชิงนามธรรม
We consider the problem of novel view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or capturing high-fidelity renderings. In this paper, we introduce 4D Gaussian Splatting (4DGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images. As an explicit spatial-temporal representation, 4DGS demonstrates powerful capabilities for modeling complicated dynamics and fine details, especially for scenes with abrupt motions. We further implement our temporal slicing and splatting techniques in a highly optimized CUDA acceleration framework, achieving real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and 583 FPS on an RTX 4090 GPU. Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DGS, which consistently outperforms existing methods both quantitatively and qualitatively. - กระดาษ
2. GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Authors : Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann
เชิงนามธรรม
Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
3. Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction
Authors : Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li
เชิงนามธรรม
3D Gaussian Splatting (3DGS) has become an emerging tool for dynamic scene reconstruction. However, existing methods focus mainly on extending static 3DGS into a time-variant representation, while overlooking the rich motion information carried by 2D observations, thus suffering from performance degradation and model redundancy. To address the above problem, we propose a novel motion-aware enhancement framework for dynamic scene reconstruction, which mines useful motion cues from optical flow to improve different paradigms of dynamic 3DGS. Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. Then a novel flow augmentation method is introduced with additional insights into uncertainty and loss collaboration. Moreover, for the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed. We conduct extensive experiments on both multi-view and monocular scenes to verify the merits of our work. Compared with the baselines, our method shows significant superiority in both rendering quality and efficiency. - กระดาษ
4. Bridging 3D Gaussian and Mesh for Freeview Video Rendering
Authors : Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao
เชิงนามธรรม
This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the fuzzy geometry. Meanwhile, the point-based splatting (eg the 3D Gaussian Splatting) method usually produces artifacts or blurry pixels in the area with smooth geometry and sharp textures. As a result, it is difficult, even not impossible, to represent the complex and dynamic scene with a single type of primitive. To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes. Given a sequence of tracked mesh as initialization, our goal is to simultaneously optimize the mesh geometry, color texture, opacity maps, a set of 3D Gaussians, and the deformation field. At a specific time, we perform α-blending on the RGB and opacity values based on the merged and re-ordered z-buffers from mesh and 3D Gaussian rasterizations. This produces the final rendering, which is supervised by the ground-truth image. Experiments demonstrate that our approach adapts the appropriate type of primitives to represent the different parts of the dynamic scene and outperforms all the baseline methods in both quantitative and qualitative comparisons without losing render speed. - กระดาษ
5. [ECCV '24] Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
Authors : Jeongmin Bae*, Seoha Kim*, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh
เชิงนามธรรม
As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames for representing a dynamic scene. However, previous works fail to accurately reconstruct complex dynamic scenes. We attribute the failure to the design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce a local smoothness regularization for per-Gaussian embedding to improve the details in dynamic regions. - กระดาษ | Project Page | รหัส
6. DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Authors : Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki
เชิงนามธรรม
Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be inferred by re-rendering the background to match the video frames. For the object motion, we first model the object-centric deformation of the objects by leveraging rendering losses and multi-view generative priors in an object-centric frame, then optimize object-centric to world-frame transformations by comparing the rendered outputs against the perceived pixel and optical flow. Finally, we recompose the background and objects and optimize for relative object scales using monocular depth prediction guidance. We show extensive results on the challenging DAVIS, Kubric, and self-captured videos, detail some limitations, and provide future directions. Besides 4D scene generation, our results show that DreamScene4D enables accurate 2D point motion tracking by projecting the inferred 3D trajectories to 2D, while never explicitly trained to do so. - กระดาษ | Project Page | Code (not yet)
7. [CVPR '24] 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
Authors : Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai
เชิงนามธรรม
In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance. - กระดาษ | Project Page | Code (not yet)
8. MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos
Authors : Qingming Liu*, Yuan Liu*, Jiepeng Wang, Xianqiang Lv,Peng Wang, Wenping Wang, Junhui Hou†,
เชิงนามธรรม
In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms baseline methods by a significant margin.
- กระดาษ | Project Page | Code (not yet)
9. [ECCVW '24] Optimizing Dynamic NeRF and 3DGS with No Video Synchronization
Authors : Seoha Kim*, Jeongmin Bae*, Youngsik Yun, HyunSeung Son, Hahyun Lee, Gun Bang, Youngjung Uh
เชิงนามธรรม
Recent advancements in 4D scene reconstruction using dynamic NeRF and 3DGS have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame, while the multi-view images at the same frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with the field. By design, our method is applicable for various baselines, even regardless of the types of radiance fields. We conduct experiments on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance of our method. - กระดาษ
10. [NeurIPS '24] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
Authors : Ruijie Zhu*, Yanzhe Liang*, Hanzhi Chang, Jiacheng Deng, Jiahao Lu, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang
เชิงนามธรรม
Dynamic scene reconstruction is a long-term challenge in the field of 3D vision. Recently, the emergence of 3D Gaussian Splatting has provided new insights into this problem. Although subsequent efforts rapidly extend static 3D Gaussian to dynamic scenes, they often lack explicit constraints on object motion, leading to optimization difficulties and performance degradation. To address the above issues, we propose a novel deformable 3D Gaussian splatting framework called MotionGS, which explores explicit motion priors to guide the deformation of 3D Gaussians. Specifically, we first introduce an optical flow decoupling module that decouples optical flow into camera flow and motion flow, corresponding to camera movement and object motion respectively. Then the motion flow can effectively constrain the deformation of 3D Gaussians, thus simulating the motion of dynamic objects. Additionally, a camera pose refinement module is proposed to alternately optimize 3D Gaussians and camera poses, mitigating the impact of inaccurate camera poses. Extensive experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods and exhibits significant superiority in both qualitative and quantitative results. - กระดาษ | Project Page | Code (not yet)
11. [ECCV '24] DGD: Dynamic 3D Gaussians Distillation
Authors : Isaac Labe*, Noam Issachar*, Itai Lang, Sagie Benaim
เชิงนามธรรม
We tackle the task of learning dynamic 3D semantic radiance fields given a single monocular video as input. Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics. This enables the segmentation and tracking of a diverse set of 3D semantic entities, specified using a simple and intuitive interface that includes a user click or a text prompt. To this end, we present DGD, a unified 3D representation for both the appearance and semantics of a dynamic 3D scene, building upon the recently proposed dynamic 3D Gaussians representation. Our representation is optimized over time with both color and semantic information. Key to our method is the joint optimization of the appearance and semantic attributes, which jointly affect the geometric properties of the scene. We evaluate our approach in its ability to enable dense semantic 3D object tracking and demonstrate high-quality results that are fast to render, for a diverse set of scenes. - กระดาษ | Project Page | Code | - Short Presentation
12. [NeurIPS '24] Fully Explicit Dynamic Gaussian Splatting
Authors : Junoh Lee, Changyeon Won, HyunJun Jung, Inhwan Bae, Hae-Gon Jeon
เชิงนามธรรม
3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a Edited{Explicit 4D Gaussian Splatting(Ex4DGS)}. Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU. - กระดาษ | Project Page | รหัส
13. [3DV '25] EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
Authors : Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang
เชิงนามธรรม
Human activities are inherently complex, often involving numerous object interactions. To better understand these activities, it is crucial to model their interactions with the environment captured through dynamic changes. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand human-object interactions in 3D environments. However, most existing methods for human activity modeling neglect the dynamic interactions with objects, resulting in only static representations. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage the uniquely discrete nature of Gaussian Splatting and segment dynamic interactions from the background, with both having explicit representations. Our approach employs a clip-level online learning pipeline that leverages the dynamic nature of human activities, allowing us to reconstruct the temporal evolution of the scene in chronological order and track rigid object motion. EgoGaussian shows significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art. We also qualitatively demonstrate the high quality of the reconstructed models. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
14. 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement
Authors : Ziqi Lu, Jianbo Ye, John Leonard
เชิงนามธรรม
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D changes. Our method can detect changes in cluttered environments using sparse post-change images within as little as 18s, using as few as a single new image. It does not rely on depth input, user instructions, object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at https://github.com/520xyxyzq/3DGS-CD. - กระดาษ | Code (not yet)
2023:
1. [3DV '24] Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
Authors : Jonathon Luiten, Georgios Kopanas, Bastian Leibe, Deva Ramanan
เชิงนามธรรม
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we allow Gaussians to move and rotate over time while enforcing that they have persistent color, opacity, and size. By regularizing Gaussians' motion and rotation with local rigidity constraints, we show that our Dynamic 3D Gaussians correctly model the same area of physical space over time, including the rotation of that space. Dense 6-DOF tracking and dynamic reconstruction emerges naturally from persistent dynamic view synthesis, without requiring any correspondence or flow as input. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing. - กระดาษ | Project Page | Code | - Explanation Video
2. [CVPR '24] Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Authors : Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, Xiaogang Jin
เชิงนามธรรม
Implicit neural representation has opened up new avenues for dynamic scene reconstruction and rendering. Nonetheless, state-of-the-art methods of dynamic neural rendering rely heavily on these implicit representations, which frequently struggle with accurately capturing the intricate details of objects in the scene. Furthermore, implicit methods struggle to achieve real-time rendering in general dynamic scenes, limiting their use in a wide range of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using explicit 3D Gaussians and learns Gaussians in canonical space with a deformation field to model monocular dynamic scenes. We also introduced a smoothing training mechanism with no extra overhead to mitigate the impact of inaccurate poses in real datasets on the smoothness of time interpolation tasks. Through differential gaussian rasterization, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time synthesis, and real-time rendering. - กระดาษ | Project Page | รหัส
3. [CVPR '24] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Authors : Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Tian Qi, Xinggang Wang
เชิงนามธรรม
Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to maintain. We introduce the 4D Gaussian Splatting (4D-GS) to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency. An efficient deformation field is constructed to model both Gaussian motions and shape deformations. Different adjacent Gaussians are connected via a HexPlane to produce more accurate position and shape deformations. Our 4D-GS method achieves real-time rendering under high resolutions, 70 FPS at a 800×800 resolution on an RTX 3090 GPU, while maintaining comparable or higher quality than previous state-of-the-art method. - กระดาษ | Project Page | รหัส
4. Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Authors : Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, Li Zhang
เชิงนามธรรม
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics. Despite advancements in neural implicit models, limitations persist: (i) Inadequate Scene Structure: Existing methods struggle to reveal the spatial and temporal structure of dynamic scenes from directly learning the complex 6D plenoptic function. (ii) Scaling Deformation Modeling: Explicitly modeling scene element deformation becomes impractical for complex dynamics. To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling. Learning to optimize the 4D primitives enables us to synthesize novel views at any desired time with our tailored rendering routine. Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics. This approach offers simplicity, flexibility for variable-length video and end-to-end training, and efficient real-time rendering, making it suitable for capturing complex dynamic scene motions. Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency. - กระดาษ | รหัส
5. [ECCV '24] A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Authors : Kai Katsumata, Duc Minh Vo, Hideki Nakayama
เชิงนามธรรม
In novel view synthesis of scenes from multiple input views, 3D Gaussian splatting emerges as a viable alternative to existing radiance field approaches, delivering great visual quality and real-time rendering. While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerous observations per time step, due to the onus of storing 3D Gaussian parameters per time step. In this study, we present an efficient 3D Gaussian representation tailored for dynamic scenes in which we define positions and rotations as functions of time while leaving other time-invariant properties of the static 3D Gaussian unchanged. Notably, our representation reduces memory usage, which is consistent regardless of the input sequence length. Additionally, it mitigates the risk of overfitting observed frames by accounting for temporal changes. The optimization of our Gaussian representation based on image and flow reconstruction results in a powerful framework for dynamic scene view synthesis in both monocular and multi-view cases. We obtain the highest rendering speed of 118 frames per second (FPS) at a resolution of 1352×1014 with a single GPU, showing the practical usability and effectiveness of our proposed method in dynamic scene rendering scenarios - กระดาษ | Project Page | รหัส
6. DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Authors : Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis
เชิงนามธรรม
Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios. - กระดาษ | Project Page | Code (not yet)
7. [CVPR '24] Control4D: Efficient 4D Portrait Editing with Text
Authors : Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu
เชิงนามธรรม
We introduce Control4D, an innovative framework for editing dynamic 4D portraits using text instructions. Our method addresses the prevalent challenges in 4D editing, notably the inefficiencies of existing 4D representations and the inconsistent editing effect caused by diffusion-based editors. We first propose GaussianPlanes, a novel 4D representation that makes Gaussian Splatting more structured by applying plane-based decomposition in 3D space and time. This enhances both efficiency and robustness in 4D editing. Furthermore, we propose to leverage a 4D generator to learn a more continuous generation space from inconsistent edited images produced by the diffusion-based editor, which effectively improves the consistency and quality of 4D editing. Comprehensive evaluation demonstrates the superiority of Control4D, including significantly reduced training time, high-quality rendering, and spatial-temporal consistency in 4D portrait editing. - กระดาษ | Project Page | Code (not yet)
8. [CVPR '24] SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Authors : Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
เชิงนามธรรม
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications. - กระดาษ | Project Page | รหัส
9. [CVPR '24] Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Authors : Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen
เชิงนามธรรม
Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, highquality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects, maintaining 3D consistency across novel views. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues. - กระดาษ
10. [CVPR '24] Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Authors : Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao
เชิงนามธรรม
We introduce Gaussian-Flow, a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos. In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS). Specifically, a novel Dual-Domain Deformation Model (DDDM) is proposed to explicitly model attribute deformations of each Gaussian point, where the time-dependent residual of each attribute is captured by a polynomial fitting in the time domain, and a Fourier series fitting in the frequency domain. The proposed DDDM is capable of modeling complex scene deformations across long video footage, eliminating the need for training separate 3DGS for each frame or introducing an additional implicit neural field to model 3D dynamics. Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction. Our proposed approach showcases a substantial efficiency improvement, achieving a 5× faster training speed compared to the per-frame 3DGS modeling. In addition, quantitative results demonstrate that the proposed Gaussian-Flow significantly outperforms previous leading methods in novel view rendering quality. - กระดาษ | Project Page | Code (not yet)
11. [CVPR '24] CoGS: Controllable Gaussian Splatting
Authors : Heng Yu, Joel Julin, Zoltán Á. Milacski, Koichiro Niinuma, László A. Jeni
เชิงนามธรรม
Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity. - กระดาษ | Project Page | Code (not yet)
12. GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
Authors : Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao
เชิงนามธรรม
We propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering. - กระดาษ | Project Page | - Short Presentation
13. [CVPR '24] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Authors : Zhan Li, Zhang Chen, Zhong Li, Yi Xu
เชิงนามธรรม
Novel view synthesis of dynamic scenes has been an intriguing yet challenging problem. Despite recent advancements, simultaneously achieving high-resolution photorealistic results, real-time rendering, and compact storage remains a formidable task. To address these challenges, we propose Spacetime Gaussian Feature Splatting as a novel dynamic scene representation, composed of three pivotal components. First, we formulate expressive Spacetime Gaussians by enhancing 3D Gaussians with temporal opacity and parametric motion/rotation. This enables Spacetime Gaussians to capture static, dynamic, as well as transient content within a scene. Second, we introduce splatted feature rendering, which replaces spherical harmonics with neural features. These features facilitate the modeling of view- and time-dependent appearance while maintaining small size. Third, we leverage the guidance of training error and coarse depth to sample new Gaussians in areas that are challenging to converge with existing pipelines. Experiments on several established real-world datasets demonstrate that our method achieves state-of-the-art rendering quality and speed, while retaining compact storage. At 8K resolution, our lite-version model can render at 60 FPS on an Nvidia RTX 4090 GPU. - กระดาษ | Project Page | Code | - Short Presentation
14. MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes
Authors : Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, Jeffrey Ichnowski
เชิงนามธรรม
Accurate 3D tracking in highly deformable scenes with occlusions and shadows can facilitate new applications in robotics, augmented reality, and generative AI. However, tracking under these conditions is extremely challenging due to the ambiguity that arises with large deformations, shadows, and occlusions. We introduce MD-Splatting, an approach for simultaneous 3D tracking and novel view synthesis, using video captures of a dynamic scene from various camera poses. MD-Splatting builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel view synthesis. MD-Splatting learns a deformation function to project a set of Gaussians with non-metric, thus canonical, properties into metric space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on local rigidity, conservation of momentum, and isometry, which leads to trajectories with smaller trajectory errors. MD-Splatting achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. Compared to state-of-the-art, we improve 3D tracking by an average of 23.9 %, while simultaneously achieving high-quality novel view synthesis. With sufficient texture such as in scene 6, MD-Splatting achieves a median tracking error of 3.39 mm on a cloth of 1 x 1 meters in size - กระดาษ | Project Page | Code (not yet)
15. [ECCV'24] SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Authors : Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero
เชิงนามธรรม
Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer. - กระดาษ
16. [CVPR '24] 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Authors : Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing
เชิงนามธรรม
Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specificallggy, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the naïve approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods. - กระดาษ | Project Page | Code (not yet) | - 3DGStream Viewer
Editing:
2024:
1. Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Authors : Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue
เชิงนามธรรม
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before α blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and α blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. Code and trained models will be released upon acceptance. - กระดาษ
2. CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians
Authors : Bin Dou, Tianyu Zhang, Yongjia Ma, Zhaohui Wang, Zejian Yuan
เชิงนามธรรม
We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based 3D segmentation methods have relied on implicit or voxel neural scene representation and ray-marching volume rendering which are time consuming. Recent 3D Gaussian Splatting significantly improves the rendering speed, however, existing Gaussians-based segmentation methods(eg: Gaussian Grouping) fail to provide compact segmentation masks especially in zero-shot segmentation, which is mainly caused by the lack of robustness and compactness for straightforwardly assigning learnable parameters to each Gaussian when encountering inconsistent 2D machine-generated labels. Our method aims to achieve compact and reliable zero-shot scene segmentation swiftly by mapping fused spatial and semantically meaningful features for each Gaussian point with a shallow decoding network. Specifically, our method firstly optimizes Gaussian points' position, convariance and color attributes under the supervision of RGB images. After Gaussian Locating, we distill multi-scale DINO features extracted from images through unprojection to each Gaussian, which is then incorporated with spatial features from the fast point features processing network, ie RandLA-Net. Then the shallow decoding MLP is applied to the multi-scale fused features to obtain compact segmentation. Experimental results show that our model can perform high-quality zero-shot scene segmentation, as our model outperforms other segmentation methods on both semantic and panoptic segmentation task, meanwhile consumes approximately only 10% segmenting time compared to NeRF-based segmentation. - กระดาษ | Project Page | Code (not yet)
3. TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Authors : Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan
เชิงนามธรรม
Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while keeping the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively. - กระดาษ | Project Page
4. Segment Anything in 3D Gaussians
Authors : Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang
เชิงนามธรรม
3D Gaussian Splatting has emerged as an alternative 3D representation of Neural Radiance Fields (NeRFs), benefiting from its high-quality rendering results and real-time rendering speed. Considering the 3D Gaussian representation remains unparsed, it is necessary first to execute object segmentation within this domain. Subsequently, scene editing and collision detection can be performed, proving vital to a multitude of applications, such as virtual reality (VR), augmented reality (AR), game/movie production, etc. In this paper, we propose a novel approach to achieve object segmentation in 3D Gaussian via an interactive procedure without any training process and learned parameters. We refer to the proposed method as SA-GS, for Segment Anything in 3D Gaussians. Given a set of clicked points in a single input view, SA-GS can generalize SAM to achieve 3D consistent segmentation via the proposed multi-view mask generation and view-wise label assignment methods. We also propose a cross-view label-voting approach to assign labels from different views. In addition, in order to address the boundary roughness issue of segmented objects resulting from the non-negligible spatial sizes of 3D Gaussian located at the boundary, SA-GS incorporates the simple but effective Gaussian Decomposition scheme. Extensive experiments demonstrate that SA-GS achieves high-quality 3D segmentation results, which can also be easily applied for scene editing and collision detection tasks. - กระดาษ
5. GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting
Authors : Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodolà
เชิงนามธรรม
We present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models. Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware. We tackle the problem by leveraging Gaussian splatting to represent 3D scenes, and we optimize the model while progressively varying the image supervision by means of a pretrained image-based diffusion model. The input object may be given as a 3D triangular mesh, or directly provided as Gaussians from a generative model such as DreamGaussian. GSEdit ensures consistency across different viewpoints, maintaining the integrity of the original object's information. Compared to previously proposed methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. Our editing process is refined via the application of the SDS loss, ensuring that our edits are both precise and accurate. Our comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following the given textual instructions while preserving their coherence and detail. - กระดาษ
6. GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Authors : Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu
เชิงนามธรรม
We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods. - กระดาษ
7. View-Consistent 3D Editing with Gaussian Splatting
Authors : Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
เชิงนามธรรม
The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. - กระดาษ
8. Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Authors : Antoine Guédon, Vincent Lepetit
เชิงนามธรรม
We propose Gaussian Frosting, a novel mesh-based representation for high-quality rendering and editing of complex 3D effects in real-time. Our approach builds on the recent 3D Gaussian Splatting framework, which optimizes a set of 3D Gaussians to approximate a radiance field from images. We propose first extracting a base mesh from Gaussians during optimization, then building and refining an adaptive layer of Gaussians with a variable thickness around the mesh to better capture the fine details and volumetric effects near the surface, such as hair or grass. We call this layer Gaussian Frosting, as it resembles a coating of frosting on a cake. The fuzzier the material, the thicker the frosting. We also introduce a parameterization of the Gaussians to enforce them to stay inside the frosting layer and automatically adjust their parameters when deforming, rescaling, editing or animating the mesh. Our representation allows for efficient rendering using Gaussian splatting, as well as editing and animation by modifying the base mesh. We demonstrate the effectiveness of our method on various synthetic and real scenes, and show that it outperforms existing surface-based approaches. We will release our code and a web-based viewer as additional contributions. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
9. Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Authors : Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li
เชิงนามธรรม
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks. - กระดาษ | Project Page | Code (not yet)
10. EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Authors : Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney
เชิงนามธรรม
In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale. - กระดาษ | Project Page | Code (not yet)
11. InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
Authors : Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao
เชิงนามธรรม
3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion. - กระดาษ | Project Page | รหัส
12. Gaga: Group Any Gaussians via 3D-aware Memory Bank
Authors : Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang
เชิงนามธรรม
We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot segmentation models, significantly enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as scene understanding and manipulation. - กระดาษ | Project Page | รหัส
13. [CVPR W'24] ICE-G: Image Conditional Editing of 3D Gaussian Splats
Authors : Vishnu Jaganathan, Hannah Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira
เชิงนามธรรม
Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine grained control of editing. - กระดาษ | Project Page | - Short Presentation
14. Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks
Authors : Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar
เชิงนามธรรม
In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we found that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. - Preprint | Project Page | Code (Segmentation)
2023:
1. [CVPR '24] GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Authors : Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
เชิงนามธรรม
3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation technique. GaussianEditor enhances precision and control in editing through our proposed Gaussian Semantic Tracing, which traces the editing target throughout the training process. Additionally, we propose hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. - กระดาษ | Project Page | Code | - Short Presentation
2. [CVPR '24] GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Authors : Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian
เชิงนามธรรม
Recently, impressive results have been achieved in 3D scene editing with text instructions based on a 2D diffusion model. However, current diffusion models primarily generate images by predicting noise in the latent space, and the editing is usually applied to the whole image, which makes it challenging to perform delicate, especially localized, editing for 3D scenes. Inspired by recent 3D Gaussian splatting, we propose a systematic framework, named GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text instructions. Benefiting from the explicit property of 3D Gaussians, we design a series of techniques to achieve delicate editing. Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians. The Gaussian RoI is further used to control the editing process. Our framework can achieve more delicate and precise editing of 3D scenes than previous methods while enjoying much faster training speed, ie within 20 minutes on a single V100 GPU, more than twice as fast as Instruct-NeRF2NeRF (45 minutes -- 2 hours) - กระดาษ | Project Page | Code (not yet) | - Short Presentation
3. Point'n Move: Interactive Scene Object Manipulation on Gaussian Splatting Radiance Fields
Authors : Jiajun Huang, Hongchuan Yu
เชิงนามธรรม
We propose Point'n Move, a method that achieves interactive scene object manipulation with exposed region inpainting. Interactivity here further comes from intuitive object selection and real-time editing. To achieve this, we adopt Gaussian Splatting Radiance Field as the scene representation and fully leverage its explicit nature and speed advantage. Its explicit representation formulation allows us to devise a 2D prompt points to 3D mask dual-stage self-prompting segmentation algorithm, perform mask refinement and merging, minimize change as well as provide good initialization for scene inpainting and perform editing in real-time without per-editing training, all leads to superior quality and performance. We test our method by performing editing on both forward-facing and 360 scenes. We also compare our method against existing scene object removal methods, showing superior quality despite being more capable and having a speed advantage. - กระดาษ
4. [ECCV'24] Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Authors : Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke
เชิงนามธรรม
The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by SAM, along with introduced 3D spatial consistency regularization. Comparing to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization and scene recomposition. - กระดาษ | รหัส
5. Segment Any 3D Gaussians
Authors : Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
เชิงนามธรรม
Interactive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation. However, existing methods face challenges in either achieving fine-grained, multi-granularity segmentation or contending with substantial computational overhead, inhibiting real-time interaction. In this paper, we introduce Segment Any 3D GAussians (SAGA), a novel 3D interactive segmentation approach that seamlessly blends a 2D segmentation foundation model with 3D Gaussian Splatting (3DGS), a recent breakthrough of radiance fields. SAGA efficiently embeds multi-granularity 2D segmentation results generated by the segmentation foundation model into 3D Gaussian point features through well-designed contrastive training. Evaluation on existing benchmarks demonstrates that SAGA can achieve competitive performance with state-of-the-art methods. Moreover, SAGA achieves multi-granularity segmentation and accommodates various prompts, including points, scribbles, and 2D masks. Notably, SAGA can finish the 3D segmentation within milliseconds, achieving nearly 1000× acceleration1 compared to previous SOTA. - กระดาษ | Project Page | รหัส
6. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
เชิงนามธรรม
3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. - กระดาษ | Project Page | Code | - Short Presentation
7. 2D-Guided 3D Gaussian Segmentation
Authors : Kun Lan, Haoran Li, Haolin Shi, Wenjun Wu, Yong Liao, Lin Wang, Pengyuan Zhou
เชิงนามธรรม
Recently, 3D Gaussian, as an explicit 3D representation method, has demonstrated strong competitiveness over NeRF (Neural Radiance Fields) in terms of expressing complex scenes and training duration. These advantages signal a wide range of applications for 3D Gaussians in 3D understanding and editing. Meanwhile, the segmentation of 3D Gaussians is still in its infancy. The existing segmentation methods are not only cumbersome but also incapable of segmenting multiple objects simultaneously in a short amount of time. In response, this paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision. This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information, while nearest neighbor clustering and statistical filtering refine the segmentation results. Experiments show that our concise method can achieve comparable performances on mIOU and mAcc for multi-object segmentation as previous single-object segmentation methods. - กระดาษ
Language Embedding:
2024:
1. [IROS '24] Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot
Authors : Justin Yu, Kush Hari, Kishore Srinivas, Karim El-Refai, Adam Rashid, Chung Min Kim, Justin Kerr, Richard Cheng, Muhammad Zubair Irshad, Ashwin Balakrishna, Thomas Kollar, Ken Goldberg
เชิงนามธรรม
Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy. - กระดาษ | Project Page
2. [CVPR '24] Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Authors : Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, Shao-Hua Guan
เชิงนามธรรม
Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. - กระดาษ | Project Page | รหัส
3. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
เชิงนามธรรม
3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. - กระดาษ | Project Page | Code | - Short Presentation
4. [CVPR '24] LangSplat: 3D Language Gaussian Splatting
Authors : Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister
เชิงนามธรรม
Human lives in a 3D world and commonly uses natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP language embeddings in a NeRF model, LangSplat advances the field by utilizing a collection of 3D Gaussians, each encoding language features distilled from CLIP, to represent the language field. By employing a tile-based splatting technique for rendering language features, we circumvent the costly rendering process inherent in NeRF. Instead of directly learning CLIP embeddings, LangSplat first trains a scene-wise language autoencoder and then learns language features on the scene-specific latent space, thereby alleviating substantial memory demands imposed by explicit modeling. Existing methods struggle with imprecise and vague 3D language fields, which fail to discern clear boundaries between objects. We delve into this issue and propose to learn hierarchical semantics using SAM, thereby eliminating the need for extensively querying the language field across various scales and the regularization of DINO features. Extensive experiments on open-vocabulary 3D object localization and semantic segmentation demonstrate that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a {speed} × speedup compared to LERF at the resolution of 1440 × 1080. - กระดาษ | Project Page | Code | - Short Presentation
5. SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting
Authors : Xinyi Liu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi
เชิงนามธรรม
Many recent developments for robots to represent environments have focused on photorealistic reconstructions. This paper particularly focuses on generating sequences of images from the photorealistic Gaussian Splatting models, that match instructions that are given by user-inputted language. We contribute a novel framework, SplaTraj, which formulates the generation of images within photorealistic environment representations as a continuous-time trajectory optimization problem. Costs are designed so that a camera following the trajectory poses will smoothly traverse through the environment and render the specified spatial information in a photogenic manner. This is achieved by querying a photorealistic representation with language embedding to isolate regions that correspond to the user-specified inputs. These regions are then projected to the camera's view as it moves over time and a cost is constructed. We can then apply gradient-based optimization and differentiate through the rendering to optimize the trajectory for the defined cost. The resulting trajectory moves to photogenically view each of the specified objects. We empirically evaluate our approach on a suite of environments and instructions, and demonstrate the quality of generated image sequences. - กระดาษ | Code (not yet) | - Short Presentation
6. FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
Authors : Xingxing Zuo, Pouya Samangouei, Yunwen Zhou, Yan Di, Mingyang Li
เชิงนามธรรม
Precisely perceiving the geometric and semantic properties of real-world 3D objects is crucial for the continued evolution of augmented reality and robotic applications. To this end, we present algfull{} (algname{}), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection, despite that we are 851× faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. - กระดาษ
7. Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Authors : Hyunjee Lee*, Youngsik Yun*, Jeongmin Bae, Seoha Kim, Youngjung Uh
เชิงนามธรรม
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. - กระดาษ | Project Page | Code (not yet)
Mesh Extraction and Physics:
2024:
1. Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting
Authors : Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang
เชิงนามธรรม
We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. - กระดาษ | Project Page | Code (not yet) | - Short Presentation
2. GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting
Authors : Joanna Waczyńska, Piotr Borycki, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek
เชิงนามธรรม
In recent years, a range of neural network-based methods for image rendering have been introduced. For instance, widely-researched neural radiance fields (NeRF) rely on a neural network to represent 3D scenes, allowing for realistic view synthesis from a small number of 2D images. However, most NeRF models are constrained by long training and inference times. In comparison, Gaussian Splatting (GS) is a novel, state-of-theart technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and swift, real-time rendering. A drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity to condition several hundred thousand Gaussian components. To solve this, we introduce Gaussian Mesh Splatting (GaMeS) model, a hybrid of mesh and a Gaussian distribution, that pin all Gaussians splats on the object surface (mesh). The unique contribution of our methods is defining Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain high-quality renders in the real-time generation of high-quality views. Furthermore, we demonstrate that in the absence of a predefined mesh, it is possible to fine-tune the initial mesh during the learning process. - กระดาษ | รหัส
3. Mesh-based Gaussian Splatting for Real-time Large-scale Deformation
Authors : Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai
เชิงนามธรรม
Neural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion. Gaussian Splatting(GS) has recently become a promising method with explicit geometry for representing static scenes and facilitating high-quality and real-time synthesis of novel views. However,it cannot be easily deformed due to the use of discrete Gaussians and lack of explicit topology. To address this, we develop a novel GS-based method that enables interactive deformation. Our key idea is to design an innovative mesh-based GS representation, which is integrated into Gaussian learning and manipulation. 3D Gaussians are defined over an explicit mesh, and they are bound with each other: the rendering of 3D Gaussians guides the mesh face split for adaptive refinement, and the mesh face split directs the splitting of 3D Gaussians. Moreover, the explicit mesh constraints help regularize the Gaussian distribution, suppressing poor-quality Gaussians(eg misaligned Gaussians,long-narrow shaped Gaussians), thus enhancing visual quality and avoiding artifacts during deformation. Based on this representation, we further introduce a large-scale Gaussian deformation technique to enable deformable GS, which alters the parameters of 3D Gaussians according to the manipulation of the associated mesh. Our method benefits from existing mesh deformation datasets for more realistic data-driven Gaussian deformation. Extensive experiments show that our approach achieves high-quality reconstruction and effective deformation, while maintaining the promising rendering results at a high frame rate(65 FPS on average). - กระดาษ
4. Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
Authors : Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li
เชิงนามธรรม
Reconstructing and simulating elastic objects from visual observations is crucial for applications in computer vision and robotics. Existing methods, such as 3D Gaussians, provide modeling for 3D appearance and geometry but lack the ability to simulate physical properties or optimize parameters for heterogeneous objects. We propose Spring-Gaus, a novel framework that integrates 3D Gaussians with physics-based simulation for reconstructing and simulating elastic objects from multi-view videos. Our method utilizes a 3D Spring-Mass model, enabling the optimization of physical parameters at the individual point level while decoupling the learning of physics and appearance. This approach achieves great sample efficiency, enhances generalization, and reduces sensitivity to the distribution of simulation particles. We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects. This includes future prediction and simulation under varying initial states and environmental parameters. - กระดาษ | Project Page | Code (not yet)
5. Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
Authors : Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang
เชิงนามธรรม
3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namely Texture-GS, to disentangle the appearance from the geometry by representing it as a 2D texture mapped onto the 3D surface, thereby facilitating appearance editing. Technically, the disentanglement is achieved by our proposed texture mapping module, which consists of a UV mapping MLP to learn the UV coordinates for the 3D Gaussian centers, a local Taylor expansion of the MLP to efficiently approximate the UV coordinates for the ray-Gaussian intersections, and a learnable texture to capture the fine-grained appearance. Extensive experiments on the DTU dataset demonstrate that our method not only facilitates high-fidelity appearance editing but also achieves real-time rendering on consumer-level devices, eg a single RTX 2080 Ti GPU. - กระดาษ | Project Page | Code (not yet)
6. DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
Authors : Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala
เชิงนามธรรม
3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints during optimization. We extend 3D Gaussian splatting with depth and normal cues to tackle challenging indoor datasets and showcase techniques for efficient mesh extraction, an important downstream application. Specifically, we regularize the optimization procedure with depth information, enforce local smoothness of nearby Gaussians, and use the geometry of the 3D Gaussians supervised by normal cues to achieve better alignment with the true scene geometry. We improve depth estimation and novel view synthesis results over baselines and show how this simple yet effective regularization technique can be used to directly extract meshes from the Gaussian representation yielding more physically accurate reconstructions on indoor scenes. - กระดาษ | Code | Project Page
7. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
Authors : Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao
เชิงนามธรรม
3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-accurate 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering. - กระดาษ | Project Page | Code | - Short Presentation
7.1 Unofficial Implementation and Specification
Authors : Yunzhou Song, Zixuan Lin, Yexin Zhang
รหัส
8. Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
Authors : Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang
เชิงนามธรรม
Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. - กระดาษ | Project Page | Code (not yet)
9. [ECCV '24] GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Authors : Yaniv Wolf, Amit Bracha, Ron Kimmel
เชิงนามธรรม
Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results. - กระดาษ | Project Page | รหัส
10. RaDe-GS: Rasterizing Depth in Gaussian Splatting
Authors : Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, Ping Tan
เชิงนามธรรม
Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. Our approach achieves a Chamfer distance error comparable to NeuraLangelo[Li et al. 2023] on the DTU dataset and similar training and rendering time as traditional Gaussian Splatting on the Tanks & Temples dataset. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods. - กระดาษ | Project Page | Code (not yet)
11. Trim 3D Gaussian Splatting for Accurate Geometry Representation
Authors : Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang
เชิงนามธรรม
In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. - กระดาษ | Project Page | รหัส
12. Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting
Authors : Junha Hyung, Susung Hong, Sungwon Hwang, Jaeseong Lee, Jaegul Choo, Jin-Hwa Kim
เชิงนามธรรม
3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals, due to the Gaussians converging into anisotropic Gaussians with one dominant variance. We propose using effective rank analysis to examine the shape statistics of 3D Gaussian primitives, and identify the Gaussians indeed converge into needle-like shapes with the effective rank 1. To address this, we introduce effective rank as a regularization, which constrains the structure of the Gaussians. Our new regularization method enhances normal and geometry reconstruction while reducing needle-like artifacts. The approach can be integrated as an add-on module to other 3DGS variants, improving their quality without compromising visual fidelity. - กระดาษ | Project Page | Code (not yet)
13. CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Authors : Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, Zhaoxiang Zhang
เชิงนามธรรม
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10x compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs. - กระดาษ | Project Page | Code (Coming Soon)
14. [CoRL '24] Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision
Authors : Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell, Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
เชิงนามธรรม
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately We introduce Cloth-Splatting, a method for estimating 3D states of cloth from RGB images through a prediction-update framework. Cloth-Splatting leverages an action-conditioned dynamics model for predicting future states and uses 3D Gaussian Splatting to update the predicted states. Our key insight is that coupling a 3D mesh-based representation with Gaussian Splatting allows us to define a differentiable map between the cloth's state space and the image space. This enables the use of gradient-based optimization techniques to refine inaccurate state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting not only improves state estimation accuracy over current baselines but also reduces convergence time by ~85%. - กระดาษ | Project Page | รหัส
2023:
1. [CVPR '24] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Authors : Tianyi Xie, Zeshun Zong, Yuxin Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang
เชิงนามธรรม
We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, "cage meshes," or any other geometry embedding, highlighting the principle of "what you see is what you simulate (WS2)." Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. - กระดาษ | Project Page | Code | - Short Presentation
2. [CVPR '24] SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Authors : Antoine Guédon, Vincent Lepetit
เชิงนามธรรม
We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting has recently become very popular as it yields realistic rendering while being significantly faster to train than NeRFs. It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene. We then introduce a method that exploits this alignment to sample points on the real surface of the scene and extract a mesh from the Gaussians using Poisson reconstruction, which is fast, scalable, and preserves details, in contrast to the Marching Cubes algorithm usually applied to extract meshes from Neural SDFs. Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality. - กระดาษ | Project Page | Code | - Short Presentation
3. NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance
Authors : Hanlin Chen, Chen Li, Gim Hee Lee
เชิงนามธรรม
Existing neural implicit surface reconstruction methods have achieved impressive performance in multi-view 3D reconstruction by leveraging explicit geometry priors such as depth maps or point clouds as regularization. However, the reconstruction results still lack fine details because of the over-smoothed depth map or sparse point cloud. In this work, we propose a neural implicit surface reconstruction pipeline with guidance from 3D Gaussian Splatting to recover highly detailed surfaces. The advantage of 3D Gaussian Splatting is that it can generate dense point clouds with detailed structure. Nonetheless, a naive adoption of 3D Gaussian Splatting can fail since the generated points are the centers of 3D Gaussians that do not necessarily lie on the surface. We thus introduce a scale regularizer to pull the centers close to the surface by enforcing the 3D Gaussians to be extremely thin. Moreover, we propose to refine the point cloud from 3D Gaussians Splatting with the normal priors from the surface predicted by neural implicit models instead of using a fixed set of points as guidance. Consequently, the quality of surface reconstruction improves from the guidance of the more accurate 3D Gaussian splatting. By jointly optimizing the 3D Gaussian Splatting and the neural implicit model, our approach benefits from both representations and generates complete surfaces with intricate details. Experiments on Tanks and Temples verify the effectiveness of our proposed method. - กระดาษ
Misc:
2024:
1. Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting
Authors : Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White
เชิงนามธรรม
The accelerating deployment of spacecraft in orbit have generated interest in on-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possible unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. This requires robust characterization of the target's geometry. In this article, we present an approach for mapping geometries of satellites on orbit based on 3D Gaussian Splatting that can run on computing resources available on current spaceflight hardware. We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up under several realistic lighting and motion conditions. Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms. Such on-board capabilities are critical to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks. - กระดาษ
2. TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
Authors : Linus Franke, Darius Rückert, Laura Fink, Marc Stamminger
เชิงนามธรรม
Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [Rückert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. - กระดาษ | Project Page | รหัส
3. EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting
Authors : Lingting Zhu, Zhao Wang, Jiahao Cui, Zhenchao Jin, Guying Lin, Lequan Yu
เชิงนามธรรม
Surgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splatting, a recent trending 3D representation, we present EndoGS, applying Gaussian Splatting for deformable endoscopic tissue reconstruction. Specifically, our approach incorporates deformation fields to handle dynamic scenes, depth-guided supervision with spatial-temporal weight masks to optimize 3D targets with tool occlusion from a single viewpoint, and surface-aligned regularization terms to capture the much better geometry. As a result, EndoGS reconstructs and renders high-quality deformable endoscopic tissues from a single-viewpoint video, estimated depth maps, and labeled tool masks. Experiments on DaVinci robotic surgery videos demonstrate that EndoGS achieves superior rendering quality. - กระดาษ | รหัส
4. EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction
Authors : Yifan Liu, Chenxin Li, Chen Yang, Yixuan Yuan
เชิงนามธรรม
Reconstructing deformable tissues from endoscopic stereo videos is essential in many downstream surgical applications. However, existing methods suffer from slow inference speed, which greatly limits their practical use. In this paper, we introduce EndoGaussian, a real-time surgical scene reconstruction framework that builds on 3D Gaussian Splatting. Our framework represents dynamic surgical scenes as canonical Gaussians and a time-dependent deformation field, which predicts Gaussian deformations at novel timestamps. Due to the efficient Gaussian representation and parallel rendering pipeline, our framework significantly accelerates the rendering speed compared to previous methods. In addition, we design the deformation field as the combination of a lightweight encoding voxel and an extremely tiny MLP, allowing for efficient Gaussian tracking with a minor rendering burden. Furthermore, we design a holistic Gaussian initialization method to fully leverage the surface distribution prior, achieved by searching informative points from across the input image sequence. Experiments on public endoscope datasets demonstrate that our method can achieve real-time rendering speed (195 FPS real-time, 100× gain) while maintaining the state-of-the-art reconstruction quality (35.925 PSNR) and the fastest training speed (within 2 min/scene), showing significant promise for intraoperative surgery applications. - กระดาษ | Project Page | รหัส
5. GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting
Authors : Butian Xiong, Zhuo Li, Zhen Li
เชิงนามธรรม
We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km2. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information - กระดาษ
6. LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering
Authors : Sheng Hong, Junjie He, Xinhu Zheng, Hesheng Wang, Hao Fang, Kangcheng Liu, Chunran Zheng, Shaojie Shen
เชิงนามธรรม
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multi-modal sensor fused mapping system that builds on the differentiable surface splatting to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion. This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initial poses for surface Gaussian scenes are obtained using a LiDAR-inertial system with size-adaptive voxels. Then, we optimized and refined the Gaussians by visual-derived photometric gradients to optimize the quality and density of LiDAR measurements. Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality while also holding potential applicability in real-time SLAM and robotics domains. - กระดาษ | Code (not yet)
7. VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
Authors : Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao Feng, Huamin Wang, Minchen Li, Henry Lau, Feng Gao, Yin Yang, Chenfanfu Jiang
เชิงนามธรรม
As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience. - Paper | Project Page
8. Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps
Authors : Timothy Chen, Ola Shorinwa, Weijia Zeng, Joseph Bruno, Philip Dames, Mac Schwager
เชิงนามธรรม
We present Splat-Nav, a navigation pipeline that consists of a real-time safe planning module and a robust state estimation module designed to operate in the Gaussian Splatting (GSplat) environment representation, a popular emerging 3D scene representation from computer vision. We formulate rigorous collision constraints that can be computed quickly to build a guaranteed-safe polytope corridor through the map. We then optimize a B-spline trajectory through this corridor. We also develop a real-time, robust state estimation module by interpreting the GSplat representation as a point cloud. The module enables the robot to localize its global pose with zero prior knowledge from RGB-D images using point cloud alignment, and then track its own pose as it moves through the scene from RGB images using image-to-point cloud localization. We also incorporate semantics into the GSplat in order to obtain better images for localization. All of these modules operate mainly on CPU, freeing up GPU resources for tasks like real-time scene reconstruction. We demonstrate the safety and robustness of our pipeline in both simulation and hardware, where we show re-planning at 5 Hz and pose estimation at 20 Hz, an order of magnitude faster than Neural Radiance Field (NeRF)-based navigation methods, thereby enabling real-time navigation. - กระดาษ
9. Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
Authors : TYuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille
เชิงนามธรรม
X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian point cloud model inspired by the isotropic nature of X-ray imaging. Our model excludes the influence of view direction when learning to predict the radiation intensity of 3D points. Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. - กระดาษ
10. ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Authors : Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang
เชิงนามธรรม
Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots. Conventional robotic manipulation methods usually learn semantic representation of the observation for action prediction, which ignores the scene-level spatiotemporal dynamics for human goal completion. In this paper, we propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction. Specifically, we first formulate the dynamic Gaussian Splatting framework that infers the semantics propagation in the Gaussian embedding space, where the semantic representation is leveraged to predict the optimal robot action. Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction. We evaluate our ManiGaussian on 10 RLBench tasks with 166 variations, and the results demonstrate our framework can outperform the state-of-the-art methods by 13.1% in average success rate. - Paper | Project Page | รหัส
11. GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
Authors : Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang
เชิงนามธรรม
Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3× lower GPU memory usage and 5× faster fitting time not only rivals INRs (eg, WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 1000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. - กระดาษ
12. GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
Authors : Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang
เชิงนามธรรม
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (eg NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. - Paper | Code (not yet)
13. Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience
Authors : Xiaohang Yu, Zhengxian Yang, Shi Pan, Yuqi Han, Haoxiang Wang, Jun Zhang, Shi Yan, Borong Lin, Lei Yang, Tao Yu, Lu Fang
เชิงนามธรรม
We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than existing datasets, may also inspire space-oriented light field reconstruction, which is potentially different from object-centric 3D reconstruction, for immersive VR/AR experiences. We utilized a total of 40 GoPro 10 cameras, capturing images of 5k resolution. The number of photos captured for each scene is no less than 1000, and the average density (view number within a unit sphere) is 134.68. It is also worth noting that our system is capable of efficiently capturing large outdoor scenes. Addressing the current lack of large-space and dense light field datasets, we made efforts to include elements such as sky, reflections, lights and shadows that are of interest to researchers in the field of 3D reconstruction during the data capture process. Finally, we validated the effectiveness of our provided dataset on three popular algorithms and also integrated the reconstructed 3DGS results into the Unity engine, demonstrating the potential of utilizing our datasets to enhance the realism of virtual reality (VR) and create feasible interactive spaces. - กระดาษ
14. Modeling uncertainty for Gaussian Splatting
Authors : Luca Savant, Diego Valsesia, Enrico Magli
เชิงนามธรรม
We present Stochastic Gaussian Splatting (SGS): the first framework for uncertainty estimation using Gaussian Splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of Neural Radiance Fields (NeRF). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this paper, we introduce a Variational Inference-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. Additionally, we introduce the Area Under Sparsification Error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the LLFF dataset demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications. - กระดาษ
15. TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
Authors : Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu
เชิงนามธรรม
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. - กระดาษ
16. GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis
Authors : Emmanouil Nikolakakis, Utkarsh Gupta, Jonathan Vengosh, Justin Bui, Razvan Marinescu
เชิงนามธรรม
We present GaSpCT, a novel view synthesis and 3D scene representation method used to generate novel projection views for Computer Tomography (CT) scans. We adapt the Gaussian Splatting framework to enable novel view synthesis in CT based on limited sets of 2D image projections and without the need for Structure from Motion (SfM) methodologies. Therefore, we reduce the total scanning duration and the amount of radiation dose the patient receives during the scan. We adapted the loss function to our use-case by encouraging a stronger background and foreground distinction using two sparsity promoting regularizers: a beta loss and a total variation (TV) loss. Finally, we initialize the Gaussian locations across the 3D space using a uniform prior distribution of where the brain's positioning would be expected to be within the field of view. We evaluate the performance of our model using brain CT scans from the Parkinson's Progression Markers Initiative (PPMI) dataset and demonstrate that the rendered novel views closely match the original projection views of the simulated scan, and have better performance than other implicit 3D scene representations methodologies . Furthermore, we empirically observe reduced training time compared to neural network based image synthesis for sparse-view CT image reconstruction. Finally, the memory requirements of the Gaussian Splatting representations are reduced by 17% compared to the equivalent voxel grid image representations. - กระดาษ
17. Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion
Authors : Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla
เชิงนามธรรม
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360∘ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance). - กระดาษ
18. Dual-Camera Smooth Zoom on Mobile Phones
Authors : Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo
เชิงนามธรรม
When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. - กระดาษ
19. Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction
Authors : Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano
เชิงนามธรรม
Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer. - กระดาษ
20. SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera
Authors : Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang
เชิงนามธรรม
One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. - กระดาษ
21. [CVPR '24] SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection
Authors : Mathis Kruse, Marco Rudolph, Dominik Woiwode, Bodo Rosenhahn
เชิงนามธรรม
Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set. - Paper | รหัส
22. Reinforcement Learning with Generalizable Gaussian Splatting
Authors : Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu
เชิงนามธรรม
An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL. - กระดาษ
23. DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark
Authors : Tianyi Zhang, Kaining Huang, Weiming Zhi, Matthew Johnson-Roberson
เชิงนามธรรม
Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments. - Paper | Code | - Short Presentation | - Short Presentation (Bilibili)
24. Adversarial Generation of Hierarchical Gaussians for 3d Generative Model
Authors : Sangeek Hyun, Jae-Pil Heo
เชิงนามธรรม
Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. - กระดาษ | Project Page
25. Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting
Authors : Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis, Monroe Kennedy III
เชิงนามธรรม
We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. - Paper | Project Page | รหัส
26. Radiance Fields for Robotic Teleoperation
Authors : Maximum Wilder-Smith, Vaishakh Patil, Marco Hutter
เชิงนามธรรม
Radiance field methods such as Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS), have revolutionized graphics and novel view synthesis. Their ability to synthesize new viewpoints with photo-realistic quality, as well as capture complex volumetric and specular scenes, makes them an ideal visualization for robotic teleoperation setups. Direct camera teleoperation provides high-fidelity operation at the cost of maneuverability, while reconstruction-based approaches offer controllable scenes with lower fidelity. With this in mind, we propose replacing the traditional reconstruction-visualization components of the robotic teleoperation pipeline with online Radiance Fields, offering highly maneuverable scenes with photorealistic quality. As such, there are three main contributions to state of the art: (1) online training of Radiance Fields using live data from multiple cameras, (2) support for a variety of radiance methods including NeRF and 3DGS, (3) visualization suite for these methods including a virtual reality scene. To enable seamless integration with existing setups, these components were tested with multiple robots in multiple configurations and were displayed using traditional tools as well as the VR headset. The results across methods and robots were compared quantitatively to a baseline of mesh reconstruction, and a user study was conducted to compare the different visualization methods. - Paper | Project Page | รหัส
2023:
1. [ECCV '24] FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
Authors : Wen Jiang, Boshu Lei, Kostas Daniilidis
เชิงนามธรรม
This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the limited availability of 2D images poses uncertainties stemming from occlusions, depth ambiguities, and imaging errors. Efficiently selecting informative views becomes crucial, and quantifying NeRF model uncertainty presents intricate challenges. Existing approaches either depend on model architecture or are based on assumptions regarding density distributions that are not generally applicable. By leveraging Fisher Information, we efficiently quantify observed information within Radiance Fields without ground truth data. This can be used for the next best view selection and pixel-wise uncertainty quantification. Our method overcomes existing limitations on model architecture and effectiveness, achieving state-of-the-art results in both view selection and uncertainty quantification, demonstrating its potential to advance the field of Radiance Fields. Our method with the 3D Gaussian Splatting backend could perform view selections at 70 fps. - Paper | Project Page | รหัส
2. Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering
Authors : Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, Li Zhang
เชิงนามธรรม
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. - Paper | Project Page | Code (not yet)
3. MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians
Authors : Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar
เชิงนามธรรม
Understanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that can cause misalignments resulting in inaccurate contacts. We present MANUS, a method for Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians. We build a novel articulated 3D Gaussians representation that extends 3D Gaussian splatting for high-fidelity representation of articulating hands. Since our representation uses Gaussian primitives, it enables us to efficiently and accurately estimate contacts between the hand and the object. For the most accurate results, our method requires tens of camera views that current datasets do not provide. We therefore build MANUS-Grasps, a new dataset that contains hand-object grasps viewed from 53 cameras across 30+ scenes, 3 subjects, and comprising over 7M frames. In addition to extensive qualitative results, we also show that our method outperforms others on a quantitative contact evaluation method that uses paint transfer from the object to the hand. - กระดาษ
4. [CVPR '24] Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Authors : Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, Song-Hai Zhang
เชิงนามธรรม
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. - Paper | Project Page | รหัส
5. Mathematical Supplement for the gsplat Library
Authors : Vickie Ye, Angjoo Kanazawa
เชิงนามธรรม
This report provides the mathematical details of the gsplat library, a modular toolbox for efficient differentiable Gaussian splatting, as proposed by Kerbl et al. It provides a self-contained reference for the computations involved in the forward and backward passes of differentiable Gaussian splatting. To facilitate practical usage and development, we provide a user friendly Python API that exposes each component of the forward and backward passes in rasterization of [gsplat](https://github.com/nerfstudio-project/gsplat). - กระดาษ
6. PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation
Authors : Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae
เชิงนามธรรม
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. - Paper | Project Page | Code (not yet)
Regularization and Optimization:
2024:
1. DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines
Authors : Sankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Nandita Vijaykumar
เชิงนามธรรม
Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (eg 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods have been demonstrated to be very promising, providing state-of-art quality for many important tasks. However, training a model to represent a scene is still a time-consuming task even when using powerful GPUs. In this work, we observe that the gradient computation phase during training is a significant bottleneck on GPUs due to the large number of atomic operations that need to be processed. These atomic operations overwhelm atomic units in the L2 partitions causing stalls. To address this challenge, we leverage the observations that during the gradient computation: (1) for most warps, all threads atomically update the same memory locations; and (2) warps generate varying amounts of atomic traffic (since some threads may be inactive). We propose DISTWAR, a software-approach to accelerate atomic operations based on two key ideas: First, we enable warp-level reduction of threads at the SM sub-cores using registers to leverage the locality in intra-warp atomic updates. Second, we distribute the atomic computation between the warp-level reduction at the SM and the L2 atomic units to increase the throughput of atomic computation. Warps with many threads performing atomic updates to the same memory locations are scheduled at the SM, and the rest using L2 atomic units. We implement DISTWAR using existing warp-level primitives. We evaluate DISTWAR on widely used raster-based differentiable rendering workloads. We demonstrate significant speedups of 2.44x on average (up to 5.7x). - กระดาษ
2. [CVPR '24] FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Authors : Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing
เชิงนามธรรม
3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (eg, Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently. - กระดาษ
3. RAIN-GS: Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting
Authors : Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kim
เชิงนามธรรม
3D Gaussian splatting (3DGS) has recently demonstrated impressive capabilities in real-time novel view synthesis and 3D reconstruction. However, 3DGS heavily depends on the accurate initialization derived from Structure-from-Motion (SfM) methods. When trained with randomly initialized point clouds, 3DGS often fails to maintain its ability to produce high-quality images, undergoing large performance drops of 4-5 dB in PSNR in general. Through extensive analysis of SfM initialization in the frequency domain and analysis of a 1D regression task with multiple 1D Gaussians, we propose a novel optimization strategy dubbed RAIN-GS (Relaxing Accurate INitialization Constraint for 3D Gaussian Splatting) that successfully trains 3D Gaussians from randomly initialized point clouds. We show the effectiveness of our strategy through quantitative and qualitative comparisons on standard datasets, largely improving the performance in all settings. - Paper | Project Page | รหัส
4. A New Split Algorithm for 3D Gaussian Splatting
Authors : Qiyuan Feng, Gengchen Cao, Haoxiang Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu
เชิงนามธรรม
3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to flatten large untextured regions, yielding a very sparse point cloud. These problems are caused by the non-uniform nature of 3D Gaussian splatting models, so in this paper, we propose a new 3D Gaussian splitting algorithm, which can produce a more uniform and surface-bounded 3D Gaussian splatting model. Our algorithm splits an N-dimensional Gaussian into two N-dimensional Gaussians. It ensures consistency of mathematical characteristics and similarity of appearance, allowing resulting 3D Gaussian splatting models to be more uniform and a better fit to the underlying surface, and thus more suitable for explicit editing, point cloud extraction and other tasks. Meanwhile, our 3D Gaussian splitting approach has a very simple closed-form solution, making it readily applicable to any 3D Gaussian model. - กระดาษ
5. Revising Densification in Gaussian Splatting
Authors : Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
เชิงนามธรรม
In this paper, we address the limitations of Adaptive Density Control (ADC) in 3D Gaussian Splatting (3DGS), a scene representation method achieving high-quality, photorealistic results for novel view synthesis. ADC has been introduced for automatic 3D point primitive management, controlling densification and pruning, however, with certain limitations in the densification logic. Our main contribution is a more principled, pixel-error driven formulation for density control in 3DGS, leveraging an auxiliary, per-pixel error function as the criterion for densification. We further introduce a mechanism to control the total number of primitives generated per scene and correct a bias in the current opacity handling strategy of ADC during cloning operations. Our approach leads to consistent quality improvements across a variety of benchmark scenes, without sacrificing the method's efficiency. - กระดาษ
2023:
1. [CVPRW '24] Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images
Authors : Jaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee
เชิงนามธรรม
In this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide to mitigate overfitting. We obtained the depth map using a pre-trained monocular depth estimation model and aligning the scale and offset using sparse COLMAP feature points. The adjusted depth aids in the color-based optimization of 3D Gaussian splatting, mitigating floating artifacts, and ensuring adherence to geometric constraints. We verify the proposed method on the NeRF-LLFF dataset with varying numbers of few images. Our approach demonstrates robust geometry compared to the original method that relies solely on images. - Paper | Project Page | รหัส
2. EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Authors : Sharath Girish, Kamal Gupta, Abhinav Shrivastava
เชิงนามธรรม
Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach results in scene representations with fewer Gaussians and quantized representations, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce memory by more than an order of magnitude all while maintaining the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed. - Paper | Project Page | รหัส
3. [CVPR '24] COLMAP-Free 3D Gaussian Splatting
Authors : Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang
เชิงนามธรรม
While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. - Paper | Project Page | Code (not yet) | - Short Presentation
4. iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching
Authors : Yuan Sun, Xuan Wang, Yunfan Zhang, Jie Zhang, Caigui Jiang, Yu Guo, Fei Wang
เชิงนามธรรม
We present a method named iComMa to address the 6D pose estimation problem in computer vision. The conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods address mesh-free 6D pose estimation by employing the inversion of a Neural Radiance Field (NeRF), aiming to overcome the aforementioned constraints. However, it still suffers from adverse initializations. By contrast, we model the pose estimation as the problem of inverting the 3D Gaussian Splatting (3DGS) with both the comparing and matching loss. In detail, a render-and-compare strategy is adopted for the precise estimation of poses. Additionally, a matching module is designed to enhance the model's robustness against adverse initializations by minimizing the distances between 2D keypoints. This framework systematically incorporates the distinctive characteristics and inherent rationale of render-and-compare and matching-based approaches. This comprehensive consideration equips the framework to effectively address a broader range of intricate and challenging scenarios, including instances with substantial angular deviations, all while maintaining a high level of prediction accuracy. Experimental results demonstrate the superior precision and robustness of our proposed jointly optimized framework when evaluated on synthetic and complex real-world data in challenging scenarios. - Paper | รหัส
Rendering:
2024:
1. [CVPR '24] Gaussian Shadow Casting for Neural Characters
Authors : Luis Bolanos, Shih-Yang Su, Helge Rhodin
เชิงนามธรรม
Neural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density proxy that replaces sampling with a simple analytic formula. It supports dynamic motion and is tailored for shadow computation, thereby avoiding the affine projection approximation and sorting required by the closely related Gaussian splatting. Combined with a deferred neural rendering model, our Gaussian shadows enable Lambertian shading and shadow casting with minimal overhead. We demonstrate improved reconstructions, with better separation of albedo, shading, and shadows in challenging outdoor scenes with direct sun light and hard shadows. Our method is able to optimize the light direction without any input from the user. As a result, novel poses have fewer shadow artifacts and relighting in novel scenes is more realistic compared to the state-of-the-art methods, providing new ways to pose neural characters in novel environments, increasing their applicability. - กระดาษ
2. Optimal Projection for 3D Gaussian Splatting
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
เชิงนามธรรม
3D Gaussian Splatting has garnered extensive attention and application in real-time neural rendering. Concurrently, concerns have been raised about the limitations of this technology in aspects such as point cloud storage, performance , and robustness in sparse viewpoints , leading to various improvements. However, there has been a notable lack of attention to the projection errors introduced by the local affine approximation inherent in the splatting itself, and the consequential impact of these errors on the quality of photo-realistic rendering. This paper addresses the projection error function of 3D Gaussian Splatting, commencing with the residual error from the first-order Taylor expansion of the projection function ϕ. The analysis establishes a correlation between the error and the Gaussian mean position. Subsequently, leveraging function optimization theory, this paper analyzes the function's minima to provide an optimal projection strategy for Gaussian Splatting referred to Optimal Gaussian Splatting. Experimental validation further confirms that this projection methodology reduces artifacts, resulting in a more convincingly realistic rendering. - กระดาษ
3. 360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
เชิงนามธรรม
3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of 360∘ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (eg, walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel 360∘ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios. - กระดาษ
4. StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering
Authors : Lukas Radl, Michael Steiner, Mathias Parger, Alexander Weinrauch, Bernhard Kerbl, Markus Steinberger
เชิงนามธรรม
Gaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces popping and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation, yet a full per-pixel sort proves excessively costly compared to a global sort operation. In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead. Our software rasterizer effectively eliminates popping artifacts and view inconsistencies, as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with popping, ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion. Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled, making our approach 1.6x faster than the original Gaussian Splatting, with a 50% reduction in memory requirements. - Paper | Project Page | Code | - Short Presentation
5. [CVPR '24] GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Authors : Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
เชิงนามธรรม
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes. It is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (eg squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%. - Paper | Project Page | Code | - การนำเสนอ
6. Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting
Authors : Joongho Jo, Hyeongwon Kim, Jongsun Park
เชิงนามธรรม
3D Gaussian splatting (3D-GS) is a new rendering approach that outperforms the neural radiance field (NeRF) in terms of both speed and image quality. 3D-GS represents 3D scenes by utilizing millions of 3D Gaussians and projects these Gaussians onto the 2D image plane for rendering. However, during the rendering process, a substantial number of unnecessary 3D Gaussians exist for the current view direction, resulting in significant computation costs associated with their identification. In this paper, we propose a computational reduction technique that quickly identifies unnecessary 3D Gaussians in real-time for rendering the current view without compromising image quality. This is accomplished through the offline clustering of 3D Gaussians that are close in distance, followed by the projection of these clusters onto a 2D image plane during runtime. Additionally, we analyze the bottleneck associated with the proposed technique when executed on GPUs and propose an efficient hardware architecture that seamlessly supports the proposed scheme. For the Mip-NeRF360 dataset, the proposed technique excludes 63% of 3D Gaussians on average before the 2D image projection, which reduces the overall rendering computation by almost 38.3% without sacrificing peak-signal-to-noise-ratio (PSNR). The proposed accelerator also achieves a speedup of 10.7x compared to a GPU. - กระดาษ
7. GaussianPro: 3D Gaussian Splatting with Progressive Propagation
Authors : Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen
เชิงนามธรรม
The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classical multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR. - Paper | Project Page | รหัส
8. Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting
Authors : Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin
เชิงนามธรรม
The recent advancements in 3D Gaussian splatting (3D-GS) have not only facilitated real-time rendering through modern GPU rasterization pipelines but have also attained state-of-the-art rendering quality. Nevertheless, despite its exceptional rendering quality and performance on standard datasets, 3D-GS frequently encounters difficulties in accurately modeling specular and anisotropic components. This issue stems from the limited ability of spherical harmonics (SH) to represent high-frequency information. To overcome this challenge, we introduce Spec-Gaussian, an approach that utilizes an anisotropic spherical Gaussian (ASG) appearance field instead of SH for modeling the view-dependent appearance of each 3D Gaussian. Additionally, we have developed a coarse-to-fine training strategy to improve learning efficiency and eliminate floaters caused by overfitting in real-world scenes. Our experimental results demonstrate that our method surpasses existing approaches in terms of rendering quality. Thanks to ASG, we have significantly improved the ability of 3D-GS to model scenes with specular and anisotropic components without increasing the number of 3D Gaussians. This improvement extends the applicability of 3D GS to handle intricate scenarios with specular and anisotropic surfaces. - กระดาษ
9. [CVPR '24] VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Authors : Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang
เชิงนามธรรม
Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering. - Paper | Project Page | รหัส
10. 3D Gaussian Model for Animation and Texturing
Authors : Xiangzhi Eric Wang, Zackary PT Sin
เชิงนามธรรม
3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We propose a modeling that is analogous to typical 3D models, which we call 3D Gaussian Model (3DGM); it provides a manipulatable proxy for novel animation and texture transfer. By binding the 3D Gaussians in texture space and re-projecting them back to world space through implicit shell mapping, we show how our 3D modeling can serve as a valid rendering methodology for interactive applications. It is further noted that recently, 3D mesh reconstruction works have been able to produce high-quality mesh for rendering. Our work, on the other hand, only requires an approximated geometry for rendering an object in high fidelity. Applicationwise, we will show that our proxy-based 3DGM is capable of driving novel animation without animated training data and texture transferring via UV mapping of the 3D Gaussians. We believe the result indicates the potential of our work for enabling interactive applications for 3D Gaussian Splatting. - กระดาษ
11. BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Authors : Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa
เชิงนามธรรม
Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods. To address this issue, we propose Blur Agnostic Gaussian Splatting (BAGS). BAGS introduces additional 2D modeling capacities such that a 3D-consistent and high quality scene can be reconstructed despite image-wise blur. Specifically, we model blur by estimating per-pixel convolution kernels from a Blur Proposal Network (BPN). BPN is designed to consider spatial, color, and depth variations of the scene to maximize modeling capacity. Additionally, BPN also proposes a quality-assessing mask, which indicates regions where blur occur. Finally, we introduce a coarse-to-fine kernel optimization scheme; this optimization scheme is fast and avoids sub-optimal solutions due to a sparse point cloud initialization, which often occurs when we apply Structure-from-Motion on blurry images. We demonstrate that BAGS achieves photorealistic renderings under various challenging blur conditions and imaging geometry, while significantly improving upon existing approaches. - Paper | รหัส
12. StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting
Authors : Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu
เชิงนามธรรม
We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. - Paper | Project Page | รหัส
13. Gaussian Splatting in Style
Authors : Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Tarun Yenamandra, Daniel Cremers
เชิงนามธรรม
Scene stylization extends the work of neural style transfer to three spatial dimensions. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across a multi-view setting. A vast majority of the previous works achieve this by optimizing the scene with a specific style image. In contrast, we propose a novel architecture trained on a collection of style images, that at test time produces high quality stylized novel views. Our work builds up on the framework of 3D Gaussian splatting. For a given scene, we take the pretrained Gaussians and process them using a multi resolution hash grid and a tiny MLP to obtain the conditional stylised views. The explicit nature of 3D Gaussians give us inherent advantages over NeRF-based methods including geometric consistency, along with having a fast training and rendering regime. This enables our method to be useful for vast practical use cases such as in augmented or virtual reality applications. Through our experiments, we show our methods achieve state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data. - กระดาษ
14. BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
Authors : Lingzhe Zhao, Peng Wang, Peidong Liu
เชิงนามธรรม
While neural rendering has demonstrated impressive capabilities in 3D scene reconstruction and novel view synthesis, it heavily relies on high-quality sharp images and accurate camera poses. Numerous approaches have been proposed to train Neural Radiance Fields (NeRF) with motion-blurred images, commonly encountered in real-world scenarios such as low-light or long-exposure conditions. However, the implicit representation of NeRF struggles to accurately recover intricate details from severely motion-blurred images and cannot achieve real-time rendering. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction and real-time rendering by explicitly optimizing point clouds into 3D Gaussians. In this paper, we introduce a novel approach, named BAD-Gaussians (Bundle Adjusted Deblur Gaussian Splatting), which leverages explicit Gaussian representation and handles severe motion-blurred images with inaccurate camera poses to achieve high-quality scene reconstruction. Our method models the physical image formation process of motion-blurred images and jointly learns the parameters of Gaussians while recovering camera motion trajectories during exposure time. In our experiments, we demonstrate that BAD-Gaussians not only achieves superior rendering quality compared to previous state-of-the-art deblur neural rendering methods on both synthetic and real datasets but also enables real-time rendering capabilities. - Paper | Project Page | รหัส
15. SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Authors : Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou
เชิงนามธรรม
Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency. - กระดาษ
16. GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Authors : Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari
เชิงนามธรรม
During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue, we propose a novel approach called GeoGaussian. Based on the smoothly connected areas observed from point clouds, this method introduces a novel pipeline to initialize thin Gaussians aligned with the surfaces, where the characteristic can be transferred to new generations through a carefully designed densification strategy. Finally, the pipeline ensures that the scene's geometry and texture are maintained through constrained optimization processes with explicit geometry constraints. Benefiting from the proposed architecture, the generative ability of 3D Gaussians is enhanced, especially in structured regions. Our proposed pipeline achieves state-of-the-art performance in novel view synthesis and geometric reconstruction, as evaluated qualitatively and quantitatively on public datasets. - กระดาษ
17. Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Authors : Zhihao Liang, Qi Zhang, Wenbo Hu, Ying Feng, Lei Zhu, Kui Jia
เชิงนามธรรม
The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity. - กระดาษ
18. Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Authors : Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin
เชิงนามธรรม
High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on detailed modelling of the physical image formation process and utilizes velocities estimated using visual-inertial odometry (VIO). Camera poses are considered non-static during the exposure time of a single image frame and camera poses are further optimized in the reconstruction process. We formulate a differentiable rendering pipeline that leverages screen space approximation to efficiently incorporate rolling-shutter and motion blur effects into the 3DGS framework. Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods, thereby advancing 3DGS in naturalistic settings. - Paper | Code | Project Page
19. RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
Authors : Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari
เชิงนามธรรม
Recent advances in view synthesis and real-time rendering have achieved photorealistic quality at impressive rendering speeds. While Radiance Field-based methods achieve state-of-the-art quality in challenging scenarios such as in-the-wild captures and large-scale scenes, they often suffer from excessively high compute requirements linked to volumetric rendering. Gaussian Splatting-based methods, on the other hand, rely on rasterization and naturally achieve real-time rendering but suffer from brittle optimization heuristics that underperform on more challenging scenes. In this work, we present RadSplat, a lightweight method for robust real-time rendering of complex scenes. Our main contributions are threefold. First, we use radiance fields as a prior and supervision signal for optimizing point-based scene representations, leading to improved quality and more robust optimization. Next, we develop a novel pruning technique reducing the overall point count while maintaining high quality, leading to smaller and more compact scene representations with faster inference speeds. Finally, we propose a novel test-time filtering approach that further accelerates rendering and allows to scale to larger, house-sized scenes. We find that our method enables state-of-the-art synthesis of complex captures at 900+ FPS. - Paper | Project Page
20. Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Authors : Guangchi Fang, Bing Wang
เชิงนามธรรม
In this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification including blur split and depth reinitialization, and simplification through Gaussian binarization and sampling. These techniques reorganize the spatial positions of the Gaussians, resulting in significant improvements across various datasets and benchmarks in terms of rendering quality, resource consumption, and storage compression. Our proposed Mini-Splatting method integrates seamlessly with the original rasterization pipeline, providing a strong baseline for future research in Gaussian-Splatting-based works. - กระดาษ
21. Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Authors : Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao
เชิงนามธรรม
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To this end, we propose a novel method, named Pixel-GS, to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive experiments both qualitatively and quantitatively demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time rendering speed, on the challenging Mip-NeRF 360 and Tanks & Temples datasets. - กระดาษ
22. Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections
Authors : Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang
เชิงนามธรรม
Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to previous methods, with a 1000× increase in rendering speed. - Paper | Project Page | Code (not yet) | - Short Presentation
23. GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
Authors : Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, Bo Dai
เชิงนามธรรม
Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural implicit surfaces is sparked from the success of neural rendering. Current works either constrain the distribution of density fields or the shape of primitives, resulting in degraded rendering quality and flaws on the learned scene surfaces. The efficacy of such methods is limited by the inherent constraints of the chosen neural representation, which struggles to capture fine surface details, especially for larger, more intricate scenes. To address these issues, we introduce GSDF, a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting (3DGS) representation with neural Signed Distance Fields (SDF). The core idea is to leverage and enhance the strengths of each branch while alleviating their limitation through mutual guidance and joint supervision. We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions, and at the meantime benefits 3DGS rendering with structures that are more aligned with the underlying geometry. - Paper | Project Page | Code (not yet)
24. Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
Authors : Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai
เชิงนามธรรม
The recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularly noticeable in zoom-out views and can lead to inconsistent rendering speeds in scenes with varying details. Moreover, it often struggles to capture the corresponding level of details at different scales with its heuristic density control operation. Inspired by the Level-of-Detail (LOD) techniques, we introduce Octree-GS, featuring an LOD-structured 3D Gaussian approach supporting level-of-detail decomposition for scene representation that contributes to the final rendering results. Our model dynamically selects the appropriate level from the set of multi-resolution anchor points, ensuring consistent rendering performance with adaptive LOD adjustments while maintaining high-fidelity rendering results. - Paper | Project Page | Code (not yet)
25. SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing
Authors : Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao
เชิงนามธรรม
In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance. The core technique is to apply 2D scale-adaptive filters to each Gaussian during test time. As pointed out by Mip-Splatting, observing Gaussians at different frequencies leads to mismatches between the Gaussian scales during training and testing. Mip-Splatting resolves this issue using 3D smoothing and 2D Mip filters, which are unfortunately not aware of testing frequency. In this work, we show that a 2D scale-adaptive filter that is informed of testing frequency can effectively match the Gaussian scale, thus making the Gaussian primitive distribution remain consistent across different testing frequencies. When scale inconsistency is eliminated, sampling rates smaller than the scene frequency result in conventional jaggedness, and we propose to integrate the projected 2D Gaussian within each pixel during testing. This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting. Through extensive experiments using various settings and both bounded and unbounded scenes, we show SA-GS performs comparably with or better than Mip-Splatting. Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. - Paper | Project Page | รหัส
26. Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
เชิงนามธรรม
Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis. Our method optimises 3D Gaussian primitives to accurately model the object's geometry at points of contact. By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map. Touch is particularly useful when considering non-Lambertian objects (eg shiny or reflective surfaces) since contemporary methods tend to fail to reconstruct with fidelity specular highlights. By combining vision and tactile sensing, we achieve more accurate geometry reconstructions with fewer images than prior methods. We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach, offering significant improvements in reconstruction quality. - กระดาษ
27. 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
เชิงนามธรรม
In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS. The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized. First, we introduce a differentiable SDF-to-opacity transformation function that converts SDF values into corresponding Gaussians' opacities. This function connects the SDF and 3D Gaussians, allowing for unified optimization and enforcing surface constraints on the 3D Gaussians. During learning, optimizing the 3D Gaussians provides supervisory signals for SDF learning, enabling the reconstruction of intricate details. However, this only provides sparse supervisory signals to the SDF at locations occupied by Gaussians, which is insufficient for learning a continuous SDF. Then, to address this limitation, we incorporate volumetric rendering and align the rendered geometric attributes (depth, normal) with those derived from 3D Gaussians. This consistency regularization introduces supervisory signals to locations not covered by discrete 3D Gaussians, effectively eliminating redundant surfaces outside the Gaussian sampling range. Our extensive experimental results demonstrate that our 3DGSR method enables high-quality 3D surface reconstruction while preserving the efficiency and rendering quality of 3DGS. Besides, our method competes favorably with leading surface reconstruction techniques while offering a more efficient learning process and much better rendering qualities. - Paper | Code (not yet)
28. Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
เชิงนามธรรม
3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. - กระดาษ
29. OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
เชิงนามธรรม
Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. To benefit the research community, the code will be made publicly available once the paper is published. - กระดาษ
30. Robust Gaussian Splatting
Authors : François Darmon, Lorenzo Porzi, Samuel Rota-Bulò, Peter Kontschieder
เชิงนามธรรม
In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines. - กระดาษ
31. DeblurGS: Gaussian Splatting for Camera Motion Blur
Authors : Jeongtaek Oh, Jaeyoung Chung, Dongwoo Lee, Kyoung Mu Lee
เชิงนามธรรม
Although significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to realworld applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challenge, we propose DeblurGS, a method to optimize sharp 3D Gaussian Splatting from motion-blurred images, even with the noisy camera pose initialization. We restore a fine-grained sharp scene by leveraging the remarkable reconstruction capability of 3D Gaussian Splatting. Our approach estimates the 6-Degree-of-Freedom camera motion for each blurry observation and synthesizes corresponding blurry renderings for the optimization process. Furthermore, we propose Gaussian Densification Annealing strategy to prevent the generation of inaccurate Gaussians at erroneous locations during the early training stages when camera motion is still imprecise. Comprehensive experiments demonstrate that our DeblurGS achieves state-of-the-art performance in deblurring and novel view synthesis for real-world and synthetic benchmark datasets, as well as field-captured blurry smartphone videos. - กระดาษ
32. StylizedGS: Controllable Stylization for 3D Gaussian Splatting
Authors : Dingxi Zhang, Zhuoxun Chen, Yu-Jie Yuan, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, Lin Gao
เชิงนามธรรม
With the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user experience and the implicit nature limits its ability to transfer the geometric pattern styles. Additionally, the ability for artists to exert flexible control over stylized scenes is considered highly desirable, fostering an environment conducive to creative exploration. In this paper, we introduce StylizedGS, a 3D neural style transfer framework with adaptable control over perceptual factors based on 3D Gaussian Splatting (3DGS) representation. The 3DGS brings the benefits of high efficiency. We propose a GS filter to eliminate floaters in the reconstruction which affects the stylization effects before stylization. Then the nearest neighbor-based style loss is introduced to achieve stylization by fine-tuning the geometry and color parameters of 3DGS, while a depth preservation loss with other regularizations is proposed to prevent the tampering of geometry content. Moreover, facilitated by specially designed losses, StylizedGS enables users to control color, stylized scale and regions during the stylization to possess customized capabilities. Our method can attain high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls. Extensive experiments across various scenes and styles demonstrate the effectiveness and efficiency of our method concerning both stylization quality and inference FPS. - กระดาษ
33. LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field
Authors : Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He
เชิงนามธรรม
Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. - Paper | Project Page | รหัส
34. GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting
Authors : Junseo Lee, Seokwon Lee, Jungi Lee, Junyong Park, Jaewoong Sim
เชิงนามธรรม
This paper presents GSCore, a hardware acceleration unit that efficiently executes the rendering pipeline of 3D Gaussian Splatting with algorithmic optimizations. GSCore builds on the observations from an in-depth analysis of Gaussian-based radiance field rendering to enhance computational efficiency and bring the technique to wide adoption. In doing so, we present several optimization techniques, Gaussian shape-aware intersection test, hierarchical sorting, and subtile skipping, all of which are synergistically integrated with GSCore. We implement the hardware design of GSCore, synthesize it using a commercial 28nm technology, and evaluate the performance across a range of synthetic and real-world scenes with varying image resolutions. Our evaluation results show that GSCore achieves a 15.86× speedup on average over the mobile consumer GPU with a substantially smaller area and lower energy consumption. - Paper | - Short Presentation
2023:
1. Mip-Splatting Alias-free 3D Gaussian Splatting
Authors : Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger
เชิงนามธรรม
Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, eg, by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our comprehensive evaluation, including scenarios such as training on single-scale images and testing on multiple scales, validates the effectiveness of our approach. - Paper | Project Page | รหัส
2. Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing
Authors : Jian Gao, Chun Gu, Youtian Lin, Hao Zhu, Xun Cao, Li Zhang, Yao Yao
เชิงนามธรรม
We present a novel differentiable point-based rendering framework for material and lighting decomposition from multi-view images, enabling editing, ray-tracing, and real-time relighting of the 3D point cloud. Specifically, a 3D scene is represented as a set of relightable 3D Gaussian points, where each point is additionally associated with a normal direction, BRDF parameters, and incident lights from different directions. To achieve robust lighting estimation, we further divide incident lights of each point into global and local components, as well as view-dependent visibilities. The 3D scene is optimized through the 3D Gaussian Splatting technique while BRDF and lighting are decomposed by physically-based differentiable rendering. Moreover, we introduce an innovative point-based ray-tracing approach based on the bounding volume hierarchy for efficient visibility baking, enabling real-time rendering and relighting of 3D Gaussian points with accurate shadow effects. Extensive experiments demonstrate improved BRDF estimation and novel view rendering results compared to state-of-the-art material estimation approaches. Our framework showcases the potential to revolutionize the mesh-based graphics pipeline with a relightable, traceable, and editable rendering pipeline solely based on point cloud. - Paper | Project Page | รหัส
3. [CVPR '24] GS-IR: 3D Gaussian Splatting for Inverse Rendering
Authors : Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia
เชิงนามธรรม
We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (eg NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (eg rasterization and splatting) cannot trace the occlusion like backward mapping (eg ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes. - Paper | Project Page | Code (not yet)
4. [CVPR '24] Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Authors : Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee
เชิงนามธรรม
3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13%-66% PSNR and 160%-2400% rendering speed improvement at 4×-128× scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splatting. - กระดาษ
5. [CVPR '24] GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Authors : Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, Yuexin Ma
เชิงนามธรรม
The advent of neural 3D Gaussians has recently brought about a revolution in the field of neural rendering, facilitating the generation of high-quality renderings at real-time speeds. However, the explicit and discrete representation encounters challenges when applied to scenes featuring reflective surfaces. In this paper, we present GaussianShader, a novel method that applies a simplified shading function on 3D Gaussians to enhance the neural rendering in scenes with reflective surfaces while preserving the training and rendering efficiency. The main challenge in applying the shading function lies in the accurate normal estimation on discrete 3D Gaussians. Specifically, we proposed a novel normal estimation framework based on the shortest axis directions of 3D Gaussians with a delicately designed loss to make the consistency between the normals and the geometries of Gaussian spheres. Experiments show that GaussianShader strikes a commendable balance between efficiency and visual quality. Our method surpasses Gaussian Splatting in PSNR on specular object datasets, exhibiting an improvement of 1.57dB. When compared to prior works handling reflective surfaces, such as Ref-NeRF, our optimization time is significantly accelerated (23h vs. 0.58h). Please click on our project website to see more results. - Paper | Project Page | รหัส
6. [CVPR '24] Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Authors : Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai
เชิงนามธรรม
Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum. Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed. - Paper | Project Page | Code https://github.com/maturk/dn-splatter
7. Deblurring 3D Gaussian Splatting
Authors : Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park
เชิงนามธรรม
Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. - Paper | Project Page | Code (not yet)
8. GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization
Authors : Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jingtuo Liu, Liangjun Zhang, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang
เชิงนามธรรม
This paper presents GIR, a 3D Gaussian Inverse Rendering method for relightable scene factorization. Compared to existing methods leveraging discrete meshes or neural implicit fields for inverse rendering, our method utilizes 3D Gaussians to estimate the material properties, illumination, and geometry of an object from multi-view images. Our study is motivated by the evidence showing that 3D Gaussian is a more promising backbone than neural fields in terms of performance, versatility, and efficiency. In this paper, we aim to answer the question: "How can 3D Gaussian be applied to improve the performance of inverse rendering?" To address the complexity of estimating normals based on discrete and often in-homogeneous distributed 3D Gaussian representations, we proposed an efficient self-regularization method that facilitates the modeling of surface normals without the need for additional supervision. To reconstruct indirect illumination, we propose an approach that simulates ray tracing. Extensive experiments demonstrate our proposed GIR's superior performance over existing methods across multiple tasks on a variety of widely used datasets in inverse rendering. This substantiates its efficacy and broad applicability, highlighting its potential as an influential tool in relighting and reconstruction. - Paper | Project Page | Code (not yet)
9. Gaussian Splatting with NeRF-based Color and Opacity
Authors : Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir Tadeja, Przemysław Spurek
เชิงนามธรรม
Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar renders quality with faster training and inference as it does not need neural networks to work. We encode information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (ie means of Gaussian), shape (ie covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity. Consequently, our model better describes shadows, light reflections, and transparency of 3D objects. - Paper | รหัส
บทวิจารณ์:
2024:
1. Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human
Authors : Song Bai, Jie Li
เชิงนามธรรม
While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future. - กระดาษ
2. A Survey on 3D Gaussian Splatting
Authors : Guikun Chen, Wenguan Wang
เชิงนามธรรม
3D Gaussian splatting (3D GS) has recently emerged as a transformative technique in the explicit radiance field and computer graphics landscape. This innovative approach, characterized by the utilization of millions of 3D Gaussians, represents a significant departure from the neural radiance field (NeRF) methodologies, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS, with its explicit scene representations and differentiable rendering algorithms, not only promises real-time rendering capabilities but also introduces unprecedented levels of control and editability. This positions 3D GS as a potential game-changer for the next generation of 3D reconstruction and representation. In the present paper, we provide the first systematic overview of the recent developments and critical contributions in the domain of 3D GS. We begin with a detailed exploration of the underlying principles and the driving forces behind the advent of 3D GS, setting the stage for understanding its significance. A focal point of our discussion is the practical applicability of 3D GS. By facilitating real-time performance, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond. This is complemented by a comparative analysis of leading 3D GS models, evaluated across various benchmark tasks to highlight their performance and practical utility. The survey concludes by identifying current challenges and suggesting potential avenues for future research in this domain. Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in applicable and explicit radiance field representation. - กระดาษ
3. 3D Gaussian as a New Vision Era: A Survey
Authors : Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, Ying He
เชิงนามธรรม
3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural networks, such as Neural Radiance Fields (NeRF). This technique has found diverse applications in areas such as robotics, urban mapping, autonomous navigation, and virtual reality/augmented reality, just name a few. Given the growing popularity and expanding research in 3D Gaussian Splatting, this paper presents a comprehensive survey of relevant papers from the past year. We organize the survey into taxonomies based on characteristics and applications, providing an introduction to the theoretical underpinnings of 3D Gaussian Splatting. Our goal through this survey is to acquaint new researchers with 3D Gaussian Splatting, serve as a valuable reference for seminal works in the field, and inspire future research directions, as discussed in our concluding section. - กระดาษ
4. Neural Fields in Robotics: A Survey
Authors : Muhammad Zubair Irshad, Mauro Comi, Yen-Chen Lin, Nick Heppert, Abhinav Valada, Zsolt Kira, Rares Ambrus, Johnathan Trembley
เชิงนามธรรม
Neural Fields have emerged as a transformative approach for 3D scene representation in computer vision and robotics, enabling accurate inference of geometry, 3D semantics, and dynamics from posed 2D data. Leveraging differentiable rendering, Neural Fields encompass both continuous implicit and explicit neural representations enabling high-fidelity 3D reconstruction, integration of multi-modal sensor data, and generation of novel viewpoints. This survey explores their applications in robotics, emphasizing their potential to enhance perception, planning, and control. Their compactness, memory efficiency, and differentiability, along with seamless integration with foundation and generative models, make them ideal for real-time applications, improving robot adaptability and decision-making. This paper provides a thorough review of Neural Fields in robotics, categorizing applications across various domains and evaluating their strengths and limitations, based on over 200 papers. First, we present four key Neural Fields frameworks: Occupancy Networks, Signed Distance Fields, Neural Radiance Fields, and Gaussian Splatting. Second, we detail Neural Fields' applications in five major robotics domains: pose estimation, manipulation, navigation, physics, and autonomous driving, highlighting key works and discussing takeaways and open challenges. Finally, we outline the current limitations of Neural Fields in robotics and propose promising directions for future research. - กระดาษ
5. How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey
Authors : Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi
เชิงนามธรรม
Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges. - กระดาษ
6. Recent Advances in 3D Gaussian Splatting
Authors : Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao
เชิงนามธรรม
The emergence of 3D Gaussian Splatting (3DGS) has greatly accelerated the rendering speed of novel view synthesis. Unlike neural implicit representations like Neural Radiance Fields (NeRF) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian Splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images. Apart from the fast rendering speed, the explicit representation of 3D Gaussian Splatting facilitates editing tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid change and growing number of works in this field, we present a literature review of recent 3D Gaussian Splatting methods, which can be roughly classified into 3D reconstruction, 3D editing, and other downstream applications by functionality. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian Splatting are also illustrated for a better understanding of this technique. This survey aims to help beginners get into this field quickly and provide experienced researchers with a comprehensive overview, which can stimulate the future development of the 3D Gaussian Splatting representation. - กระดาษ
7. Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
Authors : Anurag Dalal, Daniel Hagen, Kjell G. Robbersmyr, Kristian Muri Knausgård
เชิงนามธรรม
Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting. - กระดาษ
SLAM:
2024:
1. SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Authors : Mingrui Li, Shuhong Liu, Heng Zhou
เชิงนามธรรม
Semantic understanding plays a crucial role in Dense Simultaneous Localization and Mapping (SLAM), facilitating comprehensive scene interpretation. Recent advancements that integrate Gaussian Splatting into SLAM systems have demonstrated its effectiveness in generating high-quality renderings through the use of explicit 3D Gaussian representations. Building on this progress, we propose SGS-SLAM, the first semantic dense visual SLAM system grounded in 3D Gaussians, which provides precise 3D semantic segmentation alongside high-fidelity reconstructions. Specifically, we propose to employ multi-channel optimization during the mapping process, integrating appearance, geometric, and semantic constraints with key-frame optimization to enhance reconstruction quality. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and semantic segmentation, outperforming existing methods meanwhile preserving real-time rendering ability. - กระดาษ
2. SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM
Authors : Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang
เชิงนามธรรม
We propose SemGauss-SLAM, the first semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering in real-time. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift and improve reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to more robust tracking and consistent mapping. Our SemGauss-SLAM method demonstrates superior performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in novel-view semantic synthesis and 3D semantic mapping. - กระดาษ
3. Compact 3D Gaussian Splatting For Dense Visual SLAM
Authors : Tianchen Deng, Yaohui Chen, Leyan Zhang, Jianfei Yang, Shenghai Yuan, Danwei Wang, Weidong Chen
เชิงนามธรรม
Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, ie, the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation. - กระดาษ
4. NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting
Authors : Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie
เชิงนามธรรม
We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping. - กระดาษ
5. High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization
Authors : Shuo Sun, Malcolm Mielle, Achim J. Lilienthal, Martin Magnusson
เชิงนามธรรม
We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM. - กระดาษ
6. RGBD GS-ICP SLAM
Authors : Seongbo Ha, Jiung Yeon, Hyeonwoo Yu
เชิงนามธรรม
Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map. - Paper | Code | - Short Presentation
7. EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting
Authors : Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, Wei Shen
เชิงนามธรรม
Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Mapping) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperative applications among endoscopic surgeries. In this paper, we introduce EndoGSLAM, an efficient SLAM approach for endoscopic surgeries, which integrates streamlined Gaussian representation and differentiable rasterization to facilitate over 100 fps rendering speed during online camera tracking and tissue reconstructing. Extensive experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches, showing tremendous potential for endoscopic surgeries - Paper | Project Page | Code (not yet)
8. CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Authors : Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui
เชิงนามธรรม
Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, ie, CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available. - Paper | Project Page | Code (not yet)
9. MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
Authors : Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu
เชิงนามธรรม
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. - Paper | Project Page | Code (not yet)
10. Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting
Authors : Xiaolei Lang, Laijian Li, Hang Zhang, Feng Xiong, Mu Xu, Yong Liu, Xingxing Zuo, Jiajun Lv
เชิงนามธรรม
We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems. - กระดาษ
11. RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting
Authors : Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou
เชิงนามธรรม
We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking ความแม่นยำ. - Paper | Project Page | รหัส
12. [3DV '25] LoopSplat: Loop Closure by Registering 3D Gaussian Splats
Authors : Liyuan Zhu, Yue Li, Erik Sandström, Shengyu Huang, Konrad Schindler, Iro Armeni
เชิงนามธรรม
Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM. - Paper | Project Page | รหัส
13. MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation
Authors : Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu
เชิงนามธรรม
Emerging 3D scene representations, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated their effectiveness in Simultaneous Localization and Mapping (SLAM) for photo-realistic rendering, particularly when using high-quality video sequences as input. However, existing methods struggle with motion-blurred frames, which are common in real-world scenarios like low-light or long-exposure conditions. This often results in a significant reduction in both camera localization accuracy and map reconstruction quality. To address this challenge, we propose a dense visual SLAM pipeline (ie MBA-SLAM) to handle severe motion-blurred inputs. Our approach integrates an efficient motion blur-aware tracker with either neural radiance fields or Gaussian Splatting based mapper. By accurately modeling the physical image formation process of motion-blurred images, our method simultaneously learns 3D scene representation and estimates the cameras' local trajectory during exposure time, enabling proactive compensation for motion blur caused by camera movement. In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach. - Paper | Project Page | Code (not yet)
2023:
1. [CVPR '24] GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Authors : Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, Xuelong Li
เชิงนามธรรม
In this paper, we introduce GS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. The source code will be released upon acceptance. - กระดาษ
2. [CVPR '24] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Authors : Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten
เชิงนามธรรม
Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2× state-of-theart performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D แผนที่. - Paper | Project Page | Code | - Explanation Video
3. [CVPR '24] Gaussian Splatting SLAM
Authors : Hidenobu Matsuki, Riku Murai, Paul HJ Kelly, Andrew J. Davison
เชิงนามธรรม
We present the first application of 3D Gaussian Splatting to incremental 3D reconstruction using a single moving monocular or RGB-D camera. Our Simultaneous Localisation and Mapping (SLAM) method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation, but also reconstruction of tiny and even transparent objects. - Paper | Project Page | Code | - Short Presentation
4. Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Authors : Vladimir Yugay, Yue Li, Theo Gevers, Martin R. Oswald
เชิงนามธรรม
We present the first neural RGBD SLAM method capable of photorealistically reconstructing real-world scenes. Despite modern SLAM methods achieving impressive results on synthetic datasets, they still struggle with real-world datasets. Our approach utilizes 3D Gaussians as a primary unit for our scene representation to overcome the limitations of the previous methods. We observe that classical 3D Gaussians are hard to use in a monocular setup: they can't encode accurate geometry and are hard to optimize with single-view sequential supervision. By extending classical 3D Gaussians to encode geometry, and designing a novel scene representation and the means to grow, and optimize it, we propose a SLAM system capable of reconstructing and rendering real-world datasets without compromising on speed and efficiency. We show that Gaussian-SLAM can reconstruct and photorealistically render real-world scenes. We evaluate our method on common synthetic and real-world datasets and compare it against other state-of-the-art SLAM methods. Finally, we demonstrate, that the final 3D scene representation that we obtain can be rendered in Real-time thanks to the efficient Gaussian Splatting rendering. - Paper | Project Page | Code | - Short Presentation
5. [CVPR '24] Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Authors : Huajian Huang, Longwei Li, Hui Cheng, Sai-Kit Yeung
เชิงนามธรรม
The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, eg, PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications. - Paper | Project Page | รหัส
Sparse:
2024:
1. [CVPR '24] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Authors : Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu
เชิงนามธรรม
Radiance fields have demonstrated impressive performance in synthesizing novel views from sparse input views, yet prevailing methods suffer from high training costs and slow inference speed. This paper introduces DNGaussian, a depth-regularized framework based on 3D Gaussian radiance fields, offering real-time and high-quality few-shot novel view synthesis at low costs. Our motivation stems from the highly efficient representation and surprising quality of the recent 3D Gaussian Splatting, despite it will encounter a geometry degradation when input views decrease. In the Gaussian radiance fields, we find this degradation in scene geometry primarily lined to the positioning of Gaussian primitives and can be mitigated by depth constraint. Consequently, we propose a Hard and Soft Depth Regularization to restore accurate scene geometry under coarse monocular depth supervision while maintaining a fine-grained color appearance. To further refine detailed geometry reshaping, we introduce Global-Local Depth Normalization, enhancing the focus on small local depth changes. Extensive experiments on LLFF, DTU, and Blender datasets demonstrate that DNGaussian outperforms state-of-the-art methods, achieving comparable or better results with significantly reduced memory cost, a 25× reduction in training time, and over 3000× faster rendering speed. - Paper | Project Page | Code | - Short Presentation
2. Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting
Authors : Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III
เชิงนามธรรม
In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. - Paper | Project Page
3. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Authors : Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
เชิงนามธรรม
We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane sweeping in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian primitives' opacities, covariances, and spherical harmonics coefficients jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussian Splatting models via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, our model achieves state-of-the-art performance with the fastest feed-forward inference speed (22 fps). Compared to the latest state-of-the-art method pixelSplat, our model uses 10× fewer parameters and infers more than 2× faster while providing higher appearance and geometry quality as well as better cross-dataset generalization. - Paper | Project Page | รหัส
4. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Authors : Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen
เชิงนามธรรม
We present latentSplat, a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. Existing methods for generalizable 3D reconstruction either do not enable fast inference of high resolution novel views due to slow volume rendering, or are limited to interpolation of close input views, even in simpler settings with a single central object, where 360-degree generalization is possible . In this work, we combine a regression-based approach with a generative model, moving towards both of these capabilities within the same method, trained purely on readily available real video data. The core of our method are variational 3D Gaussians, a representation that efficiently encodes varying uncertainty within a latent space consisting of 3D feature Gaussians. From these Gaussians, specific instances can be sampled and rendered via efficient Gaussian splatting and a fast, generative decoder network. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data. - กระดาษ | Project Page | รหัส
5. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Authors : Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein
เชิงนามธรรม
We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, ie, text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. - Paper | Project Page | รหัส
6. Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Authors : Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang
เชิงนามธรรม
We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU. - กระดาษ
7. CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Authors : Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari
เชิงนามธรรม
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (eg, 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes. - Paper | Project Page
8. InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds
Authors : Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang
เชิงนามธรรม
While novel view synthesis (NVS) has made substantial progress in 3D computer vision, it typically requires an initial estimation of camera intrinsics and extrinsics from dense viewpoints. This pre-processing is usually conducted via a Structure-from-Motion (SfM) pipeline, a procedure that can be slow and unreliable, particularly in sparse-view scenarios with insufficient matched features for accurate reconstruction. In this work, we integrate the strengths of point-based representations (eg, 3D Gaussian Splatting, 3D-GS) with end-to-end dense stereo models (DUSt3R) to tackle the complex yet unresolved issues in NVS under unconstrained settings, which encompasses pose-free and sparse view challenges. Our framework, InstantSplat, unifies dense stereo priors with 3D-GS to build 3D Gaussians of large-scale scenes from sparseview & pose-free images in less than 1 minute. Specifically, InstantSplat comprises a Coarse Geometric Initialization (CGI) module that swiftly establishes a preliminary scene structure and camera parameters across all training views, utilizing globally-aligned 3D point maps derived from a pre-trained dense stereo pipeline. This is followed by the Fast 3D-Gaussian Optimization (F-3DGO) module, which jointly optimizes the 3D Gaussian attributes and the initialized poses with pose regularization. Experiments conducted on the large-scale outdoor Tanks & Temples datasets demonstrate that InstantSplat significantly improves SSIM (by 32%) while concurrently reducing Absolute Trajectory Error (ATE) by 80%. These establish InstantSplat as a viable solution for scenarios involving posefree and sparse-view conditions. - Paper | Project Page | Code | - Explanation Video
9. Sp 2 360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
Authors : Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen
เชิงนามธรรม
We aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diffusion models can strongly improve the reconstruction of a scene with low-cost fine-tuning. Specifically, we present SparseSplat360 (Sp2360), a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Due to superior training and rendering speeds, we use an explicit scene representation in the form of 3D Gaussians over NeRF-based implicit representations. We propose an iterative update strategy to fuse generated pseudo novel views with existing 3D Gaussians fitted to the initial sparse inputs. As a result, we obtain a multi-view consistent scene representation with details coherent with the observed inputs. Our evaluation on the challenging Mip-NeRF360 dataset shows that our proposed 2D to 3D distillation algorithm considerably improves the performance of a regularized version of 3DGS adapted to a sparse-view setting and outperforms existing sparse-view reconstruction methods in 360 scene reconstruction. Qualitatively, our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail. - Paper | Code (not yet)
2023:
1. SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting
Authors : Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi
เชิงนามธรรม
The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360 scenes from sparse training views. We find that using naive depth priors is not sufficient and integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by up to 30.5% and NeRF-based methods by up to 15.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost. - Paper | Project Page | Code (not yet)
2. FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Authors : Zehao Zhu, Zhiwen Fan, Yifan Jiang, Zhangyang Wang
เชิงนามธรรม
Novel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a few-shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender - Paper | Project Page | รหัส
3. [CVPR '24] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
Authors : David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann
เชิงนามธรรม
We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance สนาม. - Paper | Project Page | รหัส
4. [CVPR '24] Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Authors : Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi
เชิงนามธรรม
We introduce the Splatter Image, an ultra-fast approach for monocular 3D object reconstruction which operates at 38 FPS. Splatter Image is based on Gaussian Splatting, which has recently brought real-time rendering, fast training, and excellent scaling to multi-view reconstruction. For the first time, we apply Gaussian Splatting in a monocular reconstruction setting. Our approach is learning-based, and, at test time, reconstruction only requires the feed-forward evaluation of a neural network. The main innovation of Splatter Image is the surprisingly straightforward design: it uses a 2D image-to-image network to map the input image to one 3D Gaussian per pixel. The resulting Gaussians thus have the form of an image, the Splatter Image. We further extend the method to incorporate more than one image as input, which we do by adding cross-view attention. Owning to the speed of the renderer (588 FPS), we can use a single GPU for training while generating entire images at each iteration in order to optimize perceptual metrics like LPIPS. On standard benchmarks, we demonstrate not only fast reconstruction but also better results than recent and much more expensive baselines in terms of PSNR, LPIPS, and other metrics. - Paper | Project Page | Code | - Short Presentation
Navigation:
2024:
1. GaussNav: Gaussian Splatting for Visual Navigation
Authors : Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li
เชิงนามธรรม
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. - Paper | Project Page | รหัส
2. 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization
Authors : Peng Jiang, Gaurav Pandey, Srikanth Saripalli
เชิงนามธรรม
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset. - กระดาษ
3. Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF
Authors : Guangyi Liu, Wen Jiang, Boshu Lei, Vivek Pandey, Kostas Daniilidis, Nader Motee
เชิงนามธรรม
This work proposes a novel approach to bolster both the robot's risk assessment and safety measures while deepening its understanding of 3D scenes, which is achieved by leveraging Radiance Field (RF) models and 3D Gaussian Splatting. To further enhance these capabilities, we incorporate additional sampled views from the environment with the RF model. One of our key contributions is the introduction of Risk-aware Environment Masking (RaEM), which prioritizes crucial information by selecting the next-best-view that maximizes the expected information gain. This targeted approach aims to minimize uncertainties surrounding the robot's path and enhance the safety of its navigation. Our method offers a dual benefit: improved robot safety and increased efficiency in risk-aware 3D scene reconstruction and understanding. Extensive experiments in real-world scenarios demonstrate the effectiveness of our proposed approach, highlighting its potential to establish a robust and safety-focused framework for active robot exploration and 3D scene understanding. - กระดาษ
4. 3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration
Authors : Quentin Herau, Moussab Bennehar, Arthur Moreau, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux
เชิงนามธรรม
Reliable multimodal sensor fusion algorithms re- quire accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high compu- tational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new ren- dering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset. - กระดาษ
5. HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Authors : Zhuopeng Li, Yilin Zhang, Chenming Wu, Jianke Zhu, Liangjun Zhang
เชิงนามธรรม
The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas. To overcome these challenges, we propose a hybrid optimization method named HO-Gaussian, which combines a grid-based volume with the 3DGS pipeline. HO-Gaussian eliminates the dependency on SfM point initialization, allowing for rendering of urban scenes, and incorporates the Point Densitification to enhance rendering quality in problematic regions during training. Furthermore, we introduce Gaussian Direction Encoding as an alternative for spherical harmonics in the rendering pipeline, which enables view-dependent color representation. To account for multi-camera systems, we introduce neural warping to enhance object consistency across different cameras. Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets. - กระดาษ
6. SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Authors : Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun
เชิงนามธรรม
Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views. - กระดาษ
7. BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting
Authors : Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang
เชิงนามธรรม
Image-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. To address these limitations, we propose