멋진 3D 가우스 스플래팅 리소스
앞으로 몇 달 동안 예상되는 연구 급증에 보조를 맞추기 위해 3D Gaussian Splatting에 초점을 맞춘 엄선된 논문 및 오픈 소스 리소스 목록입니다. 추가 사항이나 제안 사항이 있으면 언제든지 알려주십시오. 블로그 게시물, 비디오 등과 같은 추가 리소스도 환영합니다.
목차
- 3D 객체 감지
- 자율주행
- 아바타
- 고전 작품
- 압축
- 확산
- 역학 및 변형
- 편집
- 언어 임베딩
- 메쉬 추출 및 물리학
- 기타
- 정규화 및 최적화
- 표현
- 리뷰
- 쾅
- 부족한
- 내비게이션 및 자율주행
- 포즈
- 대규모
- 오픈 소스 구현
- 참조
- 비공식 구현
- 2D 가우스 스플래팅
- 게임 엔진
- 시청자
- 유용
- 지도 시간
- 뼈대
- 다른
업데이트 로그:
2024년 10월 24일
2024년 10월 16일
2024년 9월 7일
2024년 5월 10일
- 18개 논문 추가: Z-Splat, Dual-Camera, StylizedGS, Hash3D, Revisiting Densification, Gaussian Pancakes, 3D-aware Deformable Gaussians, SpikeNVS, Zero-shot PC 완성, SplatPose, DreamScene360, RealmDreamer, Gaussian-ILC, GGS를 사용한 강화 학습 , GoMAvatar, OccGaussian, LoopGaussian, 검토
2024년 4월 11일
2024년 4월 9일
2024년 4월 8일
- 3개 논문 추가: Robust Gaussian Splatting, SC4D 및 MM-Gaussian
2024년 4월 5일
- 5개 논문 추가: Surface Reconstruction, TCLC-GS, GaSpCT, OmniGS 및 Per-Gaussian Embedding
- 수정
2024년 4월 2일
- 11개 논문 추가: HO, SGD, HGS, Snap-it, InstantSplat, 3DGSR, MM3DGS, HAHA, CityGaussain, Mirror-3DGS 및 Feature Splatting
2024년 3월 30일
- 8개 논문 추가: 모델링 불확실성, GRM, Gamba, CoherentGS, TOGS, SA-GS 및 GaussianCube
2024년 3월 27일
- 기타 구현 추가: 360-gaussian-splatting
- CVPR '24 라벨이 추가됨
- 5개 논문 추가: Comp4D, DreamPolisher, DN-Splatter, 2D GS 및 Octree-GS
2024년 3월 26일
- 13개 논문 추가: latentSplat, GS on the Move, RadSplat, Mini-Splatting, SyncTweedies, HAC, STAG4D, EndoGSLAM, Pixel-GS, Semantic Gaussians, Gaussian in the Wild, CG-SLAM 및 GSDF
2024년 3월 24일 :
2024년 3월 20일 :
- 4개 논문 추가: GVGEN, HUGS, RGBD GS-ICP SLAM 및 High-Fidelity SLAM
2024년 3월 19일 :
- 포인트릭스 추가
- 원저자가 3DGS 튜토리얼을 추가했습니다.
- GauStudio를 추가했습니다.
- 23개 논문 추가: Touch-GS, GGRt, FDGaussian, SWAG, Den-SOFT, Gaussian-Flow, View-Condependent 3D Editing, BAGS, GeoGaussian, GS-Pose, Analytic-Splatting, Seamless 3D Maps, Texture-GS, Recent Advances 3DGS, Dense Visual SLAM을 위한 Compact 3DGS, BrightDreamer, 3DGS-Reloc, Beyond 불확실성, 모션 인식 3DGS, Fed3DGS, GaussNav, 3DGS-Calib 및 NEDS-SLAM
2024년 3월 17일 :
- 3DGS.cpp(원래 VulkanSplatting)에 대한 저장소 이름 및 링크 업데이트
2024년 3월 16일 :
- 스플랫TV
- 6개 논문 추가: GaussianGrasper, 새로운 분할 알고리즘, 제어 가능한 텍스트-3D 생성, Spring-Mass 3DGS, Hyper-3DGS 및 DreamScene
2024년 3월 14일 :
- 6개 논문 추가: SemGauss, StyleGaussian, Gaussian Splatting in Style, GaussCtrl, GaussianImage 및 RAIN-GS
2024년 3월 8일 :
- 튜토리얼: 3DGS용 이미지 캡처 방법
- 6개 논문 추가: SplattingAvatar, DNGaussian, Radiative Gaussians, BAGS, GSEdit 및 ManiGaussian
2024년 3월 8일 :
2024년 3월 6일 :
2024년 3월 5일 :
- 1개 논문 추가됨: 3DGStream
- 코드 릴리스
- 새로운 뷰어가 추가되었습니다
2024년 3월 2일 :
- 논문 1개 추가: 애니메이션 및 텍스처링을 위한 3D 가우스 모델
- 새로운 섹션: 3DGS도 가르치는 코스.
2024년 2월 28일 :
2024년 2월 27일 :
- 2개 논문 추가: Spec-Gaussian 및 GEA
- SC-GS 코드 출시
2024년 2월 24일 :
- 논문 2개 추가됨: 불필요한 가우시안 식별 및 Gaussian Pro
2024년 2월 23일 :
- EndoGS: Gaussian Splatting을 사용한 변형 가능한 내시경 조직 재구성에 대한 수정된 저자 및 업데이트된 초록
2024년 2월 21일 :
2024년 2월 20일 :
- GaussianObject 코드 출시
- 논문 1개 추가: GaussianHair
2024년 2월 19일 :
2024년 2월 16일 :
- 2개 논문 추가: IM-3D 및 GES
- GaMeS 코드 출시
2024년 2월 14일 :
- 추가된 뷰어: VulkanSplatting - C++ 및 Vulkan Compute의 크로스 플랫폼 고성능 3DGS 렌더러
2024년 2월 13일 :
- 코드 릴리스: (2024년 1월 16일) 4D 가우스 스플래팅을 사용한 실시간 사실적 동적 장면 표현 및 렌더링
- 3개의 논문이 추가되었습니다: 새로운 비전 시대로서의 3DGala, ImplicitDeepFake 및 3D Gaussians.
2024년 2월 9일 :
2024년 2월 8일 :
- 3개 논문 추가됨: Rig3DGS, 메시 기반 GS 및 LGM 2024년 2월 6일 :
- 논문 2개 추가: SGS-SLAM 및 4D Gaussian Splatting
2024년 2월 5일 :
- SWAGS를 동적 및 변형 섹션으로 이동했습니다.
- 논문 2개 추가: GaussianObject 및 GaMeSh
- GS++의 이름이 최적 투영으로 변경되었습니다.
2024년 2월 2일 :
- 6개 논문 추가: VR-GS, Segment Anything, Gaussian Splashing, GS++, 360-GS 및 StopThePop
- TRIPS 코드 출시
2024년 1월 30일 :
- 코드 변경: GaussianAvatars 코드가 비공개로 변경됨
2024년 1월 29일 :
- 2개의 논문 추가: LIV-GaussMap 및 TIP-Editor
2024년 1월 26일 :
- 철회된 논문 제거: 인간 동작의 고충실도 합성을 위한 애니메이션 가능한 3D 가우스
- 3개 논문 추가됨: EndoGaussians, PSAvatar 및 GauU-Scene
2024년 1월 25일 :
- 추가된 뷰어: Splatapult - C++ 및 OpenGL의 3d 가우스 스플래팅 렌더러, 테더링된 VR용 OpenXR과 함께 작동
2024년 1월 24일 :
- 유틸리티 추가: SideFX Houdini용 GSOP(Gaussian Splat Operators)
- 코드 릴리스: GaussianAvatars
2024년 1월 23일 :
- 3개 논문 추가: Amortized Gen3D, Deformable Endoscopic Tissues, Fast Dynamic 3D Object Generation
- 코드 릴리스: 애니메이션 가능한 아바타, 압축된 3D 가우스, GaussianAvatar
2024년 1월 13일 :
- 논문 4개 추가: CoSSegGaussians, TRIPS, 신경 문자를 위한 Gaussian Shadow Casting 및 DISTWAR
2024년 1월 9일 :
- 논문 1개 추가: 3D 가우스 스플래팅에 관한 조사(1차 조사)
2024년 1월 8일 :
- 4개의 논문이 추가되었습니다: SWAGS(이전에 추가하는 것을 잊은 2023년에 추가된 논문), 첫 번째 리뷰 논문, 압축된 3DGS 및 위성 기하학 특성화를 위한 응용 논문.
2024년 1월 7일 :
- 1 오픈 소스 구현: taichi-splatting - 작업은 원래 Taichi 3D Gaussian Splatting에서 파생되었으며 상당한 재구성 및 변경이 이루어졌습니다.
2024년 1월 5일 :
- 3개의 논문이 추가되었습니다: FMGS, PEGASUS 및 Repaint123.
2024년 1월 2일 :
- 1개의 논문이 추가되었습니다: Street Gaussians.
2024년 1월 2일 :
- 디블러링 가우시안 논문 링크가 업데이트되었습니다.
- SAGA 코드가 공개되었습니다.
- 2023년 논문 2개가 추가되었습니다: Text2Immersion 및 2D-Guided 3DG Segmentation.
- gsplat lib의 수학적 보충입니다.
- 카테고리에 연도를 추가하세요.
- GSM 코드가 공개되었습니다.
2023년 12월 29일 :
- 1개의 논문이 추가되었습니다(이전에는 누락된 것으로 보입니다): Gaussian-Head-Avatar.
- 블로그 게시물 머리 아바타가 추가되었습니다.
2023년 12월 29일 :
- 3개의 논문이 추가되었습니다: DreamGaussian4D, 4DGen 및 Spacetime Gaussian.
2023년 12월 27일 :
- 3개의 논문이 추가되었습니다: LangSplat, Deformable 3DGS 및 Human101.
- 블로그 게시물 추가됨: 3DGS 종합 검토.
2023년 12월 25일 :
- 단안/다중 뷰 동적 장면을 위한 효율적인 3D 가우스 표현 코드가 출시되었습니다.
- GPS-Gaussian 코드가 공개되었습니다.
2023년 12월 24일 :
- 2개의 논문이 추가되었습니다: Self-Organization Gaussian Grids and Gaussian Splitting.
- 더 복잡한 장면을 모델링하기 위해 가우시안 렌더링을 향상하기 위한 저장소가 추가되었습니다.
2023년 12월 21일 :
- 3개의 논문이 추가되었습니다: Splatter Image, pixelSplat 및 가우시안 정렬.
- Gaussian Grouping 코드가 공개되었습니다.
2023년 12월 19일 :
- 2개의 논문이 추가되었습니다: GAvatar 및 GauFRe.
2023년 12월 18일 :
- 유틸리티 추가: SpectacularAI - 다양한 3DGS 규칙에 대한 변환 스크립트.
- SuGaR 코드가 공개되었습니다.
2023년 12월 16일 :
- WebGL 뷰어 3: Gauzilla가 추가되었습니다.
2023년 12월 15일 :
- 4개의 논문이 추가되었습니다: DrivingGaussian, iComMa, Triplane 및 3DGS-Avatar.
- Relightable Gaussians 코드가 출시되었습니다.
2023년 12월 13일 :
- 5개의 논문이 추가되었습니다: Gaussian-SLAM, CoGS, ASH, CF-GS, Photo-SLAM.
2023년 12월 11일 :
- 2개의 논문이 추가되었습니다: 3D 생성을 위한 Gaussian Splatting SLAM 및 Denoising Scores.
- ScaffoldGS 코드가 출시되었습니다.
2023년 12월 8일 :
- 2개의 논문이 추가되었습니다: EAGLES 및 MonoGaussianAvatar.
2023년 12월 7일 :
- LucidDreamer 코드가 공개되었습니다.
- 9개의 논문이 추가되었습니다: Gaussian-Flow, Feature-3DGS, Gaussian-Avatar, FlashAvatar, Relightable 및 Deblurring Gaussians.
2023년 12월 5일 :
- 9개의 논문이 추가되었습니다: NeuSG, GaussianHead, GaussianAvatars, GPS-Gaussian, 단안 비강체 객체 재구성을 위한 신경 파라메트릭 가우스, SplaTAM, MANUS, Segment Any 및 언어 내장 3D 가우스.
2023년 12월 4일 :
- 8개 논문이 추가되었습니다: Gaussian Grouping, MD Splatting, DynMF, Scaffold-GS, SparseGS, FSGS, Control4D 및 SC-GS.
2023년 12월 1일 :
- 4개의 논문이 추가되었습니다: 효율적인 3D 인간 생성을 위한 Compact3D, GaussianShader, 주기적 진동 가우시안 및 가우시안 쉘 맵.
- 카테고리별로 목차를 작성하고 줄 바꿈을 추가했습니다.
2023년 11월 30일 :
- Unreal 게임 엔진 구현이 추가되었습니다.
- 5개의 논문이 추가되었습니다: LightGaussian, FisherRF, HUGS, HumanGaussian, CG3D 및 Multi Scale 3DGS.
2023년 11월 29일 :
- Point and Move 및 IR-GS라는 두 가지 논문이 추가되었습니다.
2023년 11월 28일 :
- GaussinEditor, Relightable Gaussians, GART, Mip-Splatting, HumanGaussian 등 5개 논문을 추가했습니다.
2023년 11월 27일 :
- Gaussian Editing 및 Compact 3D Gaussians라는 두 가지 논문이 추가되었습니다.
2023년 11월 25일 :
- 애니메이션 가능한 가우스 프로젝트가 추가되었습니다(논문은 아직 출시되지 않음).
2023년 11월 22일 :
- 3개의 새로운 GS 논문이 추가되었습니다: Animatable, Depth-Regularized, Monocular/Multi-view 3DGS.
- 몇 가지 고전적인 논문을 추가했습니다.
- LucidDreamer라는 또 다른 GS 보고서가 추가되었습니다.
2023년 11월 21일 :
- 3개의 새로운 GS 논문이 추가되었습니다: GaussianDiffusion, LucidDreamer, PhysGaussian.
- 2개의 GS 논문이 추가되었습니다: SuGaR, PhysGaussian.
2023년 11월 21일 :
2023년 11월 17일 :
- 게임 엔진 섹션에 PlayCanvas 구현을 추가했습니다.
2023년 11월 16일 :
- 변형 가능한 3D 가우시안 코드가 출시되었습니다.
- 구동 가능한 3D 가우스 아바타 종이가 추가되었습니다.
2023년 11월 8일 :
- 3DGS 구현 및 unsive/rsal 형식 논의에 대한 몇 가지 참고 사항입니다.
2023년 11월 4일 :
- 2D 가우스 스플래팅을 추가했습니다.
- 3D 가우스 스플래팅을 설명하는 매우 상세한(기술적) 블로그 게시물을 추가했습니다.
2023년 10월 28일 :
- 유틸리티 섹션이 추가되었습니다.
- 클라우드 비교 유틸리티에서 3DGS .ply 파일을 편집하기 위한 3DGS 변환기를 추가했습니다.
- 유틸리티에 대한 변환 지침이 포함된 Kapture(번들러에서 콜맵 모델로 변환용) 및 Kapture 이미지 자르기 스크립트를 추가했습니다.
2023년 10월 23일 :
- Python WebGL 뷰어 2가 추가되었습니다.
- 가우스 스플래팅(및 Unity 뷰어) 비디오 블로그에 소개를 추가했습니다.
2023년 10월 21일 :
- Python OpenGL 뷰어가 추가되었습니다.
- Typescript WebGPU 뷰어가 추가되었습니다.
2023년 10월 20일 :
- 초록을 읽을 수 있게 만들었습니다(하이픈 제거).
- Windows 튜토리얼을 추가했습니다.
- 기타 사소한 텍스트 수정.
- Jupyter 노트북 뷰어가 추가되었습니다.
2023년 10월 19일 :
- 실시간 사실적 동적 장면 표현을 위한 Github 페이지 링크가 추가되었습니다.
- 제목을 재정렬했습니다.
- 다른 비공식 구현을 추가했습니다.
- Nerfstudio gsplat 및 fast: C++/CUDA를 비공식 구현으로 이동했습니다.
- Nerfstudio, Blender, WebRTC, iOS 및 Metal 뷰어가 추가되었습니다.
2023년 10월 17일 :
- GaussianDreamer 코드가 공개되었습니다.
- 실시간 사실적인 동적 장면 표현이 추가되었습니다.
2023년 10월 16일 :
- 변형 가능한 3D 가우스 용지가 추가되었습니다.
- 동적 3D 가우시안 코드가 출시되었습니다. 2023년 10월 15일 : 처음 6개 논문이 포함된 초기 목록.
3D 가우스 스플래팅을 소개하는 세미나 논문:
실시간 래디언스 필드 렌더링을 위한 3D 가우스 스플래팅
저자 : Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis
추상적인
Radiance Field 방법은 최근 여러 사진이나 비디오로 캡처한 장면의 새로운 관점 합성에 혁명을 일으켰습니다. 그러나 높은 시각적 품질을 달성하려면 여전히 훈련 및 렌더링에 비용이 많이 드는 신경망이 필요하며, 최근의 더 빠른 방법은 필연적으로 품질과 속도를 절충합니다. (고립된 객체가 아닌) 무한하고 완전한 장면과 1080p 해상도 렌더링의 경우 현재 어떤 방법으로도 실시간 표시 속도를 달성할 수 없습니다. 경쟁력 있는 훈련 시간을 유지하면서 최첨단 시각적 품질을 달성할 수 있게 하고 중요한 것은 1080p 해상도에서 고품질 실시간(≥ 30fps) 새로운 뷰 합성을 허용하는 세 가지 핵심 요소를 소개합니다. 첫째, 카메라 보정 중에 생성된 희소 점부터 시작하여 빈 공간에서 불필요한 계산을 피하면서 장면 최적화를 위한 연속 체적 복사 필드의 바람직한 속성을 보존하는 3D 가우시안으로 장면을 표현합니다. 둘째, 3D 가우시안의 인터리브 최적화/밀도 제어를 수행하며, 특히 장면의 정확한 표현을 달성하기 위해 이방성 공분산을 최적화합니다. 셋째, 이방성 스플래팅을 지원하고 훈련을 가속화하며 실시간 렌더링을 허용하는 빠른 가시성 인식 렌더링 알고리즘을 개발합니다. 우리는 여러 확립된 데이터 세트에서 최첨단 시각적 품질과 실시간 렌더링을 시연합니다. ? 종이(저해상도) | ? 종이(고해상도) | 프로젝트 페이지 | 코드 | ? 짧은 발표 | ? 설명영상
3D 객체 감지
2024년
1. 3DGS-DET: 3D 객체 감지를 위한 경계 안내 및 상자 중심 샘플링으로 3D 가우스 스플래팅 강화
저자 : Yang Cao, Yuanliang Jv, Dan Xu
추상적인
NeRF(Neural Radiance Fields)는 새로운 뷰 합성에 널리 사용되며 3DOD(3D 객체 감지)에 적용되어 뷰 합성 표현을 통해 3D 객체 감지에 대한 유망한 접근 방식을 제공합니다. 그러나 NeRF는 본질적인 한계에 직면해 있습니다. (i) 암시적 특성으로 인해 3DOD에 대한 표현 용량이 제한되어 있고 (ii) 렌더링 속도가 느리다는 문제가 있습니다. 최근에는 3DGS(3D Gaussian Splatting)가 더 빠른 렌더링 기능으로 이러한 제한 사항을 해결하는 명시적인 3D 표현으로 등장했습니다. 이러한 장점에 영감을 받아 이 문서에서는 처음으로 3DOD에 3DGS를 도입하여 두 가지 주요 과제를 식별했습니다. (i) 가우시안 얼룩의 모호한 공간 분포 – 3DGS는 주로 2D 픽셀 수준 감독에 의존하므로 가우시안 얼룩의 3D 공간 분포가 불분명합니다. 3DOD를 방해하는 물체와 배경 간의 식별력이 부족합니다. (ii) 과도한 배경 얼룩 – 2D 이미지에는 종종 수많은 배경 픽셀이 포함되어 있어 배경을 나타내는 많은 잡음이 있는 가우스 얼룩이 있는 조밀하게 재구성된 3DGS로 이어져 감지에 부정적인 영향을 미칩니다. 과제(i)를 해결하기 위해 우리는 3DGS 재구성이 2D 이미지에서 파생된다는 사실을 활용하고 2D 경계 안내를 통합하여 가우시안 블롭의 공간 분포를 크게 향상시켜 개체와 개체 간의 명확한 차별화를 제공함으로써 우아하고 효율적인 솔루션을 제안합니다. 그들의 배경(그림 1 참조). 과제 (ii)를 해결하기 위해 우리는 2D 상자를 사용하여 3D 공간에서 객체 확률 분포를 생성하는 상자 중심 샘플링 전략을 제안합니다. 이를 통해 3D에서 효과적인 확률적 샘플링을 통해 더 많은 객체 블롭을 유지하고 노이즈가 있는 배경 블롭을 줄일 수 있습니다. 제안된 경계 안내 및 상자 중심 샘플링의 이점을 활용하는 최종 방법인 3DGS-DET는 추가 학습 가능한 매개변수를 도입하지 않고도 기본 파이프라인 버전에 비해 상당한 개선([email protected]에서 +5.6, [email protected]에서 +3.7)을 달성합니다. . 또한 3DGS-DET는 최첨단 NeRF 기반 방법인 NeRF-Det보다 훨씬 뛰어난 성능을 발휘하여 ScanNet 데이터 세트의 경우 [email protected]에서 +6.6, [email protected]에서 +8.1, 인상적인 +31.5의 개선을 달성했습니다. ARKITScenes 데이터세트의 경우 [email protected]입니다. 코드와 모델은 https://github.com/yangcaoai/3DGS-DET에서 공개적으로 제공됩니다. ? 종이 | 코드(아직 아님)
자율주행:
2024년:
1. 역동적인 도시 장면 모델링을 위한 거리 가우시안
저자 : Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng
추상적인
본 논문은 단안 영상을 통해 역동적인 도시 거리 장면을 모델링하는 문제를 해결하는 것을 목표로 합니다. 최근 방법은 추적된 차량 포즈를 통합하여 차량을 애니메이션화함으로써 NeRF를 확장하여 역동적인 도시 거리 장면의 사실적인 뷰 합성을 가능하게 합니다. 그러나 훈련 및 렌더링 속도가 느리다는 점과 추적된 차량 자세의 높은 정밀도에 대한 중요한 요구 사항이 결합되어 중요한 제한 사항이 있습니다. 이러한 모든 제한 사항을 해결하는 새로운 명시적 장면 표현인 Street Gaussians를 소개합니다. 구체적으로, 역동적인 도시 거리는 의미론적 로짓과 3D 가우시안을 갖춘 포인트 클라우드 세트로 표현되며, 각각은 전경 차량 또는 배경과 연관되어 있습니다. 전경 객체 차량의 역학을 모델링하기 위해 각 객체 포인트 클라우드는 동적 외관을 위한 동적 구형 조화 모델과 함께 최적화 가능한 추적 포즈로 최적화되었습니다. 명시적 표현을 통해 객체 차량과 배경을 쉽게 구성할 수 있으며, 이를 통해 교육 30분 이내에 장면 편집 작업과 133FPS(1066×1600 해상도) 렌더링이 가능합니다. 제안된 방법은 KITTI 및 Waymo Open 데이터 세트를 포함한 여러 가지 까다로운 벤치마크에서 평가됩니다. 실험에 따르면 제안된 방법은 모든 데이터 세트에서 최신 방법보다 일관되게 뛰어난 성능을 발휘합니다. 또한 제안된 표현은 기성 추적기의 포즈에만 의존함에도 불구하고 정확한 실제 포즈를 사용하여 달성한 것과 동등한 성능을 제공합니다. ? 종이 | 프로젝트 페이지 | 코드(아직 아님)
2. TCLC-GS: 주변 자율 주행 장면을 위한 긴밀하게 결합된 LiDAR-카메라 가우스 스플래팅
저자 : Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jan Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren
추상적인
도시 장면을 위한 대부분의 3D 가우스 스플래팅(3D-GS) 기반 방법은 3D LiDAR 포인트로 직접 3D 가우스를 초기화합니다. 이는 LiDAR 데이터 기능을 충분히 활용하지 못할 뿐만 아니라 LiDAR와 카메라 데이터를 융합할 때의 잠재적 이점을 간과합니다. 본 논문에서는 LiDAR와 카메라 센서의 결합된 장점을 완전히 활용하여 신속한 고품질 3D 재구성과 새로운 뷰 RGB/깊이 합성을 가능하게 하는 긴밀하게 결합된 새로운 LiDAR-카메라 가우스 스플래팅(TCLC-GS)을 설계합니다. TCLC-GS는 LiDAR 카메라 데이터에서 파생된 하이브리드 명시적(색상화된 3D 메시) 및 암시적(계층적 옥트리 기능) 3D 표현을 설계하여 스플래팅에 대한 3D 가우스 속성을 풍부하게 합니다. 3D Gaussian의 속성은 보다 완성도 높은 3D 형상 및 색상 정보를 제공하는 3D 메쉬와 정렬되어 초기화될 뿐만 아니라, 검색된 옥트리 암시적 특징을 통해 보다 광범위한 상황 정보를 부여받습니다. Gaussian Splatting 최적화 프로세스 중에 3D 메시는 감독으로 조밀한 깊이 정보를 제공하여 견고한 형상을 학습하여 교육 프로세스를 향상시킵니다. Waymo Open Dataset 및 nuScenes 데이터세트에 대해 수행된 종합적인 평가는 우리 방법의 최첨단(SOTA) 성능을 검증합니다. 단일 NVIDIA RTX 3090 Ti를 활용하는 이 방법은 빠른 훈련을 시연하고 도시 시나리오에서 1920x1280(Waymo) 해상도의 90FPS, 1600x900(nuScenes) 해상도의 120FPS에서 실시간 RGB 및 깊이 렌더링을 달성합니다. ? 종이
3. OmniRe: 옴니 도시 장면 재구성
저자 : Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, Yue Wang
추상적인
온디바이스 로그에서 충실도가 높고 역동적인 도시 장면을 효율적으로 재구성하기 위한 전체적인 접근 방식인 OmniRe를 소개합니다. 신경 복사 필드 또는 Gaussian Splatting을 사용하여 주행 시퀀스를 모델링하는 최근 방법은 도전적인 동적 장면을 재구성할 수 있는 가능성을 보여 주었지만 종종 보행자 및 기타 차량이 아닌 동적 행위자를 간과하여 동적 도시 장면 재구성을 위한 완전한 파이프라인을 방해합니다. 이를 위해 우리는 운전 로그의 다양한 동적 개체를 정확하고 전체 길이로 재구성할 수 있는 OmniRe라는 운전 장면을 위한 포괄적인 3DGS 프레임워크를 제안합니다. OmniRe는 가우시안 표현을 기반으로 동적 신경 장면 그래프를 구축하고 차량, 보행자, 자전거 타는 사람 등 다양한 동적 행위자를 모델링하는 여러 로컬 표준 공간을 구성합니다. 이 기능은 기존 방법과 비교할 수 없습니다. OmniRe를 사용하면 장면에 존재하는 다양한 개체를 전체적으로 재구성할 수 있으며, 이후 모든 배우가 실시간(~60Hz)에 참여하여 재구성된 시나리오를 시뮬레이션할 수 있습니다. Waymo 데이터세트에 대한 광범위한 평가는 우리의 접근 방식이 이전의 최첨단 방법보다 양적, 질적으로 큰 차이로 뛰어남을 보여줍니다. 우리는 우리의 작업이 재건 추진에 있어서 중요한 공백을 메우고 있다고 믿습니다. ? 종이 | 프로젝트 페이지 | 암호
2023년:
1. [CVPR '24] DrivingGaussian: 주변의 역동적인 자율 주행 장면을 위한 복합 가우시안 스플래팅
저자 : Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
추상적인
주변의 역동적인 자율주행 장면을 위한 효율적이고 효과적인 프레임워크인 DrivingGaussian을 제시합니다. 움직이는 객체가 있는 복잡한 장면의 경우 먼저 증분 정적 3D 가우시안을 사용하여 전체 장면의 정적 배경을 순차적이고 점진적으로 모델링합니다. 그런 다음 복합 동적 가우스 그래프를 활용하여 움직이는 여러 객체를 처리하고 각 객체를 개별적으로 재구성하고 장면 내에서 정확한 위치와 폐색 관계를 복원합니다. 또한 Gaussian Splatting에 앞서 LiDAR를 사용하여 장면을 더욱 세밀하게 재구성하고 파노라마 일관성을 유지했습니다. DrivingGaussian은 운전 장면 재구성에서 기존 방법보다 성능이 뛰어나며 충실도가 높고 다중 카메라 일관성을 갖춘 사실적인 서라운드 뷰 합성을 가능하게 합니다. ? 종이 | 프로젝트 페이지 | 코드(아직 아님)
2. [CVPR '24] HUGS: 가우시안 스플래팅을 통한 전체적인 도시 3D 장면 이해
저자 : Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao
추상적인
RGB 이미지를 기반으로 도시 장면을 전체적으로 이해하는 것은 어렵지만 중요한 문제입니다. 이는 새로운 뷰 합성을 가능하게 하는 기하학과 모양을 모두 이해하고, 의미 라벨을 구문 분석하고, 움직이는 객체를 추적하는 것을 포함합니다. 상당한 진전에도 불구하고 기존 접근 방식은 이 작업의 특정 측면에 초점을 맞추는 경우가 많으며 LiDAR 스캔 또는 수동으로 주석이 달린 3D 경계 상자와 같은 추가 입력이 필요합니다. 본 논문에서는 전체적인 도시 장면 이해를 위해 3D Gaussian Splatting을 활용하는 새로운 파이프라인을 소개합니다. 우리의 주요 아이디어는 움직이는 객체 포즈가 물리적 제약을 통해 정규화되는 정적 및 동적 3D 가우시안의 조합을 사용하여 기하학, 모양, 의미론 및 모션의 공동 최적화를 포함합니다. 우리의 접근 방식은 실시간으로 새로운 시점을 렌더링하여 높은 정확도로 2D 및 3D 의미 정보를 생성하고 3D 경계 상자 감지에 노이즈가 심한 시나리오에서도 동적 장면을 재구성하는 기능을 제공합니다. KITTI, KITTI-360 및 Virtual KITTI 2의 실험 결과는 우리 접근 방식의 효율성을 보여줍니다. ? 종이 | 프로젝트 페이지 | 암호
아바타:
2024년:
1. GaussianBody: 3D 가우스 스플래팅을 통한 옷을 입은 인간 재구성
저자 : Mengtian Li, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang
추상적인
본 연구에서는 3D Gaussian Splatting을 기반으로 GaussianBody라는 새로운 옷을 입은 인간 재구성 방법을 제안합니다. 비용이 많이 드는 신경 복사 기반 모델과 비교하여 3D Gaussian Splatting은 최근 훈련 시간 및 렌더링 품질 측면에서 뛰어난 성능을 보여주었습니다. 그러나 정적 3D 가우스 스플래팅 모델을 동적 인간 재구성 문제에 적용하는 것은 복잡하고 비강성 변형과 풍부한 천 디테일로 인해 쉽지 않습니다. 이러한 문제를 해결하기 위해 우리의 방법은 표준 공간과 관찰 공간에 걸쳐 동적 가우스를 연결하기 위해 명시적인 포즈 기반 변형을 고려하고, 정규화된 변환과 함께 물리적 기반 사전을 도입하면 두 공간 간의 모호성을 완화하는 데 도움이 됩니다. 훈련 과정에서 우리는 부정확한 초기 추정을 보상하기 위해 포즈 회귀를 업데이트하는 포즈 개선 전략과 회귀된 포인트 클라우드의 밀도를 향상시키기 위한 분할 규모 메커니즘을 추가로 제안합니다. 실험은 우리의 방법이 명시적인 기하학적 재구성과 함께 동적 옷을 입은 인체에 대한 고품질 세부 정보를 사용하여 최첨단 사실적인 소설 뷰 렌더링 결과를 얻을 수 있음을 검증합니다. ? 종이
2. PSAvatar: 3D 가우스 스플래팅을 사용한 실시간 머리 아바타 생성을 위한 점 기반 변형 가능한 모양 모델
저자 : Zhongyuan Zhao, Zhenyu Bao, Qing Li, Guoping Qiu, Kanglin Liu
추상적인
많은 발전에도 불구하고 실시간으로 충실도가 높은 머리 아바타 애니메이션을 구현하는 것은 여전히 어렵고 기존 방법은 속도와 품질 사이에서 절충해야 합니다. 3DMM 기반 방법은 안경, 헤어스타일과 같은 비얼굴 구조를 모델링하는 데 실패하는 경우가 많고, 신경 암시적 모델은 변형의 유연성이 없고 렌더링의 비효율성으로 인해 어려움을 겪습니다. 3D 가우시안은 기하 표현 및 복사 필드 재구성을 위한 유망한 기능을 보유하고 있는 것으로 입증되었지만 머리 아바타 생성에 3D 가우시안을 적용하는 것은 포즈와 표정 변경으로 인한 머리 모양 변화를 3D 가우시안으로 모델링하기 어렵기 때문에 여전히 큰 과제로 남아 있습니다. 본 논문에서는 이산 기하학적 프리미티브를 활용하여 파라메트릭 변형 가능한 모양 모델을 생성하고 미세한 세부 표현과 고충실도 렌더링을 위해 3D 가우스를 사용하는 애니메이션 가능한 머리 아바타 생성을 위한 새로운 프레임워크인 PSAvatar를 소개합니다. 파라메트릭 변형 가능 형상 모델은 향상된 표현 유연성을 달성하기 위해 3D 표현을 위해 메시 대신 점을 사용하는 PMSM(점 기반 변형 가능 형상 모델)입니다. PMSM은 먼저 표면과 메시의 샘플링을 통해 FLAME 메시를 점으로 변환하여 표면과 같은 구조뿐만 아니라 안경, 헤어스타일과 같은 복잡한 형상도 재구성할 수 있습니다. PMSM은 합성 분석 방식으로 이러한 점을 머리 모양과 정렬함으로써 미세한 세부 표현 및 외관 모델링을 위해 3D 가우시안을 활용하여 충실도가 높은 아바타를 생성할 수 있습니다. 우리는 PSAvatar가 다양한 주제의 충실도 높은 머리 아바타를 재구성할 수 있고 아바타가 실시간으로 애니메이션화될 수 있음을 보여줍니다(512 × 512 해상도에서 ≥ 25fps). ? 종이
3. Rig3DGS: 캐주얼 단안 비디오에서 제어 가능한 인물 사진 만들기
저자 : Alfredo Rivero, ShahRukh Athar, Zhixin Shu, Dimitris Samaras
추상적인
일반 스마트폰 비디오에서 제어 가능한 3D 인물 사진을 만드는 것은 AR/VR 애플리케이션의 엄청난 가치로 인해 매우 바람직합니다. 최근 3DGS(3D Gaussian Splatting)가 개발되면서 렌더링 품질과 훈련 효율성이 향상되었습니다. 그러나 고품질 렌더링을 달성하기 위해 단일 뷰 캡처에서 머리 움직임과 얼굴 표정을 정확하게 모델링하고 분리하는 것은 여전히 어려운 과제로 남아 있습니다. 이 문서에서는 이 문제를 해결하기 위해 Rig3DGS를 소개합니다. 우리는 표준 공간에서 3D 가우시안 세트를 사용하여 동적 주제를 포함한 전체 장면을 표현합니다. 머리 자세 및 표정과 같은 일련의 제어 신호를 사용하여 이를 학습된 변형을 통해 3D 공간으로 변환하여 원하는 렌더링을 생성합니다. 우리의 핵심 혁신은 3D 변형 가능 모델에서 파생된 학습 가능한 사전 학습에 따라 신중하게 설계된 변형 방법입니다. 이 접근 방식은 훈련에 매우 효율적이며 다양한 캡처에서 얼굴 표정, 머리 위치 및 뷰 합성을 제어하는 데 효과적입니다. 우리는 광범위한 정량적, 정성적 실험을 통해 학습된 변형의 효과를 입증합니다. ? 종이 | 프로젝트 페이지
4. HeadStudio: 3D 가우스 스플래팅을 사용하여 텍스트를 애니메이션 가능한 머리 아바타로 변환
저자 : Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang
추상적인
텍스트 프롬프트에서 디지털 아바타를 만드는 것은 오랫동안 바람직하면서도 어려운 작업이었습니다. 최근 연구에서 2D 확산 이전을 통해 얻은 유망한 결과에도 불구하고 현재 방법은 고품질의 애니메이션 아바타를 효과적으로 달성하는 데 어려움을 겪고 있습니다. 본 논문에서는 3D 가우스 스플래팅을 활용하여 텍스트 프롬프트에서 사실적이고 애니메이션이 적용된 아바타를 생성하는 새로운 프레임워크인 HeadStudio를 소개합니다. 우리의 방법은 의미론적으로 3D 가우스를 구동하여 중간 FLAME 표현을 통해 유연하고 달성 가능한 모양을 만듭니다. 특히 FLAME을 3D 표현과 점수 증류에 통합합니다. 1) FLAME 기반 3D 가우스 스플래팅, 각 점을 FLAME 메시에 리깅하여 3D 가우스 점을 구동합니다. 2) FLAME 기반 점수 증류 샘플링. FLAME 기반의 세밀한 제어 신호를 활용하여 텍스트 프롬프트에서 점수 증류를 안내합니다. 광범위한 실험을 통해 텍스트 프롬프트에서 애니메이션 가능한 아바타를 생성하고 시각적으로 매력적인 모습을 보여주는 HeadStudio의 효율성이 입증되었습니다. 아바타는 1024 해상도로 고품질 실시간(40fps 이상)의 새로운 뷰를 렌더링할 수 있습니다. 아바타는 실제 음성 및 비디오를 통해 원활하게 제어할 수 있습니다. 우리는 HeadStudio가 디지털 아바타 제작을 발전시키고 현재의 방법이 다양한 영역에 걸쳐 널리 적용될 수 있기를 바랍니다. ? 종이 | 프로젝트 페이지 | 코드(아직 아님)
5. ImplicitDeepfake: NeRF 및 Gaussian Splatting을 사용한 암시적 Deepfake 생성을 통한 그럴듯한 얼굴 교환
저자 : Georgii Stanishevskii, Jakub Steczkiewicz, Tomasz Szczepanik, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek
추상적인
수많은 새로운 딥러닝 기술이 컴퓨터 그래픽에 상당한 영향을 미쳤습니다. 가장 유망한 혁신 중에는 NeRF(Neural Radiance Fields)와 GS(Gaussian Splatting)의 최근 등장이 있습니다. NeRF는 새로운 뷰를 생성하기 위해 카메라 위치가 알려진 소수의 이미지를 사용하여 신경망 가중치로 객체의 모양과 색상을 인코딩합니다. 반면 GS는 객체의 특성을 가우스 분포 모음으로 인코딩하여 렌더링 품질 저하 없이 가속화된 학습 및 추론을 제공합니다. 이 두 가지 기술은 공간 컴퓨팅 및 기타 영역에서 많은 사용 사례를 발견했습니다. 반면 딥페이크 방식의 등장은 상당한 논란을 불러일으켰다. 이러한 기술은 실제 영상을 거의 모방하는 인공 지능 생성 비디오의 형태를 가질 수 있습니다. 생성 모델을 사용하면 얼굴 특징을 수정하여 실제 사람에게 놀라울 정도로 사실적인 모습을 보여주는 변경된 신원이나 얼굴 표정을 만들 수 있습니다. 이러한 논란에도 불구하고 딥페이크는 바람직한 품질의 아바타 생성 및 게임을 위한 차세대 솔루션을 제공할 수 있습니다. 이를 위해 우리는 이러한 모든 최신 기술을 결합하여 보다 그럴듯한 결과를 얻는 방법을 보여줍니다. 우리의 ImplicitDeepfake1은 고전적인 딥페이크 알고리즘을 사용하여 모든 훈련 이미지를 개별적으로 수정한 다음 수정된 얼굴에서 NeRF와 GS를 훈련시킵니다. 이러한 상대적으로 간단한 전략은 그럴듯한 3D 딥페이크 기반 아바타를 생성할 수 있습니다. ? 종이 | 코드(아직 아님)
6. GaussianHair: 빛 인식 가우시안을 사용한 헤어 모델링 및 렌더링
저자 : Haimin Luo, Min Ouyang, Zijun Zhao, Suyi Jiang, Longwen Zhang, Qixuan Zhang, Wei Yang, Lan Xu, Jingyi Yu
추상적인
헤어스타일은 언뜻 보면 문화와 민족성을 반영합니다. 디지털 시대에는 다양한 사실적인 인간 헤어스타일도 아름다움과 포용성을 위한 충실도 높은 디지털 인간 자산에 매우 중요합니다. 그러나 사실적인 머리카락 모델링과 애니메이션을 위한 실시간 렌더링은 엄청난 수의 가닥, 복잡한 기하학적 구조, 빛과의 정교한 상호 작용으로 인해 엄청난 과제입니다. 이 논문은 새롭고 명시적인 머리카락 표현인 GaussianHair를 제시합니다. 이미지에서 머리카락 형상과 모양을 포괄적으로 모델링하여 혁신적인 조명 효과와 동적 애니메이션 기능을 육성할 수 있습니다. GaussianHair의 핵심은 각 머리카락 가닥을 일련의 연결된 원통형 3D 가우시안 프리미티브로 표현하는 새로운 개념입니다. 이 접근 방식은 머리카락의 기하학적 구조와 모양을 유지할 뿐만 아니라 2D 이미지 평면에 대한 효율적인 래스터화를 허용하여 차별화 가능한 체적 렌더링을 촉진합니다. 우리는 머리카락 가닥의 가느다란 구조를 재현하고 균일한 조명에서 로컬 확산 색상을 정확하게 캡처하는 데 능숙한 "GaussianHair Scattering Model"을 사용하여 이 모델을 더욱 향상시켰습니다. 우리는 광범위한 실험을 통해 가우시안 헤어가 기하학적 및 외관 충실도에서 돌파구를 달성하여 모발 재건을위한 최첨단 방법에서 발생하는 한계를 초월한다는 것을 입증합니다. Gaussianhair는 표현을 넘어서 모발의 편집, relighting 및 동적 렌더링을 지원하기 위해 확장되어 기존 CG 파이프 라인 워크 플로와 완벽하게 통합됩니다. 이러한 발전을 보완하면서, 우리는이 분야에서 추가 연구를 추진하기 위해 세 심하게 상세한 가닥 지오메트리를 가진 실제 인간 머리의 광범위한 데이터 세트를 편집했습니다. ? 종이
7. GVA : 단안 비디오에서 생생한 3D 가우시안 아바타 재구성
저자 : Xinqi Liu, Chenming Wu, Jialun Liu, Xing Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang
추상적인
이 논문에서 우리는 단안 비디오 입력 (GVA)에서 생생한 3D 가우시안 아바타의 생성을 용이하게하는 새로운 방법을 제시합니다. 우리의 혁신은 고 충실한 인체 재건을 전달하고 3D 가우시안을 인간의 피부 표면과 정확하게 정렬하는 복잡한 도전을 해결하는 데 있습니다. 이 백서의 주요 기여는 두 가지입니다. 첫째, 우리는 일반 맵과 실루엣을 정렬하여 손과 발 포즈 정확도를 향상시키기위한 포즈 정제 기술을 도입합니다. 정확한 자세는 올바른 모양과 외관 재구성에 중요합니다. 둘째, 우리는 아바타 표면과의 3D 가우시안 점의 정확한 정렬을 보장하는 새로운 표면 유도 재 입력 방법을 통해 이전에 3D 가우시안 아바타의 품질을 감소시킨 불균형 집계 및 초기화 바이어스의 문제를 다룹니다. 실험 결과 우리의 제안 된 방법이 높은 충실도와 생생한 3D 가우시안 아바타 재건을 달성한다는 것을 보여줍니다. 광범위한 실험 분석은 성능을 정 성적으로 정량적으로 검증하여 인체와 손 포즈에 대한 세밀한 제어를 제공하면서 사진 현실 소설 뷰 합성에서 최첨단 성능을 달성 함을 보여줍니다. ? 종이 | 프로젝트 페이지 | 코드 (아직 아님)
8.
저자 : Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang
추상적인
우리는 삼각형 메쉬에 내장 된 가우시안 스플릿이 포함 된 Photorealistic Human Avatar의 하이브리드 3D 표현 인 Splattingavatar를 제시하며, 이는 최신 GPU에서 300fps 이상, 모바일 장치의 30fps를 렌더링합니다. 우리는 가우시안 스플릿을 사용한 명백한 메쉬 지오메트리와 암시 적 모양 모델링으로 가상 인간의 움직임과 모양을 풀립니다. 가우시안은 바리 센트리 좌표와 삼각형 메쉬의 Phong 표면으로 변위에 의해 정의됩니다. 삼각형 메쉬를 걷는 동안 가우스의 매개 변수를 동시에 최적화하기 위해 리프트 최적화를 확장합니다. Splattingavatar는 메쉬가 저주파 운동 및 표면 변형을 나타내는 가상 인간의 하이브리드 표현이며, 가우스는 고주파 기하학과 상세한 외관을 인수합니다. 모션을 위해 MLP 기반 선형 블렌드 스키닝 (LBS) 필드에 의존하는 기존 변형 방법과 달리 메시에 의해 가우시안의 회전 및 번역을 직접 제어하여 다양한 애니메이션 기술, 예를 들어 골격 애니메이션, 블렌드 셰이프와 호환성을 제공합니다. 및 메쉬 편집. 전신 및 헤드 아바타를위한 단안 비디오로 훈련 할 수있는 Splattingavatar는 여러 데이터 세트에서 최첨단 렌더링 품질을 보여줍니다. ? 종이 | 프로젝트 페이지 | 코드 | ? 짧은 프리젠 테이션
9. Splatface : Gaussian Splat Face Rebonstruction 최적화 가능한 표면을 활용
저자 : Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang
추상적인
우리는 정확한 사전 결정된 지오메트리에 의존하지 않고 3D 인간 얼굴 재건을 위해 설계된 새로운 가우시안 플래팅 프레임 워크 인 Splatface를 제시합니다. 우리의 방법은 고품질 소설 뷰 렌더링과 정확한 3D 메쉬 재구성을 동시에 제공하도록 설계되었습니다. 표면 기하학적 구조를 제공하기 위해 제네릭 3D 형태성 모델 (3DMM)을 통합하여 제한된 입력 이미지 세트로 얼굴을 재구성 할 수 있습니다. 우리는 시너지 효과적인 비 강성 정렬 프로세스를 통해 가우스와 형태 표면을 모두 개선하는 관절 최적화 전략을 소개합니다. 가우스 위치와 공분산을 모두 고려하여 정렬을 개선하기 위해 새로운 거리 메트릭 인 Splat-to-surface가 제안된다. 표면 정보는 또한 세계 공간 치밀화 공정을 통합하는 데 사용되어 재구성 품질이 우수합니다. 우리의 실험 분석은 제안 된 방법이 새로운 뷰 합성에서 다른 가우시안 스플릿 기술과 높은 기하학적 정밀도를 갖는 3D 얼굴 메쉬를 생성하는 다른 3D 재구성 방법과 경쟁력이 있음을 보여줍니다. ? 종이
10. 하하 : 텍스처 메쉬가있는 고도로 조음 된 가우스 인간 아바타
저자 : Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang
추상적인
우리는 단안 입력 비디오에서 애니메이션 가능한 인간 아바타 세대를위한 새로운 접근 방식 인 haha를 제시합니다. 제안 된 방법은 효율적이고 높은 충실도 렌더링을 위해 가우시안 스플 래팅 사용과 질감의 메쉬 사이의 상충 관계를 배우는 데 의존합니다. 우리는 SMPL-X 파라 메트릭 모델을 통해 전신 인간 아바타를 애니메이션하고 전신 인간 아바타를 제어하는 효율성을 보여줍니다. 우리의 모델은 머리카락과 메시 외 의류와 같이 필요한 SMPL-X 메쉬의 영역에서만 가우스 스플릿을 적용하는 법을 배웁니다. 이로 인해 전체 아바타를 나타내는 데 사용되는 가우스의 최소 수가 발생하고 렌더링 아티팩트가 줄어 듭니다. 이를 통해 전통적으로 무시되는 손가락과 같은 작은 신체 부위의 애니메이션을 처리 할 수 있습니다. 우리는 Snapshotpeople과 X-Humans의 두 개의 열린 데이터 세트에 대한 접근 방식의 효과를 보여줍니다. 우리의 방법은 가우스의 3 분의 1 미만을 사용하면서 Snapshotpeople의 최첨단에 대한 PAR 재구성 품질을 보여줍니다. Haha는 X-Humans의 새로운 포즈에 대한 이전의 최신 작용을 정량적으로나 질적으로 능가합니다. ? 종이
11.
저자 : Florian Barthel, Arian Beckmann, Wieland Morgenstern, Anna Hilsmann, Peter Eisert
추상적인
EG3D 또는 Giraffe와 같은 NERF 기반 3D 인식 생성 적대 네트워크는 대규모 표현 품종에서 매우 높은 렌더링 품질을 보여주었습니다. 그러나 Neural Radiance Fields로 렌더링하면 대부분의 3D 애플리케이션에 대해 몇 가지 과제가됩니다. 첫째, NERF 렌더링의 중요한 계산 요구는 모바일 및 VR/AR 헤드셋과 같은 저전력 장치에서의 사용을 배제합니다. 둘째, 신경망을 기반으로하는 암시 적 표현은 VR 환경이나 비디오 게임과 같은 명백한 3D 장면에 통합하기가 어렵습니다. 3D Gaussian Splatting (3DG)은 높은 프레임 속도로 효율적으로 렌더링 할 수있는 명백한 3D 표현을 제공함으로써 이러한 한계를 극복합니다. 이 작업에서, 우리는 NERF 기반 3D 인식 생성 적대 네트워크의 높은 렌더링 품질을 3DG의 유연성 및 계산 장점과 결합하는 새로운 접근법을 제시합니다. 암시 적 NERF 표현을 명백한 3D 가우시안 플래팅 속성에 맵핑하는 디코더를 훈련함으로써, 우리는 3D GAN의 표현 다양성과 품질을 처음으로 3D 가우시안 분쇄 생태계에 통합 할 수 있습니다. 또한, 우리의 접근 방식은 3D 가우시안 분쇄 장면으로 고해상도 GAN 역전 및 실시간 간 편집을 허용합니다. ? 종이 | 프로젝트 페이지 | 암호
12. Gomavatar : Gaussians-on-Mesh를 사용한 단안 비디오의 효율적인 애니메이션 인간 모델링
저자 : Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing, Shenlong Wang
추상적인
우리는 실시간, 메모리 효율적, 고품질 애니메이션 인간 모델링을위한 새로운 접근법 인 Gomavatar를 소개합니다. Gomavatar는 단일 단안 비디오를 입력하여 새로운 포즈에서 다시 적용 할 수있는 디지털 아바타와 새로운 시점에서 실시간 렌더링을 만들 수있는 동시에 래스터 화 기반 그래픽 파이프 라인과 완벽하게 통합됩니다. 우리의 방법의 핵심은 가우시안 분할의 렌더링 품질과 속도를 결합한 하이브리드 3D 모델 인 Gaussians-on-Mesh 표현으로 기하학적 모델링과 변형 가능한 메쉬의 호환성을 결합한 하이브리드 3D 모델입니다. Zju-MoCap 데이터 및 다양한 YouTube 비디오에서 Gomavatar를 평가합니다. Gomavatar는 품질을 렌더링하는 현재 단체 인간 모델링 알고리즘을 일치 시키거나 능가하며 메모리 효율적이면서도 계산 효율 (43fps)에서 크게 성능이 우수합니다 (주제 당 3.63MB). ? 종이 | 프로젝트 페이지 | 암호
13. Occgaussian : 폐색 된 인간 렌더링을위한 3D 가우시안 스플릿
저자 : Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu
추상적인
단안 비디오에서 동적 3D 인간을 렌더링하는 것은 가상 현실 및 디지털 엔터테인먼트와 같은 다양한 응용 프로그램에 중요합니다. 대부분의 방법은 사람들이 방해받지 않는 장면에 있다고 가정하고, 다양한 물체는 실제 시나리오에서 신체 부위의 폐색을 일으킬 수 있습니다. 이전 방법은 표면 렌더링을 위해 NERF를 사용하여 폐색 된 영역을 복구하기 위해 하루 이상 훈련하는 데 하루 이상, 렌더링하는 데 몇 초 이상이 필요하며 실시간 대화식 응용 프로그램의 요구 사항을 충족하지 못했습니다. 이러한 문제를 해결하기 위해 6 분 이내에 훈련 될 수있는 3D 가우시안 스플 래팅을 기반으로 Occgaussian을 제안하며 입력이 포함 된 최대 160fps의 고품질 인간 렌더링을 생성 할 수 있습니다. Occgaussian은 표준 공간에서 3D 가우시안 분포를 초기화하고 폐색 된 영역에서 폐색 기능 쿼리를 수행하면 집계 된 픽셀 정렬 기능이 추출되어 누락 된 정보를 보상합니다. 그런 다음 가우스 기능 MLP를 사용하여 폐색 영역을 더 잘 인식하기 위해 폐색 인식 손실 기능과 함께 기능을 추가 처리합니다. 시뮬레이션 및 실제 폐색에서의 광범위한 실험은 우리의 방법이 최첨단 방법에 비해 비슷하거나 탁월한 성능을 달성 함을 보여줍니다. 그리고 우리는 각각 250x와 800x의 훈련 및 추론 속도를 향상시킵니다. ? 종이
14. [CVPR '24] 보이지 않는 것 추측 : 부분 2D glimpses의 동적 3D 장면 재구성
저자 : Inhee Lee, Byungjun Kim, Hanbyul Joo
추상적인
이 논문에서는 단안 비디오 입력에서 3D로 세상과 여러 역동적 인 인간을 재구성하는 방법을 제시합니다. 주요 아이디어로서, 우리는 최근에 떠오르는 3D 가우시안 스플릿 (3D-GS) 표현을 통해 세계와 여러 인간을 대표하여 편리하고 효율적으로 구성하고 함께 구성 할 수 있습니다. 특히, 우리는 실제 세계에서 발생하는 일반적인 도전 인 3D 인간 재구성에서 심각하게 제한적이고 드문 관찰이있는 시나리오를 다룹니다. 이 도전에 대처하기 위해, 우리는 공용 공간의 희소 신호를 융합하여 표준 공간에서 3D-GS 표현을 최적화하는 새로운 접근법을 소개합니다. 관찰 된 2D 모양. 우리는 우리의 방법이 폐색, 이미지 작물, 소수의 샷 및 매우 드문 관찰의 존재하에 다양한 도전적인 예에서 고품질 애니메이션 가능한 3D 인간을 재구성 할 수 있음을 보여줍니다. 재구성 후, 우리의 방법은 임의의 시간 인스턴스에서 새로운 관점에서 장면을 렌더링 할뿐만 아니라 개별 인간을 제거하거나 각 인간에 대해 다른 동작을 적용하여 3D 장면을 편집 할 수 있습니다. 다양한 실험을 통해 대체 기존 접근법에 대한 방법의 품질과 효율성을 보여줍니다. ? 종이 | 프로젝트 페이지 | 암호
15. [Neurips '24] 일반화 가능하고 애니메이션 가우스 헤드 아바타
저자 : Xuangeng Chu, Tatsuya Harada
추상적인
이 논문에서 우리는 원샷 애니메이션 헤드 아바타 재건을 위해 일반화 가능하고 애니메이션 가능한 가우스 헤드 아바타 (Gagavatar)를 제안합니다. 기존 방법은 신경 방사 분야에 의존하여 렌더링 소비가 심하고 재연 속도가 낮습니다. 이러한 한계를 해결하기 위해 단일 전방 패스에서 단일 이미지에서 3D 가우시안의 매개 변수를 생성합니다. 우리의 작업의 주요 혁신은 제안 된 듀얼 리프팅 방법으로, 정체성과 얼굴 세부 사항을 포착하는 고 충실도 3D 가우스를 생성합니다. 또한 표현식 제어를위한 3D 가우시안을 구성하기 위해 글로벌 이미지 기능과 3D 형태 모델을 활용합니다. 훈련 후, 우리의 모델은 구체적인 최적화없이 보이지 않는 ID를 재구성하고 실시간 속도로 재연 렌더링을 수행 할 수 있습니다. 실험에 따르면 우리의 방법은 재구성 품질 및 발현 정확도 측면에서 이전 방법에 비해 우수한 성능을 나타냅니다. 우리는 우리의 방법이 미래의 연구 및 디지털 아바타의 발전을위한 새로운 벤치 마크를 설정할 수 있다고 생각합니다. ? 종이 | 프로젝트 페이지 | 암호
16.
저자 : Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu
추상적인
Volemetric Video는 비주얼 미디어의 혁신적인 발전을 나타내므로 사용자가 몰입 형 가상 경험을 자유롭게 탐색하고 디지털 세계와 실제 세계 사이의 격차를 좁힐 수 있습니다. 그러나 메쉬 시퀀스를 안정화시키기위한 광범위한 수동 개입의 필요성과 기존 워크 플로에서 지나치게 큰 자산의 생성이 더 넓은 채택을 방해합니다. 이 논문에서는 우수한 압축 비율로 복잡한 인간 성능의 실시간 및 고 충실도 재생을 위해 textit {dualgs}라는 새로운 가우스 기반 접근법을 제시합니다. Dualgs의 주요 아이디어는 해당 피부와 관절 가우시안을 사용하여 모션과 외관을 별도로 나타내는 것입니다. 이러한 명백한 분리는 운동 중복성을 크게 줄이고 시간적 일관성을 향상시킬 수 있습니다. 우리는 첫 번째 프레임에서 Dualgs를 초기화하고 Skin Gaussian을 공동 가우시안에 고정시키는 것으로 시작합니다. 그 후, 우리는 프레임 별 인간 성능 모델링을위한 거친-미세 훈련 전략을 사용합니다. 여기에는 전체 모션 예측을위한 거친 정렬 단계와 강력한 추적 및 고 충실도 렌더링을위한 세밀한 최적화가 포함됩니다. 볼륨 비디오를 VR 환경에 완벽하게 통합하기 위해 엔트로피 인코딩 및 지속적인 코드 북과 결합 된 코덱 압축을 사용하여 모션을 효율적으로 압축합니다. 우리의 접근 방식은 최대 120 배의 압축 비율을 달성하며 프레임 당 약 350kb의 저장이 필요합니다. 우리는 VR 헤드셋에서 사진 현실적이고 자유로운 시야 경험을 통해 표현의 효능을 보여 주므로 사용자는 음악가들을 공연을 몰입시키고 공연자의 손가락 끝에서 노트의 리듬을 느낄 수 있습니다. ? 종이 | 프로젝트 페이지 | ? 짧은 프리젠 테이션 | 데이터 세트
17.
저자 : Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, Lan Xu
추상적인
2D 비디오만큼이나 원활하게 고소한 볼륨 비디오를 경험하는 것은 오랜 꿈입니다. 그러나 현재 동적 3DGS 방법은 높은 렌더링 품질에도 불구하고 계산 및 대역폭 제약으로 인해 모바일 장치에서 스트리밍에 어려움을 겪고 있습니다. 이 논문에서는 동적 가우스의 스트리밍을 통해 고품질 모바일 렌더링을 가능하게하는 새로운 접근법 인 V^3 (볼륨 비디오보기)을 소개합니다. 우리의 주요 혁신은 다이나믹 3DG를 2D 비디오로보고 하드웨어 비디오 코덱의 사용을 용이하게하는 것입니다. 또한 빠른 교육 속도로 스토리지 요구 사항을 줄이기위한 2 단계 교육 전략을 제안합니다. 첫 번째 단계는 해시 인코딩 및 얕은 MLP를 사용하여 운동을 배우고, 스트리밍 요구 사항을 충족시키기 위해 가지 치기를 통해 가우시안의 수를 줄이고, 두 번째 단계는 잔류 엔트로피 손실과 시간적 손실을 사용하여 시간적 연속성을 향상시키는 다른 가우시안 속성을 미세 조정합니다. 모션과 외관을 분리하는이 전략은 소형 스토리지 요구 사항으로 높은 렌더링 품질을 유지합니다. 한편, 우리는 멀티 플랫폼 플레이어를 설계하여 2D 가우시안 비디오를 디코딩하고 렌더링합니다. 광범위한 실험은 v^3의 효과를 보여 주며, 이전에는 보이지 않는 일반적인 장치에서 고품질 렌더링 및 스트리밍을 가능하게하여 다른 방법을 능가합니다. 모바일 장치에서 Dynamic Gaussian을 최초로 스트리밍하는 Companion Player는 사용자에게 부드러운 스크롤 및 인스턴트 공유를 포함하여 전례없는 볼륨 비디오 경험을 제공합니다. 소스 코드가있는 프로젝트 페이지는이 HTTPS URL에서 사용할 수 있습니다. ? 종이 | 프로젝트 페이지 | ? 짧은 프리젠 테이션
2023년:
1. 운전 가능한 3D 가우스 아바타
저자 : Wojciech Zielonka, Timur Bagautdinov, Shunsuke Saito, Michael Zollhöfer, Justus Thies, Javier Romero
추상적인
우리는 가우시안 스플라트로 렌더링 된 인체를위한 최초의 3D 제어 가능한 모델 인 D3GA (Drivable 3D Gaussian Avatars)를 제시합니다. 현재의 Photorealistic Drivable Avatars는 훈련 중에 정확한 3D 등록, 테스트 중 조밀 한 입력 이미지 또는 둘 다를 필요로합니다. 신경 방사 분야에 기초한 것들은 또한 순간적 인 응용 분야에서 엄청나게 느려지는 경향이있다. 이 작품은 최근에 제시된 3D Gaussian Splatting (3DG) 기술을 사용하여 실시간 교정 된 멀티 뷰 비디오를 입력으로 사용하여 실시간 프레미터로 현실적인 인간을 렌더링합니다. 이러한 프리미티브를 변형시키기 위해, 우리는 일반적으로 사용되는 포인트 변형 선형 블렌드 스키닝 (LBS)에서 출발하고 고전적인 볼륨 변형 방법 인 케이지 변형을 사용합니다. 크기가 작기 때문에 이러한 변형은 공동 각도 및 키패 인으로 구동됩니다.이 변형은 통신 응용 프로그램에 더 적합합니다. 다양한 신체 모양, 옷 및 운동을 가진 9 명의 피험자에 대한 실험은 동일한 교육 및 테스트 데이터를 사용할 때 최첨단 방법보다 고품질 결과를 얻습니다. ? 종이 | 프로젝트 페이지 | ? 짧은 프리젠 테이션
2. Splatarmor : 단안 RGB 비디오에서 애니메이션 가능한 인간을위한 관절 가우시안 스플릿
저자 : Rohit Jena, Ganesh Subramanian Iyer, Siddharth Choudhary, Brandon Smith, Pratik Chaudhari, James Gee
추상적인
우리는 3D 가우시안을 가진 매개 변수화 된 바디 모델을 'Armoring'하여 상세하고 애니메이션 가능한 인간 모델을 회복하기위한 새로운 접근법 인 Splatarmor를 제안합니다. 우리의 접근법은 인간을 표준 공간 내에서 3D 가우시안 세트로 나타냅니다.이 공간은 기본 SMPL 기하학의 스키닝을 표준 공간의 임의의 위치로 확장하여 정의됩니다. 자세 의존적 효과를 설명하기 위해, 우리는 SE (3) 필드를 소개하여 가우스의 위치와 이방성을 모두 포착 할 수 있습니다. 또한, 우리는이 가우시안의 정확한 위치에 대한 색상 규칙화 및 3D 감독을 제공하기 위해 신경 색 필드를 사용하도록 제안합니다. 우리는 가우스 스플 래팅이 일반적으로 이러한 접근법에 직면 한 비 분리성 및 최적화 문제에 직면하지 않고 래스터 렌즈 프리미티브를 활용하여 신경 렌더링 기반 방법에 대한 흥미로운 대안을 제공한다는 것을 보여줍니다. Rasterization 패러다임을 통해 우리는 앞으로 스키닝을 활용할 수 있으며 역 스키닝 및 뒤틀림과 관련된 모호성으로 어려움을 겪지 않습니다. 우리는 Zju MoCap 및 People Snapshot 데이터 세트에 대한 설득력있는 결과를 보여 주며, 이는 제어 가능한 인간 합성 방법의 효과를 강조합니다. ? 종이 | 프로젝트 페이지 | 코드 (아직 아님)
3. [CVPR '24] 애니메이션 가우스 : 고 충실한 인간 아바타 모델링을위한 학습 자세 의존 가우스지도 학습
저자 : Zhe Li, Zerong Zheng, Lizhen Wang, Yebin Liu
추상적인
RGB 비디오에서 애니메이션 가능한 휴먼 아바타를 모델링하는 것은 오랜 시간이 지남에 따라 도전적인 문제입니다. 최근의 작품은 일반적으로 3D 인간을 대표하기 위해 MLP 기반 신경 방사선 필드 (NERF)를 채택하지만 순수한 MLP가 포즈 의존적 의류 세부 사항을 회귀하기가 어렵습니다. 이를 위해, 우리는 강력한 2D CNN과 3D 가우시안 스플릿을 활용하여 고 충실한 아바타를 만드는 새로운 아바타 표현 인 애니메이션 가우시안을 소개합니다. 3D 가우시안을 애니메이션 가능한 아바타와 연결하기 위해 입력 비디오에서 매개 변수 템플릿을 배우고 각 픽셀이 3D 가우시안을 나타내는 두 개의 전면 및 후면 표준 가우시안 맵에서 템플릿을 매개 변수화합니다. 배운 템플릿은 드레스와 같은 느슨한 옷을 모델링하기 위해 착용 의복에 적용됩니다. 이러한 템플릿 유도 2D 매개 변수화를 통해 강력한 Stylegan 기반 CNN을 사용하여 자세한 동적 모양을 모델링하기위한 포즈 의존적 가우시안 맵을 배울 수 있습니다. 또한, 우리는 새로운 포즈를 감안할 때 더 나은 일반화를위한 포즈 투영 전략을 소개합니다. 전반적으로, 우리의 방법은 역동적이고 현실적이고 일반화 된 모양으로 생명과 같은 아바타를 만들 수 있습니다. 실험에 따르면 우리의 방법은 다른 최첨단 접근법보다 성능이 뛰어납니다. ? 종이 | 프로젝트 페이지 | 암호
4. [CVPR '24] GART : GAUSSIAN 관절 템플릿 모델
저자 : Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis
추상적인
우리는 가우시안 관절 템플릿 모델 GART를 소개합니다. 이는 단조로운 비디오에서 캡처하고 렌더링하는 비 강렬한 관절 주제에 대한 명시적이고 효율적이며 표현적인 표현을 소개합니다. 가트는 변형 가능한 피험자의 형상과 외관을 명시 적으로 근사화하기 위해 움직이는 3D 가우시안의 혼합물을 사용합니다. 학습 가능한 전진 스키닝이있는 이전 (SMPL, SMAL 등)을 사용하면서 새로운 잠복 뼈를 사용한보다 복잡한 비 강성 변형으로 더 일반화하는 범주 형 템플릿 모델 (SMPL, SMAL 등)을 활용합니다. GART는 단안 비디오에서 차별화 가능한 렌더링을 통해 몇 초 또는 몇 분 안에 재구성 될 수 있으며 150fps보다 더 빠르게 소설 포즈로 렌더링 할 수 있습니다. ? 종이 | 프로젝트 페이지 | 코드 | ? 짧은 프리젠 테이션
5.
저자 : Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero
추상적인
이 작품은 멀티 뷰 비디오에서 배운 사진 학적 인체 아바타의 실시간 렌더링 문제를 해결합니다. 가상 인간을 모델링하고 렌더링하는 고전적인 접근 방식은 일반적으로 텍스처 메쉬를 사용하지만 최근의 연구는 인상적인 시각적 품질을 달성하는 신경 신체 표현을 개발했습니다. 그러나 이러한 모델은 실시간으로 렌더링하기가 어렵고 캐릭터가 신체 포즈로 애니메이션을 할 때 품질이 저하됩니다. 훈련 관찰과 다릅니다. 우리는 3D 가우시안 스플릿을 기반으로 한 애니메이션 가능한 인간 모델을 제안하며, 최근 신경 방사 분야에 대한 매우 효율적인 대안으로 나타났습니다. 신체는 정식 공간의 가우스 프리미티브 세트로 표현되며, 이는 앞으로의 스키닝 및 국소 비 강성 세련미를 결합한 거친 접근 방식으로 변형됩니다. 우리는 멀티 뷰 관찰에서 엔드 투 엔드 방식으로 인간 가우시안 스플릿 (Hugs) 모델을 배우는 방법을 설명하고 옷을 입은 신체의 새로운 포즈 합성을위한 최첨단 접근법에 대해 평가합니다. 우리의 방법은 Thuman4 데이터 세트의 최첨단 ART에 비해 1.5 dB PSNR 개선을 달성하면서 실시간으로 렌더링 할 수 있습니다 (512x512 해상도의 경우 80 FPS). ? 종이 | 프로젝트 페이지 | ? 짧은 프리젠 테이션
6. [CVPR '24] 포옹 : 인간 가우시안 splats
저자 : Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan
추상적인
신경 렌더링의 최근 발전으로 인해 훈련과 렌더링 시간이 몇 배나 향상되었습니다. 이러한 방법은 최첨단 품질과 속도를 보여 주지만, 정적 장면의 사진 측량을 위해 설계되었으며 환경에서 인간을 자유롭게 움직일 수있는 일반화하지 않습니다. 이 작품에서, 우리는 3D 가우시안 스플 래팅 (3DG)을 사용하여 장면과 함께 애니메이션 가능한 인간을 나타내는 인간 가우시안 스플라트 (Hugs)를 소개합니다. 우리의 방법은 적은 수의 (50-100) 프레임이있는 단안 비디오 만 필요하며, 30 분 이내에 정적 장면과 완전 애니메이션 가능한 인간 아바타를 자동으로 배웁니다. 우리는 SMPL 바디 모델을 사용하여 인간 가우시안을 초기화합니다. SMPL (예 : 천, 머리카락)이 모델링하지 않은 세부 사항을 캡처하기 위해 3D 가우스가 인체 모델에서 벗어날 수 있습니다. 애니메이션 인간을 위해 3D 가우시안을 활용하면 가우시안을 분명히 할 때 만들어진 유물을 포함하여 새로운 도전을 가져옵니다. 우리는 애니메이션 중에 개별 가우스의 움직임을 조정하기 위해 선형 혼합 스키닝 웨이트를 공동으로 최적화 할 것을 제안합니다. 우리의 접근법은 인간과 장면 모두의 인간과 새로운 시야 합성의 새로운 포인트 합성을 가능하게합니다. 우리는 60fps의 렌더링 속도로 최첨단 렌더링 품질을 달성하면서 이전 작업을 수행하는 데 ~ 100x 더 빠릅니다. ? 종이 | 프로젝트 페이지 | 코드 (아직 아님)
7. [CVPR '24] 효율적인 3D 인간 생성을위한 가우스 쉘 맵
저자 : Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan PO, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, Gordon Wetzstein
추상적인
가상 현실, 소셜 미디어 및 영화 제작을 포함한 여러 산업에서 효율적인 3D 디지털 인간이 중요합니다. 3D 생성 적대적 네트워크 (GANS)는 생성 된 자산에 대한 최첨단 (SOTA) 품질 및 다양성을 보여 주었다. 그러나 현재의 3D GAN 아키텍처는 일반적으로 볼륨 표현에 의존하여 렌더링이 느리게 진행되어 GAN 훈련을 방해하고 다중 뷰 비해 2D 업 샘플러가 필요합니다. 여기에서는 GSMS (Gaussian Shell Maps)를 SOTA Generator 네트워크 아키텍처를 명확한 멀티 쉘 기반 스캐 폴드를 사용하여 신흥 3D 가우시안 렌더링 프리미티브와 연결하는 프레임 워크로 소개합니다. 이 설정에서 CNN은 쉘에 매핑 된 기능이있는 3D 텍스처 스택을 생성합니다. 후자는 표준 신체 포즈에서 디지털 인간의 템플릿 표면의 팽창 및 디 플라스팅 버전을 나타냅니다. 쉘을 직접 래스터 화하는 대신, 우리는 텍스처 기능에서 속성이 인코딩 된 쉘에서 3D 가우시안을 샘플링합니다. 이 가우스 사람들은 효율적이고 차별적으로 렌더링됩니다. GAN 훈련 중에 껍질을 표현하는 능력이 중요하며, 추론 시간에는 신체를 임의의 사용자 정의 포즈로 변형시키는 것이 중요합니다. 우리의 효율적인 렌더링 체계는 viewnsistent upsamplers의 필요성을 우회하고 512 × 512 픽셀의 기본 해상도에서 고품질 멀티 뷰 일관 렌더링을 달성합니다. 우리는 GSM이 SHHQ 및 DeepFashion을 포함한 단일 뷰 데이터 세트에 대해 훈련 할 때 3D 인간을 성공적으로 생성한다는 것을 보여줍니다. ? 종이 | 프로젝트 페이지 | 암호
8. Gaussianhead : 학습 가능한 가우스 파생물이있는 고 충실도 헤드 아바타
저자 : Jie Wang, Jiu-Cheng Xie, Xianyan Li, Chi-Man Pun, Feng Xu, Hao Gao
추상적인
주어진 주제를 위해 생생한 3D 헤드 아바타를 구성하고 일련의 애니메이션을 실현하는 것은 가치가 있지만 도전적입니다. 이 논문은 가우시안 헤드를 제시하며, 이는 이방성 3D 가우시안으로 행동적인 인간 머리를 모델링합니다. 우리의 프레임 워크에서, 모션 변형 필드와 다중 해상도 트리 플레인은 각각 헤드의 동적 형상 및 복잡한 질감을 다루기 위해 구성됩니다. 특히, 우리는 각 가우시안에 대한 독점 파생 체계를 부과하는데, 이는 위치 변환을위한 학습 가능한 매개 변수 세트를 통해 다수의 도플 갱어를 생성합니다. 이 디자인을 통해 우리는 가우시안의 외관 정보, 심지어 머리의 특정 구성 요소에 정교한 구조를 갖춘 외관 정보를 압축하고 정확하게 인코딩 할 수 있습니다. 또한, 새로 추가 된 가우스에 대한 상속 된 파생 전략이 훈련 가속을 용이하게하기 위해 채택됩니다. 광범위한 실험에 따르면 우리의 방법은 고 충실도 렌더링을 생성 할 수 있으며, 재건, 교차 정체성 재연 및 새로운 관점 합성 작업에서 최첨단 접근 방식을 능가 할 수 있습니다. ? 종이 | 프로젝트 페이지 | 암호
9. [CVPR '24] Gaussianavatars : 조작 된 3D 가우스가있는 광자적 헤드 아바타
저자 : Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias Nießner
추상적인
우리는 표현, 포즈 및 관점에서 완전히 제어 할 수있는 광경 적 헤드 아바타를 만드는 새로운 방법 인 Gaussianavatars를 소개합니다. 핵심 아이디어는 파라 메트릭 형태 적 얼굴 모델에 조작 된 3D 가우시안 플랫을 기반으로 한 동적 3D 표현입니다. 이 조합은 사진 렌더링을 용이하게하면서 기본 매개 변수 모델을 통해 정확한 애니메이션 컨트롤 (예 : 구동 순서로부터의 표현 전송 또는 수동으로 변형 모델 매개 변수를 변경함으로써 정확한 애니메이션 제어를 허용합니다. 삼각형의 로컬 좌표 프레임으로 각 SPLAT를 매개 변수화하고보다 정확한 기하학적 표현을 얻기 위해 명시 적 변위 오프셋을 최적화합니다. 아바타 재구성 동안, 우리는 엔드 투 엔드 방식으로 형태 가능한 모델 매개 변수와 가우시안 SPLAT 매개 변수를 공동으로 최적화합니다. 우리는 몇 가지 도전적인 시나리오에서 사진 아바타의 애니메이션 기능을 보여줍니다. 예를 들어, 우리는 운전 비디오의 재연을 보여줍니다. 여기서 우리의 방법은 기존의 작품보다 상당한 마진을 능가합니다. ? 종이 | 프로젝트 페이지 | 코드 | ? 짧은 프리젠 테이션
10. [CVPR '24] GPS-Gaussian : 실시간 인간 소설 뷰 합성을위한 일반화 가능한 픽셀-3D 가우시안 스플릿
저자 : Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu
추상적인
우리는 실시간 방식으로 캐릭터의 새로운 견해를 합성하기 위해 GPS-Gaussian이라는 새로운 접근법을 제시합니다. 제안 된 방법을 사용하면 드문 뷰 카메라 설정에서 2k- 해상도 렌더링을 가능하게합니다. 개체 당 최적화를 필요로하는 원래 가우시안 플래팅 또는 신경 암시 적 렌더링 방법과 달리 소스보기에 정의 된 가우스 매개 변수 맵을 소개하고 미세 조정 또는 최적화없이 즉각적인 소설 뷰 합성을 위해 직접 가우시안 분할 속성을 회귀합니다. 이를 위해, 우리는 2D 매개 변수 맵을 3D 공간으로 들어 올리는 깊이 추정 모듈과 공동으로 가우스 매개 변수 회귀 모듈을 많은 양의 인간 스캔 데이터로 훈련시킵니다. 제안 된 프레임 워크는 완전히 차별화 가능하며 여러 데이터 세트에 대한 실험은 우리의 방법이 최첨단 방법을 능가하면서 렌더링 속도를 초과하는 것을 달성 함을 보여줍니다. ? 종이 | 프로젝트 페이지 | 코드 | ? 짧은 프리젠 테이션
11. Gauhuman : 단안 인간 비디오에서 나온 가우시안 분열
저자 : Shoukang Hu Ziwei Liu
추상적인
우리는 기존의 NERF 기반 암시 적 표현 모델링 프레임 워크를 요구하는 훈련 시간 및 초에 비해 빠른 훈련 (1 ~ 2 분) 및 실시간 렌더링 (최대 189 fps)을 위해 가우시안 스플릿을 가진 3D 인간 모델 인 Gauhuman을 제시합니다. 프레임 당 렌더링. 구체적으로, Gauhuman은 표준 공간에서 가우시안 스플릿을 인코딩하고 3D 가우시안을 정식 공간에서 선형 혼합 스키닝 (LBS)으로 공간으로 바꾸는데, 이는 효과적인 포즈 및 LBS 정제 모듈이 무시할만한 양성 비용하에있는 3D 인간의 미세한 세부 사항을 배우도록 설계되었습니다. 또한, Gauhuman의 빠른 최적화를 가능하게하기 위해 KL Divergence Guidance를 통해 분할/클로닝과 함께 추가 속도를 높이기위한 새로운 병합 작업을 통해 3D 인간으로 3D 가우시안을 초기화하고 자릅니다. ZJU_MOCAP 및 MONOCAP 데이터 세트에 대한 광범위한 실험은 GauHuman이 빠른 교육 및 실시간 렌더링 속도로 최첨단 성능을 정량적으로 그리고 질적으로 달성 함을 보여줍니다. Notably, without sacrificing rendering quality, GauHuman can fast model the 3D human performer with ~13k 3D Gaussians. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
12. HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Authors : Helisa Dhamo, Yinyu Nie, Arthur Moreau, Jifei Song, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero
추상적인
3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by the advances in differentiable rendering and neural radiance fields. Real-time rendering is a highly desirable goal for real-world applications. We propose HeadGaS, the first model to use 3D Gaussian Splats (3DGS) for 3D head reconstruction and animation. In this paper we introduce a hybrid model that extends the explicit representation from 3DGS with a base of learnable latent features, which can be linearly blended with low-dimensional parameters from parametric head models to obtain expression-dependent final color and opacity values. We demonstrate that HeadGaS delivers state-of-the-art results in real-time inference frame rates, which surpasses baselines by up to ~2dB, while accelerating rendering speed by over x10. ? 종이
13. [CVPR '24] HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Authors : Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu
추상적인
We have recently seen tremendous progress in photo-real human modeling and rendering. Yet, efficiently rendering realistic human performance and integrating it into the rasterization pipeline remains challenging. In this paper, we present HiFi4G, an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage. Our core intuition is to marry the 3D Gaussian representation with non-rigid tracking, achieving a compact and compression-friendly representation. We first propose a dual-graph mechanism to obtain motion priors, with a coarse deformation graph for effective initialization and a fine-grained Gaussian graph to enforce subsequent constraints. Then, we utilize a 4D Gaussian optimization scheme with adaptive spatial-temporal regularizers to effectively balance the non-rigid prior and Gaussian updating. We also present a companion compression scheme with residual compensation for immersive experiences on various platforms. It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame. Extensive experiments demonstrate the effectiveness of our approach, which significantly outperforms existing approaches in terms of optimization speed, rendering quality, and storage overhead. ? 종이 | 프로젝트 페이지 | ? Short Presentation | 데이터 세트
14. [CVPR '24] GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Authors : Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, Liqiang Nie
추상적인
We present GaussianAvatar, an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video. We start by introducing animatable 3D Gaussians to explicitly represent humans in various poses and clothing styles. Such an explicit and animatable representation can fuse 3D appearances more efficiently and consistently from 2D observations. Our representation is further augmented with dynamic properties to support pose-dependent appearance modeling, where a dynamic appearance network along with an optimizable feature tensor is designed to learn the motion-to-appearance mapping. Moreover, by leveraging the differentiable motion condition, our method enables a joint optimization of motions and appearances during avatar modeling, which helps to tackle the long-standing issue of inaccurate motion estimation in monocular settings. The efficacy of GaussianAvatar is validated on both the public dataset and our collected dataset, demonstrating its superior performances in terms of appearance quality and rendering efficiency. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
15. [CVPR '24] FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding
Authors : Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang
추상적인
We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. ? 종이 | 프로젝트 페이지 | 암호
16. [CVPR '24] Relightable Gaussian Codec Avatars
Authors : Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam
추상적인
The fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with spatially all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars. ? 종이 | 프로젝트 페이지
17. MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar
Authors : Yufan Chen, Lizhen Wang, Qijing Li, Hongjiang Xiao, Shengping Zhang, Hongxun Yao, Yebin Liu
추상적인
The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods. ? 종이 | 프로젝트 페이지 | Code (not yet) | ? Short Presentation
18. [CVPR '24] ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Authors : Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann
추상적인
Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods. ? 종이 | 프로젝트 페이지 | Code (not yet) | ? Short Presentation
19. [CVPR '24] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Authors : Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang
추상적인
We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
20. [CVPR '24] GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Authors : Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal
추상적인
Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (eg, flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (eg, colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution. ? 종이 | 프로젝트 페이지 | ? Short Presentation
21. Deformable 3D Gaussian Splatting for Animatable Human Avatars
Authors : HyunJun Jung, Nikolas Brasch, Jifei Song, Eduardo Perez-Pellitero, Yiren Zhou, Zhihao Li, Nassir Navab, Benjamin Busam
추상적인
Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually. ? 종이
22. Human101: Training 100+FPS Human Gaussians in 100s from 1 View
Authors : Mingwei Li, Jiachen Tao, Zongxin Yang, Yi Yang
추상적인
Reconstructing the human body from single-view videos plays a pivotal role in the virtual reality domain. One prevalent application scenario necessitates the rapid reconstruction of high-fidelity 3D digital humans while simultaneously ensuring real-time rendering and interaction. Existing methods often struggle to fulfill both requirements. In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS. Our method leverages the strengths of 3D Gaussian Splatting, which provides an explicit and efficient representation of 3D humans. Standing apart from prior NeRF-based pipelines, Human101 ingeniously applies a Human-centric Forward Gaussian Animation method to deform the parameters of 3D Gaussians, thereby enhancing rendering speed (ie, rendering 1024-resolution images at an impressive 60+ FPS and rendering 512-resolution images at 100+ FPS). Experimental results indicate that our approach substantially eclipses current methods, clocking up to a 10 times surge in frames per second and delivering comparable or superior rendering quality. ? 종이 | 프로젝트 페이지 | Code (not yet)
23. [CVPR '24] Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Authors : Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, Yebin Liu
추상적인
Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions. ? 종이 | 프로젝트 페이지 | | 코드 | ? Short Presentation
24. HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
Authors : Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu
추상적인
Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat that predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis. Project page: https://humansplat.github.io/. ? 종이 | 프로젝트 페이지
Classic work:
1. A Generalization of Algebraic Surface Drawing
Authors : James F. Blinn
Comment: : First paper rendering 3D gaussians.
추상적인
The mathematical description of three-dimensional surfaces usually falls into one of two classifications: parametric and implicit. An implicit surface is defined to be all points which satisfy some equation F (x, y, z) = 0. This form is ideally suited for image space shaded picture drawing; the pixel coordinates are substituted for x and y, and the equation is solved for z. Algorithms for drawing such objects have been developed primarily for first- and second-order polynomial functions, a subcategory known as algebraic surfaces. This paper presents a new algorithm applicable to other functional forms, in particular to the summation of several Gaussian density distributions. The algorithm was created to model electron density maps of molecular structures, but it can be used for other artistically interesting shapes. ? 종이
2. Approximate Differentiable Rendering with Algebraic Surfaces
Authors : Leonid Keselman and Martial Hebert
Comment: : First paper to do differentiable rendering optimization of 3D gaussians.
추상적인
Differentiable renderers provide a direct mathematical link between an object's 3D representation and images of that object. In this work, we develop an approximate differentiable renderer for a compact, interpretable representation, which we call Fuzzy Metaballs. Our approximate renderer focuses on rendering shapes via depth maps and silhouettes. It sacrifices fidelity for utility, producing fast runtimes and high-quality gradient information that can be used to solve vision tasks. Compared to mesh-based differentiable renderers, our method has forward passes that are 5x faster and backwards passes that are 30x faster. The depth maps and silhouette images generated by our method are smooth and defined everywhere. In our evaluation of differentiable renderers for pose estimation, we show that our method is the only one comparable to classic techniques. In shape from silhouette, our method performs well using only gradient descent and a per-pixel loss, without any surrogate losses or regularization. These reconstructions work well even on natural video sequences with segmentation artifacts. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
3. Unbiased Gradient Estimation for Differentiable Surface Splatting via Poisson Sampling
Authors : Jan U. Müller, Michael Weinmann, Reinhard Klein
Comment: Builds 2D screen-space gaussians from underlying 3D representations.
추상적인
We propose an efficient and GPU-accelerated sampling framework which enables unbiased gradient approximation for differentiable point cloud rendering based on surface splatting. Our framework models the contribution of a point to the rendered image as a probability distribution. We derive an unbiased approximative gradient for the rendering function within this model. To efficiently evaluate the proposed sample estimate, we introduce a tree-based data-structure which employs multi-pole methods to draw samples in near linear time. Our gradient estimator allows us to avoid regularization required by previous methods, leading to a more faithful shape recovery from images. Furthermore, we validate that these improvements are applicable to real-world applications by refining the camera poses and point cloud obtained from a real-time SLAM system. Finally, employing our framework in a neural rendering setting optimizes both the point cloud and network parameters, highlighting the framework's ability to enhance data driven approaches. ? 종이 코드
4. Generating and Real-Time Rendering of Clouds
Authors : Petr Man
Comment: Splatting of anisotropic gaussians. Basically a non-differentiable implementation of 3DGS.
추상적인
This paper presents a method for generation and real-time rendering of static clouds. Perlin noise function generates three dimensional map of a cloud. We also present a twopass rendering algorithm that performs physically based approximation. In the first preprocessed phase it computes multiple forward scattering. In the second phase first order anisotropic scattering at runtime is evaluated. The generated map is stored as voxels and is unsuitable for the real-time rendering. We introduce a more suitable inner representation of cloud that approximates the original map and contains much less information. The cloud is then represented by a set of metaballs (spheres) with parameters such as center positions, radii and density values. The main contribution of this paper is to propose a method, that transforms the original cloud map to the inner representation. This method uses the Radial Basis Function (RBF) neural network. ? 종이
압축:
2024년:
1. [I3D '24] Reducing the Memory Footprint of 3D Gaussian Splatting
Authors : Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, George Drettakis
추상적인
3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and realtime rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolutionaware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a ×27 reduction in overall size on disk on the standard datasets we tested, along with a x1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device (see Fig. 1). ? 종이 | 프로젝트 페이지 | Code (not yet)
2. [CVPR '24] Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Authors : Simon Niedermayr, Josef Stumpfegger, Rüdiger Westermann
추상적인
Recently, high-fidelity scene reconstruction with an optimized 3D Gaussian splat representation has been introduced for novel view synthesis from sparse image sets. Making such representations suitable for applications like network streaming and rendering on low-power devices requires significantly reduced memory consumption as well as improved rendering efficiency. We propose a compressed 3D Gaussian splat representation that utilizes sensitivity-aware vector clustering with quantization-aware training to compress directional colors and Gaussian parameters. The learned codebooks have low bitrates and achieve a compression rate of up to 31× on real-world scenes with only minimal degradation of visual quality. We demonstrate that the compressed splat representation can be efficiently rendered with hardware rasterization on lightweight GPUs at up to 4× higher framerates than reported via an optimized GPU compute pipeline. Extensive experiments across multiple datasets demonstrate the robustness and rendering speed of the proposed approach. ? 종이 | 프로젝트 페이지 | 암호
3. HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Authors : Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin
추상적인
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over 75× compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over 11× size reduction over SOTA 3DGS compression approach Scaffold-GS. ? 종이 | 프로젝트 페이지 | 암호
4. [ECCV '24] End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Authors : Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen
추상적인
3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible and continuous rate control. RDO-Gaussian addresses two main issues that exist in current schemes: 1) Different from prior endeavors that minimize the rate under the fixed distortion, we introduce dynamic pruning and entropy-constrained vector quantization (ECVQ) that optimize the rate and distortion at the same 시간. 2) Previous works treat the colors of each Gaussian equally, while we model the colors of different regions and materials with learnable numbers of parameters. We verify our method on both real and synthetic scenes, showcasing that RDO-Gaussian greatly reduces the size of 3D Gaussian over 40×, and surpasses existing methods in rate-distortion performance. ? 종이 | 프로젝트 페이지 | 암호
5. 3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods
Authors : Milena T. Bagdasarian, Paul Knoll, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern
추상적인
We present a work-in-progress survey on 3D Gaussian Splatting compression methods, focusing on their statistical performance across various benchmarks. This survey aims to facilitate comparability by summarizing key statistics of different compression approaches in a tabulated format. The datasets evaluated include TanksAndTemples, MipNeRF360, DeepBlending, and SyntheticNeRF. For each method, we report the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and the resultant size in megabytes (MB), as provided by the respective authors . This is an ongoing, open project, and we invite contributions from the research community as GitHub issues or pull requests. Please visit http://wm.github.io/3dgs-compression-survey/ for more information and a sortable version of the table. ? 종이 | 프로젝트 페이지
6. LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming
Authors : Yuang Shi, Simone Gasparini, Géraldine Morin, Wei Tsang Ooi,
추상적인
The rise of Extended Reality (XR) requires efficient streaming of 3D online worlds, challenging current 3DGS representations to adapt to bandwidth-constrained environments. We propose LapisGS, a layered 3DGS that supports adaptive streaming and progressive rendering. Our method constructs a layered structure for cumulative representation, incorporates dynamic opacity optimization to maintain visual fidelity, and utilizes occupancy maps to efficiently manage Gaussian splats. This proposed model offers a progressive representation supporting a continuous rendering quality adapted for bandwidth-aware streaming. Extensive experiments validate the effectiveness of our approach in balancing visual fidelity with the compactness of the model, with up to 50.71% improvement in SSIM, 286.53% improvement in LPIPS, and 318.41% reduction in model size, and shows its potential for bandwidth-adapted 3D streaming and rendering applications. ? 종이 | 프로젝트 페이지
7. Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation
Authors : Minye Wu, Tinne Tuytelaars
추상적인
Recent advancements in photo-realistic novel view synthesis have been significantly driven by Gaussian Splatting (3DGS). Nevertheless, the explicit nature of 3DGS data entails considerable storage requirements, highlighting a pressing need for more efficient data representations. To address this, we present Implicit Gaussian Splatting (IGS), an innovative hybrid model that integrates explicit point clouds with implicit feature embeddings through a multi-level tri-plane architecture. This architecture features 2D feature grids at various resolutions across different levels, facilitating continuous spatial domain representation and enhancing spatial correlations among Gaussian primitives. Building upon this foundation, we introduce a level-based progressive training scheme, which incorporates explicit spatial regularization. This method capitalizes on spatial correlations to enhance both the rendering quality and the compactness of the IGS representation. Furthermore, we propose a novel compression pipeline tailored for both point clouds and 2D feature grids, considering the entropy variations across different levels. Extensive experimental evaluations demonstrate that our algorithm can deliver high-quality rendering using only a few MBs, effectively balancing storage efficiency and rendering fidelity, and yielding results that are competitive with the state-of-the-art. ? 종이 | 암호
2023년:
1. LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS
Authors : Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
추상적인
Recent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting come with a substantial storage overhead caused by growing the SfM points to millions, often demanding gigabyte-level disk space for a single unbounded scene, posing significant scalability challenges and hindering the splatting efficiency. To address this challenge, we introduce LightGaussian, a novel method designed to transform 3D Gaussians into a more efficient and compact format. Drawing inspiration from the concept of Network Pruning, LightGaussian identifies Gaussians that are insignificant in contributing to the scene reconstruction and adopts a pruning and recovery process, effectively reducing redundancy in Gaussian counts while preserving visual effects. Additionally, LightGaussian employs distillation and pseudo-view augmentation to distill spherical harmonics to a lower degree, allowing knowledge transfer to more compact representations while maintaining reflectance. Furthermore, we propose a hybrid scheme, VecTree Quantization, to quantize all attributes, resulting in lower bitwidth representations with minimal accuracy losses. In summary, LightGaussian achieves an averaged compression rate over 15x while boosting the FPS from 139 to 215, enabling an efficient representation of complex scenes on Mip-NeRF 360, Tank and Temple datasets. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
2. Compact3D: Compressing Gaussian Splat Radiance Field Models with Vector Quantization
Authors : KL Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash
추상적인
3D Gaussian Splatting is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with a drawback in the much larger storage demand compared to NeRF methods since it needs to store the parameters for several 3D Gaussians. We notice that many Gaussians may share similar parameters, so we introduce a simple vector quantization method based on kmeans algorithm to quantize the Gaussian parameters. Then, we store the small codebook along with the index of the code for each Gaussian. Moreover, we compress the indices further by sorting them and using a method similar to run-length encoding. We do extensive experiments on standard benchmarks as well as a new benchmark which is an order of magnitude larger than the standard benchmarks. We show that our simple yet effective method can reduce the storage cost for the original 3D Gaussian Splatting method by a factor of almost 20× with a very small drop in the quality of rendered images. ? 종이 | 암호
3. [CVPR '24] Compact 3D Gaussian Representation for Radiance Field
Authors : Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park
추상적인
Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. In our extensive experiments, we consistently show over 10× reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. ? 종이 | 프로젝트 페이지 | 암호
4. [ECCV '24] Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Authors : Wieland Morgenstern, Florian Barthel, Anna Hilsmann, Peter Eisert
추상적인
3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it utilizes efficient rasterization allowing for very fast rendering at high-quality. However, the storage size is significantly higher, which hinders practical deployment, eg on resource constrained devices. In this paper, we introduce a compact scene representation organizing the parameters of 3D Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a drastic reduction in storage requirements without compromising visual quality during rendering. Central to our idea is the explicit exploitation of perceptual redundancies present in natural scenes. In essence, the inherent nature of a scene allows for numerous permutations of Gaussian parameters to equivalently represent it. To this end, we propose a novel highly parallel algorithm that regularly arranges the high-dimensional Gaussian parameters into a 2D grid while preserving their neighborhood structure. During training, we further enforce local smoothness between the sorted parameters in the grid. The uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless integration with established renderers. Our method achieves a reduction factor of 17x to 42x in size for complex scenes with no increase in training time, marking a substantial leap forward in the domain of 3D scene distribution and consumption. ? 종이 | 프로젝트 페이지 | 암호
확산:
2024년:
1. AGG: Amortized Generative 3D Gaussians for Single Image to 3D
Authors : Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat
추상적인
Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster. ? 종이 | Project Page| ? Short Presentation
2. Fast Dynamic 3D Object Generation from a Single-view Video
Authors : Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang
추상적인
Generating dynamic three-dimensional (3D) object from a single-view video is challenging due to the lack of 4D labeled data. Existing methods extend text-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling, but they are slow and expensive to scale (eg, 150 minutes per object) due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this limitation, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly train a novel 4D Gaussian splatting model with explicit point cloud geometry, enabling real-time rendering under continuous camera trajectories. Extensive experiments on synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the same level of innovative view synthesis quality. For example, Efficient4D takes only 14 minutes to model a dynamic object. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
3. GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
추상적인
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
4.LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-view Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: (1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. (2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation. ? 종이 | 프로젝트 페이지 | 암호
5. GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Authors : Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
추상적인
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an object-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene . Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. ? 종이 | 프로젝트 페이지 | Code (not yet)
6. IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
Authors : Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
추상적인
Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In this paper, we further explore the design space of text-to-3D models. We significantly improve multi-view generation by considering video instead of image generators. Combined with a 3D reconstruction algorithm which, by using Gaussian splatting, can optimize a robust image-based loss, we directly produce high-quality 3D outputs from the generated views. Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x, resulting in a much more efficient pipeline, better quality, fewer geometric inconsistencies, and higher yield of usable 3D assets. ? 종이
7. Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Authors : Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
추상적인
While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content. ? 종이 | 프로젝트 페이지 | 암호
8. Hyper-3DG:Text-to-3D Gaussian Generation via Hypergraph
Authors : Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gao
추상적인
Text-to-3D generation represents an exciting field that has seen rapid advancements, facilitating the transformation of textual descriptions into detailed 3D models. However, current progress often neglects the intricate high-order correlation of geometry and texture within 3D objects, leading to challenges such as over-smoothness, over-saturation and the Janus problem. In this work, we propose a method named ``3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects. Our framework is anchored by a well-established mainflow and an essential module, named ``Geometry and Texture Hypergraph Refiner (HGRefiner)''. This module not only refines the representation of 3D Gaussians but also accelerates the update process of these 3D Gaussians by conducting the Patch-3DGS Hypergraph Learning on both explicit attributes and latent visual features. Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation. Extensive experimentation has shown that our proposed method significantly enhances the quality of 3D generation while incurring no additional computational overhead for the underlying framework. ? 종이 | Code (not yet)
9. DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Authors : Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik Hang Lee, Pengyuan Zhou
추상적인
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors, increasingly capturing the attention of both academic and industry circles. Despite significant progress, current methods still struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework that leverages Formation Pattern Sampling (FPS) for core structuring, augmented with a strategic camera sampling and supported by holistic object-environment integration to overcome these hurdles. FPS, guided by the formation patterns of 3D objects, employs multi-timesteps sampling to quickly form semantically rich, high-quality representations, uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. The camera sampling strategy incorporates a progressive three-stage approach, specifically designed for both indoor and outdoor settings, to effectively ensure scene-wide 3D consistency. DreamScene enhances scene editing flexibility by combining objects and environments, enabling targeted adjustments. Extensive experiments showcase DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. ? 종이 | 프로젝트 페이지 | Code (not yet)
10. FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
Authors : Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang
추상적인
Reconstructing detailed 3D objects from single-view images remains a challenging task due to the limited information available. In this paper, we introduce FDGaussian, a novel two-stage framework for single-image 3D reconstruction. Recent methods typically utilize pre-trained 2D diffusion models to generate plausible novel views from the input image, yet they encounter issues with either multi-view inconsistency or lack of geometric fidelity. To overcome these challenges, we propose an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, enabling the generation of consistent multi-view images. Moreover, we further accelerate the state-of-the-art Gaussian Splatting incorporating epipolar attention to fuse images from different viewpoints. We demonstrate that FDGaussian generates images with high consistency across different views and reconstructs high-quality 3D objects, both qualitatively and quantitatively. ? 종이 | 프로젝트 페이지
11. BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors
Authors : Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, Baoquan Chen
추상적인
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods. ? 종이
12. BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis
Authors : Lutao Jiang, Lin Wang
추상적인
Text-to-3D synthesis has recently seen intriguing advances by combining the text-to-image models with 3D representation methods, eg, Gaussian Splatting (GS), via Score Distillation Sampling (SDS). However, a hurdle of existing methods is the low efficiency, per-prompt optimization for a single 3D object. Therefore, it is imperative for a paradigm shift from per-prompt optimization to one-stage generation for any unseen text prompts, which yet remains challenging. A hurdle is how to directly generate a set of millions of 3D Gaussians to represent a 3D object. This paper presents BrightDreamer, an end-to-end single-stage approach that can achieve generalizable and fast (77 ms) text-to-3D generation. Our key idea is to formulate the generation process as estimating the 3D deformation from an anchor shape with predefined positions. For this, we first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions, used as the centers (one attribute) of 3D Gaussians. To estimate the other four attributes (ie, scaling, rotation, opacity, and SH coefficient), we then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object. The center of each Gaussian enables us to transform the triplane feature into the four attributes. The generated 3D Gaussians can be finally rendered at 705 frames per second. Extensive experiments demonstrate the superiority of our method over existing methods. Also, BrightDreamer possesses a strong semantic understanding capability even for complex text prompts. ? 종이 | 프로젝트 페이지 | 암호
13. GVGEN: Text-to-3D Generation with Volumetric Representation
Authors : Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He
추상적인
In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed (∼7 seconds), effectively striking a balance between quality and efficiency. ? 종이 | 프로젝트 페이지 | Code (not yet)
14. SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
Authors : Jaihoon Kim, Juil Koo, Kyeongmin Yeo, Minhyuk Sung
추상적인
We introduce a general framework for generating diverse visual content, including ambiguous images, panorama images, mesh textures, and Gaussian splat textures, by synchronizing multiple diffusion processes. We present exhaustive investigation into all possible scenarios for synchronizing multiple diffusion processes through a canonical space and analyze their characteristics across applications. In doing so, we reveal a previously unexplored case: averaging the outputs of Tweedie's formula while conducting denoising in multiple instance spaces. This case also provides the best quality with the widest applicability to downstream tasks. We name this case SyncTweedies. In our experiments generating visual content aforementioned, we demonstrate the superior quality of generation by SyncTweedies compared to other synchronization methods, optimization-based and iterative-update-based methods. ? 종이 | 프로젝트 페이지 | Code (not yet)
15. STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Authors : Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao
추상적인
Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from 3D generation techniques, we utilize a multi-view diffusion model to initialize multi-view images anchoring on the input video frames, where the video can be either real-world captured or generated by a video diffusion model. To ensure the temporal consistency of the multi-view sequence initialization, we introduce a simple yet effective fusion strategy to leverage the first frame as a temporal anchor in the self-attention computation. With the almost consistent multi-view sequences, we then apply the score distillation sampling to optimize the 4D Gaussian point cloud. The 4D Gaussian spatting is specially crafted for the generation task, where an adaptive densification strategy is proposed to mitigate the unstable Gaussian gradient for robust optimization. Notably, the proposed pipeline does not require any pre-training or fine-tuning of diffusion networks, offering a more accessible and practical solution for the 4D generation task. Extensive experiments demonstrate that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
16. Comp4D: LLM-Guided Compositional 4D Scene Generation
Authors : Dejia Xu, Hanwen Liang, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Plataniotis, Zhangyang Wang
추상적인
Recent advancements in diffusion models for 2D and 3D content creation have sparked a surge of interest in generating 4D content. However, the scarcity of 3D scene datasets constrains current methodologies to primarily object-centric generation. To overcome this limitation, we present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Utilizing Large Language Models (LLMs), the framework begins by decomposing an input text prompt into distinct entities and maps out their trajectories. It then constructs the compositional 4D scene by accurately positioning these objects along their designated paths. To refine the scene, our method employs a compositional score distillation technique guided by the pre-defined trajectories, utilizing pre-trained diffusion models across text-to-image, text-to-video, and text-to-3D domains. Extensive experiments demonstrate our outstanding 4D content creation capability compared to prior arts, showcasing superior visual quality, motion fidelity, and enhanced object interactions. ? 종이 | 프로젝트 페이지 | Code (not yet) | ? Short Presentation
17. DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Authors : Yuanze Lin, Ronald Clark, Philip Torr
추상적인
We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions. ? 종이 | 프로젝트 페이지 | Code (not yet)
18. SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Authors : Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai
추상적인
Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions. ? 종이 | 프로젝트 페이지 | Code (not yet) | ? Short Presentation
19. Hash3D: Training-free Acceleration for 3D Generation
Authors : Xingyi Yang, Xinchao Wang
추상적인
The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models, demonstrate Hash3D's versatility to speed up optimization, enhancing efficiency by 1.3 to 4 times. Additionally, Hash3D's integration with 3D Gaussian splatting largely speeds up 3D model creation, reducing text-to-3D processing to about 10 minutes and image-to-3D conversion to roughly 30 seconds. ? 종이 | 프로젝트 페이지 | 암호
20. Zero-shot Point Cloud Completion Via 2D Priors
Authors : Tianxin Huang, Zhiwen Yan, Yuyang Zhao, Gim Hee Lee
추상적인
3D point cloud completion is designed to recover complete shapes from partially observed point clouds. Conventional completion methods typically depend on extensive point cloud data for training %, with their effectiveness often constrained to object categories similar to those seen during training. In contrast, we propose a zero-shot framework aimed at completing partially observed point clouds across any unseen categories. Leveraging point rendering via Gaussian Splatting, we develop techniques of Point Cloud Colorization and Zero-shot Fractal Completion that utilize 2D priors from pre-trained diffusion models to infer missing regions. Experimental results on both synthetic and real-world scanned point clouds demonstrate that our approach outperforms existing methods in completing a variety of objects without any requirement for specific training data. ? 종이
21. [ECCV '24] DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Authors : Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi
추상적인
The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360∘ scene generation pipeline that facilitates the creation of comprehensive 360∘ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary "flat" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360∘ perspective, providing an enhanced immersive experience over existing techniques. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
22. RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
Authors : Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi
추상적인
We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image ? 종이 | 프로젝트 페이지 | Code (not yet)
23. GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
Authors : Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo
추상적인
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. Existing radiance representations either require an implicit feature decoder, which significantly degrades the modeling power of the representation, or are spatially unstructured, making them difficult to integrate with mainstream 3D diffusion methods. We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting using a fixed number of free Gaussians, and then rearranging these Gaussians into a predefined voxel grid via Optimal Transport. Since GaussianCube is a structured grid representation, it allows us to use standard 3D U-Net as our backbone in diffusion modeling without elaborate designs. More importantly, the high-accuracy fitting of the Gaussians allows us to achieve a high-quality representation with orders of magnitude fewer parameters than previous structured representations for comparable quality, ranging from one to two orders of magnitude. The compactness of GaussianCube greatly eases the difficulty of 3D generative modeling. Extensive experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D synthesis all show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a highly accurate and versatile radiance representation for 3D generative modeling. ? 종이 | 프로젝트 페이지 | 암호
24. 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Authors : Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee
추상적인
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal deformation based on the canonical representation to capture dynamic interactions in the reference video. The pipeline facilitates the generation of dynamic scenes with enhanced photorealism and structural integrity, viewable from multiple perspectives, thereby setting a new standard in 4D scene generation. ? 종이 | 프로젝트 페이지 | Code (not yet)
2023년:
1. [CVPR '24] Text-to-3D using Gaussian Splatting
Authors : Zilong Chen, Feng Wang, Huaping Liu
추상적인
In this paper, we present Gaussian Splatting based text-to-3D generation (GSGEN), a novel approach for generating high-quality 3D objects. Previous methods suffer from inaccurate geometry and limited fidelity due to the absence of 3D prior and proper representation. We leverage 3D Gaussian Splatting, a recent state-of-the-art representation, to address existing shortcomings by exploiting the explicit nature that enables the incorporation of 3D prior. Specifically, our method adopts a pro- gressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative refinement to enrich details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D content with delicate details and more accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. ? 종이 | Project Page | 코드 | ? Short Presentation | ? Explanation Video
2. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Authors : Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng
추상적인
Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods. ? Paper | Project Page | 코드 | ? Explanation Video
3. GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Authors : Taoran Yi1, Jiemin Fang, Guanjun Wu1, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Tian Qi, Xinggang Wang
추상적인
In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but the 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides point cloud priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our GaussianDreamer can generate a high-quality 3D instance within 25 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. ? Paper | Project Page | 암호
4. GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise
Authors : Xinhai Li, Huaibin Wang, Kuo-Kun Tseng
추상적인
Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the amalgamation of Nerf and 2D diffusion models frequently yields oversaturated images, posing severe limitations on downstream industrial applications due to the constraints of pixelwise rendering method. Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes. ? 종이
5. [CVPR '24] LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Authors : Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen
추상적인
The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency. ? Paper | 암호
6. LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Authors : Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee
추상적인
With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. ? Paper | Project Page | 암호
7. [CVPR '24] HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Authors : Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu
추상적인
Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. ? Paper | Project Page | 코드 | ? Short Presentation
8. CG3D: Compositional Generation for Text-to-3D
Authors : Alexander Vilesov, Pradyumna Chari, Achuta Kadambi
추상적인
With the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi-object scenes, (ii) inability to textually control multi-object configurations, and (iii) physically realistic scene composition. In this work, we propose CG3D, a method for compositionally generating scalable 3D assets that resolves these constraints. We find that explicit Gaussian radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes. By utilizing a guidance framework built around this explicit representation, we show state of the art results, capable of even exceeding the guiding diffusion model in terms of object combinations and physics accuracy. ? Paper | Project Page | | ? Short Presentation
9. Learn to Optimize Denoising Scores for 3D Generation - A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting
Authors : Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu and Guosheng Lin
추상적인
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss. ? 종이 | Project Page | 암호
10. [CVPR '24] Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Authors : Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
추상적인
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. ? Paper | 프로젝트 페이지
11. DreamGaussian4D: Generative 4D Gaussian Splatting
Authors : Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu
추상적인
Remarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In this paper, we introduce DreamGaussian4D, an efficient 4D generation framework that builds on 4D Gaussian Splatting representation. Our key insight is that the explicit modeling of spatial transformations in Gaussian Splatting makes it more suitable for the 4D generation setting compared with implicit representations. DreamGaussian4D reduces the optimization time from several hours to just a few minutes, allows flexible control of the generated 3D motion, and produces animated meshes that can be efficiently rendered in 3D engines. ? Paper | Project Page | 암호
12. 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
Authors : Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei
추상적인
Aided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs, they incur significant time and effort in prompt engineering through trial and error. This work introduces 4DGen, a novel, holistic framework for grounded 4D content generation that decomposes the 4D generation task into multiple stages. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos), thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution supervision through rendering during training, thereby facilitating high-quality 4D generation. Additionally, we employ spatial-temporal pseudo labels on anchor frames, along with seamless consistency priors implemented through 3D-aware score distillation sampling and smoothness regularizations. Compared to existing baselines, our approach yields competitive results in faithfully reconstructing input signals and realistically inferring renderings from novel viewpoints and timesteps. Most importantly, our method supports grounded generation, offering users enhanced control, a feature difficult to achieve with previous methods. ? Paper | Project Page | 코드 | ? Short Presentation
13. Text2Immersion: Generative Immersive Scene with 3D Gaussian
Authors : Hao Ouyang, Kathryn Heal, Stephen Lombardi, Tiancheng Sun
추상적인
We introduce Text2Immersion, an elegant method for producing high-quality 3D immersive scenes from text prompts. Our proposed pipeline initiates by progressively generating a Gaussian cloud using pre-trained 2D diffusion and depth estimation models. This is followed by a refining stage on the Gaussian cloud, interpolating and refining it to enhance the details of the generated scene. Distinct from prevalent methods that focus on single object or indoor scenes, or employ zoom-out trajectories, our approach generates diverse scenes with various objects, even extending to the creation of imaginary scenes. Consequently, Text2Immersion can have wide-ranging implications for various applications such as virtual reality, game development, and automated content creation. Extensive evaluations demonstrate that our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation. ? Paper | Project Page | Code (not yet)
14. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Authors : Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
추상적인
Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch. ? Paper | Project Page | Code (not yet)
Dynamics and Deformation:
2024년:
1. 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
Authors : Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen
추상적인
We consider the problem of novel view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or capturing high-fidelity renderings. In this paper, we introduce 4D Gaussian Splatting (4DGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images. As an explicit spatial-temporal representation, 4DGS demonstrates powerful capabilities for modeling complicated dynamics and fine details, especially for scenes with abrupt motions. We further implement our temporal slicing and splatting techniques in a highly optimized CUDA acceleration framework, achieving real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and 583 FPS on an RTX 4090 GPU. Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DGS, which consistently outperforms existing methods both quantitatively and qualitatively. ? 종이
2. GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Authors : Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann
추상적인
Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. ? Paper | Project Page | Code (not yet) | ? Short Presentation
3. Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction
Authors : Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li
추상적인
3D Gaussian Splatting (3DGS) has become an emerging tool for dynamic scene reconstruction. However, existing methods focus mainly on extending static 3DGS into a time-variant representation, while overlooking the rich motion information carried by 2D observations, thus suffering from performance degradation and model redundancy. To address the above problem, we propose a novel motion-aware enhancement framework for dynamic scene reconstruction, which mines useful motion cues from optical flow to improve different paradigms of dynamic 3DGS. Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. Then a novel flow augmentation method is introduced with additional insights into uncertainty and loss collaboration. Moreover, for the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed. We conduct extensive experiments on both multi-view and monocular scenes to verify the merits of our work. Compared with the baselines, our method shows significant superiority in both rendering quality and efficiency. ? 종이
4. Bridging 3D Gaussian and Mesh for Freeview Video Rendering
Authors : Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao
추상적인
This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the fuzzy geometry. Meanwhile, the point-based splatting (eg the 3D Gaussian Splatting) method usually produces artifacts or blurry pixels in the area with smooth geometry and sharp textures. As a result, it is difficult, even not impossible, to represent the complex and dynamic scene with a single type of primitive. To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes. Given a sequence of tracked mesh as initialization, our goal is to simultaneously optimize the mesh geometry, color texture, opacity maps, a set of 3D Gaussians, and the deformation field. At a specific time, we perform α-blending on the RGB and opacity values based on the merged and re-ordered z-buffers from mesh and 3D Gaussian rasterizations. This produces the final rendering, which is supervised by the ground-truth image. Experiments demonstrate that our approach adapts the appropriate type of primitives to represent the different parts of the dynamic scene and outperforms all the baseline methods in both quantitative and qualitative comparisons without losing render speed. ? 종이
5. [ECCV '24] Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
Authors : Jeongmin Bae*, Seoha Kim*, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh
추상적인
As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames for representing a dynamic scene. However, previous works fail to accurately reconstruct complex dynamic scenes. We attribute the failure to the design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce a local smoothness regularization for per-Gaussian embedding to improve the details in dynamic regions. ? Paper | Project Page | 암호
6. DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Authors : Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki
추상적인
Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be inferred by re-rendering the background to match the video frames. For the object motion, we first model the object-centric deformation of the objects by leveraging rendering losses and multi-view generative priors in an object-centric frame, then optimize object-centric to world-frame transformations by comparing the rendered outputs against the perceived pixel and optical flow. Finally, we recompose the background and objects and optimize for relative object scales using monocular depth prediction guidance. We show extensive results on the challenging DAVIS, Kubric, and self-captured videos, detail some limitations, and provide future directions. Besides 4D scene generation, our results show that DreamScene4D enables accurate 2D point motion tracking by projecting the inferred 3D trajectories to 2D, while never explicitly trained to do so. ? Paper | Project Page | Code (not yet)
7. [CVPR '24] 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
Authors : Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai
추상적인
In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance. ? Paper | Project Page | Code (not yet)
8. MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos
Authors : Qingming Liu*, Yuan Liu*, Jiepeng Wang, Xianqiang Lv,Peng Wang, Wenping Wang, Junhui Hou†,
추상적인
In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms baseline methods by a significant margin.
? Paper | Project Page | Code (not yet)
9. [ECCVW '24] Optimizing Dynamic NeRF and 3DGS with No Video Synchronization
Authors : Seoha Kim*, Jeongmin Bae*, Youngsik Yun, HyunSeung Son, Hahyun Lee, Gun Bang, Youngjung Uh
추상적인
Recent advancements in 4D scene reconstruction using dynamic NeRF and 3DGS have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame, while the multi-view images at the same frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with the field. By design, our method is applicable for various baselines, even regardless of the types of radiance fields. We conduct experiments on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance of our method. ? 종이
10. [NeurIPS '24] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
Authors : Ruijie Zhu*, Yanzhe Liang*, Hanzhi Chang, Jiacheng Deng, Jiahao Lu, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang
추상적인
Dynamic scene reconstruction is a long-term challenge in the field of 3D vision. Recently, the emergence of 3D Gaussian Splatting has provided new insights into this problem. Although subsequent efforts rapidly extend static 3D Gaussian to dynamic scenes, they often lack explicit constraints on object motion, leading to optimization difficulties and performance degradation. To address the above issues, we propose a novel deformable 3D Gaussian splatting framework called MotionGS, which explores explicit motion priors to guide the deformation of 3D Gaussians. Specifically, we first introduce an optical flow decoupling module that decouples optical flow into camera flow and motion flow, corresponding to camera movement and object motion respectively. Then the motion flow can effectively constrain the deformation of 3D Gaussians, thus simulating the motion of dynamic objects. Additionally, a camera pose refinement module is proposed to alternately optimize 3D Gaussians and camera poses, mitigating the impact of inaccurate camera poses. Extensive experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods and exhibits significant superiority in both qualitative and quantitative results. ? Paper | Project Page | Code (not yet)
11. [ECCV '24] DGD: Dynamic 3D Gaussians Distillation
Authors : Isaac Labe*, Noam Issachar*, Itai Lang, Sagie Benaim
추상적인
We tackle the task of learning dynamic 3D semantic radiance fields given a single monocular video as input. Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics. This enables the segmentation and tracking of a diverse set of 3D semantic entities, specified using a simple and intuitive interface that includes a user click or a text prompt. To this end, we present DGD, a unified 3D representation for both the appearance and semantics of a dynamic 3D scene, building upon the recently proposed dynamic 3D Gaussians representation. Our representation is optimized over time with both color and semantic information. Key to our method is the joint optimization of the appearance and semantic attributes, which jointly affect the geometric properties of the scene. We evaluate our approach in its ability to enable dense semantic 3D object tracking and demonstrate high-quality results that are fast to render, for a diverse set of scenes. ? Paper | Project Page | 코드 | ? Short Presentation
12. [NeurIPS '24] Fully Explicit Dynamic Gaussian Splatting
Authors : Junoh Lee, Changyeon Won, HyunJun Jung, Inhwan Bae, Hae-Gon Jeon
추상적인
3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a Edited{Explicit 4D Gaussian Splatting(Ex4DGS)}. Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU. ? 종이 | Project Page | 암호
13. [3DV '25] EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
Authors : Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang
추상적인
Human activities are inherently complex, often involving numerous object interactions. To better understand these activities, it is crucial to model their interactions with the environment captured through dynamic changes. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand human-object interactions in 3D environments. However, most existing methods for human activity modeling neglect the dynamic interactions with objects, resulting in only static representations. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage the uniquely discrete nature of Gaussian Splatting and segment dynamic interactions from the background, with both having explicit representations. Our approach employs a clip-level online learning pipeline that leverages the dynamic nature of human activities, allowing us to reconstruct the temporal evolution of the scene in chronological order and track rigid object motion. EgoGaussian shows significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art. We also qualitatively demonstrate the high quality of the reconstructed models. ? Paper | Project Page | Code (not yet) | ? Short Presentation
14. 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement
Authors : Ziqi Lu, Jianbo Ye, John Leonard
추상적인
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D changes. Our method can detect changes in cluttered environments using sparse post-change images within as little as 18s, using as few as a single new image. It does not rely on depth input, user instructions, object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at https://github.com/520xyxyzq/3DGS-CD. ? Paper | Code (not yet)
2023년:
1. [3DV '24] Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
Authors : Jonathon Luiten, Georgios Kopanas, Bastian Leibe, Deva Ramanan
추상적인
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we allow Gaussians to move and rotate over time while enforcing that they have persistent color, opacity, and size. By regularizing Gaussians' motion and rotation with local rigidity constraints, we show that our Dynamic 3D Gaussians correctly model the same area of physical space over time, including the rotation of that space. Dense 6-DOF tracking and dynamic reconstruction emerges naturally from persistent dynamic view synthesis, without requiring any correspondence or flow as input. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing. ? Paper | Project Page | 코드 | ? Explanation Video
2. [CVPR '24] Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Authors : Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, Xiaogang Jin
추상적인
Implicit neural representation has opened up new avenues for dynamic scene reconstruction and rendering. Nonetheless, state-of-the-art methods of dynamic neural rendering rely heavily on these implicit representations, which frequently struggle with accurately capturing the intricate details of objects in the scene. Furthermore, implicit methods struggle to achieve real-time rendering in general dynamic scenes, limiting their use in a wide range of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using explicit 3D Gaussians and learns Gaussians in canonical space with a deformation field to model monocular dynamic scenes. We also introduced a smoothing training mechanism with no extra overhead to mitigate the impact of inaccurate poses in real datasets on the smoothness of time interpolation tasks. Through differential gaussian rasterization, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time synthesis, and real-time rendering. ? Paper | Project Page | 암호
3. [CVPR '24] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Authors : Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Tian Qi, Xinggang Wang
추상적인
Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to maintain. We introduce the 4D Gaussian Splatting (4D-GS) to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency. An efficient deformation field is constructed to model both Gaussian motions and shape deformations. Different adjacent Gaussians are connected via a HexPlane to produce more accurate position and shape deformations. Our 4D-GS method achieves real-time rendering under high resolutions, 70 FPS at a 800×800 resolution on an RTX 3090 GPU, while maintaining comparable or higher quality than previous state-of-the-art method. ? Paper | Project Page | 암호
4. Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Authors : Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, Li Zhang
추상적인
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics. Despite advancements in neural implicit models, limitations persist: (i) Inadequate Scene Structure: Existing methods struggle to reveal the spatial and temporal structure of dynamic scenes from directly learning the complex 6D plenoptic function. (ii) Scaling Deformation Modeling: Explicitly modeling scene element deformation becomes impractical for complex dynamics. To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling. Learning to optimize the 4D primitives enables us to synthesize novel views at any desired time with our tailored rendering routine. Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics. This approach offers simplicity, flexibility for variable-length video and end-to-end training, and efficient real-time rendering, making it suitable for capturing complex dynamic scene motions. Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency. ? Paper | 암호
5. [ECCV '24] A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Authors : Kai Katsumata, Duc Minh Vo, Hideki Nakayama
추상적인
In novel view synthesis of scenes from multiple input views, 3D Gaussian splatting emerges as a viable alternative to existing radiance field approaches, delivering great visual quality and real-time rendering. While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerous observations per time step, due to the onus of storing 3D Gaussian parameters per time step. In this study, we present an efficient 3D Gaussian representation tailored for dynamic scenes in which we define positions and rotations as functions of time while leaving other time-invariant properties of the static 3D Gaussian unchanged. Notably, our representation reduces memory usage, which is consistent regardless of the input sequence length. Additionally, it mitigates the risk of overfitting observed frames by accounting for temporal changes. The optimization of our Gaussian representation based on image and flow reconstruction results in a powerful framework for dynamic scene view synthesis in both monocular and multi-view cases. We obtain the highest rendering speed of 118 frames per second (FPS) at a resolution of 1352×1014 with a single GPU, showing the practical usability and effectiveness of our proposed method in dynamic scene rendering scenarios ? 종이 | Project Page | 암호
6. DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Authors : Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis
추상적인
Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios. ? Paper | Project Page | Code (not yet)
7. [CVPR '24] Control4D: Efficient 4D Portrait Editing with Text
Authors : Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu
추상적인
We introduce Control4D, an innovative framework for editing dynamic 4D portraits using text instructions. Our method addresses the prevalent challenges in 4D editing, notably the inefficiencies of existing 4D representations and the inconsistent editing effect caused by diffusion-based editors. We first propose GaussianPlanes, a novel 4D representation that makes Gaussian Splatting more structured by applying plane-based decomposition in 3D space and time. This enhances both efficiency and robustness in 4D editing. Furthermore, we propose to leverage a 4D generator to learn a more continuous generation space from inconsistent edited images produced by the diffusion-based editor, which effectively improves the consistency and quality of 4D editing. Comprehensive evaluation demonstrates the superiority of Control4D, including significantly reduced training time, high-quality rendering, and spatial-temporal consistency in 4D portrait editing. ? Paper | Project Page | Code (not yet)
8. [CVPR '24] SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Authors : Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
추상적인
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications. ? Paper | Project Page | 암호
9. [CVPR '24] Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Authors : Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen
추상적인
Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, highquality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects, maintaining 3D consistency across novel views. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues. ? 종이
10. [CVPR '24] Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Authors : Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao
추상적인
We introduce Gaussian-Flow, a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos. In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS). Specifically, a novel Dual-Domain Deformation Model (DDDM) is proposed to explicitly model attribute deformations of each Gaussian point, where the time-dependent residual of each attribute is captured by a polynomial fitting in the time domain, and a Fourier series fitting in the frequency domain. The proposed DDDM is capable of modeling complex scene deformations across long video footage, eliminating the need for training separate 3DGS for each frame or introducing an additional implicit neural field to model 3D dynamics. Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction. Our proposed approach showcases a substantial efficiency improvement, achieving a 5× faster training speed compared to the per-frame 3DGS modeling. In addition, quantitative results demonstrate that the proposed Gaussian-Flow significantly outperforms previous leading methods in novel view rendering quality. ? Paper | Project Page | Code (not yet)
11. [CVPR '24] CoGS: Controllable Gaussian Splatting
Authors : Heng Yu, Joel Julin, Zoltán Á. Milacski, Koichiro Niinuma, László A. Jeni
추상적인
Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity. ? Paper | 프로젝트 페이지 | Code (not yet)
12. GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
Authors : Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao
추상적인
We propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering. ? Paper | Project Page | ? Short Presentation
13. [CVPR '24] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Authors : Zhan Li, Zhang Chen, Zhong Li, Yi Xu
추상적인
Novel view synthesis of dynamic scenes has been an intriguing yet challenging problem. Despite recent advancements, simultaneously achieving high-resolution photorealistic results, real-time rendering, and compact storage remains a formidable task. To address these challenges, we propose Spacetime Gaussian Feature Splatting as a novel dynamic scene representation, composed of three pivotal components. First, we formulate expressive Spacetime Gaussians by enhancing 3D Gaussians with temporal opacity and parametric motion/rotation. This enables Spacetime Gaussians to capture static, dynamic, as well as transient content within a scene. Second, we introduce splatted feature rendering, which replaces spherical harmonics with neural features. These features facilitate the modeling of view- and time-dependent appearance while maintaining small size. Third, we leverage the guidance of training error and coarse depth to sample new Gaussians in areas that are challenging to converge with existing pipelines. Experiments on several established real-world datasets demonstrate that our method achieves state-of-the-art rendering quality and speed, while retaining compact storage. At 8K resolution, our lite-version model can render at 60 FPS on an Nvidia RTX 4090 GPU. ? Paper | Project Page | 코드 | ? Short Presentation
14. MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes
Authors : Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, Jeffrey Ichnowski
추상적인
Accurate 3D tracking in highly deformable scenes with occlusions and shadows can facilitate new applications in robotics, augmented reality, and generative AI. However, tracking under these conditions is extremely challenging due to the ambiguity that arises with large deformations, shadows, and occlusions. We introduce MD-Splatting, an approach for simultaneous 3D tracking and novel view synthesis, using video captures of a dynamic scene from various camera poses. MD-Splatting builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel view synthesis. MD-Splatting learns a deformation function to project a set of Gaussians with non-metric, thus canonical, properties into metric space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on local rigidity, conservation of momentum, and isometry, which leads to trajectories with smaller trajectory errors. MD-Splatting achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. Compared to state-of-the-art, we improve 3D tracking by an average of 23.9 %, while simultaneously achieving high-quality novel view synthesis. With sufficient texture such as in scene 6, MD-Splatting achieves a median tracking error of 3.39 mm on a cloth of 1 x 1 meters in size ? Paper | Project Page | Code (not yet)
15. [ECCV'24] SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Authors : Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero
추상적인
Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer. ? 종이
16. [CVPR '24] 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Authors : Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing
추상적인
Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specificallggy, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the naïve approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods. ? Paper | Project Page | Code (not yet) | ? 3DGStream Viewer
편집:
2024년:
1. Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Authors : Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue
추상적인
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before α blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and α blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. Code and trained models will be released upon acceptance. ? 종이
2. CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians
Authors : Bin Dou, Tianyu Zhang, Yongjia Ma, Zhaohui Wang, Zejian Yuan
추상적인
We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based 3D segmentation methods have relied on implicit or voxel neural scene representation and ray-marching volume rendering which are time consuming. Recent 3D Gaussian Splatting significantly improves the rendering speed, however, existing Gaussians-based segmentation methods(eg: Gaussian Grouping) fail to provide compact segmentation masks especially in zero-shot segmentation, which is mainly caused by the lack of robustness and compactness for straightforwardly assigning learnable parameters to each Gaussian when encountering inconsistent 2D machine-generated labels. Our method aims to achieve compact and reliable zero-shot scene segmentation swiftly by mapping fused spatial and semantically meaningful features for each Gaussian point with a shallow decoding network. Specifically, our method firstly optimizes Gaussian points' position, convariance and color attributes under the supervision of RGB images. After Gaussian Locating, we distill multi-scale DINO features extracted from images through unprojection to each Gaussian, which is then incorporated with spatial features from the fast point features processing network, ie RandLA-Net. Then the shallow decoding MLP is applied to the multi-scale fused features to obtain compact segmentation. Experimental results show that our model can perform high-quality zero-shot scene segmentation, as our model outperforms other segmentation methods on both semantic and panoptic segmentation task, meanwhile consumes approximately only 10% segmenting time compared to NeRF-based segmentation. ? Paper | Project Page | Code (not yet)
3. TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Authors : Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan
추상적인
Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while keeping the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively. ? Paper | 프로젝트 페이지
4. Segment Anything in 3D Gaussians
Authors : Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang
추상적인
3D Gaussian Splatting has emerged as an alternative 3D representation of Neural Radiance Fields (NeRFs), benefiting from its high-quality rendering results and real-time rendering speed. Considering the 3D Gaussian representation remains unparsed, it is necessary first to execute object segmentation within this domain. Subsequently, scene editing and collision detection can be performed, proving vital to a multitude of applications, such as virtual reality (VR), augmented reality (AR), game/movie production, etc. In this paper, we propose a novel approach to achieve object segmentation in 3D Gaussian via an interactive procedure without any training process and learned parameters. We refer to the proposed method as SA-GS, for Segment Anything in 3D Gaussians. Given a set of clicked points in a single input view, SA-GS can generalize SAM to achieve 3D consistent segmentation via the proposed multi-view mask generation and view-wise label assignment methods. We also propose a cross-view label-voting approach to assign labels from different views. In addition, in order to address the boundary roughness issue of segmented objects resulting from the non-negligible spatial sizes of 3D Gaussian located at the boundary, SA-GS incorporates the simple but effective Gaussian Decomposition scheme. Extensive experiments demonstrate that SA-GS achieves high-quality 3D segmentation results, which can also be easily applied for scene editing and collision detection tasks. ? 종이
5. GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting
Authors : Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodolà
추상적인
We present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models. Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware. We tackle the problem by leveraging Gaussian splatting to represent 3D scenes, and we optimize the model while progressively varying the image supervision by means of a pretrained image-based diffusion model. The input object may be given as a 3D triangular mesh, or directly provided as Gaussians from a generative model such as DreamGaussian. GSEdit ensures consistency across different viewpoints, maintaining the integrity of the original object's information. Compared to previously proposed methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. Our editing process is refined via the application of the SDS loss, ensuring that our edits are both precise and accurate. Our comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following the given textual instructions while preserving their coherence and detail. ? 종이
6. GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Authors : Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu
추상적인
We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods. ? 종이
7. View-Consistent 3D Editing with Gaussian Splatting
Authors : Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
추상적인
The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. ? 종이
8. Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Authors : Antoine Guédon, Vincent Lepetit
추상적인
We propose Gaussian Frosting, a novel mesh-based representation for high-quality rendering and editing of complex 3D effects in real-time. Our approach builds on the recent 3D Gaussian Splatting framework, which optimizes a set of 3D Gaussians to approximate a radiance field from images. We propose first extracting a base mesh from Gaussians during optimization, then building and refining an adaptive layer of Gaussians with a variable thickness around the mesh to better capture the fine details and volumetric effects near the surface, such as hair or grass. We call this layer Gaussian Frosting, as it resembles a coating of frosting on a cake. The fuzzier the material, the thicker the frosting. We also introduce a parameterization of the Gaussians to enforce them to stay inside the frosting layer and automatically adjust their parameters when deforming, rescaling, editing or animating the mesh. Our representation allows for efficient rendering using Gaussian splatting, as well as editing and animation by modifying the base mesh. We demonstrate the effectiveness of our method on various synthetic and real scenes, and show that it outperforms existing surface-based approaches. We will release our code and a web-based viewer as additional contributions. ? Paper | Project Page | Code (not yet) | ? Short Presentation
9. Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Authors : Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li
추상적인
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks. ? Paper | Project Page | Code (not yet)
10. EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Authors : Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney
추상적인
In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale. ? Paper | Project Page | Code (not yet)
11. InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
Authors : Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao
추상적인
3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion. ? Paper | Project Page | 암호
12. Gaga: Group Any Gaussians via 3D-aware Memory Bank
Authors : Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang
추상적인
We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot segmentation models, significantly enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as scene understanding and manipulation. ? Paper | Project Page | 암호
13. [CVPR W'24] ICE-G: Image Conditional Editing of 3D Gaussian Splats
Authors : Vishnu Jaganathan, Hannah Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira
추상적인
Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine grained control of editing. ? Paper | Project Page | ? Short Presentation
14. Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks
Authors : Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar
추상적인
In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we found that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. ? Preprint | Project Page | Code (Segmentation)
2023년:
1. [CVPR '24] GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Authors : Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
추상적인
3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation technique. GaussianEditor enhances precision and control in editing through our proposed Gaussian Semantic Tracing, which traces the editing target throughout the training process. Additionally, we propose hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. ? Paper | Project Page | 코드 | ? Short Presentation
2. [CVPR '24] GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Authors : Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian
추상적인
Recently, impressive results have been achieved in 3D scene editing with text instructions based on a 2D diffusion model. However, current diffusion models primarily generate images by predicting noise in the latent space, and the editing is usually applied to the whole image, which makes it challenging to perform delicate, especially localized, editing for 3D scenes. Inspired by recent 3D Gaussian splatting, we propose a systematic framework, named GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text instructions. Benefiting from the explicit property of 3D Gaussians, we design a series of techniques to achieve delicate editing. Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians. The Gaussian RoI is further used to control the editing process. Our framework can achieve more delicate and precise editing of 3D scenes than previous methods while enjoying much faster training speed, ie within 20 minutes on a single V100 GPU, more than twice as fast as Instruct-NeRF2NeRF (45 minutes -- 2 hours) ? Paper | Project Page | Code (not yet) | ? Short Presentation
3. Point'n Move: Interactive Scene Object Manipulation on Gaussian Splatting Radiance Fields
Authors : Jiajun Huang, Hongchuan Yu
추상적인
We propose Point'n Move, a method that achieves interactive scene object manipulation with exposed region inpainting. Interactivity here further comes from intuitive object selection and real-time editing. To achieve this, we adopt Gaussian Splatting Radiance Field as the scene representation and fully leverage its explicit nature and speed advantage. Its explicit representation formulation allows us to devise a 2D prompt points to 3D mask dual-stage self-prompting segmentation algorithm, perform mask refinement and merging, minimize change as well as provide good initialization for scene inpainting and perform editing in real-time without per-editing training, all leads to superior quality and performance. We test our method by performing editing on both forward-facing and 360 scenes. We also compare our method against existing scene object removal methods, showing superior quality despite being more capable and having a speed advantage. ? 종이
4. [ECCV'24] Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Authors : Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke
추상적인
The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by SAM, along with introduced 3D spatial consistency regularization. Comparing to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization and scene recomposition. ? Paper | 암호
5. Segment Any 3D Gaussians
Authors : Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
추상적인
Interactive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation. However, existing methods face challenges in either achieving fine-grained, multi-granularity segmentation or contending with substantial computational overhead, inhibiting real-time interaction. In this paper, we introduce Segment Any 3D GAussians (SAGA), a novel 3D interactive segmentation approach that seamlessly blends a 2D segmentation foundation model with 3D Gaussian Splatting (3DGS), a recent breakthrough of radiance fields. SAGA efficiently embeds multi-granularity 2D segmentation results generated by the segmentation foundation model into 3D Gaussian point features through well-designed contrastive training. Evaluation on existing benchmarks demonstrates that SAGA can achieve competitive performance with state-of-the-art methods. Moreover, SAGA achieves multi-granularity segmentation and accommodates various prompts, including points, scribbles, and 2D masks. Notably, SAGA can finish the 3D segmentation within milliseconds, achieving nearly 1000× acceleration1 compared to previous SOTA. ? Paper | Project Page | 암호
6. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
추상적인
3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. ? Paper | Project Page | 코드 | ? Short Presentation
7. 2D-Guided 3D Gaussian Segmentation
Authors : Kun Lan, Haoran Li, Haolin Shi, Wenjun Wu, Yong Liao, Lin Wang, Pengyuan Zhou
추상적인
Recently, 3D Gaussian, as an explicit 3D representation method, has demonstrated strong competitiveness over NeRF (Neural Radiance Fields) in terms of expressing complex scenes and training duration. These advantages signal a wide range of applications for 3D Gaussians in 3D understanding and editing. Meanwhile, the segmentation of 3D Gaussians is still in its infancy. The existing segmentation methods are not only cumbersome but also incapable of segmenting multiple objects simultaneously in a short amount of time. In response, this paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision. This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information, while nearest neighbor clustering and statistical filtering refine the segmentation results. Experiments show that our concise method can achieve comparable performances on mIOU and mAcc for multi-object segmentation as previous single-object segmentation methods. ? 종이
Language Embedding:
2024년:
1. [IROS '24] Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot
Authors : Justin Yu, Kush Hari, Kishore Srinivas, Karim El-Refai, Adam Rashid, Chung Min Kim, Justin Kerr, Richard Cheng, Muhammad Zubair Irshad, Ashwin Balakrishna, Thomas Kollar, Ken Goldberg
추상적인
Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy. ? 종이 | 프로젝트 페이지
2. [CVPR '24] Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Authors : Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, Shao-Hua Guan
추상적인
Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. ? 종이 | 프로젝트 페이지 | 암호
3. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
추상적인
3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
4. [CVPR '24] LangSplat: 3D Language Gaussian Splatting
Authors : Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister
추상적인
Human lives in a 3D world and commonly uses natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP language embeddings in a NeRF model, LangSplat advances the field by utilizing a collection of 3D Gaussians, each encoding language features distilled from CLIP, to represent the language field. By employing a tile-based splatting technique for rendering language features, we circumvent the costly rendering process inherent in NeRF. Instead of directly learning CLIP embeddings, LangSplat first trains a scene-wise language autoencoder and then learns language features on the scene-specific latent space, thereby alleviating substantial memory demands imposed by explicit modeling. Existing methods struggle with imprecise and vague 3D language fields, which fail to discern clear boundaries between objects. We delve into this issue and propose to learn hierarchical semantics using SAM, thereby eliminating the need for extensively querying the language field across various scales and the regularization of DINO features. Extensive experiments on open-vocabulary 3D object localization and semantic segmentation demonstrate that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a {speed} × speedup compared to LERF at the resolution of 1440 × 1080. ? 종이 | 프로젝트 페이지 | 코드 | ? Short Presentation
5. SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting
Authors : Xinyi Liu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi
추상적인
Many recent developments for robots to represent environments have focused on photorealistic reconstructions. This paper particularly focuses on generating sequences of images from the photorealistic Gaussian Splatting models, that match instructions that are given by user-inputted language. We contribute a novel framework, SplaTraj, which formulates the generation of images within photorealistic environment representations as a continuous-time trajectory optimization problem. Costs are designed so that a camera following the trajectory poses will smoothly traverse through the environment and render the specified spatial information in a photogenic manner. This is achieved by querying a photorealistic representation with language embedding to isolate regions that correspond to the user-specified inputs. These regions are then projected to the camera's view as it moves over time and a cost is constructed. We can then apply gradient-based optimization and differentiate through the rendering to optimize the trajectory for the defined cost. The resulting trajectory moves to photogenically view each of the specified objects. We empirically evaluate our approach on a suite of environments and instructions, and demonstrate the quality of generated image sequences. ? 종이 | Code (not yet) | ? Short Presentation
6. FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
Authors : Xingxing Zuo, Pouya Samangouei, Yunwen Zhou, Yan Di, Mingyang Li
추상적인
Precisely perceiving the geometric and semantic properties of real-world 3D objects is crucial for the continued evolution of augmented reality and robotic applications. To this end, we present algfull{} (algname{}), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection, despite that we are 851× faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. ? 종이
7. Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Authors : Hyunjee Lee*, Youngsik Yun*, Jeongmin Bae, Seoha Kim, Youngjung Uh
추상적인
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. ? 종이 | Project Page | Code (not yet)
Mesh Extraction and Physics:
2024년:
1. Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting
Authors : Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang
추상적인
We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. ? Paper | Project Page | Code (not yet) | ? Short Presentation
2. GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting
Authors : Joanna Waczyńska, Piotr Borycki, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek
추상적인
In recent years, a range of neural network-based methods for image rendering have been introduced. For instance, widely-researched neural radiance fields (NeRF) rely on a neural network to represent 3D scenes, allowing for realistic view synthesis from a small number of 2D images. However, most NeRF models are constrained by long training and inference times. In comparison, Gaussian Splatting (GS) is a novel, state-of-theart technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and swift, real-time rendering. A drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity to condition several hundred thousand Gaussian components. To solve this, we introduce Gaussian Mesh Splatting (GaMeS) model, a hybrid of mesh and a Gaussian distribution, that pin all Gaussians splats on the object surface (mesh). The unique contribution of our methods is defining Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain high-quality renders in the real-time generation of high-quality views. Furthermore, we demonstrate that in the absence of a predefined mesh, it is possible to fine-tune the initial mesh during the learning process. ? Paper | 암호
3. Mesh-based Gaussian Splatting for Real-time Large-scale Deformation
Authors : Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai
추상적인
Neural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion. Gaussian Splatting(GS) has recently become a promising method with explicit geometry for representing static scenes and facilitating high-quality and real-time synthesis of novel views. However,it cannot be easily deformed due to the use of discrete Gaussians and lack of explicit topology. To address this, we develop a novel GS-based method that enables interactive deformation. Our key idea is to design an innovative mesh-based GS representation, which is integrated into Gaussian learning and manipulation. 3D Gaussians are defined over an explicit mesh, and they are bound with each other: the rendering of 3D Gaussians guides the mesh face split for adaptive refinement, and the mesh face split directs the splitting of 3D Gaussians. Moreover, the explicit mesh constraints help regularize the Gaussian distribution, suppressing poor-quality Gaussians(eg misaligned Gaussians,long-narrow shaped Gaussians), thus enhancing visual quality and avoiding artifacts during deformation. Based on this representation, we further introduce a large-scale Gaussian deformation technique to enable deformable GS, which alters the parameters of 3D Gaussians according to the manipulation of the associated mesh. Our method benefits from existing mesh deformation datasets for more realistic data-driven Gaussian deformation. Extensive experiments show that our approach achieves high-quality reconstruction and effective deformation, while maintaining the promising rendering results at a high frame rate(65 FPS on average). ? 종이
4. Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
Authors : Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li
추상적인
Reconstructing and simulating elastic objects from visual observations is crucial for applications in computer vision and robotics. Existing methods, such as 3D Gaussians, provide modeling for 3D appearance and geometry but lack the ability to simulate physical properties or optimize parameters for heterogeneous objects. We propose Spring-Gaus, a novel framework that integrates 3D Gaussians with physics-based simulation for reconstructing and simulating elastic objects from multi-view videos. Our method utilizes a 3D Spring-Mass model, enabling the optimization of physical parameters at the individual point level while decoupling the learning of physics and appearance. This approach achieves great sample efficiency, enhances generalization, and reduces sensitivity to the distribution of simulation particles. We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects. This includes future prediction and simulation under varying initial states and environmental parameters. ? 종이 | Project Page | Code (not yet)
5. Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
Authors : Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang
추상적인
3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namely Texture-GS, to disentangle the appearance from the geometry by representing it as a 2D texture mapped onto the 3D surface, thereby facilitating appearance editing. Technically, the disentanglement is achieved by our proposed texture mapping module, which consists of a UV mapping MLP to learn the UV coordinates for the 3D Gaussian centers, a local Taylor expansion of the MLP to efficiently approximate the UV coordinates for the ray-Gaussian intersections, and a learnable texture to capture the fine-grained appearance. Extensive experiments on the DTU dataset demonstrate that our method not only facilitates high-fidelity appearance editing but also achieves real-time rendering on consumer-level devices, eg a single RTX 2080 Ti GPU. ? Paper | Project Page | Code (not yet)
6. DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
Authors : Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala
추상적인
3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints during optimization. We extend 3D Gaussian splatting with depth and normal cues to tackle challenging indoor datasets and showcase techniques for efficient mesh extraction, an important downstream application. Specifically, we regularize the optimization procedure with depth information, enforce local smoothness of nearby Gaussians, and use the geometry of the 3D Gaussians supervised by normal cues to achieve better alignment with the true scene geometry. We improve depth estimation and novel view synthesis results over baselines and show how this simple yet effective regularization technique can be used to directly extract meshes from the Gaussian representation yielding more physically accurate reconstructions on indoor scenes. ? Paper | 코드 | 프로젝트 페이지
7. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
Authors : Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao
추상적인
3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-accurate 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering. ? Paper | Project Page | 코드 | ? Short Presentation
7.1 Unofficial Implementation and Specification
Authors : Yunzhou Song, Zixuan Lin, Yexin Zhang
암호
8. Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
Authors : Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang
추상적인
Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. ? 종이 | Project Page | Code (not yet)
9. [ECCV '24] GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Authors : Yaniv Wolf, Amit Bracha, Ron Kimmel
추상적인
Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results. ? Paper | Project Page | 암호
10. RaDe-GS: Rasterizing Depth in Gaussian Splatting
Authors : Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, Ping Tan
추상적인
Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. Our approach achieves a Chamfer distance error comparable to NeuraLangelo[Li et al. 2023] on the DTU dataset and similar training and rendering time as traditional Gaussian Splatting on the Tanks & Temples dataset. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods. ? Paper | Project Page | Code (not yet)
11. Trim 3D Gaussian Splatting for Accurate Geometry Representation
Authors : Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang
추상적인
In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. ? Paper | Project Page | 암호
12. Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting
Authors : Junha Hyung, Susung Hong, Sungwon Hwang, Jaeseong Lee, Jaegul Choo, Jin-Hwa Kim
추상적인
3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals, due to the Gaussians converging into anisotropic Gaussians with one dominant variance. We propose using effective rank analysis to examine the shape statistics of 3D Gaussian primitives, and identify the Gaussians indeed converge into needle-like shapes with the effective rank 1. To address this, we introduce effective rank as a regularization, which constrains the structure of the Gaussians. Our new regularization method enhances normal and geometry reconstruction while reducing needle-like artifacts. The approach can be integrated as an add-on module to other 3DGS variants, improving their quality without compromising visual fidelity. ? Paper | Project Page | Code (not yet)
13. CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Authors : Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, Zhaoxiang Zhang
추상적인
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10x compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs. ? Paper | Project Page | Code (Coming Soon)
14. [CoRL '24] Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision
Authors : Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell, Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
추상적인
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately We introduce Cloth-Splatting, a method for estimating 3D states of cloth from RGB images through a prediction-update framework. Cloth-Splatting leverages an action-conditioned dynamics model for predicting future states and uses 3D Gaussian Splatting to update the predicted states. Our key insight is that coupling a 3D mesh-based representation with Gaussian Splatting allows us to define a differentiable map between the cloth's state space and the image space. This enables the use of gradient-based optimization techniques to refine inaccurate state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting not only improves state estimation accuracy over current baselines but also reduces convergence time by ~85%. ? 종이 | Project Page | 암호
2023년:
1. [CVPR '24] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Authors : Tianyi Xie, Zeshun Zong, Yuxin Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang
추상적인
We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, "cage meshes," or any other geometry embedding, highlighting the principle of "what you see is what you simulate (WS2)." Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. ? Paper | Project Page | 코드 | ? Short Presentation
2. [CVPR '24] SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Authors : Antoine Guédon, Vincent Lepetit
추상적인
We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting has recently become very popular as it yields realistic rendering while being significantly faster to train than NeRFs. It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene. We then introduce a method that exploits this alignment to sample points on the real surface of the scene and extract a mesh from the Gaussians using Poisson reconstruction, which is fast, scalable, and preserves details, in contrast to the Marching Cubes algorithm usually applied to extract meshes from Neural SDFs. Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality. ? Paper | Project Page | 코드 | ? Short Presentation
3. NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance
Authors : Hanlin Chen, Chen Li, Gim Hee Lee
추상적인
Existing neural implicit surface reconstruction methods have achieved impressive performance in multi-view 3D reconstruction by leveraging explicit geometry priors such as depth maps or point clouds as regularization. However, the reconstruction results still lack fine details because of the over-smoothed depth map or sparse point cloud. In this work, we propose a neural implicit surface reconstruction pipeline with guidance from 3D Gaussian Splatting to recover highly detailed surfaces. The advantage of 3D Gaussian Splatting is that it can generate dense point clouds with detailed structure. Nonetheless, a naive adoption of 3D Gaussian Splatting can fail since the generated points are the centers of 3D Gaussians that do not necessarily lie on the surface. We thus introduce a scale regularizer to pull the centers close to the surface by enforcing the 3D Gaussians to be extremely thin. Moreover, we propose to refine the point cloud from 3D Gaussians Splatting with the normal priors from the surface predicted by neural implicit models instead of using a fixed set of points as guidance. Consequently, the quality of surface reconstruction improves from the guidance of the more accurate 3D Gaussian splatting. By jointly optimizing the 3D Gaussian Splatting and the neural implicit model, our approach benefits from both representations and generates complete surfaces with intricate details. Experiments on Tanks and Temples verify the effectiveness of our proposed method. ? 종이
기타:
2024년:
1. Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting
Authors : Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White
추상적인
The accelerating deployment of spacecraft in orbit have generated interest in on-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possible unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. This requires robust characterization of the target's geometry. In this article, we present an approach for mapping geometries of satellites on orbit based on 3D Gaussian Splatting that can run on computing resources available on current spaceflight hardware. We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up under several realistic lighting and motion conditions. Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms. Such on-board capabilities are critical to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks. ? 종이
2. TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
Authors : Linus Franke, Darius Rückert, Laura Fink, Marc Stamminger
추상적인
Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [Rückert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. ? 종이 | Project Page | 암호
3. EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting
Authors : Lingting Zhu, Zhao Wang, Jiahao Cui, Zhenchao Jin, Guying Lin, Lequan Yu
추상적인
Surgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splatting, a recent trending 3D representation, we present EndoGS, applying Gaussian Splatting for deformable endoscopic tissue reconstruction. Specifically, our approach incorporates deformation fields to handle dynamic scenes, depth-guided supervision with spatial-temporal weight masks to optimize 3D targets with tool occlusion from a single viewpoint, and surface-aligned regularization terms to capture the much better geometry. As a result, EndoGS reconstructs and renders high-quality deformable endoscopic tissues from a single-viewpoint video, estimated depth maps, and labeled tool masks. Experiments on DaVinci robotic surgery videos demonstrate that EndoGS achieves superior rendering quality. ? Paper | 암호
4. EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction
Authors : Yifan Liu, Chenxin Li, Chen Yang, Yixuan Yuan
추상적인
Reconstructing deformable tissues from endoscopic stereo videos is essential in many downstream surgical applications. However, existing methods suffer from slow inference speed, which greatly limits their practical use. In this paper, we introduce EndoGaussian, a real-time surgical scene reconstruction framework that builds on 3D Gaussian Splatting. Our framework represents dynamic surgical scenes as canonical Gaussians and a time-dependent deformation field, which predicts Gaussian deformations at novel timestamps. Due to the efficient Gaussian representation and parallel rendering pipeline, our framework significantly accelerates the rendering speed compared to previous methods. In addition, we design the deformation field as the combination of a lightweight encoding voxel and an extremely tiny MLP, allowing for efficient Gaussian tracking with a minor rendering burden. Furthermore, we design a holistic Gaussian initialization method to fully leverage the surface distribution prior, achieved by searching informative points from across the input image sequence. Experiments on public endoscope datasets demonstrate that our method can achieve real-time rendering speed (195 FPS real-time, 100× gain) while maintaining the state-of-the-art reconstruction quality (35.925 PSNR) and the fastest training speed (within 2 min/scene), showing significant promise for intraoperative surgery applications. ? Paper | 프로젝트 페이지 | 암호
5. GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting
Authors : Butian Xiong, Zhuo Li, Zhen Li
추상적인
We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km2. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information ? 종이
6. LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering
Authors : Sheng Hong, Junjie He, Xinhu Zheng, Hesheng Wang, Hao Fang, Kangcheng Liu, Chunran Zheng, Shaojie Shen
추상적인
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multi-modal sensor fused mapping system that builds on the differentiable surface splatting to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion. This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initial poses for surface Gaussian scenes are obtained using a LiDAR-inertial system with size-adaptive voxels. Then, we optimized and refined the Gaussians by visual-derived photometric gradients to optimize the quality and density of LiDAR measurements. Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality while also holding potential applicability in real-time SLAM and robotics domains. ? Paper | Code (not yet)
7. VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
Authors : Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao Feng, Huamin Wang, Minchen Li, Henry Lau, Feng Gao, Yin Yang, Chenfanfu Jiang
추상적인
As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience. ? Paper | 프로젝트 페이지
8. Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps
Authors : Timothy Chen, Ola Shorinwa, Weijia Zeng, Joseph Bruno, Philip Dames, Mac Schwager
추상적인
We present Splat-Nav, a navigation pipeline that consists of a real-time safe planning module and a robust state estimation module designed to operate in the Gaussian Splatting (GSplat) environment representation, a popular emerging 3D scene representation from computer vision. We formulate rigorous collision constraints that can be computed quickly to build a guaranteed-safe polytope corridor through the map. We then optimize a B-spline trajectory through this corridor. We also develop a real-time, robust state estimation module by interpreting the GSplat representation as a point cloud. The module enables the robot to localize its global pose with zero prior knowledge from RGB-D images using point cloud alignment, and then track its own pose as it moves through the scene from RGB images using image-to-point cloud localization. We also incorporate semantics into the GSplat in order to obtain better images for localization. All of these modules operate mainly on CPU, freeing up GPU resources for tasks like real-time scene reconstruction. We demonstrate the safety and robustness of our pipeline in both simulation and hardware, where we show re-planning at 5 Hz and pose estimation at 20 Hz, an order of magnitude faster than Neural Radiance Field (NeRF)-based navigation methods, thereby enabling real-time navigation. ? 종이
9. Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
Authors : TYuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille
추상적인
X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian point cloud model inspired by the isotropic nature of X-ray imaging. Our model excludes the influence of view direction when learning to predict the radiation intensity of 3D points. Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. ? 종이
10. ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Authors : Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang
추상적인
Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots. Conventional robotic manipulation methods usually learn semantic representation of the observation for action prediction, which ignores the scene-level spatiotemporal dynamics for human goal completion. In this paper, we propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction. Specifically, we first formulate the dynamic Gaussian Splatting framework that infers the semantics propagation in the Gaussian embedding space, where the semantic representation is leveraged to predict the optimal robot action. Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction. We evaluate our ManiGaussian on 10 RLBench tasks with 166 variations, and the results demonstrate our framework can outperform the state-of-the-art methods by 13.1% in average success rate. ? Paper | Project Page | 암호
11. GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
Authors : Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang
추상적인
Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3× lower GPU memory usage and 5× faster fitting time not only rivals INRs (eg, WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 1000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. ? 종이
12. GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
Authors : Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang
추상적인
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (eg NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. ? Paper | Code (not yet)
13. Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience
Authors : Xiaohang Yu, Zhengxian Yang, Shi Pan, Yuqi Han, Haoxiang Wang, Jun Zhang, Shi Yan, Borong Lin, Lei Yang, Tao Yu, Lu Fang
추상적인
We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than existing datasets, may also inspire space-oriented light field reconstruction, which is potentially different from object-centric 3D reconstruction, for immersive VR/AR experiences. We utilized a total of 40 GoPro 10 cameras, capturing images of 5k resolution. The number of photos captured for each scene is no less than 1000, and the average density (view number within a unit sphere) is 134.68. It is also worth noting that our system is capable of efficiently capturing large outdoor scenes. Addressing the current lack of large-space and dense light field datasets, we made efforts to include elements such as sky, reflections, lights and shadows that are of interest to researchers in the field of 3D reconstruction during the data capture process. Finally, we validated the effectiveness of our provided dataset on three popular algorithms and also integrated the reconstructed 3DGS results into the Unity engine, demonstrating the potential of utilizing our datasets to enhance the realism of virtual reality (VR) and create feasible interactive spaces. ? 종이
14. Modeling uncertainty for Gaussian Splatting
Authors : Luca Savant, Diego Valsesia, Enrico Magli
추상적인
We present Stochastic Gaussian Splatting (SGS): the first framework for uncertainty estimation using Gaussian Splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of Neural Radiance Fields (NeRF). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this paper, we introduce a Variational Inference-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. Additionally, we introduce the Area Under Sparsification Error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the LLFF dataset demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications. ? 종이
15. TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
Authors : Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu
추상적인
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. ? 종이
16. GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis
Authors : Emmanouil Nikolakakis, Utkarsh Gupta, Jonathan Vengosh, Justin Bui, Razvan Marinescu
추상적인
We present GaSpCT, a novel view synthesis and 3D scene representation method used to generate novel projection views for Computer Tomography (CT) scans. We adapt the Gaussian Splatting framework to enable novel view synthesis in CT based on limited sets of 2D image projections and without the need for Structure from Motion (SfM) methodologies. Therefore, we reduce the total scanning duration and the amount of radiation dose the patient receives during the scan. We adapted the loss function to our use-case by encouraging a stronger background and foreground distinction using two sparsity promoting regularizers: a beta loss and a total variation (TV) loss. Finally, we initialize the Gaussian locations across the 3D space using a uniform prior distribution of where the brain's positioning would be expected to be within the field of view. We evaluate the performance of our model using brain CT scans from the Parkinson's Progression Markers Initiative (PPMI) dataset and demonstrate that the rendered novel views closely match the original projection views of the simulated scan, and have better performance than other implicit 3D scene representations methodologies . Furthermore, we empirically observe reduced training time compared to neural network based image synthesis for sparse-view CT image reconstruction. Finally, the memory requirements of the Gaussian Splatting representations are reduced by 17% compared to the equivalent voxel grid image representations. ? 종이
17. Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion
Authors : Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla
추상적인
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360∘ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance). ? 종이
18. Dual-Camera Smooth Zoom on Mobile Phones
Authors : Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo
추상적인
When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. ? 종이
19. Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction
Authors : Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano
추상적인
Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer. ? 종이
20. SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera
Authors : Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang
추상적인
One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. ? 종이
21. [CVPR '24] SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection
Authors : Mathis Kruse, Marco Rudolph, Dominik Woiwode, Bodo Rosenhahn
추상적인
Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set. ? 종이 | 암호
22. Reinforcement Learning with Generalizable Gaussian Splatting
Authors : Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu
추상적인
An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL. ? 종이
23. DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark
Authors : Tianyi Zhang, Kaining Huang, Weiming Zhi, Matthew Johnson-Roberson
추상적인
Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments. ? Paper | 코드 | ? Short Presentation | ? Short Presentation (Bilibili)
24. Adversarial Generation of Hierarchical Gaussians for 3d Generative Model
Authors : Sangeek Hyun, Jae-Pil Heo
추상적인
Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. ? Paper | 프로젝트 페이지
25. Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting
Authors : Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis, Monroe Kennedy III
추상적인
We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. ? Paper | Project Page | 암호
26. Radiance Fields for Robotic Teleoperation
Authors : Maximum Wilder-Smith, Vaishakh Patil, Marco Hutter
추상적인
Radiance field methods such as Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS), have revolutionized graphics and novel view synthesis. Their ability to synthesize new viewpoints with photo-realistic quality, as well as capture complex volumetric and specular scenes, makes them an ideal visualization for robotic teleoperation setups. Direct camera teleoperation provides high-fidelity operation at the cost of maneuverability, while reconstruction-based approaches offer controllable scenes with lower fidelity. With this in mind, we propose replacing the traditional reconstruction-visualization components of the robotic teleoperation pipeline with online Radiance Fields, offering highly maneuverable scenes with photorealistic quality. As such, there are three main contributions to state of the art: (1) online training of Radiance Fields using live data from multiple cameras, (2) support for a variety of radiance methods including NeRF and 3DGS, (3) visualization suite for these methods including a virtual reality scene. To enable seamless integration with existing setups, these components were tested with multiple robots in multiple configurations and were displayed using traditional tools as well as the VR headset. The results across methods and robots were compared quantitatively to a baseline of mesh reconstruction, and a user study was conducted to compare the different visualization methods. ? Paper | Project Page | 암호
2023년:
1. [ECCV '24] FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
Authors : Wen Jiang, Boshu Lei, Kostas Daniilidis
추상적인
This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the limited availability of 2D images poses uncertainties stemming from occlusions, depth ambiguities, and imaging errors. Efficiently selecting informative views becomes crucial, and quantifying NeRF model uncertainty presents intricate challenges. Existing approaches either depend on model architecture or are based on assumptions regarding density distributions that are not generally applicable. By leveraging Fisher Information, we efficiently quantify observed information within Radiance Fields without ground truth data. This can be used for the next best view selection and pixel-wise uncertainty quantification. Our method overcomes existing limitations on model architecture and effectiveness, achieving state-of-the-art results in both view selection and uncertainty quantification, demonstrating its potential to advance the field of Radiance Fields. Our method with the 3D Gaussian Splatting backend could perform view selections at 70 fps. ? Paper | 프로젝트 페이지 | 암호
2. Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering
Authors : Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, Li Zhang
추상적인
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. ? Paper | Project Page | Code (not yet)
3. MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians
Authors : Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar
추상적인
Understanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that can cause misalignments resulting in inaccurate contacts. We present MANUS, a method for Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians. We build a novel articulated 3D Gaussians representation that extends 3D Gaussian splatting for high-fidelity representation of articulating hands. Since our representation uses Gaussian primitives, it enables us to efficiently and accurately estimate contacts between the hand and the object. For the most accurate results, our method requires tens of camera views that current datasets do not provide. We therefore build MANUS-Grasps, a new dataset that contains hand-object grasps viewed from 53 cameras across 30+ scenes, 3 subjects, and comprising over 7M frames. In addition to extensive qualitative results, we also show that our method outperforms others on a quantitative contact evaluation method that uses paint transfer from the object to the hand. ? 종이
4. [CVPR '24] Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Authors : Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, Song-Hai Zhang
추상적인
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. ? Paper | Project Page | 암호
5. Mathematical Supplement for the gsplat Library
Authors : Vickie Ye, Angjoo Kanazawa
추상적인
This report provides the mathematical details of the gsplat library, a modular toolbox for efficient differentiable Gaussian splatting, as proposed by Kerbl et al. It provides a self-contained reference for the computations involved in the forward and backward passes of differentiable Gaussian splatting. To facilitate practical usage and development, we provide a user friendly Python API that exposes each component of the forward and backward passes in rasterization of [gsplat](https://github.com/nerfstudio-project/gsplat). ? 종이
6. PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation
Authors : Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae
추상적인
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. ? Paper | Project Page | Code (not yet)
Regularization and Optimization:
2024년:
1. DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines
Authors : Sankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Nandita Vijaykumar
추상적인
Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (eg 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods have been demonstrated to be very promising, providing state-of-art quality for many important tasks. However, training a model to represent a scene is still a time-consuming task even when using powerful GPUs. In this work, we observe that the gradient computation phase during training is a significant bottleneck on GPUs due to the large number of atomic operations that need to be processed. These atomic operations overwhelm atomic units in the L2 partitions causing stalls. To address this challenge, we leverage the observations that during the gradient computation: (1) for most warps, all threads atomically update the same memory locations; and (2) warps generate varying amounts of atomic traffic (since some threads may be inactive). We propose DISTWAR, a software-approach to accelerate atomic operations based on two key ideas: First, we enable warp-level reduction of threads at the SM sub-cores using registers to leverage the locality in intra-warp atomic updates. Second, we distribute the atomic computation between the warp-level reduction at the SM and the L2 atomic units to increase the throughput of atomic computation. Warps with many threads performing atomic updates to the same memory locations are scheduled at the SM, and the rest using L2 atomic units. We implement DISTWAR using existing warp-level primitives. We evaluate DISTWAR on widely used raster-based differentiable rendering workloads. We demonstrate significant speedups of 2.44x on average (up to 5.7x). ? 종이
2. [CVPR '24] FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Authors : Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing
추상적인
3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (eg, Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently. ? 종이
3. RAIN-GS: Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting
Authors : Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kim
추상적인
3D Gaussian splatting (3DGS) has recently demonstrated impressive capabilities in real-time novel view synthesis and 3D reconstruction. However, 3DGS heavily depends on the accurate initialization derived from Structure-from-Motion (SfM) methods. When trained with randomly initialized point clouds, 3DGS often fails to maintain its ability to produce high-quality images, undergoing large performance drops of 4-5 dB in PSNR in general. Through extensive analysis of SfM initialization in the frequency domain and analysis of a 1D regression task with multiple 1D Gaussians, we propose a novel optimization strategy dubbed RAIN-GS (Relaxing Accurate INitialization Constraint for 3D Gaussian Splatting) that successfully trains 3D Gaussians from randomly initialized point clouds. We show the effectiveness of our strategy through quantitative and qualitative comparisons on standard datasets, largely improving the performance in all settings. ? Paper | Project Page | 암호
4. A New Split Algorithm for 3D Gaussian Splatting
Authors : Qiyuan Feng, Gengchen Cao, Haoxiang Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu
추상적인
3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to flatten large untextured regions, yielding a very sparse point cloud. These problems are caused by the non-uniform nature of 3D Gaussian splatting models, so in this paper, we propose a new 3D Gaussian splitting algorithm, which can produce a more uniform and surface-bounded 3D Gaussian splatting model. Our algorithm splits an N-dimensional Gaussian into two N-dimensional Gaussians. It ensures consistency of mathematical characteristics and similarity of appearance, allowing resulting 3D Gaussian splatting models to be more uniform and a better fit to the underlying surface, and thus more suitable for explicit editing, point cloud extraction and other tasks. Meanwhile, our 3D Gaussian splitting approach has a very simple closed-form solution, making it readily applicable to any 3D Gaussian model. ? 종이
5. Revising Densification in Gaussian Splatting
Authors : Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
추상적인
In this paper, we address the limitations of Adaptive Density Control (ADC) in 3D Gaussian Splatting (3DGS), a scene representation method achieving high-quality, photorealistic results for novel view synthesis. ADC has been introduced for automatic 3D point primitive management, controlling densification and pruning, however, with certain limitations in the densification logic. Our main contribution is a more principled, pixel-error driven formulation for density control in 3DGS, leveraging an auxiliary, per-pixel error function as the criterion for densification. We further introduce a mechanism to control the total number of primitives generated per scene and correct a bias in the current opacity handling strategy of ADC during cloning operations. Our approach leads to consistent quality improvements across a variety of benchmark scenes, without sacrificing the method's efficiency. ? 종이
2023년:
1. [CVPRW '24] Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images
Authors : Jaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee
추상적인
In this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide to mitigate overfitting. We obtained the depth map using a pre-trained monocular depth estimation model and aligning the scale and offset using sparse COLMAP feature points. The adjusted depth aids in the color-based optimization of 3D Gaussian splatting, mitigating floating artifacts, and ensuring adherence to geometric constraints. We verify the proposed method on the NeRF-LLFF dataset with varying numbers of few images. Our approach demonstrates robust geometry compared to the original method that relies solely on images. ? 종이 | Project Page | 암호
2. EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Authors : Sharath Girish, Kamal Gupta, Abhinav Shrivastava
추상적인
Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach results in scene representations with fewer Gaussians and quantized representations, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce memory by more than an order of magnitude all while maintaining the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed. ? Paper | Project Page | 암호
3. [CVPR '24] COLMAP-Free 3D Gaussian Splatting
Authors : Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang
추상적인
While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. ? Paper | Project Page | Code (not yet) | ? Short Presentation
4. iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching
Authors : Yuan Sun, Xuan Wang, Yunfan Zhang, Jie Zhang, Caigui Jiang, Yu Guo, Fei Wang
추상적인
We present a method named iComMa to address the 6D pose estimation problem in computer vision. The conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods address mesh-free 6D pose estimation by employing the inversion of a Neural Radiance Field (NeRF), aiming to overcome the aforementioned constraints. However, it still suffers from adverse initializations. By contrast, we model the pose estimation as the problem of inverting the 3D Gaussian Splatting (3DGS) with both the comparing and matching loss. In detail, a render-and-compare strategy is adopted for the precise estimation of poses. Additionally, a matching module is designed to enhance the model's robustness against adverse initializations by minimizing the distances between 2D keypoints. This framework systematically incorporates the distinctive characteristics and inherent rationale of render-and-compare and matching-based approaches. This comprehensive consideration equips the framework to effectively address a broader range of intricate and challenging scenarios, including instances with substantial angular deviations, all while maintaining a high level of prediction accuracy. Experimental results demonstrate the superior precision and robustness of our proposed jointly optimized framework when evaluated on synthetic and complex real-world data in challenging scenarios. ? Paper | 암호
표현:
2024년:
1. [CVPR '24] Gaussian Shadow Casting for Neural Characters
Authors : Luis Bolanos, Shih-Yang Su, Helge Rhodin
추상적인
Neural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density proxy that replaces sampling with a simple analytic formula. It supports dynamic motion and is tailored for shadow computation, thereby avoiding the affine projection approximation and sorting required by the closely related Gaussian splatting. Combined with a deferred neural rendering model, our Gaussian shadows enable Lambertian shading and shadow casting with minimal overhead. We demonstrate improved reconstructions, with better separation of albedo, shading, and shadows in challenging outdoor scenes with direct sun light and hard shadows. Our method is able to optimize the light direction without any input from the user. As a result, novel poses have fewer shadow artifacts and relighting in novel scenes is more realistic compared to the state-of-the-art methods, providing new ways to pose neural characters in novel environments, increasing their applicability. ? 종이
2. Optimal Projection for 3D Gaussian Splatting
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
추상적인
3D Gaussian Splatting has garnered extensive attention and application in real-time neural rendering. Concurrently, concerns have been raised about the limitations of this technology in aspects such as point cloud storage, performance , and robustness in sparse viewpoints , leading to various improvements. However, there has been a notable lack of attention to the projection errors introduced by the local affine approximation inherent in the splatting itself, and the consequential impact of these errors on the quality of photo-realistic rendering. This paper addresses the projection error function of 3D Gaussian Splatting, commencing with the residual error from the first-order Taylor expansion of the projection function ϕ. The analysis establishes a correlation between the error and the Gaussian mean position. Subsequently, leveraging function optimization theory, this paper analyzes the function's minima to provide an optimal projection strategy for Gaussian Splatting referred to Optimal Gaussian Splatting. Experimental validation further confirms that this projection methodology reduces artifacts, resulting in a more convincingly realistic rendering. ? 종이
3. 360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
추상적인
3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of 360∘ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (eg, walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel 360∘ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios. ? 종이
4. StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering
Authors : Lukas Radl, Michael Steiner, Mathias Parger, Alexander Weinrauch, Bernhard Kerbl, Markus Steinberger
추상적인
Gaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces popping and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation, yet a full per-pixel sort proves excessively costly compared to a global sort operation. In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead. Our software rasterizer effectively eliminates popping artifacts and view inconsistencies, as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with popping, ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion. Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled, making our approach 1.6x faster than the original Gaussian Splatting, with a 50% reduction in memory requirements. ? Paper | Project Page | 코드 | ? Short Presentation
5. [CVPR '24] GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Authors : Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
추상적인
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes. It is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (eg squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%. ? Paper | Project Page | 코드 | ? 프레젠테이션
6. Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting
Authors : Joongho Jo, Hyeongwon Kim, Jongsun Park
추상적인
3D Gaussian splatting (3D-GS) is a new rendering approach that outperforms the neural radiance field (NeRF) in terms of both speed and image quality. 3D-GS represents 3D scenes by utilizing millions of 3D Gaussians and projects these Gaussians onto the 2D image plane for rendering. However, during the rendering process, a substantial number of unnecessary 3D Gaussians exist for the current view direction, resulting in significant computation costs associated with their identification. In this paper, we propose a computational reduction technique that quickly identifies unnecessary 3D Gaussians in real-time for rendering the current view without compromising image quality. This is accomplished through the offline clustering of 3D Gaussians that are close in distance, followed by the projection of these clusters onto a 2D image plane during runtime. Additionally, we analyze the bottleneck associated with the proposed technique when executed on GPUs and propose an efficient hardware architecture that seamlessly supports the proposed scheme. For the Mip-NeRF360 dataset, the proposed technique excludes 63% of 3D Gaussians on average before the 2D image projection, which reduces the overall rendering computation by almost 38.3% without sacrificing peak-signal-to-noise-ratio (PSNR). The proposed accelerator also achieves a speedup of 10.7x compared to a GPU. ? 종이
7. GaussianPro: 3D Gaussian Splatting with Progressive Propagation
Authors : Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen
추상적인
The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classical multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR. ? 종이 | Project Page | 암호
8. Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting
Authors : Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin
추상적인
The recent advancements in 3D Gaussian splatting (3D-GS) have not only facilitated real-time rendering through modern GPU rasterization pipelines but have also attained state-of-the-art rendering quality. Nevertheless, despite its exceptional rendering quality and performance on standard datasets, 3D-GS frequently encounters difficulties in accurately modeling specular and anisotropic components. This issue stems from the limited ability of spherical harmonics (SH) to represent high-frequency information. To overcome this challenge, we introduce Spec-Gaussian, an approach that utilizes an anisotropic spherical Gaussian (ASG) appearance field instead of SH for modeling the view-dependent appearance of each 3D Gaussian. Additionally, we have developed a coarse-to-fine training strategy to improve learning efficiency and eliminate floaters caused by overfitting in real-world scenes. Our experimental results demonstrate that our method surpasses existing approaches in terms of rendering quality. Thanks to ASG, we have significantly improved the ability of 3D-GS to model scenes with specular and anisotropic components without increasing the number of 3D Gaussians. This improvement extends the applicability of 3D GS to handle intricate scenarios with specular and anisotropic surfaces. ? 종이
9. [CVPR '24] VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Authors : Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang
추상적인
Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering. ? Paper | Project Page | 암호
10. 3D Gaussian Model for Animation and Texturing
Authors : Xiangzhi Eric Wang, Zackary PT Sin
추상적인
3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We propose a modeling that is analogous to typical 3D models, which we call 3D Gaussian Model (3DGM); it provides a manipulatable proxy for novel animation and texture transfer. By binding the 3D Gaussians in texture space and re-projecting them back to world space through implicit shell mapping, we show how our 3D modeling can serve as a valid rendering methodology for interactive applications. It is further noted that recently, 3D mesh reconstruction works have been able to produce high-quality mesh for rendering. Our work, on the other hand, only requires an approximated geometry for rendering an object in high fidelity. Applicationwise, we will show that our proxy-based 3DGM is capable of driving novel animation without animated training data and texture transferring via UV mapping of the 3D Gaussians. We believe the result indicates the potential of our work for enabling interactive applications for 3D Gaussian Splatting. ? 종이
11. BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Authors : Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa
추상적인
Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods. To address this issue, we propose Blur Agnostic Gaussian Splatting (BAGS). BAGS introduces additional 2D modeling capacities such that a 3D-consistent and high quality scene can be reconstructed despite image-wise blur. Specifically, we model blur by estimating per-pixel convolution kernels from a Blur Proposal Network (BPN). BPN is designed to consider spatial, color, and depth variations of the scene to maximize modeling capacity. Additionally, BPN also proposes a quality-assessing mask, which indicates regions where blur occur. Finally, we introduce a coarse-to-fine kernel optimization scheme; this optimization scheme is fast and avoids sub-optimal solutions due to a sparse point cloud initialization, which often occurs when we apply Structure-from-Motion on blurry images. We demonstrate that BAGS achieves photorealistic renderings under various challenging blur conditions and imaging geometry, while significantly improving upon existing approaches. ? Paper | 암호
12. StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting
Authors : Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu
추상적인
We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. ? Paper | Project Page | 암호
13. Gaussian Splatting in Style
Authors : Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Tarun Yenamandra, Daniel Cremers
추상적인
Scene stylization extends the work of neural style transfer to three spatial dimensions. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across a multi-view setting. A vast majority of the previous works achieve this by optimizing the scene with a specific style image. In contrast, we propose a novel architecture trained on a collection of style images, that at test time produces high quality stylized novel views. Our work builds up on the framework of 3D Gaussian splatting. For a given scene, we take the pretrained Gaussians and process them using a multi resolution hash grid and a tiny MLP to obtain the conditional stylised views. The explicit nature of 3D Gaussians give us inherent advantages over NeRF-based methods including geometric consistency, along with having a fast training and rendering regime. This enables our method to be useful for vast practical use cases such as in augmented or virtual reality applications. Through our experiments, we show our methods achieve state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data. ? 종이
14. BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
Authors : Lingzhe Zhao, Peng Wang, Peidong Liu
추상적인
While neural rendering has demonstrated impressive capabilities in 3D scene reconstruction and novel view synthesis, it heavily relies on high-quality sharp images and accurate camera poses. Numerous approaches have been proposed to train Neural Radiance Fields (NeRF) with motion-blurred images, commonly encountered in real-world scenarios such as low-light or long-exposure conditions. However, the implicit representation of NeRF struggles to accurately recover intricate details from severely motion-blurred images and cannot achieve real-time rendering. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction and real-time rendering by explicitly optimizing point clouds into 3D Gaussians. In this paper, we introduce a novel approach, named BAD-Gaussians (Bundle Adjusted Deblur Gaussian Splatting), which leverages explicit Gaussian representation and handles severe motion-blurred images with inaccurate camera poses to achieve high-quality scene reconstruction. Our method models the physical image formation process of motion-blurred images and jointly learns the parameters of Gaussians while recovering camera motion trajectories during exposure time. In our experiments, we demonstrate that BAD-Gaussians not only achieves superior rendering quality compared to previous state-of-the-art deblur neural rendering methods on both synthetic and real datasets but also enables real-time rendering capabilities. ? Paper | Project Page | 암호
15. SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Authors : Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou
추상적인
Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency. ? 종이
16. GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Authors : Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari
추상적인
During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue, we propose a novel approach called GeoGaussian. Based on the smoothly connected areas observed from point clouds, this method introduces a novel pipeline to initialize thin Gaussians aligned with the surfaces, where the characteristic can be transferred to new generations through a carefully designed densification strategy. Finally, the pipeline ensures that the scene's geometry and texture are maintained through constrained optimization processes with explicit geometry constraints. Benefiting from the proposed architecture, the generative ability of 3D Gaussians is enhanced, especially in structured regions. Our proposed pipeline achieves state-of-the-art performance in novel view synthesis and geometric reconstruction, as evaluated qualitatively and quantitatively on public datasets. ? 종이
17. Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Authors : Zhihao Liang, Qi Zhang, Wenbo Hu, Ying Feng, Lei Zhu, Kui Jia
추상적인
The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity. ? 종이
18. Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Authors : Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin
추상적인
High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on detailed modelling of the physical image formation process and utilizes velocities estimated using visual-inertial odometry (VIO). Camera poses are considered non-static during the exposure time of a single image frame and camera poses are further optimized in the reconstruction process. We formulate a differentiable rendering pipeline that leverages screen space approximation to efficiently incorporate rolling-shutter and motion blur effects into the 3DGS framework. Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods, thereby advancing 3DGS in naturalistic settings. ? 종이 | 코드 | 프로젝트 페이지
19. RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
Authors : Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari
추상적인
Recent advances in view synthesis and real-time rendering have achieved photorealistic quality at impressive rendering speeds. While Radiance Field-based methods achieve state-of-the-art quality in challenging scenarios such as in-the-wild captures and large-scale scenes, they often suffer from excessively high compute requirements linked to volumetric rendering. Gaussian Splatting-based methods, on the other hand, rely on rasterization and naturally achieve real-time rendering but suffer from brittle optimization heuristics that underperform on more challenging scenes. In this work, we present RadSplat, a lightweight method for robust real-time rendering of complex scenes. Our main contributions are threefold. First, we use radiance fields as a prior and supervision signal for optimizing point-based scene representations, leading to improved quality and more robust optimization. Next, we develop a novel pruning technique reducing the overall point count while maintaining high quality, leading to smaller and more compact scene representations with faster inference speeds. Finally, we propose a novel test-time filtering approach that further accelerates rendering and allows to scale to larger, house-sized scenes. We find that our method enables state-of-the-art synthesis of complex captures at 900+ FPS. ? Paper | 프로젝트 페이지
20. Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Authors : Guangchi Fang, Bing Wang
추상적인
In this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification including blur split and depth reinitialization, and simplification through Gaussian binarization and sampling. These techniques reorganize the spatial positions of the Gaussians, resulting in significant improvements across various datasets and benchmarks in terms of rendering quality, resource consumption, and storage compression. Our proposed Mini-Splatting method integrates seamlessly with the original rasterization pipeline, providing a strong baseline for future research in Gaussian-Splatting-based works. ? 종이
21. Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Authors : Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao
추상적인
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To this end, we propose a novel method, named Pixel-GS, to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive experiments both qualitatively and quantitatively demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time rendering speed, on the challenging Mip-NeRF 360 and Tanks & Temples datasets. ? 종이
22. Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections
Authors : Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang
추상적인
Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to previous methods, with a 1000× increase in rendering speed. ? Paper | Project Page | Code (not yet) | ? Short Presentation
23. GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
Authors : Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, Bo Dai
추상적인
Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural implicit surfaces is sparked from the success of neural rendering. Current works either constrain the distribution of density fields or the shape of primitives, resulting in degraded rendering quality and flaws on the learned scene surfaces. The efficacy of such methods is limited by the inherent constraints of the chosen neural representation, which struggles to capture fine surface details, especially for larger, more intricate scenes. To address these issues, we introduce GSDF, a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting (3DGS) representation with neural Signed Distance Fields (SDF). The core idea is to leverage and enhance the strengths of each branch while alleviating their limitation through mutual guidance and joint supervision. We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions, and at the meantime benefits 3DGS rendering with structures that are more aligned with the underlying geometry. ? Paper | Project Page | Code (not yet)
24. Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
Authors : Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai
추상적인
The recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularly noticeable in zoom-out views and can lead to inconsistent rendering speeds in scenes with varying details. Moreover, it often struggles to capture the corresponding level of details at different scales with its heuristic density control operation. Inspired by the Level-of-Detail (LOD) techniques, we introduce Octree-GS, featuring an LOD-structured 3D Gaussian approach supporting level-of-detail decomposition for scene representation that contributes to the final rendering results. Our model dynamically selects the appropriate level from the set of multi-resolution anchor points, ensuring consistent rendering performance with adaptive LOD adjustments while maintaining high-fidelity rendering results. ? Paper | Project Page | Code (not yet)
25. SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing
Authors : Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao
추상적인
In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance. The core technique is to apply 2D scale-adaptive filters to each Gaussian during test time. As pointed out by Mip-Splatting, observing Gaussians at different frequencies leads to mismatches between the Gaussian scales during training and testing. Mip-Splatting resolves this issue using 3D smoothing and 2D Mip filters, which are unfortunately not aware of testing frequency. In this work, we show that a 2D scale-adaptive filter that is informed of testing frequency can effectively match the Gaussian scale, thus making the Gaussian primitive distribution remain consistent across different testing frequencies. When scale inconsistency is eliminated, sampling rates smaller than the scene frequency result in conventional jaggedness, and we propose to integrate the projected 2D Gaussian within each pixel during testing. This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting. Through extensive experiments using various settings and both bounded and unbounded scenes, we show SA-GS performs comparably with or better than Mip-Splatting. Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. ? Paper | 프로젝트 페이지 | 암호
26. Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
추상적인
Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis. Our method optimises 3D Gaussian primitives to accurately model the object's geometry at points of contact. By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map. Touch is particularly useful when considering non-Lambertian objects (eg shiny or reflective surfaces) since contemporary methods tend to fail to reconstruct with fidelity specular highlights. By combining vision and tactile sensing, we achieve more accurate geometry reconstructions with fewer images than prior methods. We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach, offering significant improvements in reconstruction quality. ? 종이
27. 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
추상적인
In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS. The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized. First, we introduce a differentiable SDF-to-opacity transformation function that converts SDF values into corresponding Gaussians' opacities. This function connects the SDF and 3D Gaussians, allowing for unified optimization and enforcing surface constraints on the 3D Gaussians. During learning, optimizing the 3D Gaussians provides supervisory signals for SDF learning, enabling the reconstruction of intricate details. However, this only provides sparse supervisory signals to the SDF at locations occupied by Gaussians, which is insufficient for learning a continuous SDF. Then, to address this limitation, we incorporate volumetric rendering and align the rendered geometric attributes (depth, normal) with those derived from 3D Gaussians. This consistency regularization introduces supervisory signals to locations not covered by discrete 3D Gaussians, effectively eliminating redundant surfaces outside the Gaussian sampling range. Our extensive experimental results demonstrate that our 3DGSR method enables high-quality 3D surface reconstruction while preserving the efficiency and rendering quality of 3DGS. Besides, our method competes favorably with leading surface reconstruction techniques while offering a more efficient learning process and much better rendering qualities. ? Paper | Code (not yet)
28. Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
추상적인
3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. ? 종이
29. OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
추상적인
Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. To benefit the research community, the code will be made publicly available once the paper is published. ? 종이
30. Robust Gaussian Splatting
Authors : François Darmon, Lorenzo Porzi, Samuel Rota-Bulò, Peter Kontschieder
추상적인
In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines. ? 종이
31. DeblurGS: Gaussian Splatting for Camera Motion Blur
Authors : Jeongtaek Oh, Jaeyoung Chung, Dongwoo Lee, Kyoung Mu Lee
추상적인
Although significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to realworld applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challenge, we propose DeblurGS, a method to optimize sharp 3D Gaussian Splatting from motion-blurred images, even with the noisy camera pose initialization. We restore a fine-grained sharp scene by leveraging the remarkable reconstruction capability of 3D Gaussian Splatting. Our approach estimates the 6-Degree-of-Freedom camera motion for each blurry observation and synthesizes corresponding blurry renderings for the optimization process. Furthermore, we propose Gaussian Densification Annealing strategy to prevent the generation of inaccurate Gaussians at erroneous locations during the early training stages when camera motion is still imprecise. Comprehensive experiments demonstrate that our DeblurGS achieves state-of-the-art performance in deblurring and novel view synthesis for real-world and synthetic benchmark datasets, as well as field-captured blurry smartphone videos. ? 종이
32. StylizedGS: Controllable Stylization for 3D Gaussian Splatting
Authors : Dingxi Zhang, Zhuoxun Chen, Yu-Jie Yuan, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, Lin Gao
추상적인
With the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user experience and the implicit nature limits its ability to transfer the geometric pattern styles. Additionally, the ability for artists to exert flexible control over stylized scenes is considered highly desirable, fostering an environment conducive to creative exploration. In this paper, we introduce StylizedGS, a 3D neural style transfer framework with adaptable control over perceptual factors based on 3D Gaussian Splatting (3DGS) representation. The 3DGS brings the benefits of high efficiency. We propose a GS filter to eliminate floaters in the reconstruction which affects the stylization effects before stylization. Then the nearest neighbor-based style loss is introduced to achieve stylization by fine-tuning the geometry and color parameters of 3DGS, while a depth preservation loss with other regularizations is proposed to prevent the tampering of geometry content. Moreover, facilitated by specially designed losses, StylizedGS enables users to control color, stylized scale and regions during the stylization to possess customized capabilities. Our method can attain high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls. Extensive experiments across various scenes and styles demonstrate the effectiveness and efficiency of our method concerning both stylization quality and inference FPS. ? 종이
33. LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field
Authors : Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He
추상적인
Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. ? Paper | Project Page | 암호
34. GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting
Authors : Junseo Lee, Seokwon Lee, Jungi Lee, Junyong Park, Jaewoong Sim
추상적인
This paper presents GSCore, a hardware acceleration unit that efficiently executes the rendering pipeline of 3D Gaussian Splatting with algorithmic optimizations. GSCore builds on the observations from an in-depth analysis of Gaussian-based radiance field rendering to enhance computational efficiency and bring the technique to wide adoption. In doing so, we present several optimization techniques, Gaussian shape-aware intersection test, hierarchical sorting, and subtile skipping, all of which are synergistically integrated with GSCore. We implement the hardware design of GSCore, synthesize it using a commercial 28nm technology, and evaluate the performance across a range of synthetic and real-world scenes with varying image resolutions. Our evaluation results show that GSCore achieves a 15.86× speedup on average over the mobile consumer GPU with a substantially smaller area and lower energy consumption. ? Paper | ? Short Presentation
2023년:
1. Mip-Splatting Alias-free 3D Gaussian Splatting
Authors : Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger
추상적인
Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, eg, by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our comprehensive evaluation, including scenarios such as training on single-scale images and testing on multiple scales, validates the effectiveness of our approach. ? Paper | Project Page | 암호
2. Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing
Authors : Jian Gao, Chun Gu, Youtian Lin, Hao Zhu, Xun Cao, Li Zhang, Yao Yao
추상적인
We present a novel differentiable point-based rendering framework for material and lighting decomposition from multi-view images, enabling editing, ray-tracing, and real-time relighting of the 3D point cloud. Specifically, a 3D scene is represented as a set of relightable 3D Gaussian points, where each point is additionally associated with a normal direction, BRDF parameters, and incident lights from different directions. To achieve robust lighting estimation, we further divide incident lights of each point into global and local components, as well as view-dependent visibilities. The 3D scene is optimized through the 3D Gaussian Splatting technique while BRDF and lighting are decomposed by physically-based differentiable rendering. Moreover, we introduce an innovative point-based ray-tracing approach based on the bounding volume hierarchy for efficient visibility baking, enabling real-time rendering and relighting of 3D Gaussian points with accurate shadow effects. Extensive experiments demonstrate improved BRDF estimation and novel view rendering results compared to state-of-the-art material estimation approaches. Our framework showcases the potential to revolutionize the mesh-based graphics pipeline with a relightable, traceable, and editable rendering pipeline solely based on point cloud. ? Paper | Project Page | 암호
3. [CVPR '24] GS-IR: 3D Gaussian Splatting for Inverse Rendering
Authors : Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia
추상적인
We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (eg NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (eg rasterization and splatting) cannot trace the occlusion like backward mapping (eg ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes. ? Paper | 프로젝트 페이지 | Code (not yet)
4. [CVPR '24] Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Authors : Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee
추상적인
3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13%-66% PSNR and 160%-2400% rendering speed improvement at 4×-128× scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splatting. ? 종이
5. [CVPR '24] GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Authors : Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, Yuexin Ma
추상적인
The advent of neural 3D Gaussians has recently brought about a revolution in the field of neural rendering, facilitating the generation of high-quality renderings at real-time speeds. However, the explicit and discrete representation encounters challenges when applied to scenes featuring reflective surfaces. In this paper, we present GaussianShader, a novel method that applies a simplified shading function on 3D Gaussians to enhance the neural rendering in scenes with reflective surfaces while preserving the training and rendering efficiency. The main challenge in applying the shading function lies in the accurate normal estimation on discrete 3D Gaussians. Specifically, we proposed a novel normal estimation framework based on the shortest axis directions of 3D Gaussians with a delicately designed loss to make the consistency between the normals and the geometries of Gaussian spheres. Experiments show that GaussianShader strikes a commendable balance between efficiency and visual quality. Our method surpasses Gaussian Splatting in PSNR on specular object datasets, exhibiting an improvement of 1.57dB. When compared to prior works handling reflective surfaces, such as Ref-NeRF, our optimization time is significantly accelerated (23h vs. 0.58h). Please click on our project website to see more results. ? Paper | Project Page | 암호
6. [CVPR '24] Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Authors : Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai
추상적인
Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum. Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed. ? Paper | Project Page | Code https://github.com/maturk/dn-splatter
7. Deblurring 3D Gaussian Splatting
Authors : Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park
추상적인
Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. ? Paper | Project Page | Code (not yet)
8. GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization
Authors : Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jingtuo Liu, Liangjun Zhang, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang
추상적인
This paper presents GIR, a 3D Gaussian Inverse Rendering method for relightable scene factorization. Compared to existing methods leveraging discrete meshes or neural implicit fields for inverse rendering, our method utilizes 3D Gaussians to estimate the material properties, illumination, and geometry of an object from multi-view images. Our study is motivated by the evidence showing that 3D Gaussian is a more promising backbone than neural fields in terms of performance, versatility, and efficiency. In this paper, we aim to answer the question: "How can 3D Gaussian be applied to improve the performance of inverse rendering?" To address the complexity of estimating normals based on discrete and often in-homogeneous distributed 3D Gaussian representations, we proposed an efficient self-regularization method that facilitates the modeling of surface normals without the need for additional supervision. To reconstruct indirect illumination, we propose an approach that simulates ray tracing. Extensive experiments demonstrate our proposed GIR's superior performance over existing methods across multiple tasks on a variety of widely used datasets in inverse rendering. This substantiates its efficacy and broad applicability, highlighting its potential as an influential tool in relighting and reconstruction. ? Paper | Project Page | Code (not yet)
9. Gaussian Splatting with NeRF-based Color and Opacity
Authors : Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir Tadeja, Przemysław Spurek
추상적인
Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar renders quality with faster training and inference as it does not need neural networks to work. We encode information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (ie means of Gaussian), shape (ie covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity. Consequently, our model better describes shadows, light reflections, and transparency of 3D objects. ? Paper | 암호
리뷰:
2024년:
1. Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human
Authors : Song Bai, Jie Li
추상적인
While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future. ? 종이
2. A Survey on 3D Gaussian Splatting
Authors : Guikun Chen, Wenguan Wang
추상적인
3D Gaussian splatting (3D GS) has recently emerged as a transformative technique in the explicit radiance field and computer graphics landscape. This innovative approach, characterized by the utilization of millions of 3D Gaussians, represents a significant departure from the neural radiance field (NeRF) methodologies, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS, with its explicit scene representations and differentiable rendering algorithms, not only promises real-time rendering capabilities but also introduces unprecedented levels of control and editability. This positions 3D GS as a potential game-changer for the next generation of 3D reconstruction and representation. In the present paper, we provide the first systematic overview of the recent developments and critical contributions in the domain of 3D GS. We begin with a detailed exploration of the underlying principles and the driving forces behind the advent of 3D GS, setting the stage for understanding its significance. A focal point of our discussion is the practical applicability of 3D GS. By facilitating real-time performance, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond. This is complemented by a comparative analysis of leading 3D GS models, evaluated across various benchmark tasks to highlight their performance and practical utility. The survey concludes by identifying current challenges and suggesting potential avenues for future research in this domain. Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in applicable and explicit radiance field representation. ? 종이
3. 3D Gaussian as a New Vision Era: A Survey
Authors : Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, Ying He
추상적인
3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural networks, such as Neural Radiance Fields (NeRF). This technique has found diverse applications in areas such as robotics, urban mapping, autonomous navigation, and virtual reality/augmented reality, just name a few. Given the growing popularity and expanding research in 3D Gaussian Splatting, this paper presents a comprehensive survey of relevant papers from the past year. We organize the survey into taxonomies based on characteristics and applications, providing an introduction to the theoretical underpinnings of 3D Gaussian Splatting. Our goal through this survey is to acquaint new researchers with 3D Gaussian Splatting, serve as a valuable reference for seminal works in the field, and inspire future research directions, as discussed in our concluding section. ? 종이
4. Neural Fields in Robotics: A Survey
Authors : Muhammad Zubair Irshad, Mauro Comi, Yen-Chen Lin, Nick Heppert, Abhinav Valada, Zsolt Kira, Rares Ambrus, Johnathan Trembley
추상적인
Neural Fields have emerged as a transformative approach for 3D scene representation in computer vision and robotics, enabling accurate inference of geometry, 3D semantics, and dynamics from posed 2D data. Leveraging differentiable rendering, Neural Fields encompass both continuous implicit and explicit neural representations enabling high-fidelity 3D reconstruction, integration of multi-modal sensor data, and generation of novel viewpoints. This survey explores their applications in robotics, emphasizing their potential to enhance perception, planning, and control. Their compactness, memory efficiency, and differentiability, along with seamless integration with foundation and generative models, make them ideal for real-time applications, improving robot adaptability and decision-making. This paper provides a thorough review of Neural Fields in robotics, categorizing applications across various domains and evaluating their strengths and limitations, based on over 200 papers. First, we present four key Neural Fields frameworks: Occupancy Networks, Signed Distance Fields, Neural Radiance Fields, and Gaussian Splatting. Second, we detail Neural Fields' applications in five major robotics domains: pose estimation, manipulation, navigation, physics, and autonomous driving, highlighting key works and discussing takeaways and open challenges. Finally, we outline the current limitations of Neural Fields in robotics and propose promising directions for future research. ? 종이
5. How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey
Authors : Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi
추상적인
Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges. ? 종이
6. Recent Advances in 3D Gaussian Splatting
Authors : Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao
추상적인
The emergence of 3D Gaussian Splatting (3DGS) has greatly accelerated the rendering speed of novel view synthesis. Unlike neural implicit representations like Neural Radiance Fields (NeRF) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian Splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into 이미지. Apart from the fast rendering speed, the explicit representation of 3D Gaussian Splatting facilitates editing tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid change and growing number of works in this field, we present a literature review of recent 3D Gaussian Splatting methods, which can be roughly classified into 3D reconstruction, 3D editing, and other downstream applications by functionality. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian Splatting are also illustrated for a better understanding of this technique. This survey aims to help beginners get into this field quickly and provide experienced researchers with a comprehensive overview, which can stimulate the future development of the 3D Gaussian Splatting representation. ? 종이
7. Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
Authors : Anurag Dalal, Daniel Hagen, Kjell G. Robbersmyr, Kristian Muri Knausgård
추상적인
Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting. ? 종이
쾅:
2024년:
1. SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Authors : Mingrui Li, Shuhong Liu, Heng Zhou
추상적인
Semantic understanding plays a crucial role in Dense Simultaneous Localization and Mapping (SLAM), facilitating comprehensive scene interpretation. Recent advancements that integrate Gaussian Splatting into SLAM systems have demonstrated its effectiveness in generating high-quality renderings through the use of explicit 3D Gaussian representations. Building on this progress, we propose SGS-SLAM, the first semantic dense visual SLAM system grounded in 3D Gaussians, which provides precise 3D semantic segmentation alongside high-fidelity reconstructions. Specifically, we propose to employ multi-channel optimization during the mapping process, integrating appearance, geometric, and semantic constraints with key-frame optimization to enhance reconstruction quality. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and semantic segmentation, outperforming existing methods meanwhile preserving real-time rendering ability. ? 종이
2. SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM
Authors : Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang
추상적인
We propose SemGauss-SLAM, the first semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering in real-time. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift and improve reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to more robust tracking and consistent mapping. Our SemGauss-SLAM method demonstrates superior performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in novel-view semantic synthesis and 3D semantic mapping. ? 종이
3. Compact 3D Gaussian Splatting For Dense Visual SLAM
Authors : Tianchen Deng, Yaohui Chen, Leyan Zhang, Jianfei Yang, Shenghai Yuan, Danwei Wang, Weidong Chen
추상적인
Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, ie, the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation. ? 종이
4. NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting
Authors : Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie
추상적인
We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping. ? 종이
5. High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization
Authors : Shuo Sun, Malcolm Mielle, Achim J. Lilienthal, Martin Magnusson
추상적인
We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM. ? 종이
6. RGBD GS-ICP SLAM
Authors : Seongbo Ha, Jiung Yeon, Hyeonwoo Yu
추상적인
Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map. ? 종이 | 코드 | ? Short Presentation
7. EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting
Authors : Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, Wei Shen
추상적인
Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Mapping) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperative applications among endoscopic surgeries. In this paper, we introduce EndoGSLAM, an efficient SLAM approach for endoscopic surgeries, which integrates streamlined Gaussian representation and differentiable rasterization to facilitate over 100 fps rendering speed during online camera tracking and tissue reconstructing. Extensive experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches, showing tremendous potential for endoscopic surgeries ? Paper | Project Page | Code (not yet)
8. CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Authors : Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui
추상적인
Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, ie, CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available. ? Paper | Project Page | Code (not yet)
9. MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
Authors : Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu
추상적인
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. ? Paper | Project Page | Code (not yet)
10. Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting
Authors : Xiaolei Lang, Laijian Li, Hang Zhang, Feng Xiong, Mu Xu, Yong Liu, Xingxing Zuo, Jiajun Lv
추상적인
We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems. ? 종이
11. RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting
Authors : Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou
추상적인
We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking 정확성. ? Paper | Project Page | 암호
12. [3DV '25] LoopSplat: Loop Closure by Registering 3D Gaussian Splats
Authors : Liyuan Zhu, Yue Li, Erik Sandström, Shengyu Huang, Konrad Schindler, Iro Armeni
추상적인
Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM. ? 종이 | Project Page | 암호
13. MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation
Authors : Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu
추상적인
Emerging 3D scene representations, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated their effectiveness in Simultaneous Localization and Mapping (SLAM) for photo-realistic rendering, particularly when using high-quality video sequences as input. However, existing methods struggle with motion-blurred frames, which are common in real-world scenarios like low-light or long-exposure conditions. This often results in a significant reduction in both camera localization accuracy and map reconstruction quality. To address this challenge, we propose a dense visual SLAM pipeline (ie MBA-SLAM) to handle severe motion-blurred inputs. Our approach integrates an efficient motion blur-aware tracker with either neural radiance fields or Gaussian Splatting based mapper. By accurately modeling the physical image formation process of motion-blurred images, our method simultaneously learns 3D scene representation and estimates the cameras' local trajectory during exposure time, enabling proactive compensation for motion blur caused by camera movement. In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach. ? Paper | Project Page | Code (not yet)
2023년:
1. [CVPR '24] GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Authors : Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, Xuelong Li
추상적인
In this paper, we introduce GS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. The source code will be released upon acceptance. ? 종이
2. [CVPR '24] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Authors : Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten
추상적인
Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2× state-of-theart performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D 지도. ? Paper | Project Page | 코드 | ? Explanation Video
3. [CVPR '24] Gaussian Splatting SLAM
Authors : Hidenobu Matsuki, Riku Murai, Paul HJ Kelly, Andrew J. Davison
추상적인
We present the first application of 3D Gaussian Splatting to incremental 3D reconstruction using a single moving monocular or RGB-D camera. Our Simultaneous Localisation and Mapping (SLAM) method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation, but also reconstruction of tiny and even transparent objects. ? Paper | Project Page | 코드 | ? Short Presentation
4. Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Authors : Vladimir Yugay, Yue Li, Theo Gevers, Martin R. Oswald
추상적인
We present the first neural RGBD SLAM method capable of photorealistically reconstructing real-world scenes. Despite modern SLAM methods achieving impressive results on synthetic datasets, they still struggle with real-world datasets. Our approach utilizes 3D Gaussians as a primary unit for our scene representation to overcome the limitations of the previous methods. We observe that classical 3D Gaussians are hard to use in a monocular setup: they can't encode accurate geometry and are hard to optimize with single-view sequential supervision. By extending classical 3D Gaussians to encode geometry, and designing a novel scene representation and the means to grow, and optimize it, we propose a SLAM system capable of reconstructing and rendering real-world datasets without compromising on speed and efficiency. We show that Gaussian-SLAM can reconstruct and photorealistically render real-world scenes. We evaluate our method on common synthetic and real-world datasets and compare it against other state-of-the-art SLAM methods. Finally, we demonstrate, that the final 3D scene representation that we obtain can be rendered in Real-time thanks to the efficient Gaussian Splatting rendering. ? Paper | Project Page | 코드 | ? Short Presentation
5. [CVPR '24] Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Authors : Huajian Huang, Longwei Li, Hui Cheng, Sai-Kit Yeung
추상적인
The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, eg, PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications. ? 종이 | Project Page | 암호
부족한:
2024년:
1. [CVPR '24] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Authors : Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu
추상적인
Radiance fields have demonstrated impressive performance in synthesizing novel views from sparse input views, yet prevailing methods suffer from high training costs and slow inference speed. This paper introduces DNGaussian, a depth-regularized framework based on 3D Gaussian radiance fields, offering real-time and high-quality few-shot novel view synthesis at low costs. Our motivation stems from the highly efficient representation and surprising quality of the recent 3D Gaussian Splatting, despite it will encounter a geometry degradation when input views decrease. In the Gaussian radiance fields, we find this degradation in scene geometry primarily lined to the positioning of Gaussian primitives and can be mitigated by depth constraint. Consequently, we propose a Hard and Soft Depth Regularization to restore accurate scene geometry under coarse monocular depth supervision while maintaining a fine-grained color appearance. To further refine detailed geometry reshaping, we introduce Global-Local Depth Normalization, enhancing the focus on small local depth changes. Extensive experiments on LLFF, DTU, and Blender datasets demonstrate that DNGaussian outperforms state-of-the-art methods, achieving comparable or better results with significantly reduced memory cost, a 25× reduction in training time, and over 3000× faster rendering speed. ? Paper | Project Page | 코드 | ? Short Presentation
2. Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting
Authors : Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III
추상적인
In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. ? Paper | 프로젝트 페이지
3. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Authors : Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
추상적인
We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane sweeping in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian primitives' opacities, covariances, and spherical harmonics coefficients jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussian Splatting models via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, our model achieves state-of-the-art performance with the fastest feed-forward inference speed (22 fps). Compared to the latest state-of-the-art method pixelSplat, our model uses 10× fewer parameters and infers more than 2× faster while providing higher appearance and geometry quality as well as better cross-dataset generalization. ? Paper | Project Page | 암호
4. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Authors : Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen
추상적인
We present latentSplat, a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. Existing methods for generalizable 3D reconstruction either do not enable fast inference of high resolution novel views due to slow volume rendering, or are limited to interpolation of close input views, even in simpler settings with a single central object, where 360-degree generalization is possible . In this work, we combine a regression-based approach with a generative model, moving towards both of these capabilities within the same method, trained purely on readily available real video data. The core of our method are variational 3D Gaussians, a representation that efficiently encodes varying uncertainty within a latent space consisting of 3D feature Gaussians. From these Gaussians, specific instances can be sampled and rendered via efficient Gaussian splatting and a fast, generative decoder network. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data. ? Paper | 프로젝트 페이지 | 암호
5. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Authors : Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein
추상적인
We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, ie, text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. ? Paper | Project Page | 암호
6. Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Authors : Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang
추상적인
We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU. ? 종이
7. CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Authors : Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari
추상적인
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (eg, 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes. ? Paper | 프로젝트 페이지
8. InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds
Authors : Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang
추상적인
While novel view synthesis (NVS) has made substantial progress in 3D computer vision, it typically requires an initial estimation of camera intrinsics and extrinsics from dense viewpoints. This pre-processing is usually conducted via a Structure-from-Motion (SfM) pipeline, a procedure that can be slow and unreliable, particularly in sparse-view scenarios with insufficient matched features for accurate reconstruction. In this work, we integrate the strengths of point-based representations (eg, 3D Gaussian Splatting, 3D-GS) with end-to-end dense stereo models (DUSt3R) to tackle the complex yet unresolved issues in NVS under unconstrained settings, which encompasses pose-free and sparse view challenges. Our framework, InstantSplat, unifies dense stereo priors with 3D-GS to build 3D Gaussians of large-scale scenes from sparseview & pose-free images in less than 1 minute. Specifically, InstantSplat comprises a Coarse Geometric Initialization (CGI) module that swiftly establishes a preliminary scene structure and camera parameters across all training views, utilizing globally-aligned 3D point maps derived from a pre-trained dense stereo pipeline. This is followed by the Fast 3D-Gaussian Optimization (F-3DGO) module, which jointly optimizes the 3D Gaussian attributes and the initialized poses with pose regularization. Experiments conducted on the large-scale outdoor Tanks & Temples datasets demonstrate that InstantSplat significantly improves SSIM (by 32%) while concurrently reducing Absolute Trajectory Error (ATE) by 80%. These establish InstantSplat as a viable solution for scenarios involving posefree and sparse-view conditions. ? Paper | Project Page | 코드 | ? Explanation Video
9. Sp 2 360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
Authors : Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen
추상적인
We aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diffusion models can strongly improve the reconstruction of a scene with low-cost fine-tuning. Specifically, we present SparseSplat360 (Sp2360), a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Due to superior training and rendering speeds, we use an explicit scene representation in the form of 3D Gaussians over NeRF-based implicit representations. We propose an iterative update strategy to fuse generated pseudo novel views with existing 3D Gaussians fitted to the initial sparse inputs. As a result, we obtain a multi-view consistent scene representation with details coherent with the observed inputs. Our evaluation on the challenging Mip-NeRF360 dataset shows that our proposed 2D to 3D distillation algorithm considerably improves the performance of a regularized version of 3DGS adapted to a sparse-view setting and outperforms existing sparse-view reconstruction methods in 360 scene reconstruction. Qualitatively, our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail. ? Paper | Code (not yet)
2023년:
1. SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting
Authors : Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi
추상적인
The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360 scenes from sparse training views. We find that using naive depth priors is not sufficient and integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by up to 30.5% and NeRF-based methods by up to 15.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost. ? Paper | Project Page | Code (not yet)
2. FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Authors : Zehao Zhu, Zhiwen Fan, Yifan Jiang, Zhangyang Wang
추상적인
Novel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a few-shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender ? Paper | Project Page | 암호
3. [CVPR '24] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
Authors : David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann
추상적인
We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance 필드. ? Paper | Project Page | 암호
4. [CVPR '24] Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Authors : Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi
추상적인
We introduce the Splatter Image, an ultra-fast approach for monocular 3D object reconstruction which operates at 38 FPS. Splatter Image is based on Gaussian Splatting, which has recently brought real-time rendering, fast training, and excellent scaling to multi-view reconstruction. For the first time, we apply Gaussian Splatting in a monocular reconstruction setting. Our approach is learning-based, and, at test time, reconstruction only requires the feed-forward evaluation of a neural network. The main innovation of Splatter Image is the surprisingly straightforward design: it uses a 2D image-to-image network to map the input image to one 3D Gaussian per pixel. The resulting Gaussians thus have the form of an image, the Splatter Image. We further extend the method to incorporate more than one image as input, which we do by adding cross-view attention. Owning to the speed of the renderer (588 FPS), we can use a single GPU for training while generating entire images at each iteration in order to optimize perceptual metrics like LPIPS. On standard benchmarks, we demonstrate not only fast reconstruction but also better results than recent and much more expensive baselines in terms of PSNR, LPIPS, and other metrics. ? Paper | Project Page | 코드 | ? Short Presentation
항해:
2024년:
1. GaussNav: Gaussian Splatting for Visual Navigation
Authors : Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li
추상적인
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. ? Paper | Project Page | 암호
2. 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization
Authors : Peng Jiang, Gaurav Pandey, Srikanth Saripalli
추상적인
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset. ? 종이
3. Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF
Authors : Guangyi Liu, Wen Jiang, Boshu Lei, Vivek Pandey, Kostas Daniilidis, Nader Motee
추상적인
This work proposes a novel approach to bolster both the robot's risk assessment and safety measures while deepening its understanding of 3D scenes, which is achieved by leveraging Radiance Field (RF) models and 3D Gaussian Splatting. To further enhance these capabilities, we incorporate additional sampled views from the environment with the RF model. One of our key contributions is the introduction of Risk-aware Environment Masking (RaEM), which prioritizes crucial information by selecting the next-best-view that maximizes the expected information gain. This targeted approach aims to minimize uncertainties surrounding the robot's path and enhance the safety of its navigation. Our method offers a dual benefit: improved robot safety and increased efficiency in risk-aware 3D scene reconstruction and understanding. Extensive experiments in real-world scenarios demonstrate the effectiveness of our proposed approach, highlighting its potential to establish a robust and safety-focused framework for active robot exploration and 3D scene understanding. ? 종이
4. 3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration
Authors : Quentin Herau, Moussab Bennehar, Arthur Moreau, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux
추상적인
Reliable multimodal sensor fusion algorithms re- quire accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high compu- tational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new ren- dering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset. ? 종이
5. HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Authors : Zhuopeng Li, Yilin Zhang, Chenming Wu, Jianke Zhu, Liangjun Zhang
추상적인
The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas. To overcome these challenges, we propose a hybrid optimization method named HO-Gaussian, which combines a grid-based volume with the 3DGS pipeline. HO-Gaussian eliminates the dependency on SfM point initialization, allowing for rendering of urban scenes, and incorporates the Point Densitification to enhance rendering quality in problematic regions during training. Furthermore, we introduce Gaussian Direction Encoding as an alternative for spherical harmonics in the rendering pipeline, which enables view-dependent color representation. To account for multi-camera systems, we introduce neural warping to enhance object consistency across different cameras. Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets. ? 종이
6. SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Authors : Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun
추상적인
Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views. ? 종이
7. BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting
Authors : Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang
추상적인
Image-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. 이러한 한계를 해결하기 위해 우리는 제안합니다