很棒的 3D 高斯濺鍍資源
專注於 3D 高斯潑濺的論文和開源資源精選列表,旨在跟上未來幾個月預期的研究激增。如果您有任何補充或建議,請隨時貢獻。也歡迎其他資源,如部落格文章、影片等。
目錄
- 3D 物體檢測
- 自動駕駛
- 頭像
- 經典之作
- 壓縮
- 擴散
- 動力學和變形
- 編輯
- 語言嵌入
- 網格提取和物理
- 雜項
- 正規化和優化
- 渲染
- 評論
- SLAM
- 疏
- 導航與自動駕駛
- 姿勢
- 大規模
- 開源實現
- 參考
- 非官方實施
- 二維高斯潑濺
- 遊戲引擎
- 觀眾
- 公用事業
- 教學
- 框架
- 其他
更新日誌:
2024 年 10 月 24 日
2024 年 10 月 16 日
2024 年 9 月 7 日
2024 年 5 月 10 日
- 新增了18 篇論文:Z-Splat、Dual-Camera、StylizedGS、Hash3D、Revisiting Densification、Gaussian Pancakes、3D-aware Deformable Gaussians、SpikeNVS、零樣本PC 完成、SplatPose、DreamScene360、Realm-Dmer、Reuss GGS 、GoMAvatar、OccGaussian、LoopGaussian、評論
2024 年 4 月 11 日
2024 年 4 月 9 日
2024 年 4 月 8 日
- 新增了 3 篇論文:Robust Gaussian Splatting、SC4D 和 MM-Gaussian
2024 年 4 月 5 日
- 新增了 5 篇論文:Surface Reconstruction、TCLC-GS、GaSpCT、OmniGS 和 Per-Gaussian Embedding,
- 修復
2024 年 4 月 2 日
- 新增了 11 篇論文:HO、SGD、HGS、Snap-it、InstantSplat、3DGSR、MM3DGS、HAHA、CityGaussain、Mirror-3DGS 和 Feature Splatting
2024 年 3 月 30 日
- 新增了 8 篇論文:建模不確定性、GRM、Gamba、CoherentGS、TOGS、SA-GS 和 GaussianCube
2024 年 3 月 27 日
- 新增了其他實作:360-gaussian-splatting
- 新增了 CVPR '24 標籤
- 新增了 5 篇論文:Comp4D、DreamPolisher、DN-Splatter、2D GS 和 Octree-GS
2024 年 3 月 26 日
- 新增了 13 篇論文:latentSplat、GS on the Move、RadSplat、Mini-Splatting、SyncTweedies、HAC、STAG4D、EndoGSLAM、Pixel-GS、Semantic Gaussians、Gaussian in the Wild、CG-SLAM 和 GSDF
2024 年 3 月 24 日:
2024 年 3 月 20 日:
- 新增了 4 篇論文:GVGEN、HUGS、RGBD GS-ICP SLAM 和 High-Fidelity SLAM
2024 年 3 月 19 日:
- 新增了點面
- 新增了原作者的 3DGS 教程
- 添加了GauStudio
- 新增了23 篇論文:Touch-GS、GGRt、FDGaussian、SWAG、Den-SOFT、Gaussian-Flow、View-Consistent 3D Editing、BAGS、GeoGaussian、GS-Pose、Analytic-Splatting、Seamless 3D Maps、Texture-GS、 Recent Advances 3DGS、用於密集視覺 SLAM 的緊湊型 3DGS、BrightDreamer、3DGS-Reloc、Beyond Uncertainty、運動感知 3DGS、Fed3DGS、GaussNav、3DGS-Calib 和 NEDS-SLAM
2024 年 3 月 17 日:
- 更新 3DGS.cpp 的儲存庫名稱和連結(最初為 VulkanSplatting)
2024 年 3 月 16 日:
- 斯普拉特電視
- 新增了 6 篇論文:GaussianGrasper、新的分割演算法、Controllable Text-to-3D Generation、Spring-Mass 3DGS、Hyper-3DGS 和 DreamScene
2024 年 3 月 14 日:
- 新增了 6 篇論文:SemGauss、StyleGaussian、Gaussian Splatting in Style、GaussCtrl、GaussianImage 和 RAIN-GS
2024 年 3 月 8 日:
- 教學:如何捕捉 3DGS 影像
- 新增了 6 篇論文:SplattingAvatar、DNGaussian、Radiative Gaussians、BAGS、GSEdit 和 ManiGaussian
2024 年 3 月 8 日:
2024 年 3 月 6 日:
2024 年 3 月 5 日:
- 新增了 1 篇論文:3DGStream
- 程式碼發布
- 新增了新檢視器
2024 年 3 月 2 日:
- 新增了 1 篇論文:動畫和紋理的 3D 高斯模型
- 新部分:同時教授 3DGS 的課程。
2024 年 2 月 28 日:
2024 年 2 月 27 日:
- 新增了 2 篇論文:Spec-Gaussian 和 GEA
- SC-GS 程式碼發布
2024 年 2 月 24 日:
- 新增了 2 篇論文:識別不必要的高斯和 Gaussian Pro
2024 年 2 月 23 日:
- 更正了 EndoGS 的作者並更新了摘要:利用高斯濺射進行可變形內視鏡組織重建
2024 年 2 月 21 日:
2024 年 2 月 20 日:
- GaussianObject程式碼發布
- 新增了一篇論文:GaussianHair
2024 年 2 月 19 日:
2024 年 2 月 16 日:
- 新增了 2 篇論文:IM-3D 和 GES
- GaMeS程式碼發布
2024 年 2 月 14 日:
- 新增了檢視器:VulkanSplatting - C++ 和 Vulkan Compute 中的跨平台高效能 3DGS 渲染器
2024 年 2 月 13 日:
- 程式碼發布:(2024 年 1 月 16 日)使用 4D 高斯潑濺進行即時真實感動態場景表示和渲染
- 新增了 3 篇論文:3DGala、ImplicitDeepFake 和 3D Gaussians as a New Vision Era。
2024 年 2 月 9 日:
2024 年 2 月 8 日:
- 新增了 3 篇論文:Rig3DGS、Mesh-based GS 和 LGM 2024 年 2 月 6 日:
- 新增了 2 篇論文:SGS-SLAM 和 4D Gaussian Splatting
2024 年 2 月 5 日:
- 將 SWAGS 移至動力學和變形部分
- 新增了 2 篇論文:GaussianObject 和 GaMeSh
- GS++ 更名為最佳投影
2024 年 2 月 2 日:
- 新增了 6 篇論文:VR-GS、Segment Anything、Gaussian Splashing、GS++、360-GS 和 StopThePop
- TRIPS 代碼發布
2024 年 1 月 30 日:
- 程式碼變更:GaussianAvatars 程式碼更改為私有
2024 年 1 月 29 日:
- 新增了 2 篇論文:LIV-GaussMap 和 TIP-Editor
2024 年 1 月 26 日:
- 刪除撤回論文:用於高保真人體運動合成的可動畫 3D 高斯
- 新增了 3 篇論文:EndoGaussians、PSAvatar 和 GauU-Scene
2024 年 1 月 25 日:
- 新增了檢視器:Splatapult - C++ 和 OpenGL 中的 3d 高斯噴射渲染器,可與 OpenXR 配合使用以實現線上 VR
2024 年 1 月 24 日:
- 新增實用程式:SideFX Houdini 的 GSOP(高斯 Splat 運算子)
- 程式碼發布:GaussianAvatars
2024 年 1 月 23 日:
- 新增了 3 篇論文:Amortized Gen3D、Deformable Endooscopy Tissues、Fastdynamic 3D Object Generation
- 程式碼發布:動畫化身、壓縮 3D 高斯、GaussianAvatar
2024 年 1 月 13 日:
- 新增了 4 篇論文:CoSSegGaussians、TRIPS、Gaussian Shadow Casting for Neural Characters 和 DISTWAR
2024 年 1 月 9 日:
- 新增 1 篇論文:A Survey on 3D Gaussian Splatting(第一次調查)
2024 年 1 月 8 日:
- 新增了 4 篇論文:SWAGS(添加了 2023 年的論文,我之前忘記添加了)、第一篇評論論文、壓縮的 3DGS 以及表徵衛星幾何的應用論文。
2024 年 1 月 7 日:
- 1 開源實作:taichi-splatting - 工作最初源自於 Taichi 3D Gaussian Splatting,並進行了重大的重新組織和更改。
2024 年 1 月 5 日:
- 新增了 3 篇論文:FMGS、PEGASUS 和 Repaint123。
2024 年 1 月 2 日:
2024 年 1 月 2 日:
- 更新了去模糊高斯論文連結。
- SAGA程式碼發布。
- 新增了 2023 年的 2 篇論文:Text2Immersion 和 2D-Guided 3DG Segmentation。
- gsplat lib 的數學補充。
- 在類別中新增年份。
- GSM 代碼發布。
2023 年 12 月 29 日:
- 新增了 1 篇論文(顯然之前漏掉了一篇):Gaussian-Head-Avatar。
- 新增了部落格文章頭像。
2023 年 12 月 29 日:
- 新增了 3 篇論文:DreamGaussian4D、4DGen 和 Spacetime Gaussian。
2023 年 12 月 27 日:
- 新增了 3 篇論文:LangSplat、Deformable 3DGS 和 Human101。
- 新增了部落格文章:3DGS 的綜合回顧。
2023 年 12 月 25 日:
- 發布了單目/多視圖動態場景代碼的高效 3D 高斯表示。
- GPS-高斯代碼發布。
2023 年 12 月 24 日:
- 新增了 2 篇論文:自組織高斯網格和高斯分裂。
- 新增了用於增強高斯渲染以建模更複雜場景的儲存庫。
2023 年 12 月 21 日:
- 新增了 3 篇論文:Splatter Image、pixelSplat 和align your gaussians。
- 高斯分組代碼發布。
2023 年 12 月 19 日:
- 新增了 2 篇論文:GAvatar 和 GauFRe。
2023 年 12 月 18 日:
- 新增了實用程式:SpectacularAI - 不同 3DGS 約定的轉換腳本。
- SuGaR 程式碼發布。
2023 年 12 月 16 日:
- 新增了 WebGL 檢視器 3:Gauzilla。
2023 年 12 月 15 日:
- 新增了 4 篇論文:DrivingGaussian、iComMa、Triplane 和 3DGS-Avatar。
- Relightable 高斯代碼發布。
2023 年 12 月 13 日:
- 新增了 5 篇論文:Gaussian-SLAM、CoGS、ASH、CF-GS 和 Photo-SLAM。
2023 年 12 月 11 日:
- 新增了 2 篇論文:Gaussian Splatting SLAM 和 3D Generation 的去噪分數。
- ScaffoldGS 程式碼已發布。
2023 年 12 月 8 日:
- 新增了 2 篇論文:EAGLES 和 MonoGaussianAvatar。
2023 年 12 月 7 日:
- LucidDreamer 程式碼已發布。
- 新增了 9 篇論文:GauHuman、HeadGaS、HiFi4G、Gaussian-Flow、Feature-3DGS、Gaussian-Avatar、FlashAvatar、Relightable 和 Deblurring Gaussians。
2023 年 12 月 5 日:
- 新增了 9 篇論文:NeuSG、GaussianHead、GaussianAvatars、GPS-Gaussian、用於單眼非剛性物件重建的神經參數高斯、SplaTAM、MANUS、Segment Any 和語言嵌入 3D 高斯。
2023 年 12 月 4 日:
- 新增了 8 篇論文:Gaussian Grouping、MD Splatting、DynMF、Scaffold-GS、SparseGS、FSGS、Control4D 和 SC-GS。
2023 年 12 月 1 日:
- 新增了 4 篇論文:Compact3D、GaussianShader、Periodic Vibration Gaussian 和 Gaussian Shell Maps for Efficient 3D Human Generation。
- 為每個類別建立了目錄並新增了換行符。
2023 年 11 月 30 日:
- 增加了虛幻遊戲引擎實現。
- 新增了 5 篇論文:LightGaussian、FisherRF、HUGS、HumanGaussian、CG3D 和 Multi Scale 3DGS。
2023 年 11 月 29 日:
- 新增了兩篇論文:Point and Move 和 IR-GS。
2023 年 11 月 28 日:
- 新增了五篇論文:GaussinEditor、Relightable Gaussians、GART、Mip-Splatting、HumanGaussian。
2023 年 11 月 27 日:
- 新增了兩篇論文:Gaussian Editing 和 Compact 3D Gaussians。
2023 年 11 月 25 日:
2023 年 11 月 22 日:
- 新增了 3 篇新的 GS 論文:Animatable、Depth-Regularized 和單目/多視圖 3DGS。
- 添加了一些經典論文。
- 另一篇 GS 論文,也稱為 LucidDreamer。
2023 年 11 月 21 日:
- 新增了 3 篇新的 GS 論文:GaussianDiffusion、LucidDreamer、PhysGaussian。
- 新增 2 篇 GS 論文:SuGaR、PhysGaussian。
2023 年 11 月 21 日:
2023 年 11 月 17 日:
- 將 PlayCanvas 實作新增至遊戲引擎部分。
2023 年 11 月 16 日:
- 發布可變形 3D 高斯代碼。
- 添加了可駕駛的 3D 高斯頭像紙。
2023 年 11 月 8 日:
- 關於 3DGS 實作和 unsive/rsal 格式討論的一些註解。
2023 年 11 月 4 日:
- 增加了 2D 高斯潑濺。
- 新增了非常詳細的(技術)部落格文章,解釋 3D 高斯潑濺。
2023 年 10 月 28 日:
- 新增了實用程式部分。
- 新增了 3DGS 轉換器,用於在 Cloud Compare to Utilities 中編輯 3DGS .ply 檔案。
- 新增了 Kapture(用於捆綁器到 colmap 模型轉換)和 Kapture 映像裁剪器腳本,以及實用程式的轉換說明。
2023 年 10 月 23 日:
- 新增了 python WebGL 檢視器 2。
- 新增了高斯潑濺(和 Unity 檢視器)視訊部落格的介紹。
2023 年 10 月 21 日:
- 新增了 python OpenGL 檢視器。
- 新增了 typescript WebGPU 檢視器。
2023 年 10 月 20 日:
- 使摘要可讀(刪除連字符)。
- 新增了 Windows 教學。
- 其他小的文字修復。
- 新增了 Jupyter 筆記本檢視器。
2023 年 10 月 19 日:
- 新增了用於即時真實感動態場景表示的 Github 頁面連結。
- 重新排列標題。
- 添加了其他非官方實作。
- 將 Nerfstudio gsplat 和 fast: C++/CUDA 移至非官方實作。
- 新增了 Nerfstudio、Blender、WebRTC、iOS 和 Metal 檢視器。
2023 年 10 月 17 日:
- GaussianDreamer 程式碼發布。
- 新增了即時真實感動態場景表示。
2023 年 10 月 16 日:
- 添加了可變形 3D 高斯紙。
- 動態 3D 高斯代碼發布。 2023 年 10 月 15 日:包含前 6 篇論文的初始清單。
介紹 3D 高斯分佈的開創性論文:
用於即時輻射場渲染的 3D 高斯噴射
作者:Bernhard Kerbl、Georgios Kopanas、Thomas Leimkühler、George Drettakis
抽象的
輻射場方法最近徹底改變了用多張照片或影片捕捉的場景的新穎視圖合成。然而,要實現高視覺品質仍然需要訓練和渲染成本高昂的神經網絡,而最近更快的方法不可避免地會犧牲速度來換取品質。對於無界且完整的場景(而不是孤立的物件)和1080p解析度渲染,目前沒有方法可以實現即時顯示速率。我們引入了三個關鍵要素,使我們能夠在保持有競爭力的訓練時間的同時實現最先進的視覺質量,並且重要的是允許在1080p 分辨率下進行高質量實時(≥ 30 fps)新視圖合成。首先,從相機校準期間產生的稀疏點開始,我們用 3D 高斯表示場景,保留連續體積輻射場的所需屬性以進行場景優化,同時避免在空白空間中進行不必要的計算;其次,我們對 3D 高斯進行交錯優化/密度控制,特別是優化各向異性協方差以實現場景的準確表示;第三,我們開發了一種快速可見性感知渲染演算法,該演算法支援各向異性潑濺,既加速訓練又允許即時渲染。我們在幾個已建立的資料集上展示了最先進的視覺品質和即時渲染。 ?紙質(低解析度)| ?紙張(高解析度)|項目頁面|代碼| ?簡短介紹 | ?解說視頻
3D 物體檢測
2024年
1. 3DGS-DET:透過邊界引導和框聚焦採樣增強 3D 高斯潑濺,以實現 3D 物體檢測
作者:曹陽、吉元良、徐丹
抽象的
神經輻射場 (NeRF) 廣泛用於新穎視圖合成,並已適用於 3D 物件偵測 (3DOD),為透過視圖合成表示進行 3D 物件偵測提供了一種有前途的方法。然而,NeRF 面臨著固有的限制:(i) 由於其隱式性質,它對 3DOD 的表示能力有限;(ii) 渲染速度慢。最近,3D 高斯分佈 (3DGS) 作為一種顯式 3D 表示形式出現,它透過更快的渲染功能解決了這些限制。受這些優點的啟發,本文首次將 3DGS 引入 3DOD,確定了兩個主要挑戰:(i)高斯斑點的空間分佈不明確 - 3DGS 主要依賴於 2D 像素級監督,導致高斯斑點的 3D 空間分佈不清晰物體和背景的區分度差,阻礙了3DOD; (ii) 過多的背景斑點-2D 影像通常包含大量背景像素,導緻密集重建的 3DGS 中含有許多代表背景的雜訊高斯斑點,對偵測產生負面影響。為了應對挑戰(i),我們利用3DGS 重建源自2D 影像的事實,並透過結合2D 邊界引導提出了一種優雅而有效的解決方案,以顯著增強高斯斑點的空間分佈,從而使物體和物體之間的區分更加清晰。為了解決挑戰(ii),我們提出了一種以框為中心的採樣策略,使用2D 框生成3D 空間中的物件機率分佈,從而允許在3D 中進行有效的機率採樣以保留更多物件斑點並減少嘈雜的背景斑點。受益於所提出的邊界引導和框聚焦採樣,我們的最終方法3DGS-DET 比我們的基本管道版本實現了顯著改進([email protected] 上+5.6,[email protected] 上+3.7),而無需引入任何額外的可學習參數。此外,3DGS-DET 顯著優於最先進的基於NeRF 的方法NeRF-Det,在ScanNet 資料集的[email protected] 上實現了+6.6 的改進,在[email protected] 上實現了+8.1 的改進,並且在ScanNet 資料集上實現了令人印象深刻的+31.5 的改進。程式碼和模型可公開取得:https://github.com/yangcaoai/3DGS-DET。 ?紙|代碼(還沒有)
自動駕駛:
2024 年:
1. 用於動態城市場景建模的街道高斯
作者:嚴雲志、林浩桐、週晨旭、王偉傑、孫海洋、詹坤、郎賢鵬、週曉偉、彭思達
抽象的
本文旨在解決利用單眼影片對動態城市街道場景進行建模的問題。最近的方法透過將履帶式車輛姿態與動畫車輛相結合來擴展 NeRF,從而實現動態城市街道場景的照片級真實感視圖合成。然而,其顯著的限制是訓練和渲染速度慢,加上對追蹤車輛姿態的高精度的迫切需求。我們引入了街道高斯,這是一種新的明確場景表示,可以解決所有這些限制。具體來說,動態城市街道被表示為一組配備語義邏輯和 3D 高斯的點雲,每個點雲與前景車輛或背景相關聯。為了對前景物體車輛的動力學進行建模,每個物體點雲都通過可優化的追蹤姿勢以及動態外觀的動態球諧函數模型進行了優化。明確表示允許輕鬆組合物件車輛和背景,從而允許在訓練半小時內以 133 FPS(1066×1600 解析度)進行場景編輯操作和渲染。所提出的方法在多個具有挑戰性的基準上進行了評估,包括 KITTI 和 Waymo Open 資料集。實驗表明,所提出的方法在所有數據集上始終優於最先進的方法。此外,儘管僅依賴現成的追蹤器的姿勢,但所提出的表示提供的性能與使用精確的地面真實姿勢所實現的性能相當。 ?紙|項目頁面|代碼(還沒有)
2. TCLC-GS:用於周圍自動駕駛場景的緊密耦合雷射雷達相機高斯潑濺
作者:趙程、孫蘇、王若愚、郭玉良、萬軍軍、黃週、黃新宇、陳英傑、劉韌
抽象的
大多數針對城市場景的基於 3D 高斯分佈 (3D-GS) 的方法直接使用 3D LiDAR 點初始化 3D 高斯,這不僅沒有充分利用 LiDAR 數據功能,而且還忽略了將 LiDAR 與相機數據融合的潛在優勢。在本文中,我們設計了一種新型緊密耦合雷射雷達-相機高斯散射(TCLC-GS),以充分利用雷射雷達和相機感測器的綜合優勢,實現快速、高品質的3D 重建和新穎的視圖RGB/深度合成。 TCLC-GS 設計了從 LiDAR 相機資料派生的混合顯式(彩色 3D 網格)和隱式(分層八叉樹特徵)3D 表示,以豐富 3D 高斯分佈的屬性。 3D Gaussian 的屬性不僅根據 3D 網格進行初始化,提供更完整的 3D 形狀和顏色訊息,而且還透過檢索的八叉樹隱式特徵賦予更廣泛的上下文資訊。在高斯潑濺優化過程中,3D 網格提供密集的深度資訊作為監督,透過學習穩健的幾何形狀來增強訓練過程。對 Waymo 開放資料集和 nuScenes 資料集進行的綜合評估驗證了我們的方法的最先進 (SOTA) 效能。利用單一 NVIDIA RTX 3090 Ti,我們的方法示範了快速訓練,並在城市場景中以 90 FPS、解析度 1920x1280 (Waymo) 和 120 FPS、解析度 1600x900 (nuScenes) 實現即時 RGB 和深度渲染。 ?紙
3. OmniRe:全方位城市場景重建
作者:陳子宇、楊家偉、黃家輝、Riccardo de Lutio、Janick Martinez Esturo、Boris Ivanovic、Or Litany、Zan Gojcic、Sanja Fidler、Marco Pavone、李松、Yue Wang
抽象的
我們推出 OmniRe,這是一種根據裝置日誌高效重建高保真動態城市場景的整體方法。最近使用神經輻射場或高斯濺射對駕駛序列進行建模的方法已經證明了重建具有挑戰性的動態場景的潛力,但經常忽略行人和其他非車輛動態參與者,阻礙了動態城市場景重建的完整管道。為此,我們提出了一個用於駕駛場景的全面 3DGS 框架,名為 OmniRe,它允許對駕駛日誌中的各種動態物件進行準確、完整的重建。 OmniRe 基於高斯表示建立動態神經場景圖,並建立多個局部規範空間來模擬各種動態參與者,包括車輛、行人和騎自行車的人等。這種能力是現有方法無法比擬的。 OmniRe 使我們能夠整體重建場景中存在的不同對象,隨後能夠模擬所有參與者即時參與的重建場景(~60Hz)。對 Waymo 資料集的廣泛評估表明,我們的方法在數量和質量上都遠遠優於先前最先進的方法。我們相信我們的工作填補了推動重建的關鍵空白。 ?紙|項目頁面|程式碼
2023 年:
1. [CVPR '24] DrivingGaussian:用於周圍動態自動駕駛場景的複合高斯潑濺
作者:週曉宇、林志偉、單曉軍、王永濤、孫德慶、楊明軒
抽象的
我們推出 DrivingGaussian,這是一個針對動態自動駕駛場景的高效且有效的框架。對於具有移動物體的複雜場景,我們首先使用增量靜態 3D 高斯對整個場景的靜態背景進行順序漸進建模。然後,我們利用複合動態高斯圖來處理多個移動對象,單獨重建每個對象並恢復它們在場景中的準確位置和遮蔽關係。我們進一步使用 LiDAR 先驗進行高斯散射來重建具有更多細節的場景並保持全景一致性。 DrivingGaussian 在駕駛場景重建方面優於現有方法,並能夠實現具有高保真度和多攝影機一致性的逼真環景合成。 ?紙|項目頁面|代碼(還沒有)
2. [CVPR '24] HUGS:透過高斯潑濺理解整體城市 3D 場景
作者:週宏宇、邵家豪、徐璐、白東風、邱偉超、劉冰冰、王悅、Andreas Geiger、廖依依
抽象的
基於 RGB 影像的城市場景的整體理解是一個具有挑戰性但重要的問題。它包括理解幾何和外觀,以實現新穎的視圖合成、解析語義標籤和追蹤移動物件。儘管取得了相當大的進展,但現有方法通常側重於該任務的特定方面,並且需要額外的輸入,例如 LiDAR 掃描或手動註釋的 3D 邊界框。在本文中,我們介紹了一種利用 3D 高斯分佈進行整體城市場景理解的新穎管道。我們的主要思想涉及使用靜態和動態 3D 高斯的組合來聯合優化幾何、外觀、語義和運動,其中移動物體的姿勢透過物理約束進行正則化。我們的方法能夠即時渲染新視點,產生高精度的 2D 和 3D 語義訊息,並重建動態場景,即使在 3D 邊界框檢測雜訊很高的情況下也是如此。 KITTI、KITTI-360 和 Virtual KITTI 2 上的實驗結果證明了我們方法的有效性。 ?紙|項目頁面|程式碼
頭像:
2024 年:
1. GaussianBody:透過 3d 高斯潑濺重建穿著衣服的人體
作者:李夢甜、姚聖祥、謝志峰、陳克宇、蔣玉剛
抽象的
在這項工作中,我們提出了一種基於 3D Gaussian Splatting 的新型服裝人體重建方法,稱為 GaussianBody。與昂貴的基於神經輻射的模型相比,3D 高斯分佈最近在訓練時間和渲染品質方面表現出了出色的性能。然而,由於複雜的非剛性變形和豐富的布料細節,將靜態 3D 高斯潑濺模型應用於動態人體重建問題並非易事。為了解決這些挑戰,我們的方法考慮顯式姿勢引導變形來關聯規範空間和觀察空間中的動態高斯,引入基於物理的先驗和正則化變換有助於減輕兩個空間之間的模糊性。在訓練過程中,我們進一步提出了一種姿態細化策略來更新姿態迴歸,以補償不準確的初始估計,並提出一種尺度分割機制來增強迴歸點雲的密度。實驗驗證了我們的方法可以實現最先進的真實感小說視圖渲染結果,具有動態穿著人體的高品質細節,以及顯式幾何重建。 ?紙
2. PSAvatar:基於點的可變形形狀模型,用於透過 3D 高斯潑濺創建即時頭部頭像
作者:趙忠遠、鮑振宇、李慶、邱國平、劉康林
抽象的
儘管取得了很大進展,但要實現即時高保真頭部頭像動畫仍然很困難,現有方法必須在速度和品質之間進行權衡。基於 3DMM 的方法通常無法對眼鏡和髮型等非臉部結構進行建模,而神經隱式模型則有變形不靈活和渲染效率低下的問題。儘管3D高斯已被證明在幾何表示和輻射場重建方面具有良好的能力,但將3D高斯應用於頭部頭像創建仍然是一個重大挑戰,因為3D高斯很難對因姿勢和表情變化而引起的頭部形狀變化進行建模。在本文中,我們介紹了PSAvatar,這是一種用於創建動畫頭部頭像的新穎框架,它利用離散幾何基元創建參數化可變形形狀模型,並採用3D 高斯進行精細細節表示和高保真度渲染。參數化可變形形狀模型是基於點的可變形形狀模型(PMSM),它使用點而不是網格進行 3D 表示,以實現增強的表示靈活性。 PMSM 首先透過在表面和網格外進行採樣,將 FLAME 網格轉換為點,不僅可以重建表面結構,還可以重建複雜的幾何形狀,例如眼鏡和髮型。透過以綜合分析的方式將這些點與頭部形狀對齊,PMSM 使得利用 3D 高斯進行精細細節表示和外觀建模成為可能,從而能夠創建高保真化身。我們證明 PSAvatar 可以重建各種主體的高保真頭部頭像,並且頭像可以即時動畫(≥ 25 fps,解析度為 512 × 512)。 ?紙
3. Rig3DGS:從休閒單眼影片創建可控制肖像
作者:阿爾弗雷多·裡韋羅、沙魯克·阿塔爾、舒志新、迪米特里斯·薩馬拉斯
抽象的
從休閒智慧型手機影片中創建可控的 3D 人物肖像是非常理想的,因為它們在 AR/VR 應用中具有巨大的價值。 3D Gaussian Splatting (3DGS) 的最新發展顯示出渲染品質和訓練效率的提升。然而,從單視圖捕獲中準確建模和分離頭部運動和臉部表情以實現高品質渲染仍然是一個挑戰。在本文中,我們引入 Rig3DGS 來應對這項挑戰。我們在規範空間中使用一組 3D 高斯函數來表示整個場景,包括動態主題。使用一組控制訊號(例如頭部姿勢和表情),我們將它們轉換到具有學習變形的 3D 空間,以產生所需的渲染。我們的關鍵創新是精心設計的變形方法,以源自 3D 可變形模型的可學習先驗為指導。這種方法在訓練中非常高效,並且在控制各種捕捉的面部表情、頭部位置和視圖合成方面非常有效。我們透過廣泛的定量和定性實驗證明了所學變形的有效性。 ?紙|專案頁面
4. HeadStudio:使用 3D 高斯潑濺將文字傳送到可動畫化的頭部頭像
作者:周正林、馬凡、範赫赫、楊易
抽象的
長期以來,根據文字提示創建數位化身一直是一項令人嚮往但具有挑戰性的任務。儘管在最近的工作中透過 2D 擴散先驗獲得了有希望的結果,但當前的方法在有效實現高品質和動畫化身方面面臨著挑戰。在本文中,我們介紹了 HeadStudio,這是一種新穎的框架,它利用 3D 高斯噴射從文字提示生成逼真的動畫化身。我們的方法在語義上驅動 3D 高斯,透過中間 FLAME 表示創建靈活且可實現的外觀。具體來說,我們將 FLAME 合併到 3D 表示和分數蒸餾:1)基於 FLAME 的 3D 高斯潑濺,透過將每個點綁定到 FLAME 網格來驅動 3D 高斯點。 2)基於FLAME的樂譜蒸餾採樣,利用基於FLAME的細粒度控制訊號從文本提示中指導樂譜蒸餾。大量的實驗證明了 HeadStudio 在根據文字提示產生可動畫化身、展現視覺上吸引人的外觀方面的功效。虛擬人物能夠以 1024 解析度渲染高品質即時(≥40 fps)新穎的視圖。我們希望 HeadStudio 能夠推進數位化身的創建,並且本方法可以廣泛應用於各個領域。 ?紙|項目頁面|代碼(還沒有)
5. ImplicitDeepfake:使用 NeRF 和高斯潑濺透過隱式 Deepfake 生成進行合理的換臉
作者:Georgii Stanishevskii、Jakub Steczkiewicz、Tomasz Szczepanik、Sławomir Tadeja、Jacek Tabor、Przemysław Spurek
抽象的
許多新興的深度學習技術對電腦圖形學產生了重大影響。最有希望的突破是最近興起的神經輻射場(NeRF)和高斯散射(GS)。 NeRF 使用少量已知相機位置的影像在神經網路權重中對物件的形狀和顏色進行編碼,以產生新穎的視圖。相較之下,GS 透過將物件的特徵編碼在高斯分佈集合中,提供加速訓練和推理,而不會降低渲染品質。這兩種技術已在空間計算和其他領域找到了許多用例。另一方面,deepfake方法的出現引發了相當大的爭議。此類技術可以採用人工智慧生成的視訊形式,非常模仿真實的鏡頭。使用生成模型,他們可以修改面部特徵,從而能夠創造改變的身份或面部表情,從而展現出與真人極其逼真的外觀。儘管存在這些爭議,但 Deepfake 可以在品質理想的情況下為頭像創建和遊戲提供下一代解決方案。為此,我們展示瞭如何結合所有這些新興技術以獲得更合理的結果。我們的ImplicitDeepfake1使用經典的deepfake演算法分別修改所有訓練影像,然後在修改後的臉部上訓練NeRF和GS。這種相對簡單的策略可以產生可信的基於深度偽造的 3D 化身。 ?紙|代碼(還沒有)
6. GaussianHair:使用光感知高斯進行頭髮建模和渲染
作者:羅海民、歐陽敏、趙子君、姜素義、張龍文、張啟軒、楊偉、徐蘭、於靜怡
抽象的
髮型乍看之下反映了文化和種族。在數位時代,各種逼真的人類髮型對於高保真數位人類資產的美觀性和包容性也至關重要。然而,由於頭髮數量龐大、幾何結構複雜以及與光線的複雜交互,逼真的頭髮建模和動畫即時渲染是一項艱鉅的挑戰。本文提出了 GaussianHair,一種新穎的顯式頭髮表示。它可以根據圖像對頭髮幾何形狀和外觀進行全面建模,從而促進創新的照明效果和動態動畫功能。 GaussianHair 的核心是一個新穎的概念,即將每根髮絲表示為一系列相連的圓柱形 3D 高斯基元。這種方法不僅保留了頭髮的幾何結構和外觀,而且還允許在 2D 影像平面上進行有效的光柵化,從而促進可微分體積渲染。我們透過「GaussianHair Scattering Model」進一步增強了該模型,擅長重建髮絲的細長結構,並在均勻照明下準確捕捉其局部漫反射顏色。透過大量的實驗,我們證實 GaussianHair 在幾何和外觀保真度方面都取得了突破,超越了最先進的頭髮重建方法所遇到的限制。除了表示之外,GaussianHair 還支援頭髮的編輯、重新照明和動態渲染,提供與傳統 CG 管道工作流程的無縫整合。為了補充這些進步,我們編制了一個廣泛的真實人類頭髮資料集,每個資料集都具有細緻的髮絲幾何形狀,以推動該領域的進一步研究。 ?紙
7. GVA:從單眼視訊重建生動的 3D 高斯頭像
作者:劉新奇、吳晨明、劉嘉倫、劉星、吳金波、趙晨、馮浩成、丁二瑞、王京東
抽象的
在本文中,我們提出了一種新穎的方法,有助於從單眼視訊輸入 (GVA) 創建生動的 3D 高斯頭像。我們的創新在於解決提供高保真人體重建並將 3D 高斯與人體皮膚表面準確對齊的複雜挑戰。本文的主要貢獻是雙重的。首先,我們介紹一種姿勢細化技術,透過對齊法線貼圖和輪廓來提高手部和腳部姿勢的準確性。精確的姿勢對於正確的形狀和外觀重建至關重要。其次,我們透過一種新穎的表面引導重新初始化方法來解決先前降低 3D 高斯化身品質的不平衡聚合和初始化偏差問題,該方法可確保 3D 高斯點與化身表面的精確對齊。實驗結果表明,我們提出的方法實現了高保真、生動的 3D 高斯頭像重建。大量的實驗分析定性和定量地驗證了其性能,證明它在逼真的新穎視圖合成方面實現了最先進的性能,同時提供了對人體和手部姿勢的細粒度控制。 ?紙|項目頁面|代碼(還沒有)
8. [CVPR '24] SplattingAvatar:具有網格嵌入式高斯潑濺的逼真即時人體頭像
作者:邵志靜、王兆龍、李壯、王多屯、林相如、張宇、範明明、王澤宇
抽象的
我們推出了 SplattingAvatar,這是一種逼真的人類化身的混合 3D 表示形式,在三角形網格上嵌入了高斯 Splatting,它在現代 GPU 上渲染超過 300 FPS,在行動裝置上渲染超過 30 FPS。我們透過顯式網格幾何和高斯潑濺隱式外觀建模來解開虛擬人的運動和外觀。高斯由重心座標和三角形網格上的位移定義為 Phong 曲面。我們擴展了提升優化,以便在三角形網格上行走時同時優化高斯參數。 SplattingAvatar 是虛擬人的混合表示,其中網格代表低頻運動和表面變形,而高斯則代表高頻幾何和詳細外觀。與依賴基於MLP 的線性混合蒙皮(LBS) 場進行運動的現有變形方法不同,我們直接透過網格控制高斯的旋轉和平移,這使其能夠與各種動畫技術相容,例如骨骼動畫、混合形狀和網格編輯。 SplattingAvatar 可透過單眼影片訓練全身和頭部頭像,在多個資料集上顯示出最先進的渲染品質。 ?紙|項目頁面|代碼| ?簡短介紹
9. SplatFace:利用可最佳化表面的高斯 Splat 面重建
作者:邵志靜、王兆龍、李壯、王多屯、林相如、張宇、範明明、王澤宇
抽象的
我們提出了 SplatFace,一種新穎的高斯噴射框架,專為 3D 人臉重建而設計,不依賴精確的預先確定的幾何形狀。我們的方法旨在同時提供高品質的新穎視圖渲染和準確的 3D 網格重建。我們採用通用 3D 可變形模型 (3DMM) 來提供表面幾何結構,從而可以使用有限的輸入影像集重建臉部。我們引入了一種聯合優化策略,透過協同非剛性對齊過程來細化高斯和可變形表面。提出了一種新穎的距離度量,splat-to-surface,透過考慮高斯位置和協方差來改善對齊。表面資訊也用於合併世界空間緻密化過程,從而實現卓越的重建品質。我們的實驗分析表明,所提出的方法在新穎的視圖合成中與其他高斯噴射技術以及在生成具有高幾何精度的 3D 面網格方面與其他 3D 重建方法相比都具有競爭力。 ?紙
10.哈哈:具有紋理網格先驗的高度鉸接的高斯人體頭像
作者:邵志靜、王兆龍、李壯、王多屯、林相如、張宇、範明明、王澤宇
抽象的
我們提出了 HAHA - 一種從單眼輸入影片產生可動畫人類頭像的新方法。所提出的方法依賴於學習高斯噴射和紋理網格的使用之間的權衡,以實現高效和高保真度的渲染。我們展示了它透過 SMPL-X 參數模型控制的全身人體化身動畫和渲染的效率。我們的模型學會僅在 SMPL-X 網格中所需的區域應用高斯潑濺,例如頭髮和網格外的衣服。這導致使用最少數量的高斯來表示完整的頭像,並減少渲染偽影。這使我們能夠處理傳統上被忽視的身體小部位(例如手指)的動畫。我們在兩個開放資料集:SnapshotPeople 和 X-Humans 上證明了我們的方法的有效性。我們的方法在 SnapshotPeople 上展示了與最先進的重建品質相當的質量,同時使用了不到三分之一的高斯模型。 HAHA 在 X 人類的新穎姿勢上無論從數量或質量上都優於之前的最先進技術。 ?紙
11. [CVPRW '24] 用於 3D 感知生成對抗網路的高斯潑濺解碼器
作者:弗洛里安·巴特爾、阿里安·貝克曼、維蘭德·摩根斯坦、安娜·希爾斯曼、彼得·艾塞特
抽象的
基於 NeRF 的 3D 感知生成對抗網絡(例如 EG3D 或 GIRAFFE)在大量表徵多樣性下表現出了非常高的渲染品質。然而,使用神經輻射場進行渲染給大多數 3D 應用帶來了一些挑戰:首先,NeRF 渲染的巨大運算需求阻礙了其在低功耗設備上的使用,例如手機和 VR/AR 耳機。其次,基於神經網路的隱式表示很難融入顯式 3D 場景,例如 VR 環境或電玩遊戲。 3D 高斯潑濺 (3DGS) 透過提供可在高幀速率下高效渲染的顯式 3D 表示來克服這些限制。在這項工作中,我們提出了一種新穎的方法,它將基於 NeRF 的 3D 感知生成對抗網路的高渲染品質與 3DGS 的靈活性和計算優勢相結合。透過訓練將隱式 NeRF 表示映射到顯式 3D Gaussian Splatting 屬性的解碼器,我們可以首次將 3D GAN 的表示多樣性和質量整合到 3D Gaussian Splatting 的生態系統中。此外,我們的方法允許使用 3D 高斯潑濺場景進行高解析度 GAN 反轉和即時 GAN 編輯。 ?紙|項目頁面|程式碼
12. GoMAvatar:使用 Gaussians-on-Mesh 從單眼影片進行高效的可動畫人體建模
作者:文靜、趙曉明、任中正、Alexander G. Schwing、王申龍
抽象的
我們推出 GoMAvatar,這是一種即時、記憶體高效、高品質的可動畫人體建模的新穎方法。 GoMAvatar 將單一單眼視訊作為輸入來創建數位化身,能夠以新的姿勢重新表達並從新的視角進行即時渲染,同時與基於光柵化的圖形管道無縫整合。我們方法的核心是網格上的高斯表示,這是一種混合 3D 模型,將高斯噴射的渲染品質和速度與幾何建模和可變形網格的兼容性相結合。我們在 ZJU-MoCap 數據和各種 YouTube 影片上評估 GoMAvatar。 GoMAvatar 在渲染品質方面匹配或超越當前的單目人體建模演算法,並在計算效率 (43 FPS) 方面顯著優於它們,同時記憶體效率較高(每個物件 3.63 MB)。 ?紙|項目頁面|程式碼
13. OccGaussian:用於遮擋人體渲染的 3D 高斯潑濺
作者:葉景瑞、張宗凱、蔣玉嬌、廖慶民、楊文明、盧宗慶
抽象的
從單眼影片渲染動態 3D 人體對於虛擬實境和數位娛樂等各種應用至關重要。大多數方法都假設人們處於無障礙場景中,而在現實生活場景中,各種物體可能會導致身體部位的遮蔽。先前的方法利用NeRF進行表面渲染來恢復被遮蔽的區域,但需要一天以上的訓練時間和幾秒鐘的渲染時間,無法滿足即時互動應用的要求。為了解決這些問題,我們提出了基於 3D Gaussian Splatting 的 OccGaussian,它可以在 6 分鐘內完成訓練,並在輸入被遮蔽的情況下產生高達 160 FPS 的高品質人體渲染。 OccGaussian在規範空間中初始化3D高斯分佈,我們在遮蔽區域執行遮蔽特徵查詢,提取聚合的像素對齊特徵以補償遺失的資訊。然後,我們使用高斯特徵 MLP 進一步處理特徵以及遮蔽感知損失函數,以更好地感知遮蔽區域。在模擬和現實世界的遮蔽中進行的大量實驗表明,與最先進的方法相比,我們的方法實現了可比甚至更優越的性能。我們將訓練和推理速度分別提高了 250 倍和 800 倍。 ?紙
14. [CVPR '24] 猜猜看看不見的東西:從部分 2D 瞥見動態 3D 場景重建
作者:李仁熙、金秉俊、朱韓星
抽象的
在本文中,我們提出了一種根據單眼視訊輸入以 3D 形式重建世界和多個動態人類的方法。作為一個關鍵想法,我們透過最近出現的 3D 高斯潑濺 (3D-GS) 表示來表示世界和多個人類,從而能夠方便有效地組合和渲染它們。特別是,我們解決了 3D 人體重建中觀察嚴重受限和稀疏的場景,這是現實世界中遇到的常見挑戰。為了應對這項挑戰,我們引入了一種新穎的方法,透過融合公共空間中的稀疏線索來優化規範空間中的3D-GS 表示,其中我們利用預先訓練的2D 擴散模型來合成看不見的視圖,同時保持與觀察到的二維外觀。我們證明了我們的方法可以在各種具有挑戰性的示例中,在存在遮擋、圖像裁剪、少量鏡頭和極其稀疏的觀察的情況下重建高品質的可動畫 3D 人體。重建後,我們的方法不僅能夠在任意時間實例以任何新穎的視圖渲染場景,還能夠透過刪除個體人物或對每個人應用不同的運動來編輯 3D 場景。透過各種實驗,我們證明了我們的方法相對於現有替代方法的品質和效率。 ?紙|項目頁面|程式碼
15. [NeurIPS '24] 可泛化且可動畫的高斯頭部頭像
作者:褚軒庚、原田達也
抽象的
在本文中,我們提出了可泛化且可動畫的高斯頭部頭像(GAGAvatar),用於一次性可動畫頭部頭像重建。現有方法依賴神經輻射場,導致大量渲染消耗和低重演速度。為了解決這些限制,我們在單次前向傳播中從單一影像產生 3D 高斯參數。我們工作的關鍵創新是提出的雙提升方法,該方法可產生捕捉身份和臉部細節的高保真 3D 高斯圖像。此外,我們利用全域影像特徵和 3D 變形模型來建立 3D 高斯模型來控製表達。訓練後,我們的模型無需特定優化即可重建看不見的身份,並以即時速度執行重演渲染。實驗表明,與先前的方法相比,我們的方法在重建品質和表達準確性方面表現出優越的性能。我們相信我們的方法可以為數位化身的未來研究和先進應用建立新的基準。 ?紙|項目頁面|程式碼
16. [SIGGRAPH Asia'24] DualGS:用於沉浸式以人為中心的體積視頻的魯棒雙高斯潑濺
作者:姜宇恆、沈哲浩、於紅、郭成成、吳以澤、張英亮、於靜怡、徐蘭
抽象的
體積影片代表了視覺媒體的變革性進步,使用戶能夠自由地瀏覽沉浸式虛擬體驗,並縮小數位世界與現實世界之間的差距。然而,需要大量的手動幹預來穩定網格序列,並且在現有工作流程中產生過大的資產,阻礙了更廣泛的採用。在本文中,我們提出了一種新穎的基於高斯的方法,稱為 textit{DualGS},用於以出色的壓縮比實時、高保真地回放複雜的人類表現。我們在 DualGS 中的關鍵思想是使用相應的皮膚和關節高斯分別表示運動和外觀。這種明確的解纏結可以顯著減少運動冗餘並增強時間相干性。我們首先初始化 DualGS 並將皮膚高斯錨定到第一幀的聯合高斯。隨後,我們採用從粗到細的訓練策略來進行逐幀人類表現建模。它包括用於整體運動預測的粗對準階段以及用於魯棒追蹤和高保真渲染的細粒度優化。為了將體積視訊無縫整合到 VR 環境中,我們使用熵編碼有效地壓縮運動,並使用編解碼器壓縮和持久碼本來有效壓縮外觀。我們的方法實現了高達 120 倍的壓縮比,每幀僅需要約 350KB 的儲存空間。我們透過 VR 耳機上的逼真自由觀看體驗展示了我們的表現效果,使用戶能夠身臨其境地觀看音樂家的表演,並感受表演者指尖的音符節奏。 ?紙|項目頁面| ?簡短介紹 |數據集
17. [SIGGRAPH Asia'24] V^3:透過可串流的 2D 動態高斯在手機上查看體積視頻
作者:王鵬浩、張志瑞、王廖、姚開心、謝思源、於靜怡、吳敏業、徐蘭
抽象的
體驗像 2D 影片一樣無縫的高保真體積影片是一個長期的夢想。然而,目前的動態 3DGS 方法儘管渲染品質很高,但由於運算和頻寬限制,在行動裝置上進行串流傳輸時面臨挑戰。在本文中,我們介紹了 V^3(查看體積視訊),這是一種透過動態高斯流實現高品質移動渲染的新穎方法。我們的關鍵創新是將動態 3DGS 視為 2D 視頻,從而促進硬體視頻編解碼器的使用。此外,我們提出了一種兩階段訓練策略,以快速訓練速度來減少儲存需求。第一階段採用雜湊編碼和淺層 MLP 來學習運動,然後透過剪枝減少高斯數量以滿足串流要求,而第二階段使用殘差熵損失和時間損失微調其他高斯屬性以提高時間連續性。這種策略將運動和外觀分開,保持高渲染品質和緊湊的儲存要求。同時,我們設計了一個多平台播放器來解碼和渲染 2D 高斯影片。大量實驗證明了 V^3 的有效性,透過在常見設備上實現高品質渲染和串流傳輸,其性能優於其他方法,這是以前從未見過的。作為第一個在行動裝置上串流動態高斯的播放器,我們的配套播放器為用戶提供了前所未有的體積視訊體驗,包括平滑滾動和即時共享。我們的專案頁面及其原始程式碼可透過此 https URL 取得。 ?紙|項目頁面| ?簡短介紹
2023 年:
1. 可駕駛的3D高斯化身
作者:Wojciech Zielonka、Timur Bagautdinov、Shunsuke Saito、Michael Zollhöfer、Justus Thies、Javier Romero
抽象的
我們推出了可駕駛 3D 高斯化身 (D3GA),這是第一個用高斯圖形渲染的人體 3D 可控模型。目前逼真的可駕駛化身需要訓練期間準確的 3D 配準,測試期間的密集輸入影像,或兩者兼而有之。基於神經輻射場的那些對於遠端呈現應用來說也往往慢得令人望而卻步。這項工作使用最近提出的 3D 高斯潑濺 (3DGS) 技術,使用密集校準的多視圖視訊作為輸入,以即時幀速率渲染逼真的人體。為了使這些基元變形,我們放棄了常用的線性混合蒙皮 (LBS) 點變形方法,並使用經典的體積變形方法:籠變形。考慮到它們的尺寸較小,我們用關節角度和關鍵點驅動這些變形,這更適合通訊應用。當使用相同的訓練和測試數據時,我們對九個具有不同體型、衣服和動作的受試者進行的實驗獲得了比最先進的方法更高品質的結果。 ?紙|項目頁面| ?簡短介紹
2. SplatArmor:透過單眼 RGB 影片為可動畫人類進行鉸接高斯潑濺
作者:Rohit Jena、Ganesh Subramanian Iyer、Siddharth Choudhary、Brandon Smith、Pratik Chaudhari、James Gee
抽象的
我們提出了 SplatArmor,這是一種透過 3D 高斯「裝甲」參數化身體模型來恢復詳細且可動畫的人體模型的新穎方法。我們的方法將人類表示為規範空間內的一組 3D 高斯,其清晰度是透過將底層 SMPL 幾何體的蒙皮擴展到規範空間中的任意位置來定義的。為了解釋與姿態相關的效應,我們引入了 SE(3) 場,它使我們能夠捕捉高斯分佈的位置和各向異性。此外,我們建議使用神經顏色場來提供顏色正則化和 3D 監督,以實現這些高斯的精確定位。我們表明,高斯潑濺透過利用光柵化基元,為基於神經渲染的方法提供了有趣的替代方案,而無需面臨此類方法中通常面臨的任何不可微性和最佳化挑戰。光柵化範例使我們能夠利用正向蒙皮,並且不會受到與反向蒙皮和扭曲相關的模糊性的影響。我們在 ZJU MoCap 和 People Snapshot 資料集上展示了令人信服的結果,這強調了我們的可控人體合成方法的有效性。 ?紙|項目頁面|代碼(還沒有)
3. [CVPR '24] 可動畫高斯:學習姿勢相關高斯圖以進行高保真人體頭像建模
作者:李喆、鄭澤榮、王麗珍、劉業斌
抽象的
從 RGB 影片中建模可動畫化的人類頭像是一個長期存在且具有挑戰性的問題。最近的工作通常採用基於 MLP 的神經輻射場 (NeRF) 來表示 3D 人體,但純 MLP 仍然很難回歸與姿勢相關的服裝細節。為此,我們引入了可動畫高斯 (Animatable Gaussians),這是一種新的化身表示形式,利用強大的 2D CNN 和 3D 高斯潑濺來創建高保真化身。為了將 3D 高斯與可動畫化的頭像關聯起來,我們從輸入影片中學習參數化模板,然後在兩個前後規範高斯圖上參數化該模板,其中每個像素代表一個 3D 高斯。學習到的模板可以適應穿著的服裝,用於對裙子等寬鬆的衣服進行建模。這種模板引導的 2D 參數化使我們能夠採用強大的基於 StyleGAN 的 CNN 來學習姿勢相關的高斯圖,以對詳細的動態外觀進行建模。此外,我們引入了一種姿勢投影策略,以便在給定新姿勢的情況下更好地泛化。總的來說,我們的方法可以創建具有動態、真實和通用外觀的逼真頭像。實驗表明,我們的方法優於其他最先進的方法。 ?紙|項目頁面|程式碼
4. [CVPR '24] GART:高斯鉸接模板模型
作者:雷家輝、王玉甫、Georgios Pavlakos、劉令傑、Kostas Daniilidis
抽象的
我們引入高斯鉸接模板模型 GART,這是一種明確、高效且富有表現力的表示,用於從單目影片擷取和渲染非剛性鉸接主題。 GART 利用移動 3D 高斯的混合來明確近似可變形主體的幾何形狀和外觀。它利用先驗分類模板模型(SMPL、SMAL 等)和可學習的前向蒙皮,同時進一步推廣到具有新型潛在骨骼的更複雜的非剛性變形。 GART 可以透過單眼視訊的可微分渲染在幾秒或幾分鐘內重建,並以超過 150 fps 的速度渲染新穎的姿勢。 ?紙|項目頁面|代碼| ?簡短介紹
5. [CVPR '24] Human Gaussian Splatting:可動畫化身的即時渲染
作者:亞瑟·莫羅、宋繼飛、Helisa Dhamo、理查德·肖、周伊人、愛德華多·佩雷斯-佩利特羅
抽象的
這項工作解決了從多視圖視訊中學習的逼真人體頭像的即時渲染問題。雖然建模和渲染虛擬人類的經典方法通常使用紋理網格,但最近的研究開發了神經身體表示,可以實現令人印象深刻的視覺品質。然而,這些模型很難即時渲染,並且當角色的身體姿勢與訓練觀察不同時,它們的品質會下降。我們提出了一種基於 3D 高斯分佈的可動畫人體模型,該模型最近已成為神經輻射場的非常有效的替代方案。身體由規範空間中的一組高斯基元表示,該空間透過結合前向蒙皮和局部非剛性細化的從粗到細的方法進行變形。我們描述瞭如何從多視圖觀察中以端到端的方式學習人類高斯潑濺(HuGS)模型,並根據穿著身體的新姿勢合成的最先進方法對其進行評估。我們的方法比 THuman4 資料集上最先進的方法實現了 1.5 dB PSNR 改進,同時能夠即時渲染(512x512 解析度為 80 fps)。 ?紙|項目頁面| ?簡短介紹
6. [CVPR '24] HUGS:人類高斯斑點
作者:Muhammed Kocabas、Jen-Hao Rick Chang、James Gabriel、Oncel Tuzel、Anurag Ranjan
抽象的
神經渲染的最新進展將訓練和渲染時間提高了幾個數量級。雖然這些方法展示了最先進的品質和速度,但它們是為靜態場景的攝影測量而設計的,並不能很好地推廣到環境中自由移動的人類。在這項工作中,我們引入了人類高斯分佈 (HUGS),它使用 3D 高斯分佈 (3DGS) 來表示可動畫的人體和場景。我們的方法只需要一個少量(50-100)幀的單眼視頻,它就能在 30 分鐘內自動學習將靜態場景和完全可動畫化的人類頭像分開。我們利用 SMPL 身體模型來初始化人類高斯模型。為了捕捉 SMPL 未建模的細節(例如布料、頭髮),我們允許 3D 高斯偏離人體模型。將 3D 高斯函數用於動畫人物會帶來新的挑戰,包括在表達高斯函數時產生的偽影。我們建議聯合優化線性混合蒙皮權重,以協調動畫期間各個高斯的運動。我們的方法能夠實現人類的新穎姿勢合成以及人類和場景的新穎視圖合成。我們以 60 FPS 的渲染速度實現了最先進的渲染質量,同時訓練速度比之前的工作快約 100 倍。 ?紙|項目頁面|代碼(還沒有)
7. [CVPR '24] 用於高效 3D 人類生成的高斯殼圖
作者:Rameen Abdal、王一凡、石子凡、徐英浩、Ryan Po、鄺正飛、陳啟峰、Dit-Yan Yeung、Gordon Wetzstein
抽象的
高效產生 3D 數位人類對於虛擬實境、社群媒體和電影製作等多個行業都很重要。 3D 生成對抗網路 (GAN) 已經證明了生成資產的最先進 (SOTA) 品質和多樣性。然而,目前的 3D GAN 架構通常依賴體積表示,而渲染速度很慢,阻礙了 GAN 訓練並需要多視圖不一致的 2D 上採樣器。在這裡,我們引入高斯殼圖 (GSM) 作為一個框架,該框架使用可清晰表達的基於多殼的支架將 SOTA 生成器網路架構與新興的 3D 高斯渲染基元連接起來。在此設定中,CNN 產生 3D 紋理堆疊,其中的特徵映射到殼。後者代表了處於規範身體姿勢的數位人模板表面的充氣和放氣版本。我們不是直接對殼進行光柵化,而是在殼上對 3D 高斯進行取樣,其屬性被編碼在紋理特徵中。這些高斯函數被高效且可微分地渲染。在 GAN 訓練期間以及在推理時將身體變形為任意使用者定義的姿勢時,連接外殼的能力非常重要。我們的高效渲染方案無需視圖不一致的上採樣器,並以 512 × 512 像素的原始解析度實現高品質的多視圖一致渲染。我們證明,當在單視圖資料集(包括 SHHQ 和 DeepFashion)上進行訓練時,GSM 可以成功生成 3D 人體。 ?紙|項目頁面|程式碼
8. GaussianHead:具有可學習高斯推導的高保真頭部頭像
作者:王傑、謝久成、李賢彥、潘志文、徐峰、高浩
抽象的
為給定的主題建立生動的 3D 頭部頭像並在其上實現一系列動畫既有價值又具有挑戰性。本文介紹了 GaussianHead,它使用各向異性 3D 高斯模型對動作人體頭部進行建模。在我們的框架中,分別建構了運動變形場和多重解析度三平面來處理頭部的動態幾何形狀和複雜紋理。值得注意的是,我們對每個高斯函數施加了獨特的推導方案,該方案透過一組用於位置變換的可學習參數來產生其多個分身。透過這種設計,我們可以緊湊而準確地編碼高斯的外觀訊息,甚至是那些具有複雜結構的頭部特定組件。此外,對新加入的高斯模型採用繼承推導策略,以促進訓練加速。大量的實驗表明,我們的方法可以產生高保真渲染,在重建、跨身份重演和新穎的視圖合成任務方面優於最先進的方法。 ?紙|項目頁面|程式碼
9. [CVPR '24] GaussianAvatars: Rigged 3D Gaussian 的真實感頭部頭像
作者:錢深漢、Tobias Kirschstein、Liam Schoneveld、Davide Davoli、Simon Giebenhain、Matthias Nießner
抽象的
我們引入了 GaussianAvatars,這是一種創建逼真頭部頭像的新方法,在表情、姿勢和視角方面完全可控。核心思想是基於 3D 高斯圖的動態 3D 表示,該圖被綁定到參數化可變形臉部模型。這種組合促進了照片級真實感渲染,同時允許透過底層參數模型進行精確的動畫控制,例如,透過來自驅動序列的表達傳輸或透過手動更改可變形模型參數。我們透過三角形的局部座標系對每個板進行參數化,並優化顯式位移偏移以獲得更準確的幾何表示。在頭像重建過程中,我們以端到端的方式聯合優化可變形模型參數和高斯分佈參數。我們在幾個具有挑戰性的場景中展示了逼真頭像的動畫功能。例如,我們展示了駕駛影片的重演,我們的方法明顯優於現有的作品。 ?紙|項目頁面|代碼| ?簡短介紹
10. [CVPR '24] GPS-Gaussian:用於即時人類小說視圖合成的可泛化像素級 3D 高斯潑濺
作者:鄭順源、周博耀、邵睿智、劉伯寧、張昇平、聶立強、劉業斌
抽象的
我們提出了一種稱為 GPS-高斯的新方法,用於即時合成角色的新穎視圖。所提出的方法能夠在稀疏視圖相機設定下實現 2K 解析度渲染。與需要針對每個主題進行最佳化的原始高斯潑濺或神經隱式渲染方法不同,我們引入了在來源視圖上定義的高斯參數圖,並直接回歸高斯潑濺屬性,以實現即時新穎的視圖合成,而無需任何微調或最佳化。為此,我們在大量人體掃描資料上訓練高斯參數回歸模組,並結合深度估計模組將 2D 參數圖提升到 3D 空間。所提出的框架是完全可微的,並且在多個數據集上進行的實驗表明,我們的方法優於最先進的方法,同時實現了超快的渲染速度。 ?紙|項目頁面|代碼| ?簡短介紹
11. GauHuman:來自單目人類影片的鉸接高斯潑濺
作者:胡壽康劉子偉
抽象的
我們提出了GauHuman,一種採用高斯潑濺法的3D 人體模型,可實現快速訓練(1~2 分鐘)和實時渲染(高達189 FPS),而現有的基於NeRF 的隱式表示建模框架需要數小時的訓練和數秒的時間每幀渲染的數量。具體來說,GauHuman 在規範空間中對高斯潑濺進行編碼,並透過線性混合蒙皮(LBS) 將3D 高斯從規範空間轉換到姿勢空間,其中有效的姿勢和LBS 細化模組旨在以可忽略的計算成本學習3D 人體的精細細節。此外,為了實現 GauHuman 的快速優化,我們使用 3D 人類先驗來初始化和修剪 3D 高斯,同時透過 KL 散度指導進行分裂/克隆,以及一種新穎的合併操作以進一步加速。在 ZJU_Mocap 和 MonoCap 資料集上的大量實驗表明,GauHuman 透過快速訓練和即時渲染速度,在定量和定性上實現了最先進的性能。值得注意的是,在不犧牲渲染品質的情況下,GauHuman 可以使用約 13k 3D 高斯快速對 3D 人類表演者進行建模。 ?紙|項目頁面|代碼| ?簡短介紹
12. HeadGaS:透過 3D 高斯潑濺實現即時動畫頭部頭像
作者:Helisa Dhamo、聶銀宇、Arthur Moreau、宋繼飛、Richard Shaw、週逸仁、Eduardo Pérez-Pellitero
抽象的
過去幾年,3D 頭部動畫在品質和運行時間方面取得了重大改進,特別是在可微分渲染和神經輻射領域的進步的推動下。即時渲染是現實應用程式非常渴望的目標。我們提出了 HeadGaS,這是第一個使用 3D 高斯 Splats (3DGS) 進行 3D 頭部重建和動畫的模型。在本文中,我們介紹了一種混合模型,該模型以可學習的潛在特徵為基礎擴展了3DGS 的明確表示,該模型可以與參數化頭部模型中的低維參數線性混合,以獲得與表達相關的最終顏色和不透明度值。我們證明,HeadGaS 在即時推理幀速率方面提供了最先進的結果,其超出基線高達約 2dB,同時將渲染速度加快了 10 倍以上。 ?紙
13. [CVPR '24] HiFi4G:透過緊湊高斯潑濺進行高保真人類表現渲染
作者:姜宇恆、沈哲浩、王鵬浩、蘇卓、於紅、張英亮、於靜怡、徐蘭
抽象的
我們最近看到了照片級真實人體建模和渲染方面的巨大進步。然而,有效地渲染真實的人類表現並將其整合到光柵化管道中仍然具有挑戰性。在本文中,我們提出了 HiFi4G,這是一種基於高斯的明確而緊湊的方法,用於從密集的鏡頭中進行高保真人類表演渲染。我們的核心直覺是將 3D 高斯表示與非剛性追蹤結合起來,實現緊湊且易於壓縮的表示。我們首先提出一種雙圖機制來獲得運動先驗,使用粗變形圖進行有效初始化,並使用細粒度高斯圖來強制後續約束。然後,我們利用具有自適應時空正則化器的 4D 高斯優化方案來有效平衡非剛性先驗和高斯更新。我們還提出了一種配套壓縮方案,具有殘差補償功能,可在各種平台上提供沉浸式體驗。它實現了大約 25 倍的大幅壓縮率,每幀儲存空間不到 2MB。大量的實驗證明了我們方法的有效性,該方法在優化速度、渲染品質和儲存開銷方面顯著優於現有方法。 ?紙|項目頁面| ?簡短介紹 |數據集
14. [CVPR '24] GaussianAvatar:透過可動畫 3D 高斯從單一影片實現逼真的人類頭像建模
作者:胡亮曉、張宏文、張宇翔、周博耀、劉伯寧、張昇平、聶立強
抽象的
我們推出 GaussianAvatar,這是一種從單一影片創建具有動態 3D 外觀的逼真人類頭像的有效方法。我們首先引入可動畫化的 3D 高斯函數來明確表示各種姿勢和服裝風格的人類。這種明確且可動畫化的表示可以更有效、更一致地融合 2D 觀察中的 3D 外觀。我們的表示進一步增強了動態屬性,以支援依賴姿勢的外觀建模,其中動態外觀網路和可優化的特徵張量被設計為學習運動到外觀的映射。此外,透過利用可區分的運動條件,我們的方法可以在阿凡達建模過程中對運動和外觀進行聯合優化,這有助於解決單眼環境中長期存在的運動估計的長期問題。 GaussianAvatar的功效在公共資料集和我們收集的資料集上都得到了驗證,證明了其在外觀品質和渲染效率方面的優越性能。 ?紙|項目頁面|代碼| ?簡短的演示
15。
作者:Jun Xiang,Xuan Gao,Yudong Guo,Juyong Zhang
抽象的
我們提出了Flashavatar,這是一種小說且輕巧的3D動畫化身表示,可以在幾分鐘內從短的單眼視頻序列中重建數位化身,並在消費級GPU上以300fps的形式在300fps上呈現高保真的照片真實影像。為了實現這一目標,我們維持一個均勻的3D高斯田地,該場嵌入了參數面模型表面,並學習額外的空間偏移,以模擬非表面區域和微妙的臉部細節。雖然充分使用幾何先驗可以捕捉高頻面部細節並保留誇張的表情,但適當的初始化可以幫助減少高斯人的數量,從而實現超快速的渲染速度。廣泛的實驗結果表明,Flashavatar的表現優於有關視覺品質和個性化細節的現有作品,並且在渲染速度方面幾乎更快。 ?紙|項目頁面|程式碼
16。
作者:Shunsuke Saito,Gabriel Schwartz,Tomas Simon,Junxuan Li,Giljoo Nam
抽象的
重新照明的保真度受到幾何和外觀表示的限制。對於幾何形狀,網格和體積方法都難以建模複雜的結構(例如3D頭髮幾何形狀)。對於外觀,現有的重新模型的忠誠度有限,通常太慢,無法在高解析度連續環境中即時渲染。在這項工作中,我們介紹了可重新的高斯編解碼化身,這是一種建立高保真可靠的頭部化身的方法,可以動畫以產生新穎的表情。我們基於3D高斯人的幾何模型可以捕捉3D一致的亞毫米細節,例如髮束和動態臉部序列上的毛孔。為了以統一的方式支持人頭,皮膚和頭髮等人頭的各種材料,我們提出了一種基於可學習的輻射轉移的新型可重新外觀模型。加上瀰漫組件的全球照明式球形諧波,我們使用球形高斯人透過空間全頻反射實現即時重新重新重新重新重新重新獲得。此外觀模型可以在點光和連續照明下有效地恢復。我們進一步提高了眼部反射的保真度,並透過引入可靠的明確眼模型來實現明確的注視控制。我們的方法在不損害即時效能的情況下優於現有方法。我們還展示了在束縛的消費者VR耳機上對化身的即時重新確認,並展示了我們的化身的效率和忠誠度。 ?紙|專案頁面
17。
作者:Yufan Chen,Lizhen Wang,Qijing Li,Hongjiang Xiao,Shengping Zhang,Hongxun Yao,Yebin Liu
抽象的
從單眼肖像影片序列重建的動畫照片真實的頭像化身是彌合虛擬世界和現實世界之間差距的關鍵步驟。針對這項正在進行的研究,已經利用了頭像頭像技術的最新進展,包括顯式3D可形態網格(3DMM),點雲和神經隱式表示。但是,基於3DMM的方法受其固定拓撲結構的約束,基於點的方法由於涉及的積分大量而承受著沉重的訓練負擔,而最後的方法則遭受了變形靈活性和渲染效率的限制。為了應對這些挑戰,我們提出了Monogaussianavatar(基於高斯點的頭部Avatar),這是一種新穎的方法,它利用3D高斯點表示以及高斯變形場,以從單面肖像視頻中學習顯式頭部化身。我們用高斯點定義頭像,其特徵是適應性形狀,從而使拓撲拓撲。這些點表現出具有高斯變形場與目標姿勢和人類表達的一致性運動,從而促進有效的變形。此外,高斯點具有可控的形狀,尺寸,顏色和不透明度,並結合高斯的碎屑,可以進行有效的訓練和渲染。實驗證明了我們方法的出色性能,該方法在以前的方法中實現了最先進的結果。 ?紙|項目頁面|代碼(尚未)| ?簡短的演示
18。
作者:Haokai Pang,Heming Zhu,Adam Kortylewski,Christian Theobalt,Marc Habermann
抽象的
即時渲染逼真且可控制的人類頭像是電腦視覺和圖形學的基石。儘管神經隱式渲染的最新進展已解除了對數位化身的前所未有的光真相,但僅在靜態場景中證明了即時效能。為了解決這個問題,我們提出了Ash,這是一種可動畫的高斯分裂方法,用於即時對動態人類的逼真渲染。我們將穿衣服的人類參數化為動畫3D高斯人,可以有效地將其散佈到圖像空間中以產生最終的渲染。但是,在3D空間中,天真地學習高斯參數在計算上構成了嚴重的挑戰。取而代之的是,我們將高斯人連接到可變形的字元模型上,並在2D紋理空間中學習其參數,這允許利用高效的2D卷積體系結構,這些卷積體系結構可以輕鬆地使用所需數量的高斯人進行擴展。我們在可控制的化身上使用競爭方法進行基準灰分,這表明我們的方法比現有的即時方法大幅度優於現有的即時方法,並且比離線方法顯示出可比甚至更好的結果。 ?紙|項目頁面|代碼(尚未)| ?簡短的演示
19。
作者:Zhiyin Qian,Shaofei Wang,Marko Mihajlovic,Andreas Geiger,Siyu Tang
抽象的
我們介紹了一種方法,該方法使用3D高斯脫落(3DGS)從單眼影片中創建動畫的人類化身。基於神經輻射場(NERF)的現有方法實現了高品質的新穎視圖/新穎置態影像合成,但通常需要訓練數天,並且在推理時間非常緩慢。最近,該社區探索了快速的網格結構,以有效地訓練衣服的化身。儘管在訓練方面非常迅速,但這些方法幾乎無法達到約15 fps的互動式渲染幀速率。在本文中,我們使用3D高斯分裂並學習非剛性變形網路來重建可動的衣服的人體化身,可以在30分鐘內進行訓練,並在實時幀速率(50+ FPS)下進行渲染。鑑於我們代表的明確性質,我們進一步引入了高斯均值向量和協方差矩陣的可能性適當化,從而增強了我們對高度明顯的未見姿勢的模型的概括。實驗結果表明,與單眼輸入的動畫化頭像創造的最新方法相比,我們的方法具有可比性甚至更好的性能,而訓練和推理的速度分別為400倍和250倍。 ?紙|項目頁面|代碼| ?簡短的演示
20。
作者:Ye Yuan,Xueting Li,Yangyi Huang,Shalini de Mello,Koki Nagano,Jan Kautz,Umar Iqbal
抽象的
高斯脫落已成為強大的3D表示,它利用了顯式(網格)和隱式(NERF)3D表示的優勢。在本文中,我們試圖利用高斯脫落來從文字描述中產生可逼真的動畫化頭像,以解決基於網格或基於NERF的表示所施加的限制(例如,靈活性和效率)。但是,高斯分裂的幼稚應用不能產生高品質的動畫化身,而學習不穩定性;它也無法捕捉精細的頭像幾何形狀,並且經常導致身體部位退化。為了解決這些問題,我們首先提出了一個基於原始的3D高斯表示,在姿勢驅動的原語內定義了高斯人以促進動畫。其次,為了穩定和攤銷數百萬高斯人的學習,我們建議使用神經隱式領域來預測高斯屬性(例如顏色)。最後,為了捕捉精細的頭像幾何形狀並提取詳細的網格,我們為3D高斯人提出了一種新型的基於SDF的隱式網格學習方法,該方法使基本的幾何形狀正常,並提取了高度細緻的紋理網格。我們提出的方法Gavatar可以只使用文字提示來實現大規模生成的動畫化身。 Gavatar在外觀和幾何質量方面都顯著超過了現有的方法,並以1K分辨率達到了極快的渲染(100 fps)。 ?紙|項目頁面| ?簡短的演示
21。
作者:Hyunjun Jung,Nikolas Brasch,Jifei Song,Eduardo Perez-Pellitero,Yiren Zhou,Zhihao LI,Nassir Navab,Benjamin Busam
抽象的
神經輻射場的最新進展使新的視圖綜合了動態設定中的照片真實影像,可以將其應用於人類動畫的場景。但是,通常使用隱式骨幹來建立準確的模型,但是需要許多輸入視圖和其他註釋,例如人類掩模,紫外線圖和深度圖。在這項工作中,我們提出了Pardy-Human(參數化的動態人頭像),這是一種完全明確的方法,可以從單一單眼序列中建立數位化頭像。 Pardy-Human將參數驅動的動力學引入3D高斯裂片,其中3D高斯人被人姿勢模型變形以使化身動畫。我們的方法由兩個部分組成:第一個模組,該模組根據SMPL頂點和連續的模組變形了規範3D高斯,該模組進一步採用了其設計的關節編碼,並預測每個高斯變形,以處理SMPL Vertex變形以外的動態。然後,影像透過柵格製劑合成。 Pardy-Human構成了現實動態人體化身的明確模型,該模型需要更少的訓練觀點和圖像。我們的頭像學習沒有其他註釋,例如面具,並且可以透過可變背景進行培訓,同時即使在消費者硬體上也可以有效地推斷出全解析度影像。我們提供實驗證據表明,帕迪人類的表現優於ZJU-MOCAP和THUMAN4.0資料集最先進的方法。 ?紙
22。
作者:Mingwei Li,Jichen Tao,Zongxin Yang,Yi Yang
抽象的
透過單一影片重建人體在虛擬實境領域中扮演關鍵角色。一個普遍的應用程式場景需要快速重建高保真3D數位人類,同時確保即時渲染和互動。現有的方法通常難以滿足這兩個要求。在本文中,我們介紹了Human101,這是一個新穎的框架,旨在透過在100秒內訓練3D Gaussians從1影片中產生高保真動態3D人體重建,並在100+ fps中渲染。我們的方法利用了3D高斯碎片的優勢,這提供了3D人類的明確有效表示。 Human101與先前基於NERF的管道分開,巧妙地應用了一種以人為中心的前向高斯動畫方法來變形3D高斯的參數,從而提高了渲染速度(即,在令人印象深刻的60+ fps上渲染1024個解析度的影像,並呈現512--渲染512---- 100+ fps的解析度影像)。實驗結果表明,我們的方法基本上黯然失色,每秒的幀速度高達10倍,並提供了可比或出色的渲染品質。 ?紙|項目頁面|代碼(還沒有)
23。
作者:Yuelang Xu,Benwang Chen,Zhe Li,Hongwen Zhang,Lizhen Wang,Zerong Zheng,Yebin Liu
抽象的
創建高保真3D Head Avatars一直是研究熱點,但是在輕巧的稀疏視圖設定下仍然存在巨大的挑戰。在本文中,我們提出了高斯頭像,由可控制的3D高斯代表高保真頭像頭像建模。我們優化了中性的3D高斯和完全學習的基於MLP的變形場,以捕捉複雜表達式。這兩個部分彼此受益,因此我們的方法可以在確保表達精度的同時對細粒的細節進行建模。此外,我們設計了一個精心設計的幾何引導初始化策略,基於隱式SDF和深度訓練程序的穩定性和收斂性。實驗表明,我們的方法的表現優於其他最先進的稀疏視圖方法,即使在誇張的表達式下,也以2K解析度達到了超高保真渲染品質。 ?紙|項目頁面| |代碼| ?簡短的演示
24。
作者:Panwang Pan,Zhuo SU,Chenguo Lin,Zhen Fan,Yongjie Zhang,Zeming Li,Tingting Shen,Yadong Mu,Yadong Mu,Yebin Liu
抽象的
儘管最新的高保真人類重建技術取得了進步,但對密集捕獲的圖像或耗時的 /演算法優化的要求顯著阻礙了它們在更廣泛的情況下的應用。為了解決這些問題,我們介紹了人類平層,以預測任何人類從單一輸入影像中以可概括的方式從任何人中的3D高斯裂片。特別是,HumanSplat包括一個2D多視圖擴散模型和具有人類結構先驗的潛在重建變壓器,這些變壓器熟練地將幾何學先驗和語義特徵整合在統一的框架中。結合人類語義資訊的層次損失旨在實現高保真紋理建模,並更好地限制估計的多個視圖。對標準基準和野外影像進行的全面實驗表明,HumanSplat超過了實現影像逼真的新型視圖合成的現有最新方法。專案頁面:https://humansplat.github.io/。 ?紙|專案頁面
經典作品:
1。
作者:James F. Blinn
評論::第一張紙渲染3D高斯人。
抽象的
三維表面的數學描述通常屬於兩個分類之一:參數和隱式。隱式表面定義為滿足某些方程式f(x,y,z)= 0的所有點。像素座標被取代為X和Y,並為z求解方程式。用於繪製此類物件的演算法主要是針對一階和二階多項式函數(稱為代數表面的子類別)開發的。本文提出了一種適用於其他功能形式的新演算法,特別是多個高斯密度分佈的總和。該演算法的創建是為了建模分子結構的電子密度圖,但可以用於其他藝術有趣的形狀。 ?紙
2。
作者:Leonid Keselman和Martial Hebert
評論::第一篇論文進行3D高斯的渲染優化。
抽象的
可區分的渲染器在物件的3D表示與該物件的圖像之間提供了直接的數學連結。在這項工作中,我們為緊湊的,可解釋的表示形式開發了一個近似可區分的渲染器,我們稱之為模糊的metaballs。我們的大約渲染器著重於透過深度圖和輪廓渲染形狀。它犧牲了為實用程式提供忠誠,產生快速運行時間和可用於解決視力任務的高品質梯度資訊。與基於網格的可區分渲染器相比,我們的方法的正向通過速度更快5倍,向後傳球速度快30倍。我們方法產生的深度圖和輪廓影像在任何地方都平滑且定義。在對可區分渲染器進行姿勢估計的評估時,我們表明我們的方法是唯一一種與經典技術相媲美的方法。在Silhouette的形狀上,我們的方法僅使用梯度下降和每像素損失,而沒有任何替代損失或正則化。這些重建即使在具有分割工件的自然視訊序列上也很好地工作。 ?紙|項目頁面|代碼| ?簡短的演示
3。
作者:JanU.Müller,Michael Weinmann,Reinhard Klein
評論:透過基礎3D表示形式建構2D螢幕空間高斯。
抽象的
我們提出了一個有效且GPU加速的採樣框架,該框架可以基於表面裂紋,實現無偏梯度的近似值,以實現可區分的點雲渲染。我們的框架將點對渲染圖像作為機率分佈的貢獻進行了建模。我們在此模型中為渲染函數得出無偏的近似梯度。為了有效評估所提出的樣品估計,我們引入了基於樹的資料結構,該資料結構採用多極方法在幾乎線性時間內繪製樣品。我們的梯度估計器使我們能夠避免以前的方法需要正規化,從而從圖像中恢復了更忠實的形狀。此外,我們透過完善從即時大滿貫系統獲得的相機姿勢和點雲來驗證這些改進適用於現實世界應用程式。最後,在神經渲染設定中採用我們的框架優化了點雲和網路參數,突顯了此框架增強資料驅動方法的能力。 ?紙本代碼
4。
作者:PETR MAN
評論:各向異性高斯人的裂開。基本上是3DGS的非差異實作。
抽象的
本文提出了一種產生和即時渲染的方法。 Perlin噪音函數產生雲的三維圖。我們還提出了一種兩次渲染演算法,該演算法是基於物理近似的。在第一個預處理階段中,它計算多個正向散射。在第二階段,一階在運行時處於各向異性散射。產生的地圖儲存為體素,不適合即時渲染。我們引入了更合適的雲內表示,該表示近似原始地圖,並包含更少的資訊。然後,雲由一組具有中心位置,半徑和密度值等參數的Metaballs(球形)表示。本文的主要貢獻是提出一種方法,該方法將原始雲圖將其轉換為內部表示。此方法使用徑向基底函數(RBF)神經網路。 ?紙
壓縮:
2024 年:
1。
作者:Panagiotis Papantonakis,Georgios Kopanas,Bernhard Kerbl,Alexandre Lanvin,George Drettakis
抽象的
3D高斯脫落為新穎的視圖綜合提供了出色的視覺質量,並透過快速的訓練和即時渲染。不幸的是,該方法的儲存和傳輸方法的記憶體要求不合理。我們首先分析原因,確定可以減少儲存的三個主要領域:用於表示場景的3D高斯原始數量,用於表示方向輻射的球形諧波的係數數量以及儲存所需的精度高斯原始屬性。我們為每個問題提供了一個解決方案。首先,我們提出了一種有效的,解決的原始修剪方法,將原始計數減少了一半。其次,我們引入了一種自適應調整方法,以選擇用於表示每個高斯原始原始性的定向輻射的係數數量,最後是一種基於密碼的量化方法,以及半循環表示,以進一步減少記憶體。綜上所述,這三個元件會導致我們測試的標準資料集上的磁碟上的整體尺寸降低,以及渲染速度的X1.7加速度。我們在標準資料集上演示了我們的方法,並展示了我們的解決方案在使用行動裝置上使用該方法時如何顯著減少下載時間(請參見圖1)。 ?紙|項目頁面|代碼(還沒有)
2。
作者:Simon Niedermayr,Josef Stumpfegger,RüdigerWestermann
抽象的
最近,從稀疏影像集中引入了具有最佳化3D高斯SPLAT表示的高保真場景重建,以進行新的視圖合成。製作適合網路串流和低功率設備渲染等應用的表示形式,需要大幅減少記憶體消耗以及提高渲染效率。我們提出了一個壓縮的3D高斯sprat表示,該表示利用敏感性感知的向量聚類和量化感知訓練來壓縮方向顏色和高斯參數。博學的代碼手冊的比特率很低,在現實世界中的壓縮率最高為31倍,而視覺品質的降低僅最小。我們證明,與經過優化的GPU Compute Pipeline報道的輕量級GPU上的硬體柵格性高達4倍的幀速率可以有效地呈現壓縮的SPLAT表示形式。多個資料集的廣泛實驗證明了所提出方法的穩健性和渲染速度。 ?紙|項目頁面|程式碼
3。
作者:Yihang Chen,Qianyi Wu,Jianfei Cai,Mehrtash Harandi,Weiyao Lin
抽象的
3D高斯脫落(3DGS)已成為新型視圖合成的有前途的框架,以高忠誠度具有快速渲染速度。但是,實質的高斯及其相關屬性需要有效的壓縮技術。然而,高斯雲(或我們論文中的錨)的稀疏和無組織的性質給壓縮帶來了挑戰。為了解決這個問題,我們利用無組織的錨和結構化的哈希網格之間的關係,利用它們的互資訊進行上下文建模,並提出了一個高度緊湊的3DGS表示的哈希網格輔助上下文(HAC)框架。我們的方法引入了二元哈希網格,以建立連續的空間一致性,使我們能夠透過精心設計的上下文模型揭示錨固固有的空間關係。為了促進熵編碼,我們利用高斯分佈來準確估計每個量化屬性的機率,其中提出了一個自適應量化模組以實現這些屬性的高度精確量化以改善保真度恢復。此外,我們結合了一種自適應掩蔽策略,以消除無效的高斯和錨。重要的是,我們的工作是探索基於上下文的3DG表示形式壓縮的先驅,與Vanilla 3DG相比,尺寸顯著降低了75倍以上,同時提高了忠誠度,並提高了11倍以上的尺寸,超過11倍SOTA 3DGS 3DGS壓縮進場助手caffold cacaffold scapfold scapfold scapfold scapfold scacfold scacfold cackaffold降低-gs。 ?紙|項目頁面|程式碼
4。
作者:Henan Wang,Hanxin Zhu,Tianyu HE,Runsen Feng,Jiajun Deng,Jiang Bian,Zhibo Chen
抽象的
3D高斯脫落(3DG)已成為一種新興技術,在3D表示和影像渲染方面具有巨大的潛力。但是,3DG的大量儲存開銷極大地阻礙了其實際應用。在這項工作中,我們將緊湊的3D高斯學習作為端到端率延伸優化(RDO)問題,並提出可以實現靈活且連續的速率控制的RDO-Gaussian。 RDO-Gaussian解決了目前方案中存在的兩個主要問題:1)與先前的努力不同,以最大程度地減少固定失真下的速率,我們引入了動態修剪和熵約束的向量量化(ECVQ),以優化同一速率和失真時間。 2)先前的作品平均處理每個高斯的顏色,而我們對不同區域和材料的顏色進行建模,並具有可學習的參數數量。我們在真實和合成場景上驗證了我們的方法,展示了RDO-Gaussian大大降低了40倍以上的3D高斯的大小,並且超過了速率降低性能的現有方法。 ?紙|項目頁面|程式碼
5.3dgs.zip:關於3D高斯脫壓縮方法的調查
作者:Milena T. Bagdasarian,Paul Knoll,Florian Barthel,Anna Hilsmann,Peter Eisert,Wieland Morgenstern
抽象的
我們對3D高斯脫落壓縮方法進行了進行中的調查,重點介紹了它們在各種基準測試中的統計性能。該調查旨在透過總結清單格式的不同壓縮方法的關鍵統計數據來促進可比較性。評估的資料集包括Tanksandtplass,Mipnerf360,Deep -Bluending和Syntheticnerf。對於每種方法,我們報告了峰值信噪比(PSNR),結構相似性指數(SSIM),學習的知覺圖像貼片相似性(LPIPS)和兆位元組(MB)的結果大小,如各自的作者所提供。這是一個正在進行的開放項目,我們將研究社區的捐款作為GitHub發行或拉動請求。請造訪http://wm.github.io/3dgs-compression-survey/有關表格的更多資訊和可排序的版本。 ?紙|專案頁面
6。
作者:Yuang Shi,Simone Gasparini,GéraldineMorin,Wei Tsang ooi,
抽象的
擴展現實(XR)的興起需要有效的3D線上世界流,挑戰當前的3DGS表示,以適應頻寬受限的環境。我們提出了Lapisgs,這是一種支援自適應流和漸進渲染的分層3DG。我們的方法建立了累積表示形式的分層結構,結合了動態不透明度最佳化以維持視覺保真度,並利用佔用圖有效地管理高斯夾層。該提出的模型提供了一種漸進的表示,支援為頻寬感知流的連續渲染品質。廣泛的實驗驗證了我們方法在平衡視覺保真度與模型的緊湊性方面的有效性,SSIM提高了50.71%,LPIPS提高了286.53%,模型尺寸減少了318.41%,並顯示了頻寬適應的潛力3D流和渲染應用程式。 ?紙|專案頁面
7。
作者:Minye Wu,Tinne Tuytelaars
抽象的
高斯碎片(3DGS)顯著地驅動了照片現實的小說視圖合成的最新進展。然而,3DGS資料的明確性質需要大量的儲存要求,突顯了對更有效的資料表示的緊迫需求。為了解決這個問題,我們提出了隱式高斯脫落(IGS),這是一種創新的混合模型,透過多層三平面體系結構將顯式點雲與隱式特徵嵌入式整合在一起。此體系結構具有不同層級的各種解析度的2D特徵網格,從而促進了連續的空間域表示並增強高斯基原始人之間的空間相關性。在這個基礎的基礎上,我們引入了一種基於級別的漸進培訓計劃,該計劃結合了明確的空間正則化。此方法利用了空間相關性,以增強IGS表示的渲染品質和緊湊性。此外,我們提出了一條針對點雲和2D特徵網格量身定制的新型壓縮管道,考慮到不同等級的熵變化。廣泛的實驗評估表明,我們的演算法只能使用幾個MBS提供高品質的渲染,從而有效地平衡了儲存效率和呈現忠誠度,並產生了與先進的藝術品競爭的結果。 ?紙|程式碼
2023 年:
1。
作者:Zhiwen Fan,Kevin Wang,Kairun Wen,Zehao Zhu,Dejia Xu,Zhangyang Wang
抽象的
即時神經渲染的最新進步使用基於點的技術為廣泛採用3D表示鋪平了道路。然而,諸如3D高斯碎片之類的基礎方法伴隨著由SFM點增加到數百萬人引起的大量存儲開銷,通常要求千兆字節級的磁碟空間用於單個無限的場景,帶來了重大的可擴展性挑戰,並阻礙了拆卸效率。為了應對這項挑戰,我們引入了Lightgaussian,這是一種新型方法,旨在將3D高斯人轉變為更高效,更緊湊的格式。 Lightgaussian從網路修剪的概念中汲取靈感,識別出對現場重建無關緊要的高斯人,並採用修剪和恢復過程,有效地減少了高斯計數的冗餘,同時保留了視覺效果。此外,Lightgauss還採用蒸餾和偽視圖的增強來使球形諧波在較低程度上蒸餾,從而使知識傳遞到更緊湊的表示,同時保持反射率。此外,我們提出了一種混合方案,即vectree量化,以量化所有屬性,從而導致較低的位元寬度表示,且精度損失最小。總而言之,Lightgaussian實現了15倍以上的平均壓縮率,同時將FPS從139提高到215,從而有效地表示了MIP-NERF 360,TANK和TEMPLE DATASET上的複雜場景。 ?紙|項目頁面|代碼| ?簡短的演示
2. compact3d:用向量量化的壓縮高斯SPLAT輻射場模型
作者:KL Navaneet,Kossar Pourahmadi Meibodi,Soroush Abbasi Koohpayegani,Hamed Pirsiavash
抽象的
與SOTA NERF方法相比,3D高斯脫落是一種建模和渲染3D輻射場的新方法,它可以實現更快的學習和渲染時間。但是,與NERF方法相比,它在更大的儲存需求中存在缺點,因為它需要儲存幾個3D高斯人的參數。我們注意到許多高斯人可能共享相似的參數,因此我們基於 kmeans演算法引入了一種簡單的向量量化方法來量化高斯參數。然後,我們將小密碼簿以及每個高斯的程式碼索引以及程式碼索引一起儲存。此外,我們透過對索引進行排序和使用類似於運行長度編碼的方法來進一步壓縮索引。我們對標準基準和新基準進行了廣泛的實驗,該實驗比標準基準大的數量級。我們表明,我們簡單而有效的方法可以將原始3D高斯分裂方法的儲存成本降低幾乎20倍,而渲染影像的品質下降很小。 ?紙|程式碼
3。
作者:Joo Chan Lee,Daniel Rho,Xiangyu Sun,Jong Hwan Ko,Eunbyung Park
抽象的
神經輻射場(NERFS)在捕捉具有高保真度的複雜3D場景方面具有巨大的潛力。 However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. In our extensive experiments, we consistently show over 10× reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. ?紙|項目頁面|程式碼
4. [ECCV '24] Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Authors : Wieland Morgenstern, Florian Barthel, Anna Hilsmann, Peter Eisert
抽象的
3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it utilizes efficient rasterization allowing for very fast rendering at high-quality. However, the storage size is significantly higher, which hinders practical deployment, eg on resource constrained devices. In this paper, we introduce a compact scene representation organizing the parameters of 3D Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a drastic reduction in storage requirements without compromising visual quality during rendering. Central to our idea is the explicit exploitation of perceptual redundancies present in natural scenes. In essence, the inherent nature of a scene allows for numerous permutations of Gaussian parameters to equivalently represent it. To this end, we propose a novel highly parallel algorithm that regularly arranges the high-dimensional Gaussian parameters into a 2D grid while preserving their neighborhood structure. During training, we further enforce local smoothness between the sorted parameters in the grid. The uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless integration with established renderers. Our method achieves a reduction factor of 17x to 42x in size for complex scenes with no increase in training time, marking a substantial leap forward in the domain of 3D scene distribution and consumption. ?紙|項目頁面|程式碼
擴散:
2024 年:
1. AGG: Amortized Generative 3D Gaussians for Single Image to 3D
Authors : Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat
抽象的
Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster. ?紙| Project Page| ?簡短的演示
2. Fast Dynamic 3D Object Generation from a Single-view Video
Authors : Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang
抽象的
Generating dynamic three-dimensional (3D) object from a single-view video is challenging due to the lack of 4D labeled data. Existing methods extend text-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling, but they are slow and expensive to scale (eg, 150 minutes per object) due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this limitation, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly train a novel 4D Gaussian splatting model with explicit point cloud geometry, enabling real-time rendering under continuous camera trajectories. Extensive experiments on synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the same level of innovative view synthesis quality. For example, Efficient4D takes only 14 minutes to model a dynamic object. ?紙|項目頁面|代碼| ?簡短的演示
3. GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
抽象的
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2)由於視圖覆蓋不足,部分省略或高度壓縮物件資訊。 To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation.然後,我們基於擴散模型建立高斯修復模型來補充遺漏的物件訊息,其中高斯被進一步細化。我們設計了一種自生成策略來獲取圖像對來訓練修復模型。 Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods. ?紙|項目頁面|代碼| ?簡短的演示
4.LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-view Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: (1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. (2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation. ?紙|項目頁面|程式碼
5. GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Authors : Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
抽象的
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an object-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interples object ed from the LLMs to align with the generated scene 。 Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. ?紙|項目頁面|代碼(還沒有)
6. IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
Authors : Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
抽象的
Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In this paper, we further explore the design space of text-to-3D models. We significantly improve multi-view generation by considering video instead of image generators. Combined with a 3D reconstruction algorithm which, by using Gaussian splatting, can optimize a robust image-based loss, we directly produce high-quality 3D outputs from the generated views. Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x, resulting in a much more efficient pipeline, better quality, fewer geometric inconsistencies, and higher yield of usable 3D assets. ?紙
7. Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Authors : Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
抽象的
雖然文字到 3D 和圖像到 3D 生成任務受到了相當多的關注,但它們之間的一個重要但尚未充分探索的領域是可控文本到 3D 生成,我們在這項工作中主要關注這一點。 To address this task, 1) we introduce ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps.我們的創新在於引入了一個調節模組,該模組使用局部和全局嵌入來控制基礎擴散模型,這些嵌入是根據輸入條件影像和相機姿勢計算得出的。經過訓練後,MVControl 能夠為基於最佳化的 3D 生成提供 3D 擴散指導。 And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm.基於我們的 MVControl 架構,我們採用獨特的混合擴散引導方法來指導最佳化過程。為了追求效率,我們採用 3D 高斯作為表示,而不是常用的隱式表示法。我們也率先使用 SuGaR,這是一種將高斯函數綁定到網格三角形面的混合表示形式。這種方法緩解了 3D 高斯幾何形狀較差的問題,並能夠在網格上直接雕刻細粒度幾何形狀。大量實驗表明,我們的方法實現了穩健的泛化,並能夠可控地產生高品質的 3D 內容。 ?紙|項目頁面|程式碼
8. Hyper-3DG:Text-to-3D Gaussian Generation via Hypergraph
Authors : Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gao
抽象的
Text-to-3D generation represents an exciting field that has seen rapid advancements, facilitating the transformation of textual descriptions into detailed 3D models. However, current progress often neglects the intricate high-order correlation of geometry and texture within 3D objects, leading to challenges such as over-smoothness, over-saturation and the Janus problem. In this work, we propose a method named ``3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects. Our framework is anchored by a well-established mainflow and an essential module, named ``Geometry and Texture Hypergraph Refiner (HGRefiner)''. This module not only refines the representation of 3D Gaussians but also accelerates the update process of these 3D Gaussians by conducting the Patch-3DGS Hypergraph Learning on both explicit attributes and latent visual features. Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation. Extensive experimentation has shown that our proposed method significantly enhances the quality of 3D generation while incurring no additional computational overhead for the underlying framework. ?紙|代碼(還沒有)
9. DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Authors : Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik Hang Lee, Pengyuan Zhou
抽象的
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors, increasingly capturing the attention of both academic and industry circles. Despite significant progress, current methods still struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework that leverages Formation Pattern Sampling (FPS) for core structuring, augmented with a strategic intera sampling and這些障礙。 FPS, guided by the formation patterns of 3D objects, employs multi-timesteps sampling to quickly form semantically rich, high-quality representations, uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. The camera sampling strategy incorporates a progressive three-stage approach, specifically designed for both indoor and outdoor settings, to effectively ensure scene-wide 3D consistency. DreamScene enhances scene editing flexibility by combining objects and environments, enabling targeted adjustments. Extensive experiments showcase DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. ?紙|項目頁面|代碼(還沒有)
10. FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
Authors : Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang
抽象的
Reconstructing detailed 3D objects from single-view images remains a challenging task due to the limited information available. In this paper, we introduce FDGaussian, a novel two-stage framework for single-image 3D reconstruction. Recent methods typically utilize pre-trained 2D diffusion models to generate plausible novel views from the input image, yet they encounter issues with either multi-view inconsistency or lack of geometric fidelity. To overcome these challenges, we propose an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, enabling the generation of consistent multi-view images. Moreover, we further accelerate the state-of-the-art Gaussian Splatting incorporating epipolar attention to fuse images from different viewpoints. We demonstrate that FDGaussian generates images with high consistency across different views and reconstructs high-quality 3D objects, both qualitatively and quantitatively. ?紙|專案頁面
11. BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors
Authors : Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, Baoquan Chen
抽象的
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods. ?紙
12. BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis
Authors : Lutao Jiang, Lin Wang
抽象的
Text-to-3D synthesis has recently seen intriguing advances by combining the text-to-image models with 3D representation methods, eg, Gaussian Splatting (GS), via Score Distillation Sampling (SDS). However, a hurdle of existing methods is the low efficiency, per-prompt optimization for a single 3D object. Therefore, it is imperative for a paradigm shift from per-prompt optimization to one-stage generation for any unseen text prompts, which yet remains challenging. A hurdle is how to directly generate a set of millions of 3D Gaussians to represent a 3D object. This paper presents BrightDreamer, an end-to-end single-stage approach that can achieve generalizable and fast (77 ms) text-to-3D generation. Our key idea is to formulate the generation process as estimating the 3D deformation from an anchor shape with predefined positions. For this, we first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions, used as the centers (one attribute) of 3D Gaussians. To estimate the other four attributes (ie, scaling, rotation, opacity, and SH coefficient), we then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object. The center of each Gaussian enables us to transform the triplane feature into the four attributes. The generated 3D Gaussians can be finally rendered at 705 frames per second. Extensive experiments demonstrate the superiority of our method over existing methods. Also, BrightDreamer possesses a strong semantic understanding capability even for complex text prompts. ?紙|項目頁面|程式碼
13. GVGEN: Text-to-3D Generation with Volumetric Representation
Authors : Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He
抽象的
In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed (∼7 seconds), effectively striking a balance between quality and efficiency. ?紙|項目頁面|代碼(還沒有)
14. SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
Authors : Jaihoon Kim, Juil Koo, Kyeongmin Yeo, Minhyuk Sung
抽象的
We introduce a general framework for generating diverse visual content, including ambiguous images, panorama images, mesh textures, and Gaussian splat textures, by synchronizing multiple diffusion processes. We present exhaustive investigation into all possible scenarios for synchronizing multiple diffusion processes through a canonical space and analyze their characteristics across applications. In doing so, we reveal a previously unexplored case: averaging the outputs of Tweedie's formula while conducting denoising in multiple instance spaces. This case also provides the best quality with the widest applicability to downstream tasks. We name this case SyncTweedies. In our experiments generating visual content aforementioned, we demonstrate the superior quality of generation by SyncTweedies compared to other synchronization methods, optimization-based and iterative-update-based methods. ?紙|項目頁面|代碼(還沒有)
15. STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Authors : Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao
抽象的
Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from 3D generation techniques, we utilize a multi-view diffusion model to initialize multi-view images anchoring on the input video frames, where the video can be either real-world captured or generated by a video diffusion model. To ensure the temporal consistency of the multi-view sequence initialization, we introduce a simple yet effective fusion strategy to leverage the first frame as a temporal anchor in the self-attention computation. With the almost consistent multi-view sequences, we then apply the score distillation sampling to optimize the 4D Gaussian point cloud. The 4D Gaussian spatting is specially crafted for the generation task, where an adaptive densification strategy is proposed to mitigate the unstable Gaussian gradient for robust optimization. Notably, the proposed pipeline does not require any pre-training or fine-tuning of diffusion networks, offering a more accessible and practical solution for the 4D generation task. Extensive experiments demonstrate that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video. ?紙|項目頁面|代碼| ?簡短的演示
16. Comp4D: LLM-Guided Compositional 4D Scene Generation
Authors : Dejia Xu, Hanwen Liang, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Plataniotis, Zhangyang Wang
抽象的
Recent advancements in diffusion models for 2D and 3D content creation have sparked a surge of interest in generating 4D content. However, the scarcity of 3D scene datasets constrains current methodologies to primarily object-centric generation. To overcome this limitation, we present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Utilizing Large Language Models (LLMs), the framework begins by decomposing an input text prompt into distinct entities and maps out their trajectories. It then constructs the compositional 4D scene by accurately positioning these objects along their designated paths. To refine the scene, our method employs a compositional score distillation technique guided by the pre-defined trajectories, utilizing pre-trained diffusion models across text-to-image, text-to-video, and text-to-3D domains. Extensive experiments demonstrate our outstanding 4D content creation capability compared to prior arts, showcasing superior visual quality, motion fidelity, and enhanced object interactions. ?紙|項目頁面| Code (not yet) | ?簡短的演示
17. DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Authors : Yuanze Lin, Ronald Clark, Philip Torr
抽象的
We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions. ?紙|項目頁面|代碼(還沒有)
18. SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Authors : Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai
抽象的
Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions. ?紙|項目頁面| Code (not yet) | ?簡短的演示
19. Hash3D: Training-free Acceleration for 3D Generation
Authors : Xingyi Yang, Xinchao Wang
抽象的
The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models, demonstrate Hash3D's versatility to speed up optimization, enhancing efficiency by 1.3 to 4 times. Additionally, Hash3D's integration with 3D Gaussian splatting largely speeds up 3D model creation, reducing text-to-3D processing to about 10 minutes and image-to-3D conversion to roughly 30 seconds. ?紙|項目頁面|程式碼
20. Zero-shot Point Cloud Completion Via 2D Priors
Authors : Tianxin Huang, Zhiwen Yan, Yuyang Zhao, Gim Hee Lee
抽象的
3D point cloud completion is designed to recover complete shapes from partially observed point clouds. Conventional completion methods typically depend on extensive point cloud data for training %, with their effectiveness often constrained to object categories similar to those seen during training. In contrast, we propose a zero-shot framework aimed at completing partially observed point clouds across any unseen categories. Leveraging point rendering via Gaussian Splatting, we develop techniques of Point Cloud Colorization and Zero-shot Fractal Completion that utilize 2D priors from pre-trained diffusion models to infer missing regions. Experimental results on both synthetic and real-world scanned point clouds demonstrate that our approach outperforms existing methods in completing a variety of objects without any requirement for specific training data. ?紙
21. [ECCV '24] DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Authors : Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi
抽象的
對虛擬實境應用程式日益增長的需求凸顯了製作沉浸式 3D 資產的重要性。 We present a text-to-3D 360∘ scene generation pipeline that facilitates the creation of comprehensive 360∘ scenes for in-the-wild environments in a matter of minutes.我們的方法利用 2D 擴散模型的生成能力並迅速自我完善,以創建高品質且全局一致的全景影像。 This image acts as a preliminary "flat" (2D) scene representation.隨後,它被提升為 3D 高斯,採用噴射技術來實現即時探索。為了產生一致的 3D 幾何形狀,我們的管道透過將 2D 單目深度對齊到全域最佳化的點雲來建構空間相干結構。此點雲用作 3D 高斯質心的初始狀態。為了解決單一視圖輸入中固有的隱形問題,我們對合成和輸入相機視圖施加語義和幾何約束作為正規化。這些指導高斯的最佳化,幫助重建未見過的區域。 In summary, our method offers a globally consistent 3D scene within a 360∘ perspective, providing an enhanced immersive experience over existing techniques. ?紙|項目頁面|代碼| ?簡短的演示
22. RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
Authors : Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi
抽象的
我們介紹 RealmDreamer,這是一種根據文字描述產生通用前向 3D 場景的技術。我們的技術優化了 3D 高斯潑濺表示以匹配複雜的文字提示。我們透過利用最先進的文字到圖像生成器來初始化這些圖,將它們的樣本提升為 3D,並計算遮蔽體積。然後,我們使用影像條件擴散模型將這種跨多個視圖的表示優化為 3D 修復任務。為了學習正確的幾何結構,我們透過對修復模型中的樣本進行調節來合併深度擴散模型,從而提供豐富的幾何結構。最後,我們使用影像產生器中的銳利化樣本對模型進行微調。 Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image ?紙|項目頁面|代碼(還沒有)
23. GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
Authors : Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo
抽象的
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. Existing radiance representations either require an implicit feature decoder, which significantly degrades the modeling power of the representation, or are spatially unstructured, making them difficult to integrate with mainstream 3D diffusion methods. We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting using a fixed number of free Gaussians, and then rearranging these Gaussians into a predefined voxel grid via Optimal Transport. Since GaussianCube is a structured grid representation, it allows us to use standard 3D U-Net as our backbone in diffusion modeling without elaborate designs. More importantly, the high-accuracy fitting of the Gaussians allows us to achieve a high-quality representation with orders of magnitude fewer parameters than previous structured representations for comparable quality, ranging from one to two orders of magnitude. The compactness of GaussianCube greatly eases the difficulty of 3D generative modeling. Extensive experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D synthesis all show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a highly accurate and versatile radiance representation for 3D generative modeling. ?紙|項目頁面|程式碼
24. 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Authors : Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee
抽象的
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal deformation based on the canonical representation to capture dynamic interactions in the reference video. The pipeline facilitates the generation of dynamic scenes with enhanced photorealism and structural integrity, viewable from multiple perspectives, thereby setting a new standard in 4D scene generation. ?紙|項目頁面|代碼(還沒有)
2023 年:
1. [CVPR '24] Text-to-3D using Gaussian Splatting
Authors : Zilong Chen, Feng Wang, Huaping Liu
抽象的
在本文中,我們提出了基於高斯 Splatting 的文本轉 3D 生成 (GSGEN),這是一種產生高品質 3D 物件的新方法。由於缺乏 3D 事先和正確的表示,先前的方法存在幾何不準確和保真度有限的問題。我們利用 3D 高斯分佈(3D Gaussian Splatting)(一種最近最先進的表示方法),透過利用能夠合併 3D 先驗的顯式性質來解決現有的缺點。 Specifically, our method adopts a pro- gressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape.隨後,對所獲得的高斯進行迭代細化以豐富細節。 In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity.透過這些設計,我們的方法可以產生具有精緻細節和更精確幾何形狀的 3D 內容。 Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. ?紙|項目頁面|代碼| ?簡短介紹 | ?解說視頻
2. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Authors : Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng
抽象的
3D 內容創建的最新進展主要利用透過分數蒸餾採樣 (SDS) 進行基於最佳化的 3D 生成。儘管已經展現出有希望的結果,但這些方法常常受到每個樣本優化緩慢的影響,限制了它們的實際使用。在本文中,我們提出了 DreamGaussian,一種新穎的 3D 內容生成框架,可以同時實現效率和品質。我們的主要見解是設計一個生成 3D 高斯潑濺模型,並在 UV 空間中進行網格提取和紋理細化。與神經輻射場中使用的佔用修剪相反,我們證明了 3D 高斯的漸進緻密化對於 3D 生成任務的收斂速度明顯更快。為了進一步提高紋理品質並促進下游應用,我們引入了一種有效的演算法將 3D 高斯轉換為紋理網格,並應用微調階段來細化細節。大量的實驗證明了我們提出的方法具有卓越的效率和有競爭力的發電品質。值得注意的是,DreamGaussian 僅需 2 分鐘即可從單視圖影像生成高品質的紋理網格,與現有方法相比,實現了約 10 倍的加速。 ?紙|項目頁面|代碼| ?解說視頻
3. GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Authors : Taoran Yi1, Jiemin Fang, Guanjun Wu1, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Tian Qi, Xinggang Wang
抽象的
最近,根據文字提示產生 3D 資產已顯示出令人印象深刻的結果。 Both 2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D 擴散模型具有良好的 3D 一致性,但由於可訓練的 3D 資料昂貴且難以取得,因此其品質和泛化能力受到限制。 2D diffusion models enjoy strong abilities of generalization and fine generation, but the 3D consistency is hard to guarantee.本文試圖透過最近明確且高效的 3D 高斯分佈表示來橋接兩種類型的擴散模型的力量。 A fast 3D generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides point cloud priors for initialization and the 2D diffusion model enriches the geometry and appearance.引入雜訊點生長和色彩擾動操作來增強初始化高斯。 Our GaussianDreamer can generate a high-quality 3D instance within 25 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. ?紙|項目頁面|程式碼
4. GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise
Authors : Xinhai Li, Huaibin Wang, Kuo-Kun Tseng
抽象的
Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain.然而,Nerf 和 2D 擴散模型的融合經常會產生過飽和的影像,由於像素渲染方法的限制,對下游工業應用造成了嚴重限制。 Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes. ?紙
5. [CVPR '24] LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Authors : Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen
抽象的
The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency. ?紙|程式碼
6. LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Authors : Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee
抽象的
隨著VR設備和內容的廣泛使用,對3D場景產生技術的需求變得更加普遍。然而,現有的 3D 場景生成模型將目標場景限制在特定領域,這主要是由於其使用遠離現實世界的 3D 掃描資料集的訓練策略。 To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment.首先,為了從輸入產生多視圖一致影像,我們將點雲設定為每次影像產生的幾何指南。具體來說,我們將點雲的一部分投影到所需的視圖,並提供投影作為使用生成模型進行修復的指導。修復後的影像透過估計的深度圖提升到 3D 空間,組成新的點。其次,為了將新點聚合到 3D 場景中,我們提出了一種對齊演算法,該演算法和諧地整合了新生成的 3D 場景的各個部分。最終獲得的3D場景作為優化高斯圖的初始點。 LucidDreamer 產生的高斯圖比先前的 3D 場景產生方法更詳細,並且對目標場景的領域沒有限制。 ?紙|項目頁面|程式碼
7. [CVPR '24] HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Authors : Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu
抽象的
Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.我們的主要見解是,3D 高斯潑濺是一種具有週期性高斯收縮或增長的高效渲染器,其中這種自適應密度控制可以由內在的人體結構自然地引導。具體來說,1)我們首先提出了一種結構感知的 SDS,它可以同時優化人體外觀和幾何形狀。 The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue.在僅剪枝階段根據高斯大小進一步消除浮動偽影,以增強生成平滑度。 Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. ?紙|項目頁面|代碼| ?簡短的演示
8. CG3D: Compositional Generation for Text-to-3D
Authors : Alexander Vilesov, Pradyumna Chari, Achuta Kadambi
抽象的
With the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi-object scenes, (ii) inability to textually control multi-object configurations, and (iii) physically realistic scene composition. In this work, we propose CG3D, a method for compositionally generating scalable 3D assets that resolves these constraints. We find that explicit Gaussian radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes. By utilizing a guidance framework built around this explicit representation, we show state of the art results, capable of even exceeding the guiding diffusion model in terms of object combinations and physics accuracy. ?紙|項目頁面| | ?簡短的演示
9. Learn to Optimize Denoising Scores for 3D Generation - A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting
Authors : Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu and Guosheng Lin
抽象的
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss. ?紙|項目頁面|程式碼
10. [CVPR '24] Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Authors : Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
抽象的
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. ?紙|專案頁面
11. DreamGaussian4D: Generative 4D Gaussian Splatting
Authors : Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu
抽象的
最近,4D 內容生成取得了顯著進展。然而,現有方法存在優化時間長、缺乏運動可控性和細節程度低的問題。在本文中,我們介紹了 DreamGaussian4D,這是一個基於 4D 高斯 Splatting 表示的高效 4D 生成框架。我們的主要見解是,與隱式表示相比,高斯分佈中空間變換的明確建模使其更適合 4D 生成設定。 DreamGaussian4D 將優化時間從幾個小時縮短到幾分鐘,允許靈活控制生成的 3D 運動,並產生可以在 3D 引擎中高效渲染的動畫網格。 ?紙|項目頁面|程式碼
12. 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
Authors : Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei
抽象的
Aided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs, they incur significant time and effort in prompt engineering through trial and error. This work introduces 4DGen, a novel, holistic framework for grounded 4D content generation that decomposes the 4D generation task into multiple stages. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos), thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution supervision through rendering during training, thereby facilitating high-quality 4D generation. Additionally, we employ spatial-temporal pseudo labels on anchor frames, along with seamless consistency priors implemented through 3D-aware score distillation sampling and smoothness regularizations. Compared to existing baselines, our approach yields competitive results in faithfully reconstructing input signals and realistically inferring renderings from novel viewpoints and timesteps. Most importantly, our method supports grounded generation, offering users enhanced control, a feature difficult to achieve with previous methods. ?紙|項目頁面|代碼| ?簡短的演示
13. Text2Immersion: Generative Immersive Scene with 3D Gaussian
Authors : Hao Ouyang, Kathryn Heal, Stephen Lombardi, Tiancheng Sun
抽象的
We introduce Text2Immersion, an elegant method for producing high-quality 3D immersive scenes from text prompts. Our proposed pipeline initiates by progressively generating a Gaussian cloud using pre-trained 2D diffusion and depth estimation models. This is followed by a refining stage on the Gaussian cloud, interpolating and refining it to enhance the details of the generated scene. Distinct from prevalent methods that focus on single object or indoor scenes, or employ zoom-out trajectories, our approach generates diverse scenes with various objects, even extending to the creation of imaginary scenes. Consequently, Text2Immersion can have wide-ranging implications for various applications such as virtual reality, game development, and automated content creation. Extensive evaluations demonstrate that our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation. ?紙|項目頁面|代碼(還沒有)
14. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Authors : Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
抽象的
Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch. ?紙|項目頁面|代碼(還沒有)
Dynamics and Deformation:
2024 年:
1. 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
Authors : Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen
抽象的
我們考慮動態場景的新穎視圖合成(NVS)問題。最近的神經方法已經在靜態 3D 場景中取得了出色的 NVS 結果,但擴展到 4D 時變場景仍然很重要。先前的努力通常透過學習規範空間加上隱式或顯式變形場來編碼動態,這在突然運動或捕捉高保真渲染等具有挑戰性的場景中遇到困難。 In this paper, we introduce 4D Gaussian Splatting (4DGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images.作為明確的時空表示,4DGS 展示了對複雜動態和精細細節進行建模的強大能力,尤其是對於運動突然的場景。我們在高度優化的CUDA 加速框架中進一步實現了時間切片和潑濺技術,在RTX 3090 GPU 上實現了高達277 FPS 的即時推理渲染速度,在RTX 4090 GPU 上實現了高達583 FPS 的即時推理渲染速度。 Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DGS, which consistently outperforms existing methods both quantitatively and qualitatively. ?紙
2. GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Authors : Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann
抽象的
Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. ?紙|項目頁面| Code (not yet) | ?簡短的演示
3. Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction
Authors : Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li
抽象的
3D 高斯分佈 (3DGS) 已成為動態場景重建的新興工具。 However, existing methods focus mainly on extending static 3DGS into a time-variant representation, while overlooking the rich motion information carried by 2D observations, thus suffering from performance degradation and model redundancy.為了解決上述問題,我們提出了一種用於動態場景重建的新型運動感知增強框架,該框架從光流中挖掘有用的運動線索來改進動態 3DGS 的不同範例。 Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. Then a novel flow augmentation method is introduced with additional insights into uncertainty and loss collaboration. Moreover, for the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed. We conduct extensive experiments on both multi-view and monocular scenes to verify the merits of our work. Compared with the baselines, our method shows significant superiority in both rendering quality and efficiency. ?紙
4. Bridging 3D Gaussian and Mesh for Freeview Video Rendering
Authors : Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao
抽象的
This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the fuzzy geometry. Meanwhile, the point-based splatting (eg the 3D Gaussian Splatting) method usually produces artifacts or blurry pixels in the area with smooth geometry and sharp textures. As a result, it is difficult, even not impossible, to represent the complex and dynamic scene with a single type of primitive. To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes. Given a sequence of tracked mesh as initialization, our goal is to simultaneously optimize the mesh geometry, color texture, opacity maps, a set of 3D Gaussians, and the deformation field. At a specific time, we perform α-blending on the RGB and opacity values based on the merged and re-ordered z-buffers from mesh and 3D Gaussian rasterizations. This produces the final rendering, which is supervised by the ground-truth image. Experiments demonstrate that our approach adapts the appropriate type of primitives to represent the different parts of the dynamic scene and outperforms all the baseline methods in both quantitative and qualitative comparisons without losing render speed. ?紙
5. [ECCV '24] Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
Authors : Jeongmin Bae*, Seoha Kim*, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh
抽象的
As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames for representing a dynamic scene. However, previous works fail to accurately reconstruct complex dynamic scenes. We attribute the failure to the design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce a local smoothness regularization for per-Gaussian embedding to improve the details in dynamic regions. ?紙|項目頁面|程式碼
6. DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Authors : Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki
抽象的
Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be inferred by re-rendering the background to match the video frames. For the object motion, we first model the object-centric deformation of the objects by leveraging rendering losses and multi-view generative priors in an object-centric frame, then optimize object-centric to world-frame transformations by comparing the rendered outputs against the perceived pixel and optical flow. Finally, we recompose the background and objects and optimize for relative object scales using monocular depth prediction guidance. We show extensive results on the challenging DAVIS, Kubric, and self-captured videos, detail some limitations, and provide future directions. Besides 4D scene generation, our results show that DreamScene4D enables accurate 2D point motion tracking by projecting the inferred 3D trajectories to 2D, while never explicitly trained to do so. ?紙|項目頁面|代碼(還沒有)
7. [CVPR '24] 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
Authors : Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai
抽象的
In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance. ?紙|項目頁面|代碼(還沒有)
8. MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos
Authors : Qingming Liu*, Yuan Liu*, Jiepeng Wang, Xianqiang Lv,Peng Wang, Wenping Wang, Junhui Hou†,
抽象的
In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms baseline methods by a significant margin.
?紙|項目頁面|代碼(還沒有)
9. [ECCVW '24] Optimizing Dynamic NeRF and 3DGS with No Video Synchronization
Authors : Seoha Kim*, Jeongmin Bae*, Youngsik Yun, HyunSeung Son, Hahyun Lee, Gun Bang, Youngjung Uh
抽象的
Recent advancements in 4D scene reconstruction using dynamic NeRF and 3DGS have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame, while the multi-view images at the same frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with the field. By design, our method is applicable for various baselines, even regardless of the types of radiance fields. We conduct experiments on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance of our method. ?紙
10. [NeurIPS '24] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
Authors : Ruijie Zhu*, Yanzhe Liang*, Hanzhi Chang, Jiacheng Deng, Jiahao Lu, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang
抽象的
Dynamic scene reconstruction is a long-term challenge in the field of 3D vision. Recently, the emergence of 3D Gaussian Splatting has provided new insights into this problem. Although subsequent efforts rapidly extend static 3D Gaussian to dynamic scenes, they often lack explicit constraints on object motion, leading to optimization difficulties and performance degradation. To address the above issues, we propose a novel deformable 3D Gaussian splatting framework called MotionGS, which explores explicit motion priors to guide the deformation of 3D Gaussians. Specifically, we first introduce an optical flow decoupling module that decouples optical flow into camera flow and motion flow, corresponding to camera movement and object motion respectively. Then the motion flow can effectively constrain the deformation of 3D Gaussians, thus simulating the motion of dynamic objects. Additionally, a camera pose refinement module is proposed to alternately optimize 3D Gaussians and camera poses, mitigating the impact of inaccurate camera poses. Extensive experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods and exhibits significant superiority in both qualitative and quantitative results. ?紙|項目頁面|代碼(還沒有)
11. [ECCV '24] DGD: Dynamic 3D Gaussians Distillation
Authors : Isaac Labe*, Noam Issachar*, Itai Lang, Sagie Benaim
抽象的
We tackle the task of learning dynamic 3D semantic radiance fields given a single monocular video as input. Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics. This enables the segmentation and tracking of a diverse set of 3D semantic entities, specified using a simple and intuitive interface that includes a user click or a text prompt. To this end, we present DGD, a unified 3D representation for both the appearance and semantics of a dynamic 3D scene, building upon the recently proposed dynamic 3D Gaussians representation. Our representation is optimized over time with both color and semantic information. Key to our method is the joint optimization of the appearance and semantic attributes, which jointly affect the geometric properties of the scene. We evaluate our approach in its ability to enable dense semantic 3D object tracking and demonstrate high-quality results that are fast to render, for a diverse set of scenes. ?紙|項目頁面|代碼| ?簡短的演示
12. [NeurIPS '24] Fully Explicit Dynamic Gaussian Splatting
Authors : Junoh Lee, Changyeon Won, HyunJun Jung, Inhwan Bae, Hae-Gon Jeon
抽象的
3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a Edited{Explicit 4D Gaussian Splatting(Ex4DGS)}. Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU. ?紙|項目頁面|程式碼
13. [3DV '25] EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
Authors : Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang
抽象的
Human activities are inherently complex, often involving numerous object interactions. To better understand these activities, it is crucial to model their interactions with the environment captured through dynamic changes. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand human-object interactions in 3D environments. However, most existing methods for human activity modeling neglect the dynamic interactions with objects, resulting in only static representations. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage the uniquely discrete nature of Gaussian Splatting and segment dynamic interactions from the background, with both having explicit representations. Our approach employs a clip-level online learning pipeline that leverages the dynamic nature of human activities, allowing us to reconstruct the temporal evolution of the scene in chronological order and track rigid object motion. EgoGaussian shows significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art. We also qualitatively demonstrate the high quality of the reconstructed models. ?紙|項目頁面| Code (not yet) | ?簡短的演示
14. 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement
Authors : Ziqi Lu, Jianbo Ye, John Leonard
抽象的
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D changes. Our method can detect changes in cluttered environments using sparse post-change images within as little as 18s, using as few as a single new image. It does not rely on depth input, user instructions, object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at https://github.com/520xyxyzq/3DGS-CD. ?紙|代碼(還沒有)
2023 年:
1. [3DV '24] Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
Authors : Jonathon Luiten, Georgios Kopanas, Bastian Leibe, Deva Ramanan
抽象的
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we allow Gaussians to move and rotate over time while enforcing that they have persistent color, opacity, and size. By regularizing Gaussians' motion and rotation with local rigidity constraints, we show that our Dynamic 3D Gaussians correctly model the same area of physical space over time, including the rotation of that space. Dense 6-DOF tracking and dynamic reconstruction emerges naturally from persistent dynamic view synthesis, without requiring any correspondence or flow as input. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing. ?紙|項目頁面|代碼| ?解說視頻
2. [CVPR '24] Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Authors : Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, Xiaogang Jin
抽象的
Implicit neural representation has opened up new avenues for dynamic scene reconstruction and rendering. Nonetheless, state-of-the-art methods of dynamic neural rendering rely heavily on these implicit representations, which frequently struggle with accurately capturing the intricate details of objects in the scene. Furthermore, implicit methods struggle to achieve real-time rendering in general dynamic scenes, limiting their use in a wide range of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using explicit 3D Gaussians and learns Gaussians in canonical space with a deformation field to model monocular dynamic scenes. We also introduced a smoothing training mechanism with no extra overhead to mitigate the impact of inaccurate poses in real datasets on the smoothness of time interpolation tasks. Through differential gaussian rasterization, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time synthesis, and real-time rendering. ?紙|項目頁面|程式碼
3. [CVPR '24] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Authors : Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Tian Qi, Xinggang Wang
抽象的
Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to maintain. We introduce the 4D Gaussian Splatting (4D-GS) to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency. An efficient deformation field is constructed to model both Gaussian motions and shape deformations. Different adjacent Gaussians are connected via a HexPlane to produce more accurate position and shape deformations. Our 4D-GS method achieves real-time rendering under high resolutions, 70 FPS at a 800×800 resolution on an RTX 3090 GPU, while maintaining comparable or higher quality than previous state-of-the-art method. ?紙|項目頁面|程式碼
4. Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Authors : Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, Li Zhang
抽象的
由於場景複雜性和時間動態性,從 2D 影像重建動態 3D 場景並隨著時間的推移產生不同的視圖具有挑戰性。儘管神經隱式模型取得了進步,但局限性仍然存在:(i)場景結構不足:現有方法很難透過直接學習複雜的 6D 全光函數來揭示動態場景的空間和時間結構。 (ii) 縮放變形建模:對於複雜的動力學,明確建模場景元素變形變得不切實際。 To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling.學習優化 4D 圖元使我們能夠在任何需要的時間透過我們定制的渲染例程合成新穎的視圖。我們的模型在概念上很簡單,由可在空間和時間上任意旋轉的各向異性橢圓參數化的 4D 高斯組成,以及由 4D 球諧函數係數表示的依賴視圖和時間演化的外觀。這種方法為可變長度視訊和端到端訓練提供了簡單性、靈活性,以及高效的即時渲染,使其適合捕捉複雜的動態場景運動。 Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency. ?紙|程式碼
5. [ECCV '24] A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Authors : Kai Katsumata, Duc Minh Vo, Hideki Nakayama
抽象的
在來自多個輸入視圖的場景的新穎視圖合成中,3D 高斯噴射成為現有輻射場方法的可行替代方案,提供出色的視覺品質和即時渲染。 While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerous observations per time step, due to the onus of storing 3D Gaussian parameters per time step. In this study, we present an efficient 3D Gaussian representation tailored for dynamic scenes in which we define positions and rotations as functions of time while leaving other time-invariant properties of the static 3D Gaussian unchanged. Notably, our representation reduces memory usage, which is consistent regardless of the input sequence length. Additionally, it mitigates the risk of overfitting observed frames by accounting for temporal changes. The optimization of our Gaussian representation based on image and flow reconstruction results in a powerful framework for dynamic scene view synthesis in both monocular and multi-view cases. We obtain the highest rendering speed of 118 frames per second (FPS) at a resolution of 1352×1014 with a single GPU, showing the practical usability and effectiveness of our proposed method in dynamic scene rendering scenarios ?紙|項目頁面|程式碼
6. DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Authors : Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis
抽象的
Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios. ?紙|項目頁面|代碼(還沒有)
7. [CVPR '24] Control4D: Efficient 4D Portrait Editing with Text
Authors : Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu
抽象的
We introduce Control4D, an innovative framework for editing dynamic 4D portraits using text instructions. Our method addresses the prevalent challenges in 4D editing, notably the inefficiencies of existing 4D representations and the inconsistent editing effect caused by diffusion-based editors. We first propose GaussianPlanes, a novel 4D representation that makes Gaussian Splatting more structured by applying plane-based decomposition in 3D space and time. This enhances both efficiency and robustness in 4D editing. Furthermore, we propose to leverage a 4D generator to learn a more continuous generation space from inconsistent edited images produced by the diffusion-based editor, which effectively improves the consistency and quality of 4D editing. Comprehensive evaluation demonstrates the superiority of Control4D, including significantly reduced training time, high-quality rendering, and spatial-temporal consistency in 4D portrait editing. ?紙|項目頁面|代碼(還沒有)
8. [CVPR '24] SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Authors : Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
抽象的
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively.我們的關鍵思想是使用數量明顯少於高斯分佈的稀疏控制點來學習緊湊的 6 DoF 變換基,這些變換基可以透過學習的插值權重進行局部插值,以產生 3D 高斯分佈的運動場。我們採用變形 MLP 來預測每個控制點的時變 6 DoF 變換,這降低了學習複雜性,增強了學習能力,並有助於獲得時間和空間相干運動模式。然後,我們共同學習 3D 高斯、控制點的規範空間位置和變形 MLP,以重建 3D 場景的外觀、幾何和動態。在學習過程中,控制點的位置和數量會自適應調整,以適應不同區域不同的運動複雜性,並發展遵循盡可能剛性原則的 ARAP 損失,以增強學習運動的空間連續性和局部剛性。最後,由於明確的稀疏運動表示及其外觀分解,我們的方法可以實現使用者控制的運動編輯,同時保留高保真外觀。大量的實驗表明,我們的方法在具有高渲染速度的新穎視圖合成方面優於現有方法,並且能夠實現新穎的保留外觀的運動編輯應用程式。 ?紙|項目頁面|程式碼
9. [CVPR '24] Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Authors : Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen
抽象的
Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, highquality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, whichn is ularular for as is ularizion as 正確第二階段。 The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects, maintaining 3D consistency across novel views. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues. ?紙
10. [CVPR '24] Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Authors : Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao
抽象的
We introduce Gaussian-Flow, a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos. In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS).具體來說,提出了一種新穎的雙域變形模型(DDDM)來明確建模每個高斯點的屬性變形,其中每個屬性的時間相關殘差透過時域中的多項式擬合來捕獲,並透過時域中的傅立葉級數擬合來捕捉頻域。所提出的 DDDM 能夠對長視訊片段中的複雜場景變形進行建模,從而無需為每個幀訓練單獨的 3DGS 或引入額外的隱式神經場來建模 3D 動態。此外,離散高斯點的明確變形建模確保了 4D 場景的超快速訓練和渲染,這與為靜態 3D 重建設計的原始 3DGS 相當。 Our proposed approach showcases a substantial efficiency improvement, achieving a 5× faster training speed compared to the per-frame 3DGS modeling. In addition, quantitative results demonstrate that the proposed Gaussian-Flow significantly outperforms previous leading methods in novel view rendering quality. ?紙|項目頁面|代碼(還沒有)
11. [CVPR '24] CoGS: Controllable Gaussian Splatting
Authors : Heng Yu, Joel Julin, Zoltán Á.米拉茨基、新沼光一郎、拉斯洛·A·傑尼
抽象的
Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity. ?紙|項目頁面|代碼(還沒有)
12. GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
Authors : Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao
抽象的
We propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering. ?紙|項目頁面| ?簡短的演示
13. [CVPR '24] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Authors : Zhan Li, Zhang Chen, Zhong Li, Yi Xu
抽象的
動態場景的新穎視圖合成一直是一個有趣但具有挑戰性的問題。儘管最近取得了一些進展,但同時實現高解析度照片級真實感結果、即時渲染和緊湊儲存仍然是一項艱鉅的任務。為了應對這些挑戰,我們提出時空高斯特徵擴散作為一種新穎的動態場景表示,由三個關鍵組件組成。首先,我們透過時間不透明度和參數化運動/旋轉來增強 3D 高斯,從而製定富有表現力的時空高斯。這使得時空高斯能夠捕捉場景中的靜態、動態以及瞬態內容。其次,我們引入了splatted特徵渲染,它用神經特徵取代球諧函數。這些功能有助於對視圖和時間相關的外觀進行建模,同時保持小尺寸。第三,我們利用訓練誤差和粗略深度的指導,在難以與現有管道收斂的區域中對新高斯進行採樣。對幾個已建立的現實世界資料集的實驗表明,我們的方法實現了最先進的渲染品質和速度,同時保留了緊湊的儲存。在 8K 解析度下,我們的精簡版模型可以在 Nvidia RTX 4090 GPU 上以 60 FPS 的速度渲染。 ?紙|項目頁面|代碼| ?簡短的演示
14. MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes
Authors : Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, Jeffrey Ichnowski
抽象的
Accurate 3D tracking in highly deformable scenes with occlusions and shadows can facilitate new applications in robotics, augmented reality, and generative AI. However, tracking under these conditions is extremely challenging due to the ambiguity that arises with large deformations, shadows, and occlusions. We introduce MD-Splatting, an approach for simultaneous 3D tracking and novel view synthesis, using video captures of a dynamic scene from various camera poses. MD-Splatting builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel view synthesis. MD-Splatting learns a deformation function to project a set of Gaussians with non-metric, thus canonical, properties into metric space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on local rigidity, conservation of momentum, and isometry, which leads to trajectories with smaller trajectory errors. MD-Splatting achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. Compared to state-of-the-art, we improve 3D tracking by an average of 23.9 %, while simultaneously achieving high-quality novel view synthesis. With sufficient texture such as in scene 6, MD-Splatting achieves a median tracking error of 3.39 mm on a cloth of 1 x 1 meters in size ?紙|項目頁面|代碼(還沒有)
15. [ECCV'24] SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Authors : Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero
抽象的
Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates.然而,它僅限於靜態場景。 In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer. ?紙
16. [CVPR '24] 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Authors : Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing
抽象的
Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specificallggy, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the naïve approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods. ?紙|項目頁面| Code (not yet) | ? 3DGStream Viewer
編輯:
2024 年:
1. Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Authors : Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue
抽象的
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before α blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and α blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. Code and trained models will be released upon acceptance. ?紙
2. CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians
Authors : Bin Dou, Tianyu Zhang, Yongjia Ma, Zhaohui Wang, Zejian Yuan
抽象的
We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based 3D segmentation methods have relied on implicit or voxel neural scene representation and ray-marching volume rendering which are time consuming. Recent 3D Gaussian Splatting significantly improves the rendering speed, however, existing Gaussians-based segmentation methods(eg: Gaussian Grouping) fail to provide compact segmentation masks especially in zero-shot segmentation, which is mainly caused by the lack of robustness and compactness for straightforwardly assigning learnable parameters to each Gaussian when encountering inconsistent 2D machine-generated labels. Our method aims to achieve compact and reliable zero-shot scene segmentation swiftly by mapping fused spatial and semantically meaningful features for each Gaussian point with a shallow decoding network. Specifically, our method firstly optimizes Gaussian points' position, convariance and color attributes under the supervision of RGB images. After Gaussian Locating, we distill multi-scale DINO features extracted from images through unprojection to each Gaussian, which is then incorporated with spatial features from the fast point features processing network, ie RandLA-Net. Then the shallow decoding MLP is applied to the multi-scale fused features to obtain compact segmentation. Experimental results show that our model can perform high-quality zero-shot scene segmentation, as our model outperforms other segmentation methods on both semantic and panoptic segmentation task, meanwhile consumes approximately only 10% segmenting time compared to NeRF-based segmentation. ?紙|項目頁面|代碼(還沒有)
3. TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Authors : Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan
抽象的
Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while keeping the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively. ?紙|專案頁面
4. Segment Anything in 3D Gaussians
Authors : Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang
抽象的
3D Gaussian Splatting has emerged as an alternative 3D representation of Neural Radiance Fields (NeRFs), benefiting from its high-quality rendering results and real-time rendering speed. Considering the 3D Gaussian representation remains unparsed, it is necessary first to execute object segmentation within this domain. Subsequently, scene editing and collision detection can be performed, proving vital to a multitude of applications, such as virtual reality (VR), augmented reality (AR), game/movie production, etc. In this paper, we propose a novel approach to achieve object segmentation in 3D Gaussian via an interactive procedure without any training process and learned parameters. We refer to the proposed method as SA-GS, for Segment Anything in 3D Gaussians. Given a set of clicked points in a single input view, SA-GS can generalize SAM to achieve 3D consistent segmentation via the proposed multi-view mask generation and view-wise label assignment methods. We also propose a cross-view label-voting approach to assign labels from different views. In addition, in order to address the boundary roughness issue of segmented objects resulting from the non-negligible spatial sizes of 3D Gaussian located at the boundary, SA-GS incorporates the simple but effective Gaussian Decomposition scheme. Extensive experiments demonstrate that SA-GS achieves high-quality 3D segmentation results, which can also be easily applied for scene editing and collision detection tasks. ?紙
5. GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting
Authors : Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodolà
抽象的
We present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models. Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware. We tackle the problem by leveraging Gaussian splatting to represent 3D scenes, and we optimize the model while progressively varying the image supervision by means of a pretrained image-based diffusion model. The input object may be given as a 3D triangular mesh, or directly provided as Gaussians from a generative model such as DreamGaussian. GSEdit ensures consistency across different viewpoints, maintaining the integrity of the original object's information. Compared to previously proposed methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. Our editing process is refined via the application of the SDS loss, ensuring that our edits are both precise and accurate. Our comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following the given textual instructions while preserving their coherence and detail. ?紙
6. GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Authors : Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu
抽象的
We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works.它可以帶來更快的編輯速度和更高的視覺品質。 This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods. ?紙
7. View-Consistent 3D Editing with Gaussian Splatting
Authors : Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
抽象的
The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. ?紙
8. Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Authors : Antoine Guédon, Vincent Lepetit
抽象的
我們提出了 Gaussian Frosting,這是一種新穎的基於網格的表示,用於即時高品質渲染和編輯複雜的 3D 效果。我們的方法建立在最近的 3D 高斯分佈框架的基礎上,該框架優化了一組 3D 高斯以近似影像的輻射場。 We propose first extracting a base mesh from Gaussians during optimization, then building and refining an adaptive layer of Gaussians with a variable thickness around the mesh to better capture the fine details and volumetric effects near the surface, such as hair or grass.我們將這一層稱為高斯糖霜,因為它類似於蛋糕上的糖霜塗層。材質越模糊,糖霜就越厚。我們還引入了高斯參數化,以強制它們留在磨砂層內,並在變形、重新縮放、編輯或動畫網格時自動調整其參數。我們的表示允許使用高斯噴射進行高效渲染,以及透過修改基礎網格進行編輯和動畫。 We demonstrate the effectiveness of our method on various synthetic and real scenes, and show that it outperforms existing surface-based approaches. We will release our code and a web-based viewer as additional contributions. ?紙|項目頁面| Code (not yet) | ?簡短的演示
9. Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Authors : Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li
抽象的
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks. ?紙|項目頁面|代碼(還沒有)
10. EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Authors : Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney
抽象的
In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale. ?紙|項目頁面|代碼(還沒有)
11. InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
Authors : Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao
抽象的
3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion. ?紙|項目頁面|程式碼
12. Gaga: Group Any Gaussians via 3D-aware Memory Bank
Authors : Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang
抽象的
We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot segmentation models, significantly enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as scene understanding and manipulation. ?紙|項目頁面|程式碼
13. [CVPR W'24] ICE-G: Image Conditional Editing of 3D Gaussian Splats
Authors : Vishnu Jaganathan, Hannah Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira
抽象的
Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine grained control of editing. ?紙|項目頁面| ?簡短的演示
14. Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks
Authors : Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar
抽象的
In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we found that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. ?預印本 |項目頁面| Code (Segmentation)
2023 年:
1. [CVPR '24] GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Authors : Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
抽象的
3D 編輯在遊戲和虛擬實境等許多領域中發揮著至關重要的作用。傳統的 3D 編輯方法依賴網格和點雲等表示形式,通常無法真實地描繪複雜的場景。另一方面,基於隱式 3D 表示的方法(例如神經輻射場 (NeRF))可以有效渲染複雜場景,但處理速度慢且對特定場景區域的控制有限。 In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation technique. GaussianEditor enhances precision and control in editing through our proposed Gaussian Semantic Tracing, which traces the editing target throughout the training process. Additionally, we propose hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models.我們還開發了有效的物件刪除和整合的編輯策略,這對現有方法來說是一項具有挑戰性的任務。 Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. ?紙|項目頁面|代碼| ?簡短的演示
2. [CVPR '24] GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Authors : Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian
抽象的
Recently, impressive results have been achieved in 3D scene editing with text instructions based on a 2D diffusion model. However, current diffusion models primarily generate images by predicting noise in the latent space, and the editing is usually applied to the whole image, which makes it challenging to perform delicate, especially localized, editing for 3D scenes. Inspired by recent 3D Gaussian splatting, we propose a systematic framework, named GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text instructions. Benefiting from the explicit property of 3D Gaussians, we design a series of techniques to achieve delicate editing. Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians. The Gaussian RoI is further used to control the editing process. Our framework can achieve more delicate and precise editing of 3D scenes than previous methods while enjoying much faster training speed, ie within 20 minutes on a single V100 GPU, more than twice as fast as Instruct-NeRF2NeRF (45 minutes -- 2 hours) ?紙|項目頁面| Code (not yet) | ?簡短的演示
3. Point'n Move: Interactive Scene Object Manipulation on Gaussian Splatting Radiance Fields
Authors : Jiajun Huang, Hongchuan Yu
抽象的
We propose Point'n Move, a method that achieves interactive scene object manipulation with exposed region inpainting. Interactivity here further comes from intuitive object selection and real-time editing. To achieve this, we adopt Gaussian Splatting Radiance Field as the scene representation and fully leverage its explicit nature and speed advantage. Its explicit representation formulation allows us to devise a 2D prompt points to 3D mask dual-stage self-prompting segmentation algorithm, perform mask refinement and merging, minimize change as well as provide good initialization for scene inpainting and perform editing in real-time without per-editing training, all leads to superior quality and performance. We test our method by performing editing on both forward-facing and 360 scenes. We also compare our method against existing scene object removal methods, showing superior quality despite being more capable and having a speed advantage. ?紙
4. [ECCV'24] Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Authors : Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke
抽象的
最近的 Gaussian Splatting 實現了 3D 場景的高品質、即時新穎視圖合成。 However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene.我們沒有採用昂貴的 3D 標籤,而是利用 SAM 的 2D 掩模預測以及引入的 3D 空間一致性正則化在可微分渲染期間監督身份編碼。與隱式 NeRF 表示相比,我們表明離散和分組 3D 高斯可以以高視覺品質、細粒度和效率重建、分割和編輯 3D 中的任何內容。基於高斯分組,我們進一步提出了一種局部高斯編輯方案,該方案在多功能場景編輯應用中顯示出功效,包括 3D 物件去除、修復、著色和場景重組。 ?紙|程式碼
5. Segment Any 3D Gaussians
Authors : Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
抽象的
Interactive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation. However, existing methods face challenges in either achieving fine-grained, multi-granularity segmentation or contending with substantial computational overhead, inhibiting real-time interaction. In this paper, we introduce Segment Any 3D GAussians (SAGA), a novel 3D interactive segmentation approach that seamlessly blends a 2D segmentation foundation model with 3D Gaussian Splatting (3DGS), a recent breakthrough of radiance fields. SAGA efficiently embeds multi-granularity 2D segmentation results generated by the segmentation foundation model into 3D Gaussian point features through well-designed contrastive training. Evaluation on existing benchmarks demonstrates that SAGA can achieve competitive performance with state-of-the-art methods. Moreover, SAGA achieves multi-granularity segmentation and accommodates various prompts, including points, scribbles, and 2D masks. Notably, SAGA can finish the 3D segmentation within milliseconds, achieving nearly 1000× acceleration1 compared to previous SOTA. ?紙|項目頁面|程式碼
6. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
抽象的
近年來,3D 場景表示非常受歡迎。使用神經輻射場的方法對於新穎視圖合成等傳統任務是通用的。 In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. ?紙|項目頁面|代碼| ?簡短的演示
7. 2D-Guided 3D Gaussian Segmentation
Authors : Kun Lan, Haoran Li, Haolin Shi, Wenjun Wu, Yong Liao, Lin Wang, Pengyuan Zhou
抽象的
Recently, 3D Gaussian, as an explicit 3D representation method, has demonstrated strong competitiveness over NeRF (Neural Radiance Fields) in terms of expressing complex scenes and training duration. These advantages signal a wide range of applications for 3D Gaussians in 3D understanding and editing. Meanwhile, the segmentation of 3D Gaussians is still in its infancy. The existing segmentation methods are not only cumbersome but also incapable of segmenting multiple objects simultaneously in a short amount of time. In response, this paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision. This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information, while nearest neighbor clustering and statistical filtering refine the segmentation results. Experiments show that our concise method can achieve comparable performances on mIOU and mAcc for multi-object segmentation as previous single-object segmentation methods. ?紙
Language Embedding:
2024 年:
1. [IROS '24] Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot
Authors : Justin Yu, Kush Hari, Kishore Srinivas, Karim El-Refai, Adam Rashid, Chung Min Kim, Justin Kerr, Richard Cheng, Muhammad Zubair Irshad, Ashwin Balakrishna, Thomas Kollar, Ken Goldberg
抽象的
建立語義 3D 地圖對於搜尋辦公室、倉庫、商店和家庭中感興趣的物件非常有價值。我們提出了一個地圖系統,可以逐步建立語言嵌入的高斯 Splat (LEGS):一種詳細的 3D 場景表示,以統一的表示形式對外觀和語義進行編碼。當機器人遍歷其環境時,LEGS 會進行線上訓練,以實現開放詞彙物件查詢的在地化。我們在 4 個房間規模的場景上評估 LEGS,在這些場景中查詢場景中的物件以評估 LEGS 如何捕捉語義。我們將 LEGS 與 LERF 進行比較,發現雖然兩個系統的物件查詢成功率相當,但 LEGS 的訓練速度比 LERF 快 3.5 倍以上。結果表明,多攝影機設定和增量束調整可以提高受限機器人軌跡中的視覺重建質量,並表明 LEGS 可以以高達 66% 的準確度定位開放詞彙和長尾物件查詢。 ?紙|專案頁面
2. [CVPR '24] Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Authors : Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, Shao-Hua Guan
抽象的
Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. ?紙|項目頁面|程式碼
3. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
抽象的
近年來,3D 場景表示非常受歡迎。使用神經輻射場的方法對於新穎視圖合成等傳統任務是通用的。 In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. ?紙|項目頁面|代碼| ?簡短的演示
4. [CVPR '24] LangSplat: 3D Language Gaussian Splatting
Authors : Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister
抽象的
Human lives in a 3D world and commonly uses natural language to interact with a 3D scene.對 3D 語言欄位進行建模以支援 3D 開放式語言查詢最近受到越來越多的關注。本文介紹了 LangSplat,它建立了一個 3D 語言場,可以在 3D 空間內進行精確、高效的開放詞彙查詢。與在 NeRF 模型中建立 CLIP 語言嵌入的現有方法不同,LangSplat 透過利用 3D 高斯集合(每種編碼語言特徵都從 CLIP 中提取)來代表語言領域,從而推進了該領域的發展。透過採用基於圖塊的splatting技術來渲染語言特徵,我們規避了NeRF固有的昂貴的渲染過程。 LangSplat 不是直接學習 CLIP 嵌入,而是先訓練場景語言自動編碼器,然後學習場景特定潛在空間上的語言特徵,從而減輕顯式建模帶來的大量記憶體需求。現有方法難以應對不精確且模糊的 3D 語言領域,無法辨別物件之間的清晰邊界。我們深入研究了這個問題,並提出使用 SAM 學習分層語義,從而消除跨不同尺度廣泛查詢語言領域和 DINO 特徵正則化的需要。 Extensive experiments on open-vocabulary 3D object localization and semantic segmentation demonstrate that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a {speed} × speedup compared to LERF at the resolution of 1440 × 1080. ?紙|項目頁面|代碼| ?簡短的演示
5. SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting
Authors : Xinyi Liu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi
抽象的
機器人代表環境的許多最新進展都集中在照片級真實感重建。本文特別關注從真實感高斯潑濺模型產生影像序列,這些影像序列與使用者輸入語言給出的指令相符。我們貢獻了一個新穎的框架 SplaTraj,它將真實環境表示中的影像生成表述為連續時間軌跡最佳化問題。成本的設計使得遵循軌跡姿勢的相機能夠平滑地穿越環境並以上鏡的方式渲染指定的空間資訊。這是透過查詢具有語言嵌入的真實感表示來隔離與使用者指定的輸入相對應的區域來實現的。然後,當相機隨時間移動時,這些區域會被投影到相機的視圖中,並建立成本。然後,我們可以應用基於梯度的最佳化並透過渲染進行區分,以優化定義成本的軌跡。產生的軌跡移動以拍攝出適合照片的每個指定物件。我們在一系列環境和指令上根據經驗評估我們的方法,並演示生成的圖像序列的品質。 ?紙| Code (not yet) | ?簡短的演示
6. FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
Authors : Xingxing Zuo, Pouya Samangouei, Yunwen Zhou, Yan Di, Mingyang Li
抽象的
精確感知現實世界 3D 物件的幾何和語義屬性對於擴增實境和機器人應用的持續發展至關重要。 To this end, we present algfull{} (algname{}), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection, despite that we are 851× faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. ?紙
7. Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Authors : Hyunjee Lee*, Youngsik Yun*, Jeongmin Bae, Seoha Kim, Youngjung Uh
抽象的
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. ?紙|項目頁面|代碼(還沒有)
Mesh Extraction and Physics:
2024 年:
1. Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting
Authors : Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang
抽象的
We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. ?紙|項目頁面| Code (not yet) | ?簡短的演示
2. GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting
Authors : Joanna Waczyńska, Piotr Borycki, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek
抽象的
In recent years, a range of neural network-based methods for image rendering have been introduced. For instance, widely-researched neural radiance fields (NeRF) rely on a neural network to represent 3D scenes, allowing for realistic view synthesis from a small number of 2D images. However, most NeRF models are constrained by long training and inference times. In comparison, Gaussian Splatting (GS) is a novel, state-of-theart technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and swift, real-time rendering. A drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity to condition several hundred thousand Gaussian components. To solve this, we introduce Gaussian Mesh Splatting (GaMeS) model, a hybrid of mesh and a Gaussian distribution, that pin all Gaussians splats on the object surface (mesh). The unique contribution of our methods is defining Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain high-quality renders in the real-time generation of high-quality views. Furthermore, we demonstrate that in the absence of a predefined mesh, it is possible to fine-tune the initial mesh during the learning process. ?紙|程式碼
3. Mesh-based Gaussian Splatting for Real-time Large-scale Deformation
Authors : Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai
抽象的
Neural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion. Gaussian Splatting(GS) has recently become a promising method with explicit geometry for representing static scenes and facilitating high-quality and real-time synthesis of novel views. However,it cannot be easily deformed due to the use of discrete Gaussians and lack of explicit topology. To address this, we develop a novel GS-based method that enables interactive deformation. Our key idea is to design an innovative mesh-based GS representation, which is integrated into Gaussian learning and manipulation. 3D Gaussians are defined over an explicit mesh, and they are bound with each other: the rendering of 3D Gaussians guides the mesh face split for adaptive refinement, and the mesh face split directs the splitting of 3D Gaussians. Moreover, the explicit mesh constraints help regularize the Gaussian distribution, suppressing poor-quality Gaussians(eg misaligned Gaussians,long-narrow shaped Gaussians), thus enhancing visual quality and avoiding artifacts during deformation. Based on this representation, we further introduce a large-scale Gaussian deformation technique to enable deformable GS, which alters the parameters of 3D Gaussians according to the manipulation of the associated mesh. Our method benefits from existing mesh deformation datasets for more realistic data-driven Gaussian deformation. Extensive experiments show that our approach achieves high-quality reconstruction and effective deformation, while maintaining the promising rendering results at a high frame rate(65 FPS on average). ?紙
4. Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
Authors : Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li
抽象的
Reconstructing and simulating elastic objects from visual observations is crucial for applications in computer vision and robotics. Existing methods, such as 3D Gaussians, provide modeling for 3D appearance and geometry but lack the ability to simulate physical properties or optimize parameters for heterogeneous objects. We propose Spring-Gaus, a novel framework that integrates 3D Gaussians with physics-based simulation for reconstructing and simulating elastic objects from multi-view videos. Our method utilizes a 3D Spring-Mass model, enabling the optimization of physical parameters at the individual point level while decoupling the learning of physics and appearance. This approach achieves great sample efficiency, enhances generalization, and reduces sensitivity to the distribution of simulation particles. We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects. This includes future prediction and simulation under varying initial states and environmental parameters. ?紙|項目頁面|代碼(還沒有)
5. Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
Authors : Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang
抽象的
3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namely Texture-GS, to disentangle the appearance from the geometry by representing it as a 2D texture mapped onto the 3D surface, thereby facilitating appearance editing. Technically, the disentanglement is achieved by our proposed texture mapping module, which consists of a UV mapping MLP to learn the UV coordinates for the 3D Gaussian centers, a local Taylor expansion of the MLP to efficiently approximate the UV coordinates for the ray-Gaussian intersections, and a learnable texture to capture the fine-grained appearance. Extensive experiments on the DTU dataset demonstrate that our method not only facilitates high-fidelity appearance editing but also achieves real-time rendering on consumer-level devices, eg a single RTX 2080 Ti GPU. ?紙|項目頁面|代碼(還沒有)
6. DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
Authors : Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala
抽象的
3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints during optimization. We extend 3D Gaussian splatting with depth and normal cues to tackle challenging indoor datasets and showcase techniques for efficient mesh extraction, an important downstream application. Specifically, we regularize the optimization procedure with depth information, enforce local smoothness of nearby Gaussians, and use the geometry of the 3D Gaussians supervised by normal cues to achieve better alignment with the true scene geometry. We improve depth estimation and novel view synthesis results over baselines and show how this simple yet effective regularization technique can be used to directly extract meshes from the Gaussian representation yielding more physically accurate reconstructions on indoor scenes. ?紙|代碼|專案頁面
7. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
Authors : Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao
抽象的
3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-accurate 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering. ?紙|項目頁面|代碼| ?簡短的演示
7.1 Unofficial Implementation and Specification
Authors : Yunzhou Song, Zixuan Lin, Yexin Zhang
程式碼
8. Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
Authors : Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang
抽象的
Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. ?紙|項目頁面|代碼(還沒有)
9. [ECCV '24] GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Authors : Yaniv Wolf, Amit Bracha, Ron Kimmel
抽象的
最近,3D 高斯分佈 (3DGS) 已成為準確表示場景的有效方法。然而,儘管其具有卓越的新穎視圖合成功能,直接從高斯屬性中提取場景的幾何形狀仍然是一個挑戰,因為這些幾何形狀是基於光度損失進行最佳化的。雖然一些並發模型嘗試在高斯優化過程中添加幾何約束,但它們仍然會產生嘈雜的、不切實際的表面。我們提出了一種新穎的方法,透過將現實世界的知識注入深度提取過程,來彌合雜訊 3DGS 表示和平滑 3D 網格表示之間的差距。我們不是直接從高斯屬性中提取場景的幾何形狀,而是透過預先訓練的立體匹配模型來提取幾何形狀。我們渲染與原始訓練姿勢相對應的立體對齊影像對,將這些影像對輸入立體模型以獲得深度剖面,最後將所有剖面融合在一起以獲得單一網格。與高斯潑濺的其他表面重建方法相比,所得重建更平滑、更準確,並且顯示更複雜的細節,同時在相當短的 3DGS 優化過程之上只需要很小的開銷。我們對使用智慧型手機獲得的野外場景對所提出的方法進行了廣泛的測試,展示了其卓越的重建能力。此外,我們還在 Tanks and Temples 和 DTU 基準測試中測試了該方法,並取得了最先進的結果。 ?紙|項目頁面|程式碼
10. RaDe-GS: Rasterizing Depth in Gaussian Splatting
Authors : Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, Ping Tan
抽象的
Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. Our approach achieves a Chamfer distance error comparable to NeuraLangelo[Li et al. 2023] on the DTU dataset and similar training and rendering time as traditional Gaussian Splatting on the Tanks & Temples dataset. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods. ?紙|項目頁面|代碼(還沒有)
11. Trim 3D Gaussian Splatting for Accurate Geometry Representation
Authors : Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang
抽象的
In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. ?紙|項目頁面|程式碼
12. Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting
Authors : Junha Hyung, Susung Hong, Sungwon Hwang, Jaeseong Lee, Jaegul Choo, Jin-Hwa Kim
抽象的
3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals, due to the Gaussians converging into anisotropic Gaussians with one dominant variance. We propose using effective rank analysis to examine the shape statistics of 3D Gaussian primitives, and identify the Gaussians indeed converge into needle-like shapes with the effective rank 1. To address this, we introduce effective rank as a regularization, which constrains the structure of the Gaussians. Our new regularization method enhances normal and geometry reconstruction while reducing needle-like artifacts. The approach can be integrated as an add-on module to other 3DGS variants, improving their quality without compromising visual fidelity. ?紙|項目頁面|代碼(還沒有)
13. CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Authors : Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, Zhaoxiang Zhang
抽象的
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10x compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs. ?紙|項目頁面|程式碼(即將推出)
14. [CoRL '24] Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision
Authors : Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell, Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
抽象的
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately We introduce Cloth-Splatting, a method for estimating 3D states of cloth from RGB images through a prediction-update framework. Cloth-Splatting 利用動作條件動力學模型來預測未來狀態,並使用 3D 高斯 Splatting 來更新預測狀態。 Our key insight is that coupling a 3D mesh-based representation with Gaussian Splatting allows us to define a differentiable map between the cloth's state space and the image space.這使得能夠使用基於梯度的最佳化技術來僅使用 RGB 監督來細化不準確的狀態估計。 Our experiments demonstrate that Cloth-Splatting not only improves state estimation accuracy over current baselines but also reduces convergence time by ~85%. ?紙|項目頁面|程式碼
2023 年:
1. [CVPR '24] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Authors : Tianyi Xie, Zeshun Zong, Yuxin Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang
抽象的
We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, "cage meshes," or any other geometry embedding, highlighting the principle of "what you see is what you simulate (WS2)." Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. ?紙|項目頁面|代碼| ?簡短的演示
2. [CVPR '24] SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Authors : Antoine Guédon, Vincent Lepetit
抽象的
We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting 最近變得非常流行,因為它可以產生逼真的渲染,同時訓練速度比 NeRF 快得多。 It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene.然後,我們介紹一種方法,利用這種對齊方式對場景真實表面上的樣本點進行對齊,並使用泊松重建從高斯中提取網格,與通常應用於從神經 SDF 中提取網格。 Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality. ?紙|項目頁面|代碼| ?簡短的演示
3. NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance
Authors : Hanlin Chen, Chen Li, Gim Hee Lee
抽象的
Existing neural implicit surface reconstruction methods have achieved impressive performance in multi-view 3D reconstruction by leveraging explicit geometry priors such as depth maps or point clouds as regularization. However, the reconstruction results still lack fine details because of the over-smoothed depth map or sparse point cloud. In this work, we propose a neural implicit surface reconstruction pipeline with guidance from 3D Gaussian Splatting to recover highly detailed surfaces. The advantage of 3D Gaussian Splatting is that it can generate dense point clouds with detailed structure. Nonetheless, a naive adoption of 3D Gaussian Splatting can fail since the generated points are the centers of 3D Gaussians that do not necessarily lie on the surface. We thus introduce a scale regularizer to pull the centers close to the surface by enforcing the 3D Gaussians to be extremely thin. Moreover, we propose to refine the point cloud from 3D Gaussians Splatting with the normal priors from the surface predicted by neural implicit models instead of using a fixed set of points as guidance. Consequently, the quality of surface reconstruction improves from the guidance of the more accurate 3D Gaussian splatting. By jointly optimizing the 3D Gaussian Splatting and the neural implicit model, our approach benefits from both representations and generates complete surfaces with intricate details. Experiments on Tanks and Temples verify the effectiveness of our proposed method. ?紙
雜項:
2024 年:
1. Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting
Authors : Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White
抽象的
The accelerating deployment of spacecraft in orbit have generated interest in on-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possible unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. This requires robust characterization of the target's geometry. In this article, we present an approach for mapping geometries of satellites on orbit based on 3D Gaussian Splatting that can run on computing resources available on current spaceflight hardware. We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up under several realistic lighting and motion conditions. Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms. Such on-board capabilities are critical to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks. ?紙
2. TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
Authors : Linus Franke, Darius Rückert, Laura Fink, Marc Stamminger
抽象的
Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [Rückert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. ?紙|項目頁面|程式碼
3. EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting
Authors : Lingting Zhu, Zhao Wang, Jiahao Cui, Zhenchao Jin, Guying Lin, Lequan Yu
抽象的
Surgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splatting, a recent trending 3D representation, we present EndoGS, applying Gaussian Splatting for deformable endoscopic tissue reconstruction. Specifically, our approach incorporates deformation fields to handle dynamic scenes, depth-guided supervision with spatial-temporal weight masks to optimize 3D targets with tool occlusion from a single viewpoint, and surface-aligned regularization terms to capture the much better geometry. As a result, EndoGS reconstructs and renders high-quality deformable endoscopic tissues from a single-viewpoint video, estimated depth maps, and labeled tool masks. Experiments on DaVinci robotic surgery videos demonstrate that EndoGS achieves superior rendering quality. ?紙|程式碼
4. EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction
Authors : Yifan Liu, Chenxin Li, Chen Yang, Yixuan Yuan
抽象的
Reconstructing deformable tissues from endoscopic stereo videos is essential in many downstream surgical applications. However, existing methods suffer from slow inference speed, which greatly limits their practical use. In this paper, we introduce EndoGaussian, a real-time surgical scene reconstruction framework that builds on 3D Gaussian Splatting. Our framework represents dynamic surgical scenes as canonical Gaussians and a time-dependent deformation field, which predicts Gaussian deformations at novel timestamps. Due to the efficient Gaussian representation and parallel rendering pipeline, our framework significantly accelerates the rendering speed compared to previous methods. In addition, we design the deformation field as the combination of a lightweight encoding voxel and an extremely tiny MLP, allowing for efficient Gaussian tracking with a minor rendering burden. Furthermore, we design a holistic Gaussian initialization method to fully leverage the surface distribution prior, achieved by searching informative points from across the input image sequence. Experiments on public endoscope datasets demonstrate that our method can achieve real-time rendering speed (195 FPS real-time, 100× gain) while maintaining the state-of-the-art reconstruction quality (35.925 PSNR) and the fastest training speed (within 2 min/scene), showing significant promise for intraoperative surgery applications. ?紙|項目頁面|程式碼
5. GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting
Authors : Butian Xiong, Zhuo Li, Zhen Li
抽象的
We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km2. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information ?紙
6. LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering
Authors : Sheng Hong, Junjie He, Xinhu Zheng, Hesheng Wang, Hao Fang, Kangcheng Liu, Chunran Zheng, Shaojie Shen
抽象的
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multi-modal sensor fused mapping system that builds on the differentiable surface splatting to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion. This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initial poses for surface Gaussian scenes are obtained using a LiDAR-inertial system with size-adaptive voxels. Then, we optimized and refined the Gaussians by visual-derived photometric gradients to optimize the quality and density of LiDAR measurements. Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality while also holding potential applicability in real-time SLAM and robotics domains. ?紙|代碼(還沒有)
7. VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
Authors : Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao Feng, Huamin Wang, Minchen Li, Henry Lau, Feng Gao, Yin Yang, Chenfanfu Jiang
抽象的
As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience. ?紙|專案頁面
8. Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps
Authors : Timothy Chen, Ola Shorinwa, Weijia Zeng, Joseph Bruno, Philip Dames, Mac Schwager
抽象的
We present Splat-Nav, a navigation pipeline that consists of a real-time safe planning module and a robust state estimation module designed to operate in the Gaussian Splatting (GSplat) environment representation, a popular emerging 3D scene representation from computer vision. We formulate rigorous collision constraints that can be computed quickly to build a guaranteed-safe polytope corridor through the map. We then optimize a B-spline trajectory through this corridor. We also develop a real-time, robust state estimation module by interpreting the GSplat representation as a point cloud. The module enables the robot to localize its global pose with zero prior knowledge from RGB-D images using point cloud alignment, and then track its own pose as it moves through the scene from RGB images using image-to-point cloud localization. We also incorporate semantics into the GSplat in order to obtain better images for localization. All of these modules operate mainly on CPU, freeing up GPU resources for tasks like real-time scene reconstruction. We demonstrate the safety and robustness of our pipeline in both simulation and hardware, where we show re-planning at 5 Hz and pose estimation at 20 Hz, an order of magnitude faster than Neural Radiance Field (NeRF)-based navigation methods, thereby enabling real-time navigation. ?紙
9. Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
Authors : TYuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille
抽象的
X射線因其比自然光更強的穿透力而被廣泛應用於透射成像。 When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis.首先,受 X 光成像各向同性性質的啟發,我們重新設計了輻射高斯點雲模型。我們的模型在學習預測 3D 點的輻射強度時排除了視角方向的影響。 Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. ?紙
10. ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Authors : Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang
抽象的
Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots. Conventional robotic manipulation methods usually learn semantic representation of the observation for action prediction, which ignores the scene-level spatiotemporal dynamics for human goal completion. In this paper, we propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction. Specifically, we first formulate the dynamic Gaussian Splatting framework that infers the semantics propagation in the Gaussian embedding space, where the semantic representation is leveraged to predict the optimal robot action. Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction. We evaluate our ManiGaussian on 10 RLBench tasks with 166 variations, and the results demonstrate our framework can outperform the state-of-the-art methods by 13.1% in average success rate. ?紙|項目頁面|程式碼
11. GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
Authors : Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang
抽象的
Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3× lower GPU memory usage and 5× faster fitting time not only rivals INRs (eg, WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 1000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. ?紙
12. GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
Authors : Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang
抽象的
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (eg NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. ?紙|代碼(還沒有)
13. Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience
Authors : Xiaohang Yu, Zhengxian Yang, Shi Pan, Yuqi Han, Haoxiang Wang, Jun Zhang, Shi Yan, Borong Lin, Lei Yang, Tao Yu, Lu Fang
抽象的
We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than existing datasets, may also inspire space-oriented light field reconstruction, which is potentially different from object-centric 3D reconstruction, for immersive VR/AR experiences. We utilized a total of 40 GoPro 10 cameras, capturing images of 5k resolution. The number of photos captured for each scene is no less than 1000, and the average density (view number within a unit sphere) is 134.68. It is also worth noting that our system is capable of efficiently capturing large outdoor scenes. Addressing the current lack of large-space and dense light field datasets, we made efforts to include elements such as sky, reflections, lights and shadows that are of interest to researchers in the field of 3D reconstruction during the data capture process. Finally, we validated the effectiveness of our provided dataset on three popular algorithms and also integrated the reconstructed 3DGS results into the Unity engine, demonstrating the potential of utilizing our datasets to enhance the realism of virtual reality (VR) and create feasible interactive spaces. ?紙
14. Modeling uncertainty for Gaussian Splatting
Authors : Luca Savant, Diego Valsesia, Enrico Magli
抽象的
We present Stochastic Gaussian Splatting (SGS): the first framework for uncertainty estimation using Gaussian Splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of Neural Radiance Fields (NeRF). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this paper, we introduce a Variational Inference-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. Additionally, we introduce the Area Under Sparsification Error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the LLFF dataset demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications. ?紙
15. TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
Authors : Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu
抽象的
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. ?紙
16. GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis
Authors : Emmanouil Nikolakakis, Utkarsh Gupta, Jonathan Vengosh, Justin Bui, Razvan Marinescu
抽象的
We present GaSpCT, a novel view synthesis and 3D scene representation method used to generate novel projection views for Computer Tomography (CT) scans. We adapt the Gaussian Splatting framework to enable novel view synthesis in CT based on limited sets of 2D image projections and without the need for Structure from Motion (SfM) methodologies. Therefore, we reduce the total scanning duration and the amount of radiation dose the patient receives during the scan. We adapted the loss function to our use-case by encouraging a stronger background and foreground distinction using two sparsity promoting regularizers: a beta loss and a total variation (TV) loss. Finally, we initialize the Gaussian locations across the 3D space using a uniform prior distribution of where the brain's positioning would be expected to be within the field of view. We evaluate the performance of our model using brain CT scans from the Parkinson's Progression Markers Initiative (PPMI) dataset and demonstrate that the rendered novel views closely match the original projection viewsex of the simult , and tidity, andlast unstal sperleat 片面骨ies 。 Furthermore, we empirically observe reduced training time compared to neural network based image synthesis for sparse-view CT image reconstruction. Finally, the memory requirements of the Gaussian Splatting representations are reduced by 17% compared to the equivalent voxel grid image representations. ?紙
17. Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion
Authors : Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla
抽象的
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360∘ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance). ?紙
18. Dual-Camera Smooth Zoom on Mobile Phones
Authors : Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo
抽象的
When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. ?紙
19. Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction
Authors : Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano
抽象的
Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer. ?紙
20. SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera
Authors : Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang
抽象的
One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. ?紙
21. [CVPR '24] SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection
Authors : Mathis Kruse, Marco Rudolph, Dominik Woiwode, Bodo Rosenhahn
抽象的
Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set. ?紙|程式碼
22. Reinforcement Learning with Generalizable Gaussian Splatting
Authors : Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu
抽象的
An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL. ?紙
23. DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark
Authors : Tianyi Zhang, Kaining Huang, Weiming Zhi, Matthew Johnson-Roberson
抽象的
Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments. ?紙|代碼| ?簡短介紹 | ? Short Presentation (Bilibili)
24. Adversarial Generation of Hierarchical Gaussians for 3d Generative Model
Authors : Sangeek Hyun, Jae-Pil Heo
抽象的
Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. ?紙|專案頁面
25. Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting
Authors : Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis, Monroe Kennedy III
抽象的
We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. ?紙|項目頁面|程式碼
26. Radiance Fields for Robotic Teleoperation
Authors : Maximum Wilder-Smith, Vaishakh Patil, Marco Hutter
抽象的
Radiance field methods such as Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS), have revolutionized graphics and novel view synthesis. Their ability to synthesize new viewpoints with photo-realistic quality, as well as capture complex volumetric and specular scenes, makes them an ideal visualization for robotic teleoperation setups. Direct camera teleoperation provides high-fidelity operation at the cost of maneuverability, while reconstruction-based approaches offer controllable scenes with lower fidelity. With this in mind, we propose replacing the traditional reconstruction-visualization components of the robotic teleoperation pipeline with online Radiance Fields, offering highly maneuverable scenes with photorealistic quality. As such, there are three main contributions to state of the art: (1) online training of Radiance Fields using live data from multiple cameras, (2) support for a variety of radiance methods including NeRF and 3DGS, (3) visualization suite for these methods including a virtual reality scene. To enable seamless integration with existing setups, these components were tested with multiple robots in multiple configurations and were displayed using traditional tools as well as the VR headset. The results across methods and robots were compared quantitatively to a baseline of mesh reconstruction, and a user study was conducted to compare the different visualization methods. ?紙|項目頁面|程式碼
2023 年:
1. [ECCV '24] FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
Authors : Wen Jiang, Boshu Lei, Kostas Daniilidis
抽象的
This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the limited availability of 2D images poses uncertainties stemming from occlusions, depth ambiguities, and imaging errors. Efficiently selecting informative views becomes crucial, and quantifying NeRF model uncertainty presents intricate challenges. Existing approaches either depend on model architecture or are based on assumptions regarding density distributions that are not generally applicable. By leveraging Fisher Information, we efficiently quantify observed information within Radiance Fields without ground truth data. This can be used for the next best view selection and pixel-wise uncertainty quantification. Our method overcomes existing limitations on model architecture and effectiveness, achieving state-of-the-art results in both view selection and uncertainty quantification, demonstrating its potential to advance the field of Radiance Fields. Our method with the 3D Gaussian Splatting backend could perform view selections at 70 fps. ?紙|項目頁面|程式碼
2. Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering
Authors : Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, Li Zhang
抽象的
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. ?紙|項目頁面|代碼(還沒有)
3. MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians
Authors : Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar
抽象的
Understanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that can cause misalignments resulting in inaccurate contacts. We present MANUS, a method for Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians. We build a novel articulated 3D Gaussians representation that extends 3D Gaussian splatting for high-fidelity representation of articulating hands. Since our representation uses Gaussian primitives, it enables us to efficiently and accurately estimate contacts between the hand and the object. For the most accurate results, our method requires tens of camera views that current datasets do not provide. We therefore build MANUS-Grasps, a new dataset that contains hand-object grasps viewed from 53 cameras across 30+ scenes, 3 subjects, and comprising over 7M frames. In addition to extensive qualitative results, we also show that our method outperforms others on a quantitative contact evaluation method that uses paint transfer from the object to the hand. ?紙
4. [CVPR '24] Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Authors : Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, Song-Hai Zhang
抽象的
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. ?紙|項目頁面|程式碼
5. Mathematical Supplement for the gsplat Library
Authors : Vickie Ye, Angjoo Kanazawa
抽象的
This report provides the mathematical details of the gsplat library, a modular toolbox for efficient differentiable Gaussian splatting, as proposed by Kerbl et al. It provides a self-contained reference for the computations involved in the forward and backward passes of differentiable Gaussian splatting. To facilitate practical usage and development, we provide a user friendly Python API that exposes each component of the forward and backward passes in rasterization of [gsplat](https://github.com/nerfstudio-project/gsplat). ?紙
6. PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation
Authors : Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae
抽象的
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. ?紙|項目頁面|代碼(還沒有)
Regularization and Optimization:
2024 年:
1. DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines
Authors : Sankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Nandita Vijaykumar
抽象的
Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (eg 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods have been demonstrated to be very promising, providing state-of-art quality for many important tasks. However, training a model to represent a scene is still a time-consuming task even when using powerful GPUs. In this work, we observe that the gradient computation phase during training is a significant bottleneck on GPUs due to the large number of atomic operations that need to be processed. These atomic operations overwhelm atomic units in the L2 partitions causing stalls. To address this challenge, we leverage the observations that during the gradient computation: (1) for most warps, all threads atomically update the same memory locations; and (2) warps generate varying amounts of atomic traffic (since some threads may be inactive). We propose DISTWAR, a software-approach to accelerate atomic operations based on two key ideas: First, we enable warp-level reduction of threads at the SM sub-cores using registers to leverage the locality in intra-warp atomic updates. Second, we distribute the atomic computation between the warp-level reduction at the SM and the L2 atomic units to increase the throughput of atomic computation. Warps with many threads performing atomic updates to the same memory locations are scheduled at the SM, and the rest using L2 atomic units. We implement DISTWAR using existing warp-level primitives. We evaluate DISTWAR on widely used raster-based differentiable rendering workloads. We demonstrate significant speedups of 2.44x on average (up to 5.7x). ?紙
2. [CVPR '24] FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Authors : Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing
抽象的
3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (eg, Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently. ?紙
3. RAIN-GS: Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting
Authors : Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kim
抽象的
3D Gaussian splatting (3DGS) has recently demonstrated impressive capabilities in real-time novel view synthesis and 3D reconstruction. However, 3DGS heavily depends on the accurate initialization derived from Structure-from-Motion (SfM) methods. When trained with randomly initialized point clouds, 3DGS often fails to maintain its ability to produce high-quality images, undergoing large performance drops of 4-5 dB in PSNR in general. Through extensive analysis of SfM initialization in the frequency domain and analysis of SfM initialization in the frequency domain and analysis of a 1D regression task with multiple 1D Gaussians, we propose a novel optimization strategy dubbed RAIN-GS (Rcullyid occet 錫ians 來自 randomly initialized點雲。 We show the effectiveness of our strategy through quantitative and qualitative comparisons on standard datasets, largely improving the performance in all settings. ?紙|項目頁面|程式碼
4. A New Split Algorithm for 3D Gaussian Splatting
Authors : Qiyuan Feng, Gengchen Cao, Haoxiang Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu
抽象的
3D高斯潑濺模型作為一種新穎的顯式3D表示,最近已在許多領域得到應用,例如顯式幾何編輯和幾何生成。進展很快。然而,由於其混合尺度和雜亂形狀,3D 高斯濺鍍模型可能會在表面附近產生模糊或針狀效果。同時,3D 高斯噴濺模型往往會展平大的無紋理區域,產生非常稀疏的點雲。這些問題是由3D高斯潑濺模型的不均勻性質引起的,因此在本文中,我們提出了一種新的3D高斯分裂演算法,它可以產生更均勻和表面有界的3D高斯潑濺模型。 Our algorithm splits an N-dimensional Gaussian into two N-dimensional Gaussians.它保證了數學特徵的一致性和外觀的相似性,使生成的3D高斯濺射模型更加均勻,更好地貼合底層表面,從而更適合顯式編輯、點雲提取等任務。同時,我們的 3D 高斯分裂方法具有非常簡單的封閉式解決方案,使其易於適用於任何 3D 高斯模型。 ?紙
5. Revising Densification in Gaussian Splatting
Authors : Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
抽象的
In this paper, we address the limitations of Adaptive Density Control (ADC) in 3D Gaussian Splatting (3DGS), a scene representation method achieving high-quality, photorealistic results for novel view synthesis. ADC has been introduced for automatic 3D point primitive management, controlling densification and pruning, however, with certain limitations in the densification logic. Our main contribution is a more principled, pixel-error driven formulation for density control in 3DGS, leveraging an auxiliary, per-pixel error function as the criterion for densification. We further introduce a mechanism to control the total number of primitives generated per scene and correct a bias in the current opacity handling strategy of ADC during cloning operations. Our approach leads to consistent quality improvements across a variety of benchmark scenes, without sacrificing the method's efficiency. ?紙
2023 年:
1. [CVPRW '24] Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images
Authors : Jaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee
抽象的
In this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide to mitigate overfitting. We obtained the depth map using a pre-trained monocular depth estimation model and aligning the scale and offset using sparse COLMAP feature points. The adjusted depth aids in the color-based optimization of 3D Gaussian splatting, mitigating floating artifacts, and ensuring adherence to geometric constraints. We verify the proposed method on the NeRF-LLFF dataset with varying numbers of few images. Our approach demonstrates robust geometry compared to the original method that relies solely on images. ?紙|項目頁面|程式碼
2. EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Authors : Sharath Girish, Kamal Gupta, Abhinav Shrivastava
抽象的
Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach results in scene representations with fewer Gaussians and quantized representations, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce memory by more than an order of magnitude all while maintaining the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed. ?紙|項目頁面|程式碼
3. [CVPR '24] COLMAP-Free 3D Gaussian Splatting
Authors : Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang
抽象的
While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. ?紙|項目頁面| Code (not yet) | ?簡短的演示
4. iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching
Authors : Yuan Sun, Xuan Wang, Yunfan Zhang, Jie Zhang, Caigui Jiang, Yu Guo, Fei Wang
抽象的
We present a method named iComMa to address the 6D pose estimation problem in computer vision. The conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods address mesh-free 6D pose estimation by employing the inversion of a Neural Radiance Field (NeRF), aiming to overcome the aforementioned constraints. However, it still suffers from adverse initializations. By contrast, we model the pose estimation as the problem of inverting the 3D Gaussian Splatting (3DGS) with both the comparing and matching loss. In detail, a render-and-compare strategy is adopted for the precise estimation of poses. Additionally, a matching module is designed to enhance the model's robustness against adverse initializations by minimizing the distances between 2D keypoints. This framework systematically incorporates the distinctive characteristics and inherent rationale of render-and-compare and matching-based approaches. This comprehensive consideration equips the framework to effectively address a broader range of intricate and challenging scenarios, including instances with substantial angular deviations, all while maintaining a high level of prediction accuracy. Experimental results demonstrate the superior precision and robustness of our proposed jointly optimized framework when evaluated on synthetic and complex real-world data in challenging scenarios. ?紙|程式碼
渲染:
2024 年:
1. [CVPR '24] Gaussian Shadow Casting for Neural Characters
Authors : Luis Bolanos, Shih-Yang Su, Helge Rhodin
抽象的
Neural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density proxy that replaces sampling with a simple analytic formula. It supports dynamic motion and is tailored for shadow computation, thereby avoiding the affine projection approximation and sorting required by the closely related Gaussian splatting. Combined with a deferred neural rendering model, our Gaussian shadows enable Lambertian shading and shadow casting with minimal overhead. We demonstrate improved reconstructions, with better separation of albedo, shading, and shadows in challenging outdoor scenes with direct sun light and hard shadows. Our method is able to optimize the light direction without any input from the user. As a result, novel poses have fewer shadow artifacts and relighting in novel scenes is more realistic compared to the state-of-the-art methods, providing new ways to pose neural characters in novel environments, increasing their applicability. ?紙
2. Optimal Projection for 3D Gaussian Splatting
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
抽象的
3D Gaussian Splatting has garnered extensive attention and application in real-time neural rendering. Concurrently, concerns have been raised about the limitations of this technology in aspects such as point cloud storage, performance , and robustness in sparse viewpoints , leading to various improvements. However, there has been a notable lack of attention to the projection errors introduced by the local affine approximation inherent in the splatting itself, and the consequential impact of these errors on the quality of photo-realistic rendering. This paper addresses the projection error function of 3D Gaussian Splatting, commencing with the residual error from the first-order Taylor expansion of the projection function ϕ. The analysis establishes a correlation between the error and the Gaussian mean position. Subsequently, leveraging function optimization theory, this paper analyzes the function's minima to provide an optimal projection strategy for Gaussian Splatting referred to Optimal Gaussian Splatting. Experimental validation further confirms that this projection methodology reduces artifacts, resulting in a more convincingly realistic rendering. ?紙
3. 360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
抽象的
3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of 360∘ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (eg, walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel 360∘ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios. ?紙
4. StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering
Authors : Lukas Radl, Michael Steiner, Mathias Parger, Alexander Weinrauch, Bernhard Kerbl, Markus Steinberger
抽象的
Gaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces popping and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation, yet a full per-pixel sort proves excessively costly compared to a global sort operation. In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead. Our software rasterizer effectively eliminates popping artifacts and view inconsistencies, as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with popping, ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion. Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled, making our approach 1.6x faster than the original Gaussian Splatting, with a 50% reduction in memory requirements. ?紙|項目頁面|代碼| ?簡短的演示
5. [CVPR '24] GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Authors : Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
抽象的
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes. It is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (eg squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%. ?紙|項目頁面|代碼| ?推介會
6. Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting
Authors : Joongho Jo, Hyeongwon Kim, Jongsun Park
抽象的
3D Gaussian splatting (3D-GS) is a new rendering approach that outperforms the neural radiance field (NeRF) in terms of both speed and image quality. 3D-GS represents 3D scenes by utilizing millions of 3D Gaussians and projects these Gaussians onto the 2D image plane for rendering. However, during the rendering process, a substantial number of unnecessary 3D Gaussians exist for the current view direction, resulting in significant computation costs associated with their identification. In this paper, we propose a computational reduction technique that quickly identifies unnecessary 3D Gaussians in real-time for rendering the current view without compromising image quality. This is accomplished through the offline clustering of 3D Gaussians that are close in distance, followed by the projection of these clusters onto a 2D image plane during runtime. Additionally, we analyze the bottleneck associated with the proposed technique when executed on GPUs and propose an efficient hardware architecture that seamlessly supports the proposed scheme. For the Mip-NeRF360 dataset, the proposed technique excludes 63% of 3D Gaussians on average before the 2D image projection, which reduces the overall rendering computation by almost 38.3% without sacrificing peak-signal-to-noise-ratio (PSNR). The proposed accelerator also achieves a speedup of 10.7x compared to a GPU. ?紙
7. GaussianPro: 3D Gaussian Splatting with Progressive Propagation
Authors : Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen
抽象的
3D高斯潑濺(3DGS)的出現最近為神經渲染領域帶來了一場革命,促進了即時速度的高品質渲染。然而,3DGS 在很大程度上依賴運動結構 (SfM) 技術產生的初始化點雲。當處理不可避免地包含無紋理表面的大型場景時,SfM 技術總是無法在這些表面中產生足夠的點,並且無法為 3DGS 提供良好的初始化。因此,3DGS 面臨優化困難和渲染品質低下的問題。在本文中,受經典多視圖立體 (MVS) 技術的啟發,我們提出了 GaussianPro,這是一種應用漸進傳播策略來指導 3D 高斯的緻密化的新穎方法。 Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR. ?紙|項目頁面|程式碼
8. Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting
Authors : Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin
抽象的
3D 高斯噴射 (3D-GS) 的最新進展不僅促進了透過現代 GPU 光柵化管道的即時渲染,而且還獲得了最先進的渲染品質。然而,儘管 3D-GS 在標準資料集上具有出色的渲染品質和性能,但在精確建模鏡面反射和各向異性組件時經常遇到困難。這個問題源自於球諧函數(SH)表示高頻資訊的能力有限。為了克服這一挑戰,我們引入了 Spec-Gaussian,這是一種利用各向異性球面高斯 (ASG) 外觀場而不是 SH 來對每個 3D 高斯的視圖相關外觀進行建模的方法。此外,我們還開發了從粗到精的訓練策略,以提高學習效率並消除現實場景中因過度擬合而導致的漂浮物。我們的實驗結果表明,我們的方法在渲染品質方面超越了現有方法。借助 ASG,我們顯著提高了 3D-GS 對具有鏡面反射和各向異性分量的場景進行建模的能力,而無需增加 3D 高斯的數量。這項改進擴展了 3D GS 的適用性,可處理具有鏡面和各向異性表面的複雜場景。 ?紙
9. [CVPR '24] VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Authors : Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang
抽象的
Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering. ?紙|項目頁面|程式碼
10. 3D Gaussian Model for Animation and Texturing
Authors : Xiangzhi Eric Wang, Zackary PT Sin
抽象的
3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We propose a modeling that is analogous to typical 3D models, which we call 3D Gaussian Model (3DGM); it provides a manipulatable proxy for novel animation and texture transfer. By binding the 3D Gaussians in texture space and re-projecting them back to world space through implicit shell mapping, we show how our 3D modeling can serve as a valid rendering methodology for interactive applications. It is further noted that recently, 3D mesh reconstruction works have been able to produce high-quality mesh for rendering. Our work, on the other hand, only requires an approximated geometry for rendering an object in high fidelity. Applicationwise, we will show that our proxy-based 3DGM is capable of driving novel animation without animated training data and texture transferring via UV mapping of the 3D Gaussians. We believe the result indicates the potential of our work for enabling interactive applications for 3D Gaussian Splatting. ?紙
11. BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Authors : Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa
抽象的
最近使用 3D 高斯進行場景重建和新穎視圖合成的努力可以在策劃的基準測試中取得令人印象深刻的結果;然而,現實生活中捕捉到的影像通常是模糊的。 In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods.為了解決這個問題,我們提出了模糊不可知高斯潑濺(BAGS)。 BAGS 引入了額外的 2D 建模功能,即使存在影像模糊,也可以重建 3D 一致的高品質場景。具體來說,我們透過模糊提議網路(BPN)估計每像素卷積核來對模糊進行建模。 BPN 旨在考慮場景的空間、顏色和深度變化,以最大限度地提高建模能力。此外,BPN 還提出了一種品質評估掩模,用於指示發生模糊的區域。最後,我們介紹一種由粗到細的內核最佳化方案;這種最佳化方案速度很快,並且避免了由於稀疏點雲初始化而導致的次優解決方案,當我們在模糊影像上應用運動結構時,這種情況經常發生。我們證明 BAGS 在各種具有挑戰性的模糊條件和成像幾何條件下實現了逼真的渲染,同時顯著改進了現有方法。 ?紙|程式碼
12. StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting
Authors : Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu
抽象的
We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. ?紙|項目頁面|程式碼
13. Gaussian Splatting in Style
Authors : Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Tarun Yenamandra, Daniel Cremers
抽象的
Scene stylization extends the work of neural style transfer to three spatial dimensions. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across a multi-view setting. A vast majority of the previous works achieve this by optimizing the scene with a specific style image. In contrast, we propose a novel architecture trained on a collection of style images, that at test time produces high quality stylized novel views. Our work builds up on the framework of 3D Gaussian splatting. For a given scene, we take the pretrained Gaussians and process them using a multi resolution hash grid and a tiny MLP to obtain the conditional stylised views. The explicit nature of 3D Gaussians give us inherent advantages over NeRF-based methods including geometric consistency, along with having a fast training and rendering regime. This enables our method to be useful for vast practical use cases such as in augmented or virtual reality applications. Through our experiments, we show our methods achieve state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data. ?紙
14. BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
Authors : Lingzhe Zhao, Peng Wang, Peidong Liu
抽象的
雖然神經渲染在 3D 場景重建和新穎的視圖合成方面展示了令人印象深刻的能力,但它在很大程度上依賴高品質的清晰影像和準確的相機姿勢。人們提出了多種方法來使用運動模糊影像訓練神經輻射場(NeRF),這些影像在現實場景(例如低光或長時間曝光條件)中經常遇到。然而,NeRF 的隱式表示很難從嚴重運動模糊的影像中準確地恢復複雜的細節,並且無法實現即時渲染。相較之下,3D 高斯分佈的最新進展透過將點雲明確優化為 3D 高斯來實現高品質的 3D 場景重建和即時渲染。在本文中,我們介紹了一種名為BAD-Gaussians(捆綁調整去模糊高斯潑濺)的新穎方法,該方法利用顯式高斯表示並處理相機姿勢不准確的嚴重運動模糊圖像,以實現高質量的場景重建。我們的方法對運動模糊影像的物理影像形成過程進行建模,並在曝光期間恢復相機運動軌跡的同時共同學習高斯參數。在我們的實驗中,我們證明,與之前最先進的去模糊神經渲染方法相比,BAD-Gaussians 不僅在合成資料集和真實資料集上實現了卓越的渲染質量,而且還實現了即時渲染功能。 ?紙|項目頁面|程式碼
15. SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Authors : Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou
抽象的
Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency. ?紙
16. GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Authors : Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari
抽象的
During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces.這種退化顯著影響了與訓練資料中的視點顯著偏離的新穎視圖的渲染品質。為了緩解這個問題,我們提出了一種稱為 GeoGaussian 的新方法。基於從點雲觀察到的平滑連接區域,該方法引入了一種新穎的管道來初始化與表面對齊的薄高斯,其中可以透過精心設計的緻密化策略將特徵轉移到新一代。 Finally, the pipeline ensures that the scene's geometry and texture are maintained through constrained optimization processes with explicit geometry constraints.受益於所提出的架構,3D 高斯的生成能力得到增強,特別是在結構化區域。根據公共資料集的定性和定量評估,我們提出的管道在新穎的視圖合成和幾何重建方面實現了最先進的性能。 ?紙
17. Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Authors : Zhihao Liang, Qi Zhang, Wenbo Hu, Ying Feng, Lei Zhu, Kui Jia
抽象的
The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity. ?紙
18. Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Authors : Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin
抽象的
High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on detailed modelling of the physical image formation process and utilizes velocities estimated using visual-inertial odometry (VIO). Camera poses are considered non-static during the exposure time of a single image frame and camera poses are further optimized in the reconstruction process. We formulate a differentiable rendering pipeline that leverages screen space approximation to efficiently incorporate rolling-shutter and motion blur effects into the 3DGS framework. Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods, thereby advancing 3DGS in naturalistic settings. ?紙|代碼|專案頁面
19. RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
Authors : Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari
抽象的
Recent advances in view synthesis and real-time rendering have achieved photorealistic quality at impressive rendering speeds. While Radiance Field-based methods achieve state-of-the-art quality in challenging scenarios such as in-the-wild captures and large-scale scenes, they often suffer from excessively high compute requirements linked to volumetric rendering. Gaussian Splatting-based methods, on the other hand, rely on rasterization and naturally achieve real-time rendering but suffer from brittle optimization heuristics that underperform on more challenging scenes. In this work, we present RadSplat, a lightweight method for robust real-time rendering of complex scenes.我們的主要貢獻有三。 First, we use radiance fields as a prior and supervision signal for optimizing point-based scene representations, leading to improved quality and more robust optimization. Next, we develop a novel pruning technique reducing the overall point count while maintaining high quality, leading to smaller and more compact scene representations with faster inference speeds. Finally, we propose a novel test-time filtering approach that further accelerates rendering and allows to scale to larger, house-sized scenes. We find that our method enables state-of-the-art synthesis of complex captures at 900+ FPS. ?紙|專案頁面
20. Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Authors : Guangchi Fang, Bing Wang
抽象的
In this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification including blur split and depth reinitialization, and simplification through Gaussian binarization and sampling. These techniques reorganize the spatial positions of the Gaussians, resulting in significant improvements across various datasets and benchmarks in terms of rendering quality, resource consumption, and storage compression. Our proposed Mini-Splatting method integrates seamlessly with the original rasterization pipeline, providing a strong baseline for future research in Gaussian-Splatting-based works. ?紙
21. Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Authors : Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao
抽象的
3D 高斯潑濺 (3DGS) 展示了令人印象深刻的新穎視圖合成結果,同時提高了即時渲染效能。然而,它嚴重依賴初始點雲的質量,導致初始化點不足的區域出現模糊和針狀偽影。這主要歸因於3DGS中的點雲生長條件僅考慮可觀察視點的點的平均梯度大小,因此無法生長對於許多視點可觀察的大高斯,而其中許多僅被邊界覆蓋。為此,我們提出了一種名為 Pixel-GS 的新方法,在計算生長條件期間考慮每個視圖中高斯覆蓋的像素數量。我們將涵蓋的像素數作為權重,對不同視圖的梯度進行動態平均,從而可以促進大高斯的成長。結果,可以更有效地生長初始化點不足的區域內的點,從而實現更準確和詳細的重建。此外,我們提出了一種簡單而有效的策略,根據到相機的距離縮放梯度場,以抑制相機附近飛蚊的生長。大量的定性和定量實驗表明,我們的方法在具有挑戰性的 Mip-NeRF 360 和 Tanks & Temples 資料集上實現了最先進的渲染質量,同時保持即時渲染速度。 ?紙
22. Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections
Authors : Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang
抽象的
Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to previous methods, with a 1000× increase in rendering speed. ?紙|項目頁面| Code (not yet) | ?簡短的演示
23. GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
Authors : Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, Bo Dai
抽象的
Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural implicit surfaces is sparked from the success of neural rendering. Current works either constrain the distribution of density fields or the shape of primitives, resulting in degraded rendering quality and flaws on the learned scene surfaces. The efficacy of such methods is limited by the inherent constraints of the chosen neural representation, which struggles to capture fine surface details, especially for larger, more intricate scenes. To address these issues, we introduce GSDF, a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting (3DGS) representation with neural Signed Distance Fields (SDF). The core idea is to leverage and enhance the strengths of each branch while alleviating their limitation through mutual guidance and joint supervision. We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions, and at the meantime benefits 3DGS rendering with structures that are more aligned with the underlying geometry. ?紙|項目頁面|代碼(還沒有)
24. Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
Authors : Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai
抽象的
The recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularly noticeable in zoom-out views and can lead to inconsistent rendering speeds in scenes with varying details. Moreover, it often struggles to capture the corresponding level of details at different scales with its heuristic density control operation. Inspired by the Level-of-Detail (LOD) techniques, we introduce Octree-GS, featuring an LOD-structured 3D Gaussian approach supporting level-of-detail decomposition for scene representation that contributes to the final rendering results. Our model dynamically selects the appropriate level from the set of multi-resolution anchor points, ensuring consistent rendering performance with adaptive LOD adjustments while maintaining high-fidelity rendering results. ?紙|項目頁面|代碼(還沒有)
25. SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing
Authors : Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao
抽象的
In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS).雖然最先進的方法 Mip-Splatting 需要修改高斯分佈的訓練過程,但我們的方法在測試時起作用並且無需訓練。 Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance.核心技術是在測試期間對每個高斯應用 2D 尺度自適應濾波器。正如 Mip-Splatting 所指出的,在不同頻率下觀察高斯分佈會導致訓練和測試期間高斯尺度之間的不匹配。 Mip-Splatting 使用 3D 平滑和 2D Mip 濾波器解決了這個問題,遺憾的是它們不知道測試頻率。在這項工作中,我們證明了知道測試頻率的二維尺度自適應濾波器可以有效地匹配高斯尺度,從而使高斯本原分佈在不同的測試頻率上保持一致。當消除尺度不一致時,小於場景頻率的取樣率會導致傳統的鋸齒狀,我們建議在測試期間將投影的 2D 高斯整合到每個像素內。這種整合實際上是超級取樣的極限情況,與普通高斯潑濺相比,它顯著提高了抗鋸齒性能。透過使用各種設定以及有界和無界場景的大量實驗,我們證明 SA-GS 的性能與 Mip-Splatting 相當或更好。 Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. ?紙|項目頁面|程式碼
26. Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
抽象的
Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis. Our method optimises 3D Gaussian primitives to accurately model the object's geometry at points of contact. By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map. Touch is particularly useful when considering non-Lambertian objects (eg shiny or reflective surfaces) since contemporary methods tend to fail to reconstruct with fidelity specular highlights. By combining vision and tactile sensing, we achieve more accurate geometry reconstructions with fewer images than prior methods. We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach, offering significant improvements in reconstruction quality. ?紙
27. 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
抽象的
在本文中,我們提出了一種採用 3D 高斯分佈 (3DGS) 的隱式表面重建方法,即 3DGSR,它可以實現具有複雜細節的精確 3D 重建,同時繼承了 3DGS 的高效率和渲染品質。關鍵的見解是將隱式符號距離場 (SDF) 納入 3D 高斯函數中,使它們能夠對齊並聯合優化。首先,我們引入一個可微的 SDF 到不透明度轉換函數,它將 SDF 值轉換為對應的高斯不透明度。此函數連接 SDF 和 3D 高斯,允許統一最佳化並對 3D 高斯施加表面約束。在學習過程中,最佳化 3D 高斯函數為 SDF 學習提供監督訊號,從而能夠重建複雜的細節。然而,這僅在高斯佔據的位置處向 SDF 提供稀疏的監督訊號,這不足以學習連續的 SDF。然後,為了解決這個限制,我們結合了體積渲染,並將渲染的幾何屬性(深度、法線)與從 3D 高斯導出的幾何屬性對齊。這種一致性正則化將監督訊號引入到離散 3D 高斯未覆蓋的位置,有效消除高斯採樣範圍之外的冗餘表面。我們廣泛的實驗結果表明,我們的 3DGSR 方法能夠實現高品質的 3D 表面重建,同時保持 3DGS 的效率和渲染品質。此外,我們的方法與領先的表面重建技術競爭,同時提供更有效率的學習過程和更好的渲染品質。 ?紙|代碼(還沒有)
28. Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
抽象的
3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. ?紙
29. OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
抽象的
Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. To benefit the research community, the code will be made publicly available once the paper is published. ?紙
30. Robust Gaussian Splatting
Authors : François Darmon, Lorenzo Porzi, Samuel Rota-Bulò, Peter Kontschieder
抽象的
In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines. ?紙
31. DeblurGS: Gaussian Splatting for Camera Motion Blur
Authors : Jeongtaek Oh, Jaeyoung Chung, Dongwoo Lee, Kyoung Mu Lee
抽象的
Although significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to realworld applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challenge, we propose DeblurGS, a method to optimize sharp 3D Gaussian Splatting from motion-blurred images, even with the noisy camera pose initialization. We restore a fine-grained sharp scene by leveraging the remarkable reconstruction capability of 3D Gaussian Splatting. Our approach estimates the 6-Degree-of-Freedom camera motion for each blurry observation and synthesizes corresponding blurry renderings for the optimization process. Furthermore, we propose Gaussian Densification Annealing strategy to prevent the generation of inaccurate Gaussians at erroneous locations during the early training stages when camera motion is still imprecise. Comprehensive experiments demonstrate that our DeblurGS achieves state-of-the-art performance in deblurring and novel view synthesis for real-world and synthetic benchmark datasets, as well as field-captured blurry smartphone videos. ?紙
32. StylizedGS: Controllable Stylization for 3D Gaussian Splatting
Authors : Dingxi Zhang, Zhuoxun Chen, Yu-Jie Yuan, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, Lin Gao
抽象的
With the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user experience and the implicit nature limits its ability to transfer the geometric pattern styles. Additionally, the ability for artists to exert flexible control over stylized scenes is considered highly desirable, fostering an environment conducive to creative exploration. In this paper, we introduce StylizedGS, a 3D neural style transfer framework with adaptable control over perceptual factors based on 3D Gaussian Splatting (3DGS) representation. The 3DGS brings the benefits of high efficiency. We propose a GS filter to eliminate floaters in the reconstruction which affects the stylization effects before stylization. Then the nearest neighbor-based style loss is introduced to achieve stylization by fine-tuning the geometry and color parameters of 3DGS, while a depth preservation loss with other regularizations is proposed to prevent the tampering of geometry content. Moreover, facilitated by specially designed losses, StylizedGS enables users to control color, stylized scale and regions during the stylization to possess customized capabilities. Our method can attain high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls. Extensive experiments across various scenes and styles demonstrate the effectiveness and efficiency of our method concerning both stylization quality and inference FPS. ?紙
33. LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field
Authors : Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He
抽象的
Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. ?紙|項目頁面|程式碼
34. GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting
Authors : Junseo Lee, Seokwon Lee, Jungi Lee, Junyong Park, Jaewoong Sim
抽象的
This paper presents GSCore, a hardware acceleration unit that efficiently executes the rendering pipeline of 3D Gaussian Splatting with algorithmic optimizations. GSCore builds on the observations from an in-depth analysis of Gaussian-based radiance field rendering to enhance computational efficiency and bring the technique to wide adoption. In doing so, we present several optimization techniques, Gaussian shape-aware intersection test, hierarchical sorting, and subtile skipping, all of which are synergistically integrated with GSCore. We implement the hardware design of GSCore, synthesize it using a commercial 28nm technology, and evaluate the performance across a range of synthetic and real-world scenes with varying image resolutions. Our evaluation results show that GSCore achieves a 15.86× speedup on average over the mobile consumer GPU with a substantially smaller area and lower energy consumption. ?紙| ?簡短的演示
2023 年:
1. Mip-Splatting Alias-free 3D Gaussian Splatting
Authors : Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger
抽象的
Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, eg, by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our comprehensive evaluation, including scenarios such as training on single-scale images and testing on multiple scales, validates the effectiveness of our approach. ?紙|項目頁面|程式碼
2. Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing
Authors : Jian Gao, Chun Gu, Youtian Lin, Hao Zhu, Xun Cao, Li Zhang, Yao Yao
抽象的
我們提出了一種新穎的可微分基於點的渲染框架,用於從多視圖影像中進行材質和照明分解,從而實現 3D 點雲的編輯、光線追蹤和即時重新照明。具體來說,3D 場景表示為一組可重新照明的 3D 高斯點,其中每個點也與法線方向、BRDF 參數和來自不同方向的入射光相關聯。為了實現穩健的照明估計,我們進一步將每個點的入射光劃分為全局和局部組件,以及依賴視圖的可見性。 3D 場景透過 3D 高斯噴射技術進行最佳化,同時 BRDF 和光照透過基於物理的可微分渲染進行分解。此外,我們還引入了一種基於邊界體積層次結構的創新點光線追蹤方法,以實現高效的可見性烘焙,從而實現即時渲染和重新照明 3D 高斯點,並具有準確的陰影效果。大量實驗證明,與最先進的材質估計方法相比,BRDF 估計得到了改進,並且具有新穎的視圖渲染結果。我們的框架展示了透過僅基於點雲的可重新點亮、可追蹤和可編輯的渲染管道徹底改變基於網格的圖形管道的潛力。 ?紙|項目頁面|程式碼
3. [CVPR '24] GS-IR: 3D Gaussian Splatting for Inverse Rendering
Authors : Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia
抽象的
We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (eg NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (eg rasterization and splatting) cannot trace the occlusion like backward mapping (eg ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes. ?紙|項目頁面|代碼(還沒有)
4. [CVPR '24] Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Authors : Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee
抽象的
3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13%-66% PSNR and 160%-2400% rendering speed improvement at 4×-128× scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splatting. ?紙
5. [CVPR '24] GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Authors : Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, Yuexin Ma
抽象的
The advent of neural 3D Gaussians has recently brought about a revolution in the field of neural rendering, facilitating the generation of high-quality renderings at real-time speeds. However, the explicit and discrete representation encounters challenges when applied to scenes featuring reflective surfaces. In this paper, we present GaussianShader, a novel method that applies a simplified shading function on 3D Gaussians to enhance the neural rendering in scenes with reflective surfaces while preserving the training and rendering efficiency. The main challenge in applying the shading function lies in the accurate normal estimation on discrete 3D Gaussians. Specifically, we proposed a novel normal estimation framework based on the shortest axis directions of 3D Gaussians with a delicately designed loss to make the consistency between the normals and the geometries of Gaussian spheres. Experiments show that GaussianShader strikes a commendable balance between efficiency and visual quality. Our method surpasses Gaussian Splatting in PSNR on specular object datasets, exhibiting an improvement of 1.57dB. When compared to prior works handling reflective surfaces, such as Ref-NeRF, our optimization time is significantly accelerated (23h vs. 0.58h). Please click on our project website to see more results. ?紙|項目頁面|程式碼
6. [CVPR '24] Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Authors : Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai
抽象的
Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects.我們引入了 Scaffold-GS,它使用錨點來分佈局部 3D 高斯,並根據視錐體內的觀察方向和距離動態預測它們的屬性。 Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed. ?紙|項目頁面| Code https://github.com/maturk/dn-splatter
7. Deblurring 3D Gaussian Splatting
Authors : Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park
抽象的
輻射場的最新研究以其逼真的渲染品質為新穎的視圖合成鋪平了道路。然而,它們通常採用神經網路和體積渲染,訓練成本高昂,並且由於渲染時間長而阻礙了它們在各種即時應用中的廣泛使用。 Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. ?紙|項目頁面|代碼(還沒有)
8. GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization
Authors : Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jingtuo Liu, Liangjun Zhang, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang
抽象的
This paper presents GIR, a 3D Gaussian Inverse Rendering method for relightable scene factorization. Compared to existing methods leveraging discrete meshes or neural implicit fields for inverse rendering, our method utilizes 3D Gaussians to estimate the material properties, illumination, and geometry of an object from multi-view images. Our study is motivated by the evidence showing that 3D Gaussian is a more promising backbone than neural fields in terms of performance, versatility, and efficiency. In this paper, we aim to answer the question: "How can 3D Gaussian be applied to improve the performance of inverse rendering?" To address the complexity of estimating normals based on discrete and often in-homogeneous distributed 3D Gaussian representations, we proposed an efficient self-regularization method that facilitates the modeling of surface normals without the need for additional supervision. To reconstruct indirect illumination, we propose an approach that simulates ray tracing. Extensive experiments demonstrate our proposed GIR's superior performance over existing methods across multiple tasks on a variety of widely used datasets in inverse rendering. This substantiates its efficacy and broad applicability, highlighting its potential as an influential tool in relighting and reconstruction. ?紙|項目頁面|代碼(還沒有)
9. Gaussian Splatting with NeRF-based Color and Opacity
Authors : Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir Tadeja, Przemysław Spurek
抽象的
神經輻射場 (NeRF) 已經證明了神經網路在捕捉 3D 物體的複雜性方面的巨大潛力。透過在神經網路權重中編碼形狀和顏色訊息,NeRF 擅長產生 3D 物件的極其清晰的新穎視圖。最近,出現了許多利用生成模型的 NeRF 泛化,擴大了其多功能性。相比之下,高斯潑濺 (GS) 提供了類似的渲染品質以及更快的訓練和推理,因為它不需要神經網路來工作。我們對高斯分佈集中的 3D 物件資訊進行編碼,這些資訊可以像經典網格一樣以 3D 方式渲染。不幸的是,GS 很難調節,因為它們通常需要大約十萬個高斯分量。為了減輕這兩種模型的缺陷,我們提出了一種混合模型,該模型使用 3D 物件形狀的 GS 表示以及基於 NeRF 的顏色和不透明度編碼。 Our model uses Gaussian distributions with trainable positions (ie means of Gaussian), shape (ie covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity.因此,我們的模型可以更好地描述 3D 物件的陰影、光反射和透明度。 ?紙|程式碼
評論:
2024 年:
1. Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human
Authors : Song Bai, Jie Li
抽象的
While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future. ?紙
2. A Survey on 3D Gaussian Splatting
Authors : Guikun Chen, Wenguan Wang
抽象的
3D Gaussian splatting (3D GS) has recently emerged as a transformative technique in the explicit radiance field and computer graphics landscape. This innovative approach, characterized by the utilization of millions of 3D Gaussians, represents a significant departure from the neural radiance field (NeRF) methodologies, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS 憑藉其明確的場景表示和可微的渲染演算法,不僅保證了即時渲染功能,而且還引入了前所未有的控制和可編輯性水平。這使得 3D GS 成為下一代 3D 重建和表示的潛在遊戲規則改變者。在本文中,我們首次系統性地概述了 3D GS 領域的最新發展和關鍵貢獻。我們首先詳細探討 3D GS 出現背後的基本原理和驅動力,為理解其重要性奠定基礎。我們討論的一個焦點是 3D GS 的實際適用性。透過促進即時效能,3D GS 開啟了從虛擬實境到互動式媒體等多種應用。對此進行了補充,對領先的 3D GS 模型進行了比較分析,並在各種基準任務中進行了評估,以突出其性能和實用性。該調查最後確定了當前的挑戰並提出了該領域未來研究的潛在途徑。透過這項調查,我們的目標是為新手和經驗豐富的研究人員提供寶貴的資源,促進在適用和明確的輻射場表示方面的進一步探索和進步。 ?紙
3. 3D Gaussian as a New Vision Era: A Survey
Authors : Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, Ying He
抽象的
3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural networks, such as Neural Radiance Fields (NeRF). This technique has found diverse applications in areas such as robotics, urban mapping, autonomous navigation, and virtual reality/augmented reality, just name a few. Given the growing popularity and expanding research in 3D Gaussian Splatting, this paper presents a comprehensive survey of relevant papers from the past year. We organize the survey into taxonomies based on characteristics and applications, providing an introduction to the theoretical underpinnings of 3D Gaussian Splatting. Our goal through this survey is to acquaint new researchers with 3D Gaussian Splatting, serve as a valuable reference for seminal works in the field, and inspire future research directions, as discussed in our concluding section. ?紙
4. Neural Fields in Robotics: A Survey
Authors : Muhammad Zubair Irshad, Mauro Comi, Yen-Chen Lin, Nick Heppert, Abhinav Valada, Zsolt Kira, Rares Ambrus, Johnathan Trembley
抽象的
神經場已成為電腦視覺和機器人技術中 3D 場景表示的變革性方法,可以從 2D 數據中準確推斷幾何形狀、3D 語義和動力學。 Leveraging differentiable rendering, Neural Fields encompass both continuous implicit and explicit neural representations enabling high-fidelity 3D reconstruction, integration of multi-modal sensor data, and generation of novel viewpoints. This survey explores their applications in robotics, emphasizing their potential to enhance perception, planning, and control. Their compactness, memory efficiency, and differentiability, along with seamless integration with foundation and generative models, make them ideal for real-time applications, improving robot adaptability and decision-making. This paper provides a thorough review of Neural Fields in robotics, categorizing applications across various domains and evaluating their strengths and limitations, based on over 200 papers. First, we present four key Neural Fields frameworks: Occupancy Networks, Signed Distance Fields, Neural Radiance Fields, and Gaussian Splatting. Second, we detail Neural Fields' applications in five major robotics domains: pose estimation, manipulation, navigation, physics, and autonomous driving, highlighting key works and discussing takeaways and open challenges. Finally, we outline the current limitations of Neural Fields in robotics and propose promising directions for future research. ?紙
5. How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey
Authors : Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi
抽象的
Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges. ?紙
6. Recent Advances in 3D Gaussian Splatting
Authors : Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao
抽象的
3D高斯濺鍍(3DGS)的出現,大大加快了新穎視圖合成的渲染速度。與使用位置和視點條件神經網路表示3D 場景的神經輻射場(NeRF) 等神經隱式表示不同,3D 高斯潑濺利用一組高斯橢球體對場景進行建模,以便透過將高斯橢球體光柵化為高斯橢球體來實現高效渲染影像。除了渲染速度快之外,3D 高斯分佈的明確表示還有助於動態重建、幾何編輯和物理模擬等編輯任務。考慮到該領域的快速變化和不斷增長的工作數量,我們對最近的 3D 高斯分佈方法進行了文獻綜述,這些方法可以按功能大致分為 3D 重建、3D 編輯和其他下游應用。也說明了傳統的基於點的渲染方法和 3D 高斯分佈的渲染公式,以便更好地理解該技術。本次調查旨在幫助初學者快速進入該領域,並為經驗豐富的研究人員提供全面的概述,從而可以刺激 3D 高斯潑濺表示的未來發展。 ?紙
7. Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
Authors : Anurag Dalal, Daniel Hagen, Kjell G. Robbersmyr, Kristian Muri Knausgård
抽象的
Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting. ?紙
滿貫:
2024 年:
1. SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Authors : Mingrui Li, Shuhong Liu, Heng Zhou
抽象的
Semantic understanding plays a crucial role in Dense Simultaneous Localization and Mapping (SLAM), facilitating comprehensive scene interpretation. Recent advancements that integrate Gaussian Splatting into SLAM systems have demonstrated its effectiveness in generating high-quality renderings through the use of explicit 3D Gaussian representations. Building on this progress, we propose SGS-SLAM, the first semantic dense visual SLAM system grounded in 3D Gaussians, which provides precise 3D semantic segmentation alongside high-fidelity reconstructions. Specifically, we propose to employ multi-channel optimization during the mapping process, integrating appearance, geometric, and semantic constraints with key-frame optimization to enhance reconstruction quality. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and semantic segmentation, outperforming existing methods meanwhile preserving real-time rendering ability. ?紙
2. SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM
Authors : Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang
抽象的
We propose SemGauss-SLAM, the first semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering in real-time. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift and improve reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to more robust tracking and consistent mapping. Our SemGauss-SLAM method demonstrates superior performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in novel-view semantic synthesis and 3D semantic mapping. ?紙
3. Compact 3D Gaussian Splatting For Dense Visual SLAM
Authors : Tianchen Deng, Yaohui Chen, Leyan Zhang, Jianfei Yang, Shenghai Yuan, Danwei Wang, Weidong Chen
抽象的
Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, ie, the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation. ?紙
4. NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting
Authors : Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie
抽象的
We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping. ?紙
5. High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization
Authors : Shuo Sun, Malcolm Mielle, Achim J. Lilienthal, Martin Magnusson
抽象的
We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM. ?紙
6. RGBD GS-ICP SLAM
Authors : Seongbo Ha, Jiung Yeon, Hyeonwoo Yu
抽象的
Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map. ?紙|代碼| ?簡短的演示
7. EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting
Authors : Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, Wei Shen
抽象的
Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Mapping) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperative applications among endoscopic surgeries. In this paper, we introduce EndoGSLAM, an efficient SLAM approach for endoscopic surgeries, which integrates streamlined Gaussian representation and differentiable rasterization to facilitate over 100 fps rendering speed during online camera tracking and tissue reconstructing. Extensive experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches, showing tremendous potential for endoscopic surgeries ?紙|項目頁面|代碼(還沒有)
8. CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Authors : Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui
抽象的
Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, ie, CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available. ?紙|項目頁面|代碼(還沒有)
9. MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
Authors : Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu
抽象的
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. ?紙|項目頁面|代碼(還沒有)
10. Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting
Authors : Xiaolei Lang, Laijian Li, Hang Zhang, Feng Xiong, Mu Xu, Yong Liu, Xingxing Zuo, Jiajun Lv
抽象的
We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems. ?紙
11. RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting
Authors : Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou
抽象的
We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view srack performance in the 實時準確性。 ?紙|項目頁面|程式碼
12. [3DV '25] LoopSplat: Loop Closure by Registering 3D Gaussian Splats
Authors : Liyuan Zhu, Yue Li, Erik Sandström, Shengyu Huang, Konrad Schindler, Iro Armeni
抽象的
Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM. ?紙|項目頁面|程式碼
13. MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation
Authors : Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu
抽象的
新興的3D 場景表示,例如神經輻射場(NeRF) 和3D 高斯分佈(3DGS),已經證明了它們在同步定位和建圖(SLAM) 中實現照片級真實感渲染的有效性,特別是在使用高質量視頻序列作為輸入時。然而,現有的方法很難處理運動模糊幀,這在低光或長時間曝光條件等現實場景中很常見。這通常會導致相機定位精度和地圖重建品質顯著降低。 To address this challenge, we propose a dense visual SLAM pipeline (ie MBA-SLAM) to handle severe motion-blurred inputs.我們的方法將高效的運動模糊感知追蹤器與神經輻射場或基於高斯濺射的映射器整合在一起。透過精確建模運動模糊影像的實體影像形成過程,我們的方法同時學習 3D 場景表示並估計相機在曝光期間的局部軌跡,從而能夠主動補償由相機運動引起的運動模糊。在我們的實驗中,我們證明MBA-SLAM 在相機定位和地圖重建方面都超越了以前最先進的方法,在一系列數據集上展示了卓越的性能,包括具有清晰圖像的合成數據集和真實數據集以及受影響的數據集透過運動模糊,突出了我們方法的多功能性和穩健性。 ?紙|項目頁面|代碼(還沒有)
2023 年:
1. [CVPR '24] GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Authors : Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, Xuelong Li
抽象的
在本文中,我們介紹了 GS-SLAM,它首先在同步定位與建圖 (SLAM) 系統中利用 3D 高斯表示。它有利於效率和準確性之間更好的平衡。 Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas.此策略對於擴展 3D 高斯表示以重建整個場景而不是在現有方法中合成靜態物件至關重要。此外,在姿態追蹤過程中,設計了一種有效的從粗到精的技術來選擇可靠的3D高斯表示來優化相機姿態,從而減少運行時間和魯棒估計。與 Replica、TUM-RGBD 資料集上現有最先進的即時方法相比,我們的方法實現了具有競爭力的效能。 The source code will be released upon acceptance. ?紙
2. [CVPR '24] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Authors : Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten
抽象的
Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2× state-of-theart performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing highing hight.地圖。 ?紙|項目頁面|代碼| ?解說視頻
3. [CVPR '24] Gaussian Splatting SLAM
Authors : Hidenobu Matsuki, Riku Murai, Paul HJ Kelly, Andrew J. Davison
抽象的
我們首次將 3D 高斯分佈應用於使用單一移動單眼或 RGB-D 相機進行增量 3D 重建。 Our Simultaneous Localisation and Mapping (SLAM) method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering.需要多項創新才能從即時攝影機持續重建高保真度的 3D 場景。 First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence.其次,透過利用高斯的顯式性質,我們引入幾何驗證和正則化來處理增量 3D 密集重建中出現的歧義。 Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation, but also reconstruction of tiny and even transparent objects. ?紙|項目頁面|代碼| ?簡短的演示
4. Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Authors : Vladimir Yugay, Yue Li, Theo Gevers, Martin R. Oswald
抽象的
We present the first neural RGBD SLAM method capable of photorealistically reconstructing real-world scenes. Despite modern SLAM methods achieving impressive results on synthetic datasets, they still struggle with real-world datasets. Our approach utilizes 3D Gaussians as a primary unit for our scene representation to overcome the limitations of the previous methods. We observe that classical 3D Gaussians are hard to use in a monocular setup: they can't encode accurate geometry and are hard to optimize with single-view sequential supervision. By extending classical 3D Gaussians to encode geometry, and designing a novel scene representation and the means to grow, and optimize it, we propose a SLAM system capable of reconstructing and rendering real-world datasets without compromising on speed and efficiency. We show that Gaussian-SLAM can reconstruct and photorealistically render real-world scenes. We evaluate our method on common synthetic and real-world datasets and compare it against other state-of-the-art SLAM methods. Finally, we demonstrate, that the final 3D scene representation that we obtain can be rendered in Real-time thanks to the efficient Gaussian Splatting rendering. ?紙|項目頁面|代碼| ?簡短的演示
5. [CVPR '24] Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Authors : Huajian Huang, Longwei Li, Hui Cheng, Sai-Kit Yeung
抽象的
The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, eg, PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications. ?紙|項目頁面|程式碼
疏:
2024 年:
1. [CVPR '24] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Authors : Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu
抽象的
輻射場在從稀疏輸入視圖合成新穎視圖方面表現出了令人印象深刻的性能,但流行的方法存在訓練成本高且推理速度慢的問題。本文介紹了 DNGaussian,一種基於 3D 高斯輻射場的深度正則化框架,以低成本提供即時、高品質的少樣本新穎視圖合成。我們的動機源於最近 3D Gaussian Splatting 的高效表示和令人驚訝的質量,儘管當輸入視圖減少時它會遇到幾何退化。在高斯輻射場中,我們發現場景幾何形狀的退化主要與高斯圖元的定位有關,並且可以透過深度約束來緩解。 Consequently, we propose a Hard and Soft Depth Regularization to restore accurate scene geometry under coarse monocular depth supervision while maintaining a fine-grained color appearance. To further refine detailed geometry reshaping, we introduce Global-Local Depth Normalization, enhancing the focus on small local depth changes.在LLFF、DTU 和Blender 資料集上進行的大量實驗表明,DNGaussian 的性能優於最先進的方法,可以顯著降低記憶體成本,減少25 倍的訓練時間,並將渲染速度提高3000 倍以上,從而獲得可比或更好的結果。 ?紙|項目頁面|代碼| ?簡短的演示
2. Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting
Authors : Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III
抽象的
In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touchone in a few-view on y better results than vision or touchone in a few-view on swynque透明物體。 ?紙|專案頁面
3. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Authors : Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
抽象的
We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane sweeping in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian primitives' opacities, covariances, and spherical harmonics coefficients jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussian Splatting models via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, our model achieves state-of-the-art performance with the fastest feed-forward inference speed (22 fps). Compared to the latest state-of-the-art method pixelSplat, our model uses 10× fewer parameters and infers more than 2× faster while providing higher appearance and geometry quality as well as better cross-dataset generalization. ?紙|項目頁面|程式碼
4. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Authors : Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen
抽象的
我們提出了 LatentSplat,一種在 3D 潛在空間中預測語義高斯的方法,可以透過輕量級產生 2D 架構進行展開和解碼。 Existing methods for generalizable 3D reconstruction either do not enable fast inference of high resolution novel views due to slow volume rendering, or are limited to interpation of closeeven input views, 5,geneler is possible 。 In this work, we combine a regression-based approach with a generative model, moving towards both of these capabilities within the same method, trained purely on readily available real video data. The core of our method are variational 3D Gaussians, a representation that efficiently encodes varying uncertainty within a latent space consisting of 3D feature Gaussians. From these Gaussians, specific instances can be sampled and rendered via efficient Gaussian splatting and a fast, generative decoder network. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data. ?紙|項目頁面|程式碼
5. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Authors : Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein
抽象的
我們引入了 GRM,一種大型重建器,能夠在大約 0.1 秒內從稀疏視圖影像中恢復 3D 資源。 GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, ie, text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. ?紙|項目頁面|程式碼
6. Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Authors : Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang
抽象的
We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU. ?紙
7. CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Authors : Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari
抽象的
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (eg, 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes. ?紙|專案頁面
8. InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds
Authors : Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang
抽象的
While novel view synthesis (NVS) has made substantial progress in 3D computer vision, it typically requires an initial estimation of camera intrinsics and extrinsics from dense viewpoints. This pre-processing is usually conducted via a Structure-from-Motion (SfM) pipeline, a procedure that can be slow and unreliable, particularly in sparse-view scenarios with insufficient matched features for accurate reconstruction. In this work, we integrate the strengths of point-based representations (eg, 3D Gaussian Splatting, 3D-GS) with end-to-end dense stereo models (DUSt3R) to tackle the complex yet unresolved issues in NVS under unconstrained settings, which encompasses pose-free and sparse view challenges. Our framework, InstantSplat, unifies dense stereo priors with 3D-GS to build 3D Gaussians of large-scale scenes from sparseview & pose-free images in less than 1 minute. Specifically, InstantSplat comprises a Coarse Geometric Initialization (CGI) module that swiftly establishes a preliminary scene structure and camera parameters across all training views, utilizing globally-aligned 3D point maps derived from a pre-trained dense stereo pipeline. This is followed by the Fast 3D-Gaussian Optimization (F-3DGO) module, which jointly optimizes the 3D Gaussian attributes and the initialized poses with pose regularization. Experiments conducted on the large-scale outdoor Tanks & Temples datasets demonstrate that InstantSplat significantly improves SSIM (by 32%) while concurrently reducing Absolute Trajectory Error (ATE) by 80%. These establish InstantSplat as a viable solution for scenarios involving posefree and sparse-view conditions. ?紙|項目頁面|代碼| ?解說視頻
9. Sp 2 360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
Authors : Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen
抽象的
We aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diffusion models can strongly improve the reconstruction of a scene with low-cost fine-tuning. Specifically, we present SparseSplat360 (Sp2360), a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Due to superior training and rendering speeds, we use an explicit scene representation in the form of 3D Gaussians over NeRF-based implicit representations. We propose an iterative update strategy to fuse generated pseudo novel views with existing 3D Gaussians fitted to the initial sparse inputs. As a result, we obtain a multi-view consistent scene representation with details coherent with the observed inputs. Our evaluation on the challenging Mip-NeRF360 dataset shows that our proposed 2D to 3D distillation algorithm considerably improves the performance of a regularized version of 3DGS adapted to a sparse-view setting and outperforms existing sparse-view reconstruction methods in 360 scene reconstruction. Qualitatively, our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail. ?紙|代碼(還沒有)
2023 年:
1. SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting
Authors : Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi
抽象的
The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360 scenes from sparse training views. We find that using naive depth priors is not sufficient and integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by up to 30.5% and NeRF-based methods by up to 15.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost. ?紙|項目頁面|代碼(還沒有)
2. FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Authors : Zehao Zhu, Zhiwen Fan, Yifan Jiang, Zhangyang Wang
抽象的
Novel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a few-shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender ?紙|項目頁面|程式碼
3. [CVPR '24] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
Authors : David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann
抽象的
We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of nit週場地。 ?紙|項目頁面|程式碼
4. [CVPR '24] Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Authors : Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi
抽象的
We introduce the Splatter Image, an ultra-fast approach for monocular 3D object reconstruction which operates at 38 FPS. Splatter Image is based on Gaussian Splatting, which has recently brought real-time rendering, fast training, and excellent scaling to multi-view reconstruction. For the first time, we apply Gaussian Splatting in a monocular reconstruction setting. Our approach is learning-based, and, at test time, reconstruction only requires the feed-forward evaluation of a neural network. The main innovation of Splatter Image is the surprisingly straightforward design: it uses a 2D image-to-image network to map the input image to one 3D Gaussian per pixel. The resulting Gaussians thus have the form of an image, the Splatter Image. We further extend the method to incorporate more than one image as input, which we do by adding cross-view attention. Owning to the speed of the renderer (588 FPS), we can use a single GPU for training while generating entire images at each iteration in order to optimize perceptual metrics like LPIPS. On standard benchmarks, we demonstrate not only fast reconstruction but also better results than recent and much more expensive baselines in terms of PSNR, LPIPS, and other metrics. ?紙|項目頁面|代碼| ?簡短的演示
導航:
2024 年:
1. GaussNav: Gaussian Splatting for Visual Navigation
Authors : Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li
抽象的
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. ?紙|項目頁面|程式碼
2. 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization
Authors : Peng Jiang, Gaurav Pandey, Srikanth Saripalli
抽象的
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset. ?紙
3. Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF
Authors : Guangyi Liu, Wen Jiang, Boshu Lei, Vivek Pandey, Kostas Daniilidis, Nader Motee
抽象的
This work proposes a novel approach to bolster both the robot's risk assessment and safety measures while deepening its understanding of 3D scenes, which is achieved by leveraging Radiance Field (RF) models and 3D Gaussian Splatting. To further enhance these capabilities, we incorporate additional sampled views from the environment with the RF model. One of our key contributions is the introduction of Risk-aware Environment Masking (RaEM), which prioritizes crucial information by selecting the next-best-view that maximizes the expected information gain. This targeted approach aims to minimize uncertainties surrounding the robot's path and enhance the safety of its navigation. Our method offers a dual benefit: improved robot safety and increased efficiency in risk-aware 3D scene reconstruction and understanding. Extensive experiments in real-world scenarios demonstrate the effectiveness of our proposed approach, highlighting its potential to establish a robust and safety-focused framework for active robot exploration and 3D scene understanding. ?紙
4. 3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration
Authors : Quentin Herau, Moussab Bennehar, Arthur Moreau, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux
抽象的
Reliable multimodal sensor fusion algorithms re- quire accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high compu- tational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new ren- dering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset. ?紙
5. HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Authors : Zhuopeng Li, Yilin Zhang, Chenming Wu, Jianke Zhu, Liangjun Zhang
抽象的
The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas.為了克服這些挑戰,我們提出了一種名為 HO-Gaussian 的混合最佳化方法,它將基於網格的體積與 3DGS 管道結合。 HO-Gaussian 消除了對 SfM 點初始化的依賴,允許渲染城市場景,並結合點密度化來提高訓練期間有問題區域的渲染品質。此外,我們引入高斯方向編碼作為渲染管道中球諧函數的替代方案,從而實現依賴視圖的顏色表示。為了考慮多攝影機系統,我們引入了神經扭曲來增強不同攝影機之間的物件一致性。 Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets. ?紙
6. SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Authors : Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun
抽象的
Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views. ?紙
7. BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting
Authors : Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang
抽象的
影像目標導航使機器人能夠使用視覺提示進行引導,到達捕捉目標影像的位置。然而,目前的方法要么嚴重依賴數據和計算成本昂貴的基於學習的方法,要么由於探索策略不足而在複雜環境中缺乏效率。 To address these limitations, we propose