很棒的 3D 高斯溅射资源
专注于 3D 高斯泼溅的论文和开源资源精选列表,旨在跟上未来几个月预期的研究激增。如果您有任何补充或建议,请随时贡献。也欢迎其他资源,如博客文章、视频等。
目录
- 3D 物体检测
- 自动驾驶
- 头像
- 经典之作
- 压缩
- 扩散
- 动力学和变形
- 编辑
- 语言嵌入
- 网格提取和物理
- 杂项
- 正则化和优化
- 渲染
- 评论
- SLAM
- 疏
- 导航与自动驾驶
- 姿势
- 大规模
- 开源实现
- 参考
- 非官方实施
- 二维高斯泼溅
- 游戏引擎
- 观众
- 公用事业
- 教程
- 框架
- 其他
更新日志:
2024 年 10 月 24 日
2024 年 10 月 16 日
2024 年 9 月 7 日
2024 年 5 月 10 日
- 添加了 18 篇论文:Z-Splat、Dual-Camera、StylizedGS、Hash3D、Revisiting Densification、Gaussian Pancakes、3D-aware Deformable Gaussians、SpikeNVS、零样本 PC 完成、SplatPose、DreamScene360、RealmDreamer、Gaussian-ILC、Reinforcement Learning with GGS 、GoMAvatar、OccGaussian、 LoopGaussian,回顾
2024 年 4 月 11 日
2024 年 4 月 9 日
2024 年 4 月 8 日
- 添加了 3 篇论文:Robust Gaussian Splatting、SC4D 和 MM-Gaussian
2024 年 4 月 5 日
- 添加了 5 篇论文:Surface Reconstruction、TCLC-GS、GaSpCT、OmniGS 和 Per-Gaussian Embedding,
- 修复
2024 年 4 月 2 日
- 添加了 11 篇论文:HO、SGD、HGS、Snap-it、InstantSplat、3DGSR、MM3DGS、HAHA、CityGaussain、Mirror-3DGS 和 Feature Splatting
2024 年 3 月 30 日
- 添加了 8 篇论文:建模不确定性、GRM、Gamba、CoherentGS、TOGS、SA-GS 和 GaussianCube
2024 年 3 月 27 日
- 添加了其他实现:360-gaussian-splatting
- 添加了 CVPR '24 标签
- 添加了 5 篇论文:Comp4D、DreamPolisher、DN-Splatter、2D GS 和 Octree-GS
2024 年 3 月 26 日
- 添加了 13 篇论文:latentSplat、GS on the Move、RadSplat、Mini-Splatting、SyncTweedies、HAC、STAG4D、EndoGSLAM、Pixel-GS、Semantic Gaussians、Gaussian in the Wild、CG-SLAM 和 GSDF
2024 年 3 月 24 日:
2024 年 3 月 20 日:
- 添加了 4 篇论文:GVGEN、HUGS、RGBD GS-ICP SLAM 和 High-Fidelity SLAM
2024 年 3 月 19 日:
- 添加了点面
- 添加了原作者的 3DGS 教程
- 添加了GauStudio
- 添加了 23 篇论文:Touch-GS、GGRt、FDGaussian、SWAG、Den-SOFT、Gaussian-Flow、View-Consistent 3D Editing、BAGS、GeoGaussian、GS-Pose、Analytic-Splatting、Seamless 3D Maps、Texture-GS、Recent Advances 3DGS、用于密集视觉 SLAM 的紧凑型 3DGS、BrightDreamer、3DGS-Reloc、Beyond不确定性、运动感知 3DGS、Fed3DGS、GaussNav、3DGS-Calib 和 NEDS-SLAM
2024 年 3 月 17 日:
- 更新 3DGS.cpp 的存储库名称和链接(最初为 VulkanSplatting)
2024 年 3 月 16 日:
- 斯普拉特电视
- 添加了 6 篇论文:GaussianGrasper、新的分割算法、Controllable Text-to-3D Generation、Spring-Mass 3DGS、Hyper-3DGS 和 DreamScene
2024 年 3 月 14 日:
- 添加了 6 篇论文:SemGauss、StyleGaussian、Gaussian Splatting in Style、GaussCtrl、GaussianImage 和 RAIN-GS
2024 年 3 月 8 日:
- 教程:如何捕获 3DGS 图像
- 添加了 6 篇论文:SplattingAvatar、DNGaussian、Radiative Gaussians、BAGS、GSEdit 和 ManiGaussian
2024 年 3 月 8 日:
2024 年 3 月 6 日:
2024 年 3 月 5 日:
- 添加了 1 篇论文:3DGStream
- 代码发布
- 添加了新查看器
2024 年 3 月 2 日:
- 添加了 1 篇论文:动画和纹理的 3D 高斯模型
- 新部分:同时教授 3DGS 的课程。
2024 年 2 月 28 日:
2024 年 2 月 27 日:
- 添加了 2 篇论文:Spec-Gaussian 和 GEA
- SC-GS 代码发布
2024 年 2 月 24 日:
- 添加了 2 篇论文:识别不必要的高斯和 Gaussian Pro
2024 年 2 月 23 日:
- 更正了 EndoGS 的作者并更新了摘要:利用高斯溅射进行可变形内窥镜组织重建
2024 年 2 月 21 日:
2024 年 2 月 20 日:
- GaussianObject代码发布
- 添加了一篇论文:GaussianHair
2024 年 2 月 19 日:
2024 年 2 月 16 日:
- 添加了 2 篇论文:IM-3D 和 GES
- GaMeS代码发布
2024 年 2 月 14 日:
- 添加了查看器:VulkanSplatting - C++ 和 Vulkan Compute 中的跨平台高性能 3DGS 渲染器
2024 年 2 月 13 日:
- 代码发布:(2024 年 1 月 16 日)使用 4D 高斯泼溅进行实时真实感动态场景表示和渲染
- 添加了 3 篇论文:3DGala、ImplicitDeepFake 和 3D Gaussians as a New Vision Era。
2024 年 2 月 9 日:
2024 年 2 月 8 日:
- 添加了 3 篇论文:Rig3DGS、Mesh-based GS 和 LGM 2024 年 2 月 6 日:
- 添加了 2 篇论文:SGS-SLAM 和 4D Gaussian Splatting
2024 年 2 月 5 日:
- 将 SWAGS 移至动力学和变形部分
- 添加了 2 篇论文:GaussianObject 和 GaMeSh
- GS++ 更名为最佳投影
2024 年 2 月 2 日:
- 添加了 6 篇论文:VR-GS、Segment Anything、Gaussian Splashing、GS++、360-GS 和 StopThePop
- TRIPS 代码发布
2024 年 1 月 30 日:
- 代码更改:GaussianAvatars 代码更改为私有
2024 年 1 月 29 日:
- 添加了 2 篇论文:LIV-GaussMap 和 TIP-Editor
2024 年 1 月 26 日:
- 删除撤回论文:用于高保真人体运动合成的可动画 3D 高斯
- 添加了 3 篇论文:EndoGaussians、PSAvatar 和 GauU-Scene
2024 年 1 月 25 日:
- 添加了查看器:Splatapult - C++ 和 OpenGL 中的 3d 高斯喷射渲染器,可与 OpenXR 配合使用以实现联机 VR
2024 年 1 月 24 日:
- 添加实用程序:SideFX Houdini 的 GSOP(高斯 Splat 运算符)
- 代码发布:GaussianAvatars
2024 年 1 月 23 日:
- 添加了 3 篇论文:Amortized Gen3D、Deformable Endooscopy Tissues、Fastdynamic 3D Object Generation
- 代码发布:动画化身、压缩 3D 高斯、GaussianAvatar
2024 年 1 月 13 日:
- 添加了 4 篇论文:CoSSegGaussians、TRIPS、Gaussian Shadow Casting for Neural Characters 和 DISTWAR
2024 年 1 月 9 日:
- 新增 1 篇论文:A Survey on 3D Gaussian Splatting(第一次调查)
2024 年 1 月 8 日:
- 添加了 4 篇论文:SWAGS(添加了 2023 年的论文,我之前忘记添加了)、第一篇评论论文、压缩的 3DGS 以及表征卫星几何的应用论文。
2024 年 1 月 7 日:
- 1 开源实现:taichi-splatting - 工作最初源自 Taichi 3D Gaussian Splatting,并进行了重大的重新组织和更改。
2024 年 1 月 5 日:
- 添加了 3 篇论文:FMGS、PEGASUS 和 Repaint123。
2024 年 1 月 2 日:
2024 年 1 月 2 日:
- 更新了去模糊高斯论文链接。
- SAGA代码发布。
- 添加了 2023 年的 2 篇论文:Text2Immersion 和 2D-Guided 3DG Segmentation。
- gsplat lib 的数学补充。
- 在类别中添加年份。
- GSM 代码发布。
2023 年 12 月 29 日:
- 添加了 1 篇论文(显然之前漏掉了一篇):Gaussian-Head-Avatar。
- 添加了博客文章头像。
2023 年 12 月 29 日:
- 添加了 3 篇论文:DreamGaussian4D、4DGen 和 Spacetime Gaussian。
2023 年 12 月 27 日:
- 添加了 3 篇论文:LangSplat、Deformable 3DGS 和 Human101。
- 添加了博客文章:3DGS 的综合回顾。
2023 年 12 月 25 日:
- 发布了单目/多视图动态场景代码的高效 3D 高斯表示。
- GPS-高斯代码发布。
2023 年 12 月 24 日:
- 添加了 2 篇论文:自组织高斯网格和高斯分裂。
- 添加了用于增强高斯渲染以建模更复杂场景的存储库。
2023 年 12 月 21 日:
- 添加了 3 篇论文:Splatter Image、pixelSplat 和align your gaussians。
- 高斯分组代码发布。
2023 年 12 月 19 日:
- 添加了 2 篇论文:GAvatar 和 GauFRe。
2023 年 12 月 18 日:
- 添加了实用程序:SpectacularAI - 不同 3DGS 约定的转换脚本。
- SuGaR 代码发布。
2023 年 12 月 16 日:
- 添加了 WebGL 查看器 3:Gauzilla。
2023 年 12 月 15 日:
- 添加了 4 篇论文:DrivingGaussian、iComMa、Triplane 和 3DGS-Avatar。
- Relightable 高斯代码发布。
2023 年 12 月 13 日:
- 添加了 5 篇论文:Gaussian-SLAM、CoGS、ASH、CF-GS 和 Photo-SLAM。
2023 年 12 月 11 日:
- 添加了 2 篇论文:Gaussian Splatting SLAM 和 3D Generation 的去噪分数。
- ScaffoldGS 代码已发布。
2023 年 12 月 8 日:
- 添加了 2 篇论文:EAGLES 和 MonoGaussianAvatar。
2023 年 12 月 7 日:
- LucidDreamer 代码已发布。
- 添加了 9 篇论文:GauHuman、HeadGaS、HiFi4G、Gaussian-Flow、Feature-3DGS、Gaussian-Avatar、FlashAvatar、Relightable 和 Deblurring Gaussians。
2023 年 12 月 5 日:
- 添加了 9 篇论文:NeuSG、GaussianHead、GaussianAvatars、GPS-Gaussian、用于单眼非刚性对象重建的神经参数高斯、SplaTAM、MANUS、Segment Any 和语言嵌入 3D 高斯。
2023 年 12 月 4 日:
- 添加了 8 篇论文:Gaussian Grouping、MD Splatting、DynMF、Scaffold-GS、SparseGS、FSGS、Control4D 和 SC-GS。
2023 年 12 月 1 日:
- 添加了 4 篇论文:Compact3D、GaussianShader、Periodic Vibration Gaussian 和 Gaussian Shell Maps for Efficient 3D Human Generation。
- 为每个类别创建了目录并添加了换行符。
2023 年 11 月 30 日:
- 添加了虚幻游戏引擎实现。
- 添加了 5 篇论文:LightGaussian、FisherRF、HUGS、HumanGaussian、CG3D 和 Multi Scale 3DGS。
2023 年 11 月 29 日:
- 添加了两篇论文:Point and Move 和 IR-GS。
2023 年 11 月 28 日:
- 添加了五篇论文:GaussinEditor、Relightable Gaussians、GART、Mip-Splatting、HumanGaussian。
2023 年 11 月 27 日:
- 添加了两篇论文:Gaussian Editing 和 Compact 3D Gaussians。
2023 年 11 月 25 日:
2023 年 11 月 22 日:
- 添加了 3 篇新的 GS 论文:Animatable、Depth-Regularized 和单目/多视图 3DGS。
- 添加了一些经典论文。
- 添加了另一篇 GS 论文,也称为 LucidDreamer。
2023 年 11 月 21 日:
- 添加了 3 篇新的 GS 论文:GaussianDiffusion、LucidDreamer、PhysGaussian。
- 新增 2 篇 GS 论文:SuGaR、PhysGaussian。
2023 年 11 月 21 日:
2023 年 11 月 17 日:
- 将 PlayCanvas 实现添加到游戏引擎部分。
2023 年 11 月 16 日:
- 发布可变形 3D 高斯代码。
- 添加了可驾驶的 3D 高斯头像纸。
2023 年 11 月 8 日:
- 关于 3DGS 实现和 unsive/rsal 格式讨论的一些注释。
2023 年 11 月 4 日:
- 添加了 2D 高斯泼溅。
- 添加了非常详细的(技术)博客文章,解释 3D 高斯泼溅。
2023 年 10 月 28 日:
- 添加了实用程序部分。
- 添加了 3DGS 转换器,用于在 Cloud Compare to Utilities 中编辑 3DGS .ply 文件。
- 添加了 Kapture(用于捆绑器到 colmap 模型转换)和 Kapture 图像裁剪器脚本,以及实用程序的转换说明。
2023 年 10 月 23 日:
- 添加了 python WebGL 查看器 2。
- 添加了高斯泼溅(和 Unity 查看器)视频博客的介绍。
2023 年 10 月 21 日:
- 添加了 python OpenGL 查看器。
- 添加了 typescript WebGPU 查看器。
2023 年 10 月 20 日:
- 使摘要可读(删除连字符)。
- 添加了 Windows 教程。
- 其他小的文本修复。
- 添加了 Jupyter 笔记本查看器。
2023 年 10 月 19 日:
- 添加了用于实时真实感动态场景表示的 Github 页面链接。
- 重新排列标题。
- 添加了其他非官方实现。
- 将 Nerfstudio gsplat 和 fast: C++/CUDA 移至非官方实现。
- 添加了 Nerfstudio、Blender、WebRTC、iOS 和 Metal 查看器。
2023 年 10 月 17 日:
- GaussianDreamer 代码发布。
- 添加了实时真实感动态场景表示。
2023 年 10 月 16 日:
- 添加了可变形 3D 高斯纸。
- 动态 3D 高斯代码发布。 2023 年 10 月 15 日:包含前 6 篇论文的初始列表。
介绍 3D 高斯分布的开创性论文:
用于实时辐射场渲染的 3D 高斯喷射
作者:Bernhard Kerbl、Georgios Kopanas、Thomas Leimkühler、George Drettakis
抽象的
辐射场方法最近彻底改变了用多张照片或视频捕获的场景的新颖视图合成。然而,实现高视觉质量仍然需要训练和渲染成本高昂的神经网络,而最近更快的方法不可避免地会牺牲速度来换取质量。对于无界且完整的场景(而不是孤立的物体)和1080p分辨率渲染,当前没有方法可以实现实时显示速率。我们引入了三个关键要素,使我们能够在保持有竞争力的训练时间的同时实现最先进的视觉质量,并且重要的是允许在 1080p 分辨率下进行高质量实时(≥ 30 fps)新视图合成。首先,从相机校准期间产生的稀疏点开始,我们用 3D 高斯表示场景,保留连续体积辐射场的所需属性以进行场景优化,同时避免在空白空间中进行不必要的计算;其次,我们对 3D 高斯进行交错优化/密度控制,特别是优化各向异性协方差以实现场景的准确表示;第三,我们开发了一种快速可见性感知渲染算法,该算法支持各向异性泼溅,既加速训练又允许实时渲染。我们在几个已建立的数据集上展示了最先进的视觉质量和实时渲染。 ?纸质(低分辨率)| ?纸张(高分辨率)|项目页面|代码| ?简短介绍 | ?解说视频
3D 物体检测
2024年
1. 3DGS-DET:通过边界引导和框聚焦采样增强 3D 高斯泼溅,以实现 3D 物体检测
作者:曹阳、吉元良、徐丹
抽象的
神经辐射场 (NeRF) 广泛用于新颖视图合成,并已适用于 3D 对象检测 (3DOD),为通过视图合成表示进行 3D 对象检测提供了一种有前途的方法。然而,NeRF 面临着固有的局限性:(i) 由于其隐式性质,它对 3DOD 的表示能力有限;(ii) 渲染速度慢。最近,3D 高斯分布 (3DGS) 作为一种显式 3D 表示形式出现,它通过更快的渲染功能解决了这些限制。受这些优点的启发,本文首次将 3DGS 引入 3DOD,确定了两个主要挑战:(i)高斯斑点的空间分布不明确 - 3DGS 主要依赖于 2D 像素级监督,导致高斯斑点的 3D 空间分布不清晰物体和背景的区分度差,阻碍了 3DOD; (ii) 过多的背景斑点——2D 图像通常包含大量背景像素,导致密集重建的 3DGS 中含有许多代表背景的噪声高斯斑点,对检测产生负面影响。为了应对挑战 (i),我们利用 3DGS 重建源自 2D 图像的事实,并通过结合 2D 边界引导提出了一种优雅而有效的解决方案,以显着增强高斯斑点的空间分布,从而使物体和物体之间的区分更加清晰。他们的背景(见图1)。为了解决挑战 (ii),我们提出了一种以框为中心的采样策略,使用 2D 框生成 3D 空间中的对象概率分布,从而允许在 3D 中进行有效的概率采样以保留更多对象斑点并减少嘈杂的背景斑点。受益于所提出的边界引导和框聚焦采样,我们的最终方法 3DGS-DET 比我们的基本管道版本实现了显着改进([email protected] 上 +5.6,[email protected] 上 +3.7),而无需引入任何额外的可学习参数。此外,3DGS-DET 显着优于最先进的基于 NeRF 的方法 NeRF-Det,在 ScanNet 数据集的 [email protected] 上实现了 +6.6 的改进,在 [email protected] 上实现了 +8.1 的改进,并且在 ScanNet 数据集上实现了令人印象深刻的 +31.5 的改进。 ARKITScenes 数据集的 [email protected]。代码和模型可公开获取:https://github.com/yangcaoai/3DGS-DET。 ?纸|代码(还没有)
自动驾驶:
2024 年:
1. 用于动态城市场景建模的街道高斯
作者:严云志、林浩桐、周晨旭、王伟杰、孙海洋、詹坤、郎贤鹏、周晓伟、彭思达
抽象的
本文旨在解决利用单目视频对动态城市街道场景进行建模的问题。最近的方法通过将履带式车辆姿态与动画车辆相结合来扩展 NeRF,从而实现动态城市街道场景的照片级真实感视图合成。然而,其显着的局限性是训练和渲染速度慢,加上对跟踪车辆姿态的高精度的迫切需求。我们引入了街道高斯,这是一种新的显式场景表示,可以解决所有这些限制。具体来说,动态城市街道被表示为一组配备语义逻辑和 3D 高斯的点云,每个点云与前景车辆或背景相关联。为了对前景物体车辆的动力学进行建模,每个物体点云都通过可优化的跟踪姿势以及动态外观的动态球谐函数模型进行了优化。显式表示允许轻松组合物体车辆和背景,从而允许在训练半小时内以 133 FPS(1066×1600 分辨率)进行场景编辑操作和渲染。所提出的方法在多个具有挑战性的基准上进行了评估,包括 KITTI 和 Waymo Open 数据集。实验表明,所提出的方法在所有数据集上始终优于最先进的方法。此外,尽管仅依赖于现成的跟踪器的姿势,但所提出的表示提供的性能与使用精确的地面真实姿势所实现的性能相当。 ?纸|项目页面|代码(还没有)
2. TCLC-GS:用于周围自动驾驶场景的紧耦合激光雷达相机高斯泼溅
作者:赵程、孙苏、王若愚、郭玉良、万军军、黄周、黄新宇、陈英杰、刘韧
抽象的
大多数针对城市场景的基于 3D 高斯分布 (3D-GS) 的方法直接使用 3D LiDAR 点初始化 3D 高斯,这不仅没有充分利用 LiDAR 数据功能,而且还忽略了将 LiDAR 与相机数据融合的潜在优势。在本文中,我们设计了一种新型紧耦合激光雷达-相机高斯散射(TCLC-GS),以充分利用激光雷达和相机传感器的综合优势,实现快速、高质量的 3D 重建和新颖的视图 RGB/深度合成。 TCLC-GS 设计了从 LiDAR 相机数据派生的混合显式(彩色 3D 网格)和隐式(分层八叉树特征)3D 表示,以丰富 3D 高斯分布的属性。 3D Gaussian 的属性不仅根据 3D 网格进行初始化,提供更完整的 3D 形状和颜色信息,而且还通过检索的八叉树隐式特征赋予更广泛的上下文信息。在高斯泼溅优化过程中,3D 网格提供密集的深度信息作为监督,通过学习稳健的几何形状来增强训练过程。对 Waymo 开放数据集和 nuScenes 数据集进行的综合评估验证了我们的方法的最先进 (SOTA) 性能。利用单个 NVIDIA RTX 3090 Ti,我们的方法演示了快速训练,并在城市场景中以 90 FPS、分辨率 1920x1280 (Waymo) 和 120 FPS、分辨率 1600x900 (nuScenes) 实现实时 RGB 和深度渲染。 ?纸
3. OmniRe:全方位城市场景重建
作者:陈子宇、杨家伟、黄家辉、Riccardo de Lutio、Janick Martinez Esturo、Boris Ivanovic、Or Litany、Zan Gojcic、Sanja Fidler、Marco Pavone、李松、Yue Wang
抽象的
我们推出 OmniRe,这是一种根据设备日志高效重建高保真动态城市场景的整体方法。最近使用神经辐射场或高斯溅射对驾驶序列进行建模的方法已经证明了重建具有挑战性的动态场景的潜力,但经常忽视行人和其他非车辆动态参与者,阻碍了动态城市场景重建的完整管道。为此,我们提出了一个用于驾驶场景的全面 3DGS 框架,名为 OmniRe,它允许对驾驶日志中的各种动态对象进行准确、完整的重建。 OmniRe 基于高斯表示构建动态神经场景图,并构建多个局部规范空间来模拟各种动态参与者,包括车辆、行人和骑自行车的人等。这种能力是现有方法无法比拟的。 OmniRe 使我们能够整体重建场景中存在的不同对象,随后能够模拟所有参与者实时参与的重建场景(~60Hz)。对 Waymo 数据集的广泛评估表明,我们的方法在数量和质量上都远远优于先前最先进的方法。我们相信我们的工作填补了推动重建的关键空白。 ?纸|项目页面|代码
2023 年:
1. [CVPR '24] DrivingGaussian:用于周围动态自动驾驶场景的复合高斯泼溅
作者:周晓宇、林志伟、单晓军、王永涛、孙德庆、杨明轩
抽象的
我们推出 DrivingGaussian,这是一个针对动态自动驾驶场景的高效且有效的框架。对于具有移动物体的复杂场景,我们首先使用增量静态 3D 高斯对整个场景的静态背景进行顺序渐进建模。然后,我们利用复合动态高斯图来处理多个移动对象,单独重建每个对象并恢复它们在场景中的准确位置和遮挡关系。我们进一步使用 LiDAR 先验进行高斯散射来重建具有更多细节的场景并保持全景一致性。 DrivingGaussian 在驾驶场景重建方面优于现有方法,并能够实现具有高保真度和多摄像头一致性的逼真环视合成。 ?纸|项目页面|代码(还没有)
2. [CVPR '24] HUGS:通过高斯泼溅理解整体城市 3D 场景
作者:周宏宇、邵家豪、徐璐、白东风、邱伟超、刘冰冰、王悦、Andreas Geiger、廖依依
抽象的
基于 RGB 图像的城市场景的整体理解是一个具有挑战性但又重要的问题。它包括理解几何和外观,以实现新颖的视图合成、解析语义标签和跟踪移动对象。尽管取得了相当大的进展,但现有方法通常侧重于该任务的特定方面,并且需要额外的输入,例如 LiDAR 扫描或手动注释的 3D 边界框。在本文中,我们介绍了一种利用 3D 高斯分布进行整体城市场景理解的新颖管道。我们的主要思想涉及使用静态和动态 3D 高斯的组合来联合优化几何、外观、语义和运动,其中移动物体的姿势通过物理约束进行正则化。我们的方法能够实时渲染新视点,生成高精度的 2D 和 3D 语义信息,并重建动态场景,即使在 3D 边界框检测噪声很高的情况下也是如此。 KITTI、KITTI-360 和 Virtual KITTI 2 上的实验结果证明了我们方法的有效性。 ?纸|项目页面|代码
头像:
2024 年:
1. GaussianBody:通过 3d 高斯泼溅重建穿着衣服的人体
作者:李梦甜、姚圣祥、谢志峰、陈克宇、蒋玉刚
抽象的
在这项工作中,我们提出了一种基于 3D Gaussian Splatting 的新型服装人体重建方法,称为 GaussianBody。与昂贵的基于神经辐射的模型相比,3D 高斯分布最近在训练时间和渲染质量方面表现出了出色的性能。然而,由于复杂的非刚性变形和丰富的布料细节,将静态 3D 高斯泼溅模型应用于动态人体重建问题并非易事。为了解决这些挑战,我们的方法考虑显式姿势引导变形来关联规范空间和观察空间中的动态高斯,引入基于物理的先验和正则化变换有助于减轻两个空间之间的模糊性。在训练过程中,我们进一步提出了一种姿态细化策略来更新姿态回归,以补偿不准确的初始估计,并提出一种尺度分割机制来增强回归点云的密度。实验验证了我们的方法可以实现最先进的真实感小说视图渲染结果,具有动态穿着人体的高质量细节,以及显式几何重建。 ?纸
2. PSAvatar:基于点的可变形形状模型,用于通过 3D 高斯泼溅创建实时头部头像
作者:赵忠远、鲍振宇、李庆、邱国平、刘康林
抽象的
尽管取得了很大进展,但实现实时高保真头部头像动画仍然很困难,现有方法必须在速度和质量之间进行权衡。基于 3DMM 的方法通常无法对眼镜和发型等非面部结构进行建模,而神经隐式模型则存在变形不灵活和渲染效率低下的问题。尽管3D高斯已被证明在几何表示和辐射场重建方面具有良好的能力,但将3D高斯应用于头部头像创建仍然是一个重大挑战,因为3D高斯很难对因姿势和表情变化而引起的头部形状变化进行建模。在本文中,我们介绍了 PSAvatar,这是一种用于创建动画头部头像的新颖框架,它利用离散几何基元创建参数化可变形形状模型,并采用 3D 高斯进行精细细节表示和高保真度渲染。参数化可变形形状模型是基于点的可变形形状模型(PMSM),它使用点而不是网格进行 3D 表示,以实现增强的表示灵活性。 PMSM 首先通过在表面和网格外进行采样,将 FLAME 网格转换为点,不仅可以重建表面结构,还可以重建复杂的几何形状,例如眼镜和发型。通过以综合分析的方式将这些点与头部形状对齐,PMSM 使得利用 3D 高斯进行精细细节表示和外观建模成为可能,从而能够创建高保真化身。我们证明 PSAvatar 可以重建各种主体的高保真头部头像,并且头像可以实时动画(≥ 25 fps,分辨率为 512 × 512)。 ?纸
3. Rig3DGS:从休闲单目视频创建可控肖像
作者:阿尔弗雷多·里韦罗、沙鲁克·阿塔尔、舒志新、迪米特里斯·萨马拉斯
抽象的
从休闲智能手机视频中创建可控的 3D 人物肖像是非常理想的,因为它们在 AR/VR 应用中具有巨大的价值。 3D Gaussian Splatting (3DGS) 的最新发展显示出渲染质量和训练效率的提高。然而,从单视图捕获中准确建模和分离头部运动和面部表情以实现高质量渲染仍然是一个挑战。在本文中,我们引入 Rig3DGS 来应对这一挑战。我们在规范空间中使用一组 3D 高斯函数来表示整个场景,包括动态主题。使用一组控制信号(例如头部姿势和表情),我们将它们转换到具有学习变形的 3D 空间,以生成所需的渲染。我们的关键创新是精心设计的变形方法,该方法以源自 3D 可变形模型的可学习先验为指导。这种方法在训练中非常高效,并且在控制各种捕获的面部表情、头部位置和视图合成方面非常有效。我们通过广泛的定量和定性实验证明了所学变形的有效性。 ?纸|项目页面
4. HeadStudio:使用 3D 高斯泼溅将文本发送到可动画化的头部头像
作者:周正林、马凡、范赫赫、杨易
抽象的
长期以来,根据文本提示创建数字化身一直是一项令人向往但具有挑战性的任务。尽管在最近的工作中通过 2D 扩散先验获得了有希望的结果,但当前的方法在有效实现高质量和动画化身方面面临着挑战。在本文中,我们介绍了 HeadStudio,这是一种新颖的框架,它利用 3D 高斯喷射从文本提示生成逼真的动画化身。我们的方法在语义上驱动 3D 高斯,通过中间 FLAME 表示创建灵活且可实现的外观。具体来说,我们将 FLAME 合并到 3D 表示和分数蒸馏中:1)基于 FLAME 的 3D 高斯泼溅,通过将每个点绑定到 FLAME 网格来驱动 3D 高斯点。 2)基于FLAME的乐谱蒸馏采样,利用基于FLAME的细粒度控制信号从文本提示中指导乐谱蒸馏。大量的实验证明了 HeadStudio 在根据文本提示生成可动画化身、展现视觉上吸引人的外观方面的功效。虚拟人物能够以 1024 分辨率渲染高质量实时(≥40 fps)新颖的视图。它们可以通过现实世界的语音和视频流畅地控制。我们希望 HeadStudio 能够推进数字化身的创建,并且本方法可以广泛应用于各个领域。 ?纸|项目页面|代码(还没有)
5. ImplicitDeepfake:使用 NeRF 和高斯泼溅通过隐式 Deepfake 生成进行合理的换脸
作者:Georgii Stanishevskii、Jakub Steczkiewicz、Tomasz Szczepanik、Sławomir Tadeja、Jacek Tabor、Przemysław Spurek
抽象的
许多新兴的深度学习技术对计算机图形学产生了重大影响。最有希望的突破是最近兴起的神经辐射场(NeRF)和高斯散射(GS)。 NeRF 使用少量具有已知相机位置的图像在神经网络权重中对对象的形状和颜色进行编码,以生成新颖的视图。相比之下,GS 通过将对象的特征编码在高斯分布集合中,提供加速训练和推理,而不会降低渲染质量。这两种技术已在空间计算和其他领域找到了许多用例。另一方面,deepfake方法的出现引发了相当大的争议。此类技术可以采用人工智能生成的视频形式,非常模仿真实的镜头。使用生成模型,他们可以修改面部特征,从而能够创建改变的身份或面部表情,从而展现出与真人极其逼真的外观。尽管存在这些争议,但 Deepfake 可以在质量理想的情况下为头像创建和游戏提供下一代解决方案。为此,我们展示了如何结合所有这些新兴技术以获得更合理的结果。我们的ImplicitDeepfake1使用经典的deepfake算法分别修改所有训练图像,然后在修改后的面部上训练NeRF和GS。这种相对简单的策略可以产生可信的基于深度伪造的 3D 化身。 ?纸|代码(还没有)
6. GaussianHair:使用光感知高斯进行头发建模和渲染
作者:罗海民、欧阳敏、赵子君、姜素义、张龙文、张启轩、杨伟、徐兰、于静怡
抽象的
发型乍一看就反映了文化和种族。在数字时代,各种逼真的人类发型对于高保真数字人类资产的美观性和包容性也至关重要。然而,由于头发数量庞大、几何结构复杂以及与光线的复杂交互,逼真的头发建模和动画实时渲染是一项艰巨的挑战。本文提出了 GaussianHair,一种新颖的显式头发表示。它可以根据图像对头发几何形状和外观进行全面建模,从而促进创新的照明效果和动态动画功能。 GaussianHair 的核心是一个新颖的概念,即将每根发丝表示为一系列相连的圆柱形 3D 高斯基元。这种方法不仅保留了头发的几何结构和外观,而且还允许在 2D 图像平面上进行有效的光栅化,从而促进可微分体积渲染。我们通过“GaussianHair Scattering Model”进一步增强了该模型,擅长重建发丝的细长结构,并在均匀照明下准确捕捉其局部漫反射颜色。通过大量的实验,我们证实 GaussianHair 在几何和外观保真度方面都取得了突破,超越了最先进的头发重建方法所遇到的限制。除了表示之外,GaussianHair 还支持头发的编辑、重新照明和动态渲染,提供与传统 CG 管道工作流程的无缝集成。为了补充这些进步,我们编制了一个广泛的真实人类头发数据集,每个数据集都具有细致的发丝几何形状,以推动该领域的进一步研究。 ?纸
7. GVA:从单目视频重建生动的 3D 高斯头像
作者:刘新奇、吴晨明、刘嘉伦、刘星、吴金波、赵晨、冯浩成、丁二瑞、王京东
抽象的
在本文中,我们提出了一种新颖的方法,有助于从单目视频输入 (GVA) 创建生动的 3D 高斯头像。我们的创新在于解决提供高保真人体重建并将 3D 高斯与人体皮肤表面准确对齐的复杂挑战。本文的主要贡献是双重的。首先,我们介绍一种姿势细化技术,通过对齐法线贴图和轮廓来提高手部和脚部姿势的准确性。精确的姿势对于正确的形状和外观重建至关重要。其次,我们通过一种新颖的表面引导重新初始化方法来解决先前降低 3D 高斯化身质量的不平衡聚合和初始化偏差问题,该方法可确保 3D 高斯点与化身表面的精确对齐。实验结果表明,我们提出的方法实现了高保真、生动的 3D 高斯头像重建。广泛的实验分析可在定性和定量上验证性能,表明它在照片真实的新型视图合成中实现了最新的性能,同时对人体和手部姿势提供了细粒度的控制。 ?纸|项目页面|代码(尚未)
8。[CVPR '24] Splattingavatar:现实的实时人体化身,带网状的高斯裂口
作者:Zhijing Shao,Zhaolong Wang,Zhuang Li,Duotun Wang,Xiangru Lin,Yu Zhang,Mingming粉丝,Zeyu Wang
抽象的
我们提出了splattingavatar,这是一种嵌入在三角形网格上的高斯脱落的感性人类化身的混合3D表示,在现代GPU上呈现300 fps,在移动设备上呈现30 fps。我们将具有明确的网状几何形状和隐式外观建模的虚拟人的运动和外观拆开。高斯人的定义是由三角形网格上的barycentric坐标和位移作为phong表面。我们扩展了提升的优化,以同时在三角形网格上行走时优化高斯人的参数。 Splattingavatar是虚拟人类的混合表示,其中网格代表低频运动和表面变形,而高斯人则接管了高频几何形状和详细的外观。与现有的变形方法依赖于基于MLP的线性混合剥皮(LBS)进行运动不同,我们直接控制着网状高斯人的旋转和翻译,这赋予了其与各种动画技术的兼容性,例如和网状编辑。 Splattingavatar可从全身和头部化身的单眼视频训练,展示了多个数据集的最先进的渲染质量。 ?纸|项目页面|代码| ?简短的演示
9. splatface:高斯splat脸重建利用优化的表面
作者:Zhijing Shao,Zhaolong Wang,Zhuang Li,Duotun Wang,Xiangru Lin,Yu Zhang,Mingming粉丝,Zeyu Wang
抽象的
我们提出了Splatface,这是一种新型的高斯脱落框架,旨在3D人脸重建,而不依赖于准确的预定几何形状。我们的方法旨在同时提供高质量的新型视图渲染和准确的3D网格重建。我们合并了一个通用的3D形态模型(3DMM),以提供表面几何结构,从而可以使用有限的输入图像重建面。我们引入了一种联合优化策略,该策略通过协同的非刚性比对过程可以同时完善高斯和可变形的表面。提出了一种新颖的距离度量,splat-to-surface,以通过考虑高斯位置和协方差来改善对齐方式。表面信息还用于结合世界空间致密过程,从而产生了出色的重建质量。我们的实验分析表明,在新型视图合成中,该方法与其他高斯拆分技术和其他3D重建方法具有竞争力,并在生产具有高几何学精度的3D面部网格中具有竞争力。 ?纸
10。哈哈:高度铰接的高斯人化身,带有纹理网状
作者:Zhijing Shao,Zhaolong Wang,Zhuang Li,Duotun Wang,Xiangru Lin,Yu Zhang,Mingming粉丝,Zeyu Wang
抽象的
我们提出了哈哈 - 一种新颖的方法,可以从单眼输入视频中产生动画化的人类化身。所提出的方法依赖于学习使用高斯脱落和纹理网格之间的权衡,以提高效率和高保真效果。我们证明了它通过SMPL-X参数模型控制的全身人体化身动画和渲染的效率。我们的模型学会了仅在必要的SMPL-X网状区域应用高斯脱落,例如头发和网状衣服。这导致使用最少数量的高斯人用来代表整个化身,并减少了渲染伪像。这使我们能够处理小身体部位的动画,例如传统上无视的手指。我们在两个开放数据集上演示了方法的有效性:快照和X-Humans。我们的方法证明了Snapshotpeople上最先进的PAR重建质量,同时使用了不到三分之一的高斯人。哈哈在定量和定性上都胜过X-Humans的小说姿势的先前最先进的。 ?纸
11。[CVPRW '24] 3D感知生成的对抗网络的高斯脱离解码器
作者:Florian Barthel,Arian Beckmann,Wieland Morgenstern,Anna Hilsmann,Peter Eisert
抽象的
基于NERF的3D感知生成对抗网络(例如EG3D或长颈鹿)在大型代表性变化下显示出很高的渲染质量。但是,对大多数3D应用程序的呈现呈现出几个挑战:首先,NERF渲染的重大计算需求排除了其在低功率设备(例如Mobiles和VR/AR耳机)上的使用。其次,基于神经网络的隐性表示很难将其纳入显式3D场景,例如VR环境或视频游戏。 3D高斯裂(3DGS)通过提供可以在高帧速率下有效渲染的显式3D表示来克服这些限制。在这项工作中,我们提出了一种新颖的方法,结合了基于NERF的3D感知生成对抗网络的高渲染质量,并具有3DGS的灵活性和计算优势。通过训练一个解码器,该解码器将隐式的NERF表示形式映射到显式3D高斯脱落属性,我们可以将3D GAN的代表性多样性和质量整合到第一次的3D高斯裂口的生态系统中。此外,我们的方法允许使用3D高斯分裂场景进行高分辨率的GAN反转和实时GAN编辑。 ?纸|项目页面|代码
12. Gomavatar:使用网格高斯的单眼视频从单眼视频中有效的动画人类建模
作者:Jing Wen,小赵,中ZHEN REN,ALEXANDER G.SCHWING,王王
抽象的
我们介绍了Gomavatar,这是一种新颖的方法,用于实时,记忆,高质量的动画人类建模。 Gomavatar作为输入单眼视频来创建一个数字化头像,能够从新姿势中重新进行新姿势和实时渲染,同时与基于栅格化的图形管道无缝集成。我们方法的核心是高斯在网状表示形式中,这是一种混合3D模型,结合了高斯脱落的渲染质量和速度与几何形状建模和可变形网格的兼容性。我们在ZJU-MOCAP数据和各种YouTube视频上评估Gomavatar。 Gomavatar匹配或超过当前的单眼人类建模算法,使其在质量方面显着优于计算效率(43 fps),而记忆效率高(每位受试者为3.63 MB)。 ?纸|项目页面|代码
13。occgaussian:3d高斯分裂以遮挡人类渲染
作者:Jingrui Ye,Zongkai Zhang,Yujiao Jiang,Qingmin Liao,Wening Yang,Zongqing Lu
抽象的
从单眼视频中渲染动态3D人类对于虚拟现实和数字娱乐等各种应用至关重要。大多数方法都认为人们处于一个毫无疑问的场景中,而各种对象可能会在现实生活中引起身体部位的阻塞。以前的方法利用NERF进行表面渲染来恢复被遮挡的区域,但是它需要超过一天的训练和几秒钟才能渲染,因此无法满足实时交互式应用程序的要求。为了解决这些问题,我们提出了基于3D高斯裂缝的Occgaussian,可以在6分钟内训练,并产生高质量的人效果,最多可容纳160 fps,并具有遮挡的输入。 Occgaussian在规范空间中的3D高斯分布初始化,我们在遮挡区域进行遮挡特征查询,提取聚合的像素空位特征以补偿缺失的信息。然后,我们使用高斯特征MLP以及闭塞感知损失功能进一步处理该功能,以更好地感知被遮挡的区域。在模拟和现实世界中的广泛实验表明,与最先进的方法相比,我们的方法具有可比性甚至更高的性能。我们分别将训练和推理速度提高了250倍和800倍。 ?纸
14。[CVPR '24]猜测看不见的:Dynamic 3D场景重建局部2D瞥见
作者:Inhee Lee,Byungjun Kim,Hanbyul Joo
抽象的
在本文中,我们提出了一种从单眼视频输入中重建世界和3D中多个动态人类的方法。作为一个关键的想法,我们通过最近新兴的3D高斯脱落(3D-GS)表示,代表世界和多个人类,使他们可以方便有效地组成并将其渲染在一起。特别是,我们在3D人类重建中以严格有限且稀疏的观察到了这些方案,这是现实世界中遇到的常见挑战。为了应对这一挑战,我们引入了一种新颖的方法,通过融合公共空间中的稀疏线索来优化3D-GS代表,我们利用预先训练的2D扩散模型来综合看不见的视图,同时保持与一致性的一致性。观察到的2D出现。我们证明我们的方法可以在遮挡,图像作物,少量且极为稀疏的观测的情况下,在各种挑战性的例子中重建高质量的动画3D人类。重建后,我们的方法不仅能够在任意时间实例的任何新颖观点中呈现现场,而且还可以通过删除人类或为每个人应用不同的动作来编辑3D场景。通过各种实验,我们证明了我们方法在替代现有方法上的质量和效率。 ?纸|项目页面|代码
15。[Neurips '24]可推广和动画的高斯头像
作者:Xuangeng Chu,Tatsuya Harada
抽象的
在本文中,我们提出了可概括和动画的高斯头像阿凡达(Gagavatar),以进行一次性动画头部的头像重建。现有方法依赖于神经辐射场,从而导致大量渲染消耗和低重演速度。为了解决这些限制,我们在单个前向通过中从单个图像中生成3D高斯的参数。我们作品的关键创新是提出的双重指示方法,该方法产生了捕获身份和面部细节的高保真3D高斯人。此外,我们利用全球图像特征和3D形态模型来构建3D高斯人控制表达式。训练后,我们的模型可以在没有特定优化的情况下重建看不见的身份,并以实时的速度进行重演。实验表明,就重建质量和表达精度而言,我们的方法与以前的方法相比表现出色。我们认为,我们的方法可以为未来的研究和提高数字化身的应用建立新的基准。 ?纸|项目页面|代码
16。[siggraph asia'24]双重:鲁棒的双高斯分裂以沉浸式人体以人为中心的体积视频
作者:Yuheng Jiang,Zhehao Shen,Yu Hong,Chengcheng Guo,Yize Wu,Yingliang Zhang,Jingyi Yu,Lan Xu
抽象的
体积视频代表了视觉媒体的变革性进步,使用户能够自由地浏览沉浸式虚拟体验并缩小数字和现实世界之间的差距。但是,需要大量的手动干预来稳定网格序列,而在现有工作流程中产生过多的资产会阻碍更广泛的采用。在本文中,我们提出了一种新颖的基于高斯的方法,称为 textit {dualgs},以实时和高保真播放复杂的人类性能,具有出色的压缩比。我们在双重的主要思想是使用相应的皮肤和关节高斯分别表示运动和外观。这种明确的解开可以显着降低运动冗余并提高时间连贯性。我们首先在第一帧初始初始化双重和将皮肤高卢人锚定在关节高斯。随后,我们采用了逐个框架的人类绩效建模来采用粗线训练策略。它包括用于整体运动预测的粗对齐阶段,以及用于鲁棒跟踪和高保真渲染的细粒优化。为了将体积视频无缝地集成到VR环境中,我们使用熵编码和外观使用编解码器压缩有效地压缩运动,并与持久的代码簿相结合。我们的方法达到的压缩率最高为120次,每帧只需大约350kb的存储空间。我们通过VR耳机上的照片现实,免费观看体验来证明我们表示的功效,使用户能够在表演中沉浸式观看音乐家,并在表演者的指尖上感受到音符的节奏。 ?纸|项目页面| ?简短的演示|数据集
17。[Siggraph Asia'24] V^3:通过流式2D动态高斯观看手机上的体积视频
作者:Penghao Wang,Zhirui Zhang,Liao Wang,Kaixin Yao,Siyuan Xie,Jingyi Yu,Minye Wu,Lan Xu
抽象的
像2D视频一样无缝地体验高保真的体积视频是一个长期以来的梦想。但是,由于计算和带宽限制,当前动态3DGS方法尽管具有很高的渲染质量,但在移动设备上流式传输方面面临挑战。在本文中,我们介绍了v^3(观看体积视频),这是一种新颖的方法,可以通过动态高斯的流传输来实现高质量的移动渲染。我们的关键创新是将动态3DGS视为2D视频,从而促进了硬件视频编解码器的使用。此外,我们提出了一种两阶段的培训策略,以快速培训速度降低存储需求。第一阶段采用哈希编码和浅MLP来学习运动,然后通过修剪来减少高斯人的数量以满足流媒体要求,而第二阶段的微调则使用残留的熵损失和时间损失来改善时间连续性。这种策略可以消除运动和外观,并具有紧凑的存储要求,可保持高渲染质量。同时,我们设计了一个多平台播放器来解码和渲染2D高斯视频。广泛的实验证明了v^3的有效性,通过在公共设备上启用高质量的渲染和流,超过了其他方法,这是前所未有的。作为第一个在移动设备上流动动态高斯的人,我们的同伴播放器为用户提供了前所未有的体积视频体验,包括光滑的滚动和即时共享。我们的带有源代码的项目页面可在此HTTPS URL上找到。 ?纸|项目页面| ?简短的演示
2023 年:
1。可驱动的3D高斯化身
作者:Wojciech Zielonka,Timur Bagautdinov,Shunsuke Saito,MichaelZollhöfer,Justus Thies,Javier Romero
抽象的
我们推出了可驾驶 3D 高斯化身 (D3GA),这是第一个用高斯图形渲染的人体 3D 可控模型。当前逼真的可驾驶化身需要训练期间准确的 3D 配准,测试期间的密集输入图像,或两者兼而有之。基于神经辐射场的那些对于远程呈现应用来说也往往慢得令人望而却步。这项工作使用最近呈现的3D高斯脱落(3DGS)技术,以实时的帧速率使人类呈现逼真的人类,使用密集的校准多视频视频作为输入。为了变形这些原语,我们偏离了线性混合皮肤(LB)的常用点变形方法(LBS),并使用经典的量化变形方法:笼变形。鉴于它们的尺寸较小,我们使用关节角度和关键点驱动这些变形,这些变形更适合通信应用。在使用相同的训练和测试数据时,我们对具有各种身体形状,衣服和动作的九个受试者的实验获得了更高质量的结果。 ?纸|项目页面| ?简短的演示
2. splatarmor:来自单眼RGB视频的动画人类的铰接高斯碎片
作者:Rohit Jena,Ganesh Subramanian Iyer,Siddharth Choudhary,Brandon Smith,Pratik Chaudhari,James Gee
抽象的
我们提出了 SplatArmor,这是一种通过用 3D 高斯“装甲”参数化身体模型来恢复详细且可动画的人体模型的新颖方法。我们的方法将人类表示为规范空间内的一组 3D 高斯,其清晰度是通过将底层 SMPL 几何体的蒙皮扩展到规范空间中的任意位置来定义的。为了说明姿势依赖性效果,我们引入了SE(3)场,这使我们能够捕获高斯人的位置和各向异性。此外,我们建议使用神经色场来提供颜色正则化和3D监督,以确定这些高斯人的精确定位。我们表明,高斯的裂纹通过利用栅格化原始化来提供基于神经渲染方法的有趣替代方法,而无需面对这种方法通常面临的任何非差异性和优化挑战。栅格化范式使我们能够利用前向皮肤,并且不会遭受与逆向皮肤和翘曲相关的歧义。我们在ZJU MOCAP和人快照数据集上显示了令人信服的结果,这突显了我们方法可控制的人类合成的有效性。 ?纸|项目页面|代码(尚未)
3。[CVPR '24]动画高斯人:学习姿势依赖的高斯地图
作者:Zhe Li,Zerong Zheng,Lizhen Wang,Yebin Liu
抽象的
从RGB视频中对动画人体化身进行建模是一个长期且具有挑战性的问题。最近的工作通常采用基于MLP的神经辐射场(NERF)来代表3D人类,但是纯MLP仍然很难恢复依赖姿势的服装细节。为此,我们介绍了动画高斯,这是一种新的头像表示,利用强大的2D CNN和3D Gaussian脱落来创建高保真的化身。要将3D高斯人与动画化的化身相关联,我们从输入视频中学习一个参数模板,然后在两个前和背面的高斯映射上参数化模板,每个像素代表3D高斯。博学的模板适应穿着的衣服,以建模宽松的衣服,例如礼服。这种模板引导的2D参数化使我们能够采用强大的基于样式的CNN学习姿势依赖的高斯图,以建模详细的动态外观。此外,我们引入了一种姿势投影策略,以更好地概括新颖的姿势。总体而言,我们的方法可以创建具有动态,现实和普遍外观的栩栩如生的化身。实验表明,我们的方法表现优于其他最先进的方法。 ?纸|项目页面|代码
4。[CVPR '24] GART:高斯铰接模板模型
作者:Jiahui Lei,Yufu Wang,Georgios Pavlakos,Lingjie Liu,Kostas Daniilidis
抽象的
我们介绍了高斯铰接模板GART,这是一种明确,有效且表现力的表示,用于从单眼视频中捕获和渲染的非辅助表达的主题。 Gart利用移动3D高斯人的混合物明确近似可变形的受试者的几何形状和外观。它利用了具有可学习的前向皮肤的分类模型先验(SMPL,SMAL等)的优势,同时进一步推广到具有新型潜在骨骼的更复杂的非刚性变形。可以通过在几秒钟或几分钟内通过单眼视频从单眼视频中进行可区分的渲染来重建GART,并以新颖的姿势呈现比150fps快。 ?纸|项目页面|代码| ?简短的演示
5。[CVPR '24]人类高斯碎片:动画化身的实时渲染
作者:Arthur Moreau,Jifei Song,Helisa Dhamo,Richard Shaw,Yiren Zhou,EduardoPérez-Pellitero
抽象的
这项工作解决了从多视频视频中学到的感性人体化身实时渲染的问题。尽管经典的方法来建模和渲染虚拟人类通常使用纹理的网格,但最近的研究开发了具有令人印象深刻的视觉质量的神经体现。但是,这些模型很难实时渲染,并且当角色的身体姿势与训练观察结果不同时,它们的质量会降低。我们提出了一个基于3D高斯裂缝的动画人类模型,该模型最近成为神经辐射场的非常有效的替代品。身体在规范空间中以一组高斯原语表示,该典型空间以粗糙至细的方法变形,将前进性皮肤和局部非刚性细化结合在一起。我们描述了如何从多视图观察中以端到端的方式学习人类的高斯分裂(HUGS)模型,并根据新颖的姿势合成衣服的身体的最新方法对其进行评估。我们的方法可实现1.5 dB PSNR的改进,比Thuman4数据集上的最先进,同时能够实时渲染(512x512分辨率为80 fps)。 ?纸|项目页面| ?简短的演示
6。[CVPR '24]拥抱:人类高斯碎片
作者:穆罕默德·科卡巴斯(Muhammed Kocabas),Jen-Hao Rick Chang,James Gabriel,Oncel Tuzel,Anurag Ranjan
抽象的
神经渲染的最新进展通过数量级来改善培训和渲染时间。尽管这些方法证明了最先进的质量和速度,但它们是为静态场景的摄影测量设计而设计的,并且不能很好地推广到在环境中自由移动人类。在这项工作中,我们介绍了人类的高斯碎片(HUGS),它代表了动画人类的人类,并使用3D高斯分裂(3DGS)以及场景。我们的方法只拍摄了一个单眼视频,其中少数(50-100)帧,它自动学会在30分钟内解散静态场景,并在30分钟内解散完全动画的人类头像。我们利用SMPL身体模型来初始化人类高斯人。为了捕获未通过SMPL(例如布,头发)建模的细节,我们允许3D高斯人偏离人体模型。利用3D高斯人进行动画人类带来了新的挑战,包括在表达高斯人时创建的文物。我们建议共同优化线性混合肤色重量,以协调动画过程中单个高斯人的运动。我们的方法使人类和现场的新观点综合了新颖的人类和新颖观点的综合。我们以60 fps的渲染速度实现了最先进的渲染质量,而在以前的工作中训练的速度更快约100倍。 ?纸|项目页面|代码(尚未)
7。[CVPR '24]高斯壳图3D人类一代
作者:Rameen Abdal,Wang Yifan,Zifan Shi,Yinghao Xu,Ryan Po,Zhengfei Kuang,Qifeng Chen,Dit-Yan Yeung,Gordon Wetzstein
抽象的
在包括虚拟现实,社交媒体和电影制作在内的多个行业中,有效产生3D数字人类很重要。 3D生成对抗网络(GAN)已证明了生成资产的最先进(SOTA)质量和多样性。但是,当前的3D GAN体系结构通常依赖于渲染缓慢的音量表示,从而妨碍了GAN训练并需要多视图的2D UPS采样器。在这里,我们将高斯壳图(GSM)介绍为一个框架,该框架将SOTA发电机网络体系结构与新兴的3D高斯渲染原始构件连接起来,并使用基于多个壳牌的脚手架。在这种情况下,CNN生成一个3D纹理堆栈,其功能映射到外壳。后者表示数字人体姿势中数字人的模板表面的膨胀和放气版本。我们没有直接对外壳进行栅格化,而是在质地特征中编码的属性上的外壳上采样了3D高斯人。这些高斯人有效地呈现。在GAN训练期间,在推断时,将壳阐明的能力很重要,将身体变形为任意的用户定义姿势。我们的有效渲染方案绕过了对观看式UPSMPLER的需求,并以512×512像素的天然分辨率达到了高质量的多视图一致渲染。我们证明,GSMS在单视数据集(包括SHHQ和DeepFashion)上接受培训时成功生成3D人类。 ?纸|项目页面|代码
8。高斯黑德:具有可学习高斯派生的高保真头像
作者:Jie Wang,Jiu-Cheng Xie,Xianyan Li,Chi-Man Pun,Feng Xu,Hao Gao
抽象的
为给定的主题构建生动的3D头像并意识到它们的一系列动画是有价值但充满挑战的。本文介绍了高斯黑德(Gaussianhead),它用各向异性的3D高斯(Gaussians)进行了行动的人头。在我们的框架中,分别构建了运动变形场和多分辨率三平面,以处理头部的动态几何形状和复杂的纹理。值得注意的是,我们在每个高斯上强加了一个独家推导方案,该方案通过一组可学习的参数来生成其多个Doppelgangers,以用于位置转换。通过这种设计,我们可以紧凑,准确地编码高斯人的外观信息,即使是那些将头部特定组成部分与复杂结构相吻合的人。此外,采用了新添加的高斯人的继承派生策略来促进培训加速。广泛的实验表明,我们的方法可以产生高保真效果,在重建,跨认证重新制定和新颖的视图综合任务方面的表现优于最先进的方法。 ?纸|项目页面|代码
9。[CVPR '24]高斯瓦塔塔:逼真的头像带有3D高斯
作者:Shenhan Qian,Tobias Kirschstein,Liam Schoneveld,Davide Davoli,Simon Giebenhain,MatthiasNießner
抽象的
我们介绍了高斯瓦塔尔(Gaussianavatars),这是一种创建具有表达,姿势和观点方面完全可控制的影照相头像的新方法。核心思想是基于3D高斯夹层的动态3D表示,该表示索具可用于参数可变形的面部模型。这种组合有助于逼真的渲染,同时通过基础参数模型,例如,通过从驱动序列或手动更改可变形模型参数来进行精确的动画控制。我们通过三角形的局部坐标框架对每个SPLAT进行参数化,并优化以显式位移偏移以获得更准确的几何表示。在阿凡达重建过程中,我们以端到端的方式共同针对可变形的模型参数和高斯SPLAT参数优化。我们在几种具有挑战性的情况下展示了我们影子化的化身的动画功能。例如,我们显示了一个驾驶视频中的重演,我们的方法以大幅度的余量优于现有作品。 ?纸|项目页面|代码| ?简短的演示
10。[CVPR '24] GPS-GAUSSIAN:可推广的像素3D高斯分裂,用于实时人类小说综合
作者:Shunyuan Zheng,Boyao Zhou,Ruizhi Shao,Boning Liu,Shengping Zhang,Liqiang Nie,Yebin Liu
抽象的
我们提出了一种称为 GPS-高斯的新方法,用于实时合成角色的新颖视图。所提出的方法能够在稀疏视图相机设置下实现 2K 分辨率渲染。与需要针对每个主题进行优化的原始高斯泼溅或神经隐式渲染方法不同,我们引入了在源视图上定义的高斯参数图,并直接回归高斯泼溅属性,以实现即时新颖的视图合成,而无需任何微调或优化。为此,我们在大量人体扫描数据上训练高斯参数回归模块,并结合深度估计模块将 2D 参数图提升到 3D 空间。所提出的框架是完全可区分的,并且在几个数据集上进行了实验表明,我们的方法在实现超出渲染速度的同时优于最先进的方法。 ?纸|项目页面|代码| ?简短的演示
11。高曼:从单眼人类视频中铰接的高斯分裂
作者:Shoukang Hu Ziwei Liu
抽象的
与现有的基于NERF的隐性表示框架相比每帧渲染。具体而言,高人类编码在规范空间中的高斯碎片,并将3D高斯人从规范空间转变为带有线性混合皮肤(lbs)的姿势空间,其中有效的姿势和lbs细化模块旨在学习可忽略的计算成本下3D人类的细节。此外,为了快速优化高人类,我们以3D人的先验初始化和修剪3D高斯人,同时通过KL Divergence指导进行分裂/克隆,以及一个新颖的合并操作,以进一步加快加速。关于ZJU_MOCAP和MONOCAP数据集的广泛实验表明,Gauhuman通过快速训练和实时渲染速度在定量上和质量上实现最先进的性能。值得注意的是,如果不牺牲渲染质量,高人类就可以用〜13k 3d高斯人的3D人类表演者快速建模。 ?纸|项目页面|代码| ?简短的演示
12。
作者:Helisa Dhamo,Yinyu Nie,Arthur Moreau,Jifei Song,Richard Shaw,Yiren Zhou,EduardoPérez-Pellitero
抽象的
在过去的几年中,3D Head Animation在质量和运行时的改进都进行了重大质量和运行时,尤其是在可区分渲染和神经辐射领域的进步方面赋予了能力。实时渲染是真实应用程序的高度理想目标。我们提出了HeadGas,这是第一个使用3D高斯夹(3DGS)进行3D头部重建和动画的模型。在本文中,我们介绍了一个混合模型,该模型从具有可学习潜在特征的基础的3DG扩展了显式表示,可以将其与参数头模型的低维参数线性融合,以获得与表达相关的最终颜色和不透明度值。我们证明,HeadGas提供最先进的导致实时推理框架速率,这使基准超过〜2DB,同时将渲染速度加速超过X10。 ?纸
13。[CVPR '24] HIFI4G:高保真人类的性能通过紧凑的高斯脱落
作者:Yuheng Jiang,Zhehao Shen,Penghao Wang,Zhuo Su,Yu Hong,Yingliang Zhang,Jingyi Yu,Lan Xu
抽象的
最近,我们看到了照片真实的人类建模和渲染方面的巨大进展。然而,有效地呈现现实的人类绩效并将其集成到栅格化管道中仍然具有挑战性。在本文中,我们提出了HIFI4G,这是一种基于高坚实的高斯基于高卢斯的方法,可从密集的素材中呈现高保真性的人类绩效。我们的核心直觉是将3D高斯表示与非刚性跟踪结合,实现紧凑而适合压缩的表示。我们首先提出了一种双颗粒机制来获得运动先验,并具有粗变形图,用于有效初始化和细颗粒的高斯图,以强制执行后续约束。然后,我们利用具有自适应时空正则化剂的4D高斯优化方案,以有效地平衡非刚性先验和高斯更新。我们还提出了一个伴随压缩方案,并为各种平台上的沉浸式体验提供残留补偿。它达到了大约25倍的大量压缩率,每个框架的存储空间少于2MB。广泛的实验证明了我们方法的有效性,这在优化速度,渲染质量和开销方面极大地超过了现有方法。 ?纸|项目页面| ?简短的演示|数据集
14。[CVPR '24] Gaussianavatar:通过动画3D高斯人从单个视频中迈向现实的人类头像建模
作者:Liangxiao Hu,Hongwen Zhang,Yuxiang Zhang,Boyao Zhou,Boning Liu,Shengping Zhang,Liqiang nie
抽象的
我们推出 GaussianAvatar,这是一种从单个视频创建具有动态 3D 外观的逼真人类头像的有效方法。我们首先引入可动画化的 3D 高斯函数来明确表示各种姿势和服装风格的人类。这种明确且可动画化的表示可以更有效、更一致地融合 2D 观察中的 3D 外观。我们的表示进一步增强了动态属性,以支持依赖姿势的外观建模,其中动态外观网络和可优化的特征张量被设计为学习运动到外观的映射。此外,通过利用可区分的运动条件,我们的方法可以在阿凡达建模过程中对运动和外观进行联合优化,这有助于解决单眼环境中长期存在的运动估计的长期问题。 GaussianAvatar的功效在公共数据集和我们收集的数据集上都得到了验证,证明了其在外观质量和渲染效率方面的优越性能。 ?纸|项目页面|代码| ?简短的演示
15。[CVPR '24] Flashavatar:高保真头像具有高效的高斯嵌入
Authors : Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang
抽象的
We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. ?纸|项目页面|代码
16. [CVPR '24] Relightable Gaussian Codec Avatars
Authors : Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam
抽象的
重新照明的保真度受到几何和外观表示的限制。 For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with spatially all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars. ?纸|项目页面
17. MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar
Authors : Yufan Chen, Lizhen Wang, Qijing Li, Hongjiang Xiao, Shengping Zhang, Hongxun Yao, Yebin Liu
抽象的
The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods. ?纸|项目页面| Code (not yet) | ? Short Presentation
18. [CVPR '24] ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Authors : Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann
抽象的
实时渲染逼真且可控的人类头像是计算机视觉和图形学的基石。 While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods. ?纸|项目页面| Code (not yet) | ? Short Presentation
19. [CVPR '24] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Authors : Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang
抽象的
We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively. ?纸|项目页面|代码| ? Short Presentation
20. [CVPR '24] GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Authors : Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal
抽象的
Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (eg, flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (eg, colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution. ?纸|项目页面| ? Short Presentation
21. Deformable 3D Gaussian Splatting for Animatable Human Avatars
Authors : HyunJun Jung, Nikolas Brasch, Jifei Song, Eduardo Perez-Pellitero, Yiren Zhou, Zhihao Li, Nassir Navab, Benjamin Busam
抽象的
Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually. ?纸
22. Human101: Training 100+FPS Human Gaussians in 100s from 1 View
Authors : Mingwei Li, Jiachen Tao, Zongxin Yang, Yi Yang
抽象的
Reconstructing the human body from single-view videos plays a pivotal role in the virtual reality domain. One prevalent application scenario necessitates the rapid reconstruction of high-fidelity 3D digital humans while simultaneously ensuring real-time rendering and interaction. Existing methods often struggle to fulfill both requirements. In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS. Our method leverages the strengths of 3D Gaussian Splatting, which provides an explicit and efficient representation of 3D humans. Standing apart from prior NeRF-based pipelines, Human101 ingeniously applies a Human-centric Forward Gaussian Animation method to deform the parameters of 3D Gaussians, thereby enhancing rendering speed (ie, rendering 1024-resolution images at an impressive 60+ FPS and rendering 512-resolution images at 100+ FPS). Experimental results indicate that our approach substantially eclipses current methods, clocking up to a 10 times surge in frames per second and delivering comparable or superior rendering quality. ?纸|项目页面| Code (not yet)
23. [CVPR '24] Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Authors : Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, Yebin Liu
抽象的
Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions. ?纸|项目页面| |代码| ? Short Presentation
24. HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
Authors : Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu
抽象的
Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat that predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis. Project page: https://humansplat.github.io/. ?纸|项目页面
Classic work:
1. A Generalization of Algebraic Surface Drawing
Authors : James F. Blinn
Comment: : First paper rendering 3D gaussians.
抽象的
The mathematical description of three-dimensional surfaces usually falls into one of two classifications: parametric and implicit. An implicit surface is defined to be all points which satisfy some equation F (x, y, z) = 0. This form is ideally suited for image space shaded picture drawing; the pixel coordinates are substituted for x and y, and the equation is solved for z. Algorithms for drawing such objects have been developed primarily for first- and second-order polynomial functions, a subcategory known as algebraic surfaces. This paper presents a new algorithm applicable to other functional forms, in particular to the summation of several Gaussian density distributions. The algorithm was created to model electron density maps of molecular structures, but it can be used for other artistically interesting shapes. ?纸
2. Approximate Differentiable Rendering with Algebraic Surfaces
Authors : Leonid Keselman and Martial Hebert
Comment: : First paper to do differentiable rendering optimization of 3D gaussians.
抽象的
Differentiable renderers provide a direct mathematical link between an object's 3D representation and images of that object. In this work, we develop an approximate differentiable renderer for a compact, interpretable representation, which we call Fuzzy Metaballs. Our approximate renderer focuses on rendering shapes via depth maps and silhouettes. It sacrifices fidelity for utility, producing fast runtimes and high-quality gradient information that can be used to solve vision tasks. Compared to mesh-based differentiable renderers, our method has forward passes that are 5x faster and backwards passes that are 30x faster. The depth maps and silhouette images generated by our method are smooth and defined everywhere. In our evaluation of differentiable renderers for pose estimation, we show that our method is the only one comparable to classic techniques. In shape from silhouette, our method performs well using only gradient descent and a per-pixel loss, without any surrogate losses or regularization. These reconstructions work well even on natural video sequences with segmentation artifacts. ?纸|项目页面|代码| ? Short Presentation
3. Unbiased Gradient Estimation for Differentiable Surface Splatting via Poisson Sampling
Authors : Jan U. Müller, Michael Weinmann, Reinhard Klein
Comment: Builds 2D screen-space gaussians from underlying 3D representations.
抽象的
We propose an efficient and GPU-accelerated sampling framework which enables unbiased gradient approximation for differentiable point cloud rendering based on surface splatting. Our framework models the contribution of a point to the rendered image as a probability distribution. We derive an unbiased approximative gradient for the rendering function within this model. To efficiently evaluate the proposed sample estimate, we introduce a tree-based data-structure which employs multi-pole methods to draw samples in near linear time. Our gradient estimator allows us to avoid regularization required by previous methods, leading to a more faithful shape recovery from images. Furthermore, we validate that these improvements are applicable to real-world applications by refining the camera poses and point cloud obtained from a real-time SLAM system. Finally, employing our framework in a neural rendering setting optimizes both the point cloud and network parameters, highlighting the framework's ability to enhance data driven approaches. ?纸质代码
4. Generating and Real-Time Rendering of Clouds
Authors : Petr Man
Comment: Splatting of anisotropic gaussians. Basically a non-differentiable implementation of 3DGS.
抽象的
This paper presents a method for generation and real-time rendering of static clouds. Perlin noise function generates three dimensional map of a cloud. We also present a twopass rendering algorithm that performs physically based approximation. In the first preprocessed phase it computes multiple forward scattering. In the second phase first order anisotropic scattering at runtime is evaluated. The generated map is stored as voxels and is unsuitable for the real-time rendering. We introduce a more suitable inner representation of cloud that approximates the original map and contains much less information. The cloud is then represented by a set of metaballs (spheres) with parameters such as center positions, radii and density values. The main contribution of this paper is to propose a method, that transforms the original cloud map to the inner representation. This method uses the Radial Basis Function (RBF) neural network. ?纸
压缩:
2024 年:
1. [I3D '24] Reducing the Memory Footprint of 3D Gaussian Splatting
Authors : Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, George Drettakis
抽象的
3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and realtime rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolutionaware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a ×27 reduction in overall size on disk on the standard datasets we tested, along with a x1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device (see Fig. 1). ?纸|项目页面| Code (not yet)
2. [CVPR '24] Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Authors : Simon Niedermayr, Josef Stumpfegger, Rüdiger Westermann
抽象的
Recently, high-fidelity scene reconstruction with an optimized 3D Gaussian splat representation has been introduced for novel view synthesis from sparse image sets. Making such representations suitable for applications like network streaming and rendering on low-power devices requires significantly reduced memory consumption as well as improved rendering efficiency. We propose a compressed 3D Gaussian splat representation that utilizes sensitivity-aware vector clustering with quantization-aware training to compress directional colors and Gaussian parameters. The learned codebooks have low bitrates and achieve a compression rate of up to 31× on real-world scenes with only minimal degradation of visual quality. We demonstrate that the compressed splat representation can be efficiently rendered with hardware rasterization on lightweight GPUs at up to 4× higher framerates than reported via an optimized GPU compute pipeline. Extensive experiments across multiple datasets demonstrate the robustness and rendering speed of the proposed approach. ?纸|项目页面|代码
3. HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Authors : Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin
抽象的
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over 75× compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over 11× size reduction over SOTA 3DGS compression approach Scaffold-GS. ?纸|项目页面|代码
4. [ECCV '24] End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Authors : Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen
抽象的
3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible and continuous rate control. RDO-Gaussian addresses two main issues that exist in current schemes: 1) Different from prior endeavors that minimize the rate under the fixed distortion, we introduce dynamic pruning and entropy-constrained vector quantization (ECVQ) that optimize the rate and distortion at the same时间。 2) Previous works treat the colors of each Gaussian equally, while we model the colors of different regions and materials with learnable numbers of parameters. We verify our method on both real and synthetic scenes, showcasing that RDO-Gaussian greatly reduces the size of 3D Gaussian over 40×, and surpasses existing methods in rate-distortion performance. ?纸|项目页面|代码
5. 3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods
Authors : Milena T. Bagdasarian, Paul Knoll, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern
抽象的
We present a work-in-progress survey on 3D Gaussian Splatting compression methods, focusing on their statistical performance across various benchmarks. This survey aims to facilitate comparability by summarizing key statistics of different compression approaches in a tabulated format. The datasets evaluated include TanksAndTemples, MipNeRF360, DeepBlending, and SyntheticNeRF. For each method, we report the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and the resultant size in megabytes (MB), as provided by the respective authors 。 This is an ongoing, open project, and we invite contributions from the research community as GitHub issues or pull requests. Please visit http://wm.github.io/3dgs-compression-survey/ for more information and a sortable version of the table. ?纸|项目页面
6. LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming
Authors : Yuang Shi, Simone Gasparini, Géraldine Morin, Wei Tsang Ooi,
抽象的
The rise of Extended Reality (XR) requires efficient streaming of 3D online worlds, challenging current 3DGS representations to adapt to bandwidth-constrained environments. We propose LapisGS, a layered 3DGS that supports adaptive streaming and progressive rendering. Our method constructs a layered structure for cumulative representation, incorporates dynamic opacity optimization to maintain visual fidelity, and utilizes occupancy maps to efficiently manage Gaussian splats. This proposed model offers a progressive representation supporting a continuous rendering quality adapted for bandwidth-aware streaming. Extensive experiments validate the effectiveness of our approach in balancing visual fidelity with the compactness of the model, with up to 50.71% improvement in SSIM, 286.53% improvement in LPIPS, and 318.41% reduction in model size, and shows its potential for bandwidth-adapted 3D streaming and rendering applications. ?纸|项目页面
7. Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation
Authors : Minye Wu, Tinne Tuytelaars
抽象的
Recent advancements in photo-realistic novel view synthesis have been significantly driven by Gaussian Splatting (3DGS). Nevertheless, the explicit nature of 3DGS data entails considerable storage requirements, highlighting a pressing need for more efficient data representations. To address this, we present Implicit Gaussian Splatting (IGS), an innovative hybrid model that integrates explicit point clouds with implicit feature embeddings through a multi-level tri-plane architecture. This architecture features 2D feature grids at various resolutions across different levels, facilitating continuous spatial domain representation and enhancing spatial correlations among Gaussian primitives. Building upon this foundation, we introduce a level-based progressive training scheme, which incorporates explicit spatial regularization. This method capitalizes on spatial correlations to enhance both the rendering quality and the compactness of the IGS representation. Furthermore, we propose a novel compression pipeline tailored for both point clouds and 2D feature grids, considering the entropy variations across different levels. Extensive experimental evaluations demonstrate that our algorithm can deliver high-quality rendering using only a few MBs, effectively balancing storage efficiency and rendering fidelity, and yielding results that are competitive with the state-of-the-art. ?纸|代码
2023 年:
1. LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS
Authors : Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
抽象的
Recent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting come with a substantial storage overhead caused by growing the SfM points to millions, often demanding gigabyte-level disk space for a single unbounded scene, posing significant scalability challenges and hindering the splatting efficiency. To address this challenge, we introduce LightGaussian, a novel method designed to transform 3D Gaussians into a more efficient and compact format. Drawing inspiration from the concept of Network Pruning, LightGaussian identifies Gaussians that are insignificant in contributing to the scene reconstruction and adopts a pruning and recovery process, effectively reducing redundancy in Gaussian counts while preserving visual effects. Additionally, LightGaussian employs distillation and pseudo-view augmentation to distill spherical harmonics to a lower degree, allowing knowledge transfer to more compact representations while maintaining reflectance. Furthermore, we propose a hybrid scheme, VecTree Quantization, to quantize all attributes, resulting in lower bitwidth representations with minimal accuracy losses. In summary, LightGaussian achieves an averaged compression rate over 15x while boosting the FPS from 139 to 215, enabling an efficient representation of complex scenes on Mip-NeRF 360, Tank and Temple datasets. ?纸|项目页面|代码| ? Short Presentation
2. Compact3D: Compressing Gaussian Splat Radiance Field Models with Vector Quantization
Authors : KL Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash
抽象的
3D Gaussian Splatting is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with a drawback in the much larger storage demand compared to NeRF methods since it needs to store the parameters for several 3D Gaussians. We notice that many Gaussians may share similar parameters, so we introduce a simple vector quantization method based on kmeans algorithm to quantize the Gaussian parameters. Then, we store the small codebook along with the index of the code for each Gaussian. Moreover, we compress the indices further by sorting them and using a method similar to run-length encoding. We do extensive experiments on standard benchmarks as well as a new benchmark which is an order of magnitude larger than the standard benchmarks. We show that our simple yet effective method can reduce the storage cost for the original 3D Gaussian Splatting method by a factor of almost 20× with a very small drop in the quality of rendered images. ?纸|代码
3. [CVPR '24] Compact 3D Gaussian Representation for Radiance Field
Authors : Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park
抽象的
Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. In our extensive experiments, we consistently show over 10× reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. ?纸|项目页面|代码
4. [ECCV '24] Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Authors : Wieland Morgenstern, Florian Barthel, Anna Hilsmann, Peter Eisert
抽象的
3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it utilizes efficient rasterization allowing for very fast rendering at high-quality. However, the storage size is significantly higher, which hinders practical deployment, eg on resource constrained devices. In this paper, we introduce a compact scene representation organizing the parameters of 3D Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a drastic reduction in storage requirements without compromising visual quality during rendering. Central to our idea is the explicit exploitation of perceptual redundancies present in natural scenes. In essence, the inherent nature of a scene allows for numerous permutations of Gaussian parameters to equivalently represent it. To this end, we propose a novel highly parallel algorithm that regularly arranges the high-dimensional Gaussian parameters into a 2D grid while preserving their neighborhood structure. During training, we further enforce local smoothness between the sorted parameters in the grid. The uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless integration with established renderers. Our method achieves a reduction factor of 17x to 42x in size for complex scenes with no increase in training time, marking a substantial leap forward in the domain of 3D scene distribution and consumption. ?纸|项目页面|代码
扩散:
2024 年:
1. AGG: Amortized Generative 3D Gaussians for Single Image to 3D
Authors : Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat
抽象的
Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster. ?纸| Project Page| ? Short Presentation
2. Fast Dynamic 3D Object Generation from a Single-view Video
Authors : Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang
抽象的
Generating dynamic three-dimensional (3D) object from a single-view video is challenging due to the lack of 4D labeled data. Existing methods extend text-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling, but they are slow and expensive to scale (eg, 150 minutes per object) due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this limitation, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly train a novel 4D Gaussian splatting model with explicit point cloud geometry, enabling real-time rendering under continuous camera trajectories. Extensive experiments on synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the same level of innovative view synthesis quality. For example, Efficient4D takes only 14 minutes to model a dynamic object. ?纸|项目页面|代码| ? Short Presentation
3. GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
抽象的
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2)由于视图覆盖不足,部分省略或高度压缩对象信息。 To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation.然后,我们基于扩散模型构建高斯修复模型来补充遗漏的对象信息,其中高斯被进一步细化。我们设计了一种自生成策略来获取图像对来训练修复模型。 Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods. ?纸|项目页面|代码| ? Short Presentation
4.LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Authors : Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-view Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: (1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. (2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation. ?纸|项目页面|代码
5. GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Authors : Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
抽象的
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an object-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene 。 Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. ?纸|项目页面| Code (not yet)
6. IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
Authors : Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
抽象的
Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In this paper, we further explore the design space of text-to-3D models. We significantly improve multi-view generation by considering video instead of image generators. Combined with a 3D reconstruction algorithm which, by using Gaussian splatting, can optimize a robust image-based loss, we directly produce high-quality 3D outputs from the generated views. Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x, resulting in a much more efficient pipeline, better quality, fewer geometric inconsistencies, and higher yield of usable 3D assets. ?纸
7. Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Authors : Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
抽象的
虽然文本到 3D 和图像到 3D 生成任务受到了相当多的关注,但它们之间的一个重要但尚未充分探索的领域是可控文本到 3D 生成,我们在这项工作中主要关注这一点。 To address this task, 1) we introduce ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps.我们的创新在于引入了一个调节模块,该模块使用局部和全局嵌入来控制基础扩散模型,这些嵌入是根据输入条件图像和相机姿势计算得出的。经过训练后,MVControl 能够为基于优化的 3D 生成提供 3D 扩散指导。 And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm.基于我们的 MVControl 架构,我们采用独特的混合扩散引导方法来指导优化过程。为了追求效率,我们采用 3D 高斯作为表示,而不是常用的隐式表示。我们还率先使用 SuGaR,这是一种将高斯函数绑定到网格三角形面的混合表示形式。这种方法缓解了 3D 高斯几何形状较差的问题,并能够在网格上直接雕刻细粒度几何形状。大量实验表明,我们的方法实现了稳健的泛化,并能够可控地生成高质量的 3D 内容。 ?纸|项目页面|代码
8. Hyper-3DG:Text-to-3D Gaussian Generation via Hypergraph
Authors : Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gao
抽象的
Text-to-3D generation represents an exciting field that has seen rapid advancements, facilitating the transformation of textual descriptions into detailed 3D models. However, current progress often neglects the intricate high-order correlation of geometry and texture within 3D objects, leading to challenges such as over-smoothness, over-saturation and the Janus problem. In this work, we propose a method named ``3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects. Our framework is anchored by a well-established mainflow and an essential module, named ``Geometry and Texture Hypergraph Refiner (HGRefiner)''. This module not only refines the representation of 3D Gaussians but also accelerates the update process of these 3D Gaussians by conducting the Patch-3DGS Hypergraph Learning on both explicit attributes and latent visual features. Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation. Extensive experimentation has shown that our proposed method significantly enhances the quality of 3D generation while incurring no additional computational overhead for the underlying framework. ?纸| Code (not yet)
9. DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Authors : Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik Hang Lee, Pengyuan Zhou
抽象的
Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors, increasingly capturing the attention of both academic and industry circles. Despite significant progress, current methods still struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework that leverages Formation Pattern Sampling (FPS) for core structuring, augmented with a strategic camera sampling and supported by holistic object-environment integration to overcome这些障碍。 FPS, guided by the formation patterns of 3D objects, employs multi-timesteps sampling to quickly form semantically rich, high-quality representations, uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. The camera sampling strategy incorporates a progressive three-stage approach, specifically designed for both indoor and outdoor settings, to effectively ensure scene-wide 3D consistency. DreamScene enhances scene editing flexibility by combining objects and environments, enabling targeted adjustments. Extensive experiments showcase DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. ?纸|项目页面| Code (not yet)
10. FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
Authors : Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang
抽象的
Reconstructing detailed 3D objects from single-view images remains a challenging task due to the limited information available. In this paper, we introduce FDGaussian, a novel two-stage framework for single-image 3D reconstruction. Recent methods typically utilize pre-trained 2D diffusion models to generate plausible novel views from the input image, yet they encounter issues with either multi-view inconsistency or lack of geometric fidelity. To overcome these challenges, we propose an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, enabling the generation of consistent multi-view images. Moreover, we further accelerate the state-of-the-art Gaussian Splatting incorporating epipolar attention to fuse images from different viewpoints. We demonstrate that FDGaussian generates images with high consistency across different views and reconstructs high-quality 3D objects, both qualitatively and quantitatively. ?纸|项目页面
11. BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors
Authors : Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, Baoquan Chen
抽象的
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods. ?纸
12. BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis
Authors : Lutao Jiang, Lin Wang
抽象的
Text-to-3D synthesis has recently seen intriguing advances by combining the text-to-image models with 3D representation methods, eg, Gaussian Splatting (GS), via Score Distillation Sampling (SDS). However, a hurdle of existing methods is the low efficiency, per-prompt optimization for a single 3D object. Therefore, it is imperative for a paradigm shift from per-prompt optimization to one-stage generation for any unseen text prompts, which yet remains challenging. A hurdle is how to directly generate a set of millions of 3D Gaussians to represent a 3D object. This paper presents BrightDreamer, an end-to-end single-stage approach that can achieve generalizable and fast (77 ms) text-to-3D generation. Our key idea is to formulate the generation process as estimating the 3D deformation from an anchor shape with predefined positions. For this, we first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions, used as the centers (one attribute) of 3D Gaussians. To estimate the other four attributes (ie, scaling, rotation, opacity, and SH coefficient), we then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object. The center of each Gaussian enables us to transform the triplane feature into the four attributes. The generated 3D Gaussians can be finally rendered at 705 frames per second. Extensive experiments demonstrate the superiority of our method over existing methods. Also, BrightDreamer possesses a strong semantic understanding capability even for complex text prompts. ?纸|项目页面|代码
13. GVGEN: Text-to-3D Generation with Volumetric Representation
Authors : Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He
抽象的
In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed (∼7 seconds), effectively striking a balance between quality and efficiency. ?纸|项目页面| Code (not yet)
14. SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
Authors : Jaihoon Kim, Juil Koo, Kyeongmin Yeo, Minhyuk Sung
抽象的
We introduce a general framework for generating diverse visual content, including ambiguous images, panorama images, mesh textures, and Gaussian splat textures, by synchronizing multiple diffusion processes. We present exhaustive investigation into all possible scenarios for synchronizing multiple diffusion processes through a canonical space and analyze their characteristics across applications. In doing so, we reveal a previously unexplored case: averaging the outputs of Tweedie's formula while conducting denoising in multiple instance spaces. This case also provides the best quality with the widest applicability to downstream tasks. We name this case SyncTweedies. In our experiments generating visual content aforementioned, we demonstrate the superior quality of generation by SyncTweedies compared to other synchronization methods, optimization-based and iterative-update-based methods. ?纸|项目页面| Code (not yet)
15. STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Authors : Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao
抽象的
Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from 3D generation techniques, we utilize a multi-view diffusion model to initialize multi-view images anchoring on the input video frames, where the video can be either real-world captured or generated by a video diffusion model. To ensure the temporal consistency of the multi-view sequence initialization, we introduce a simple yet effective fusion strategy to leverage the first frame as a temporal anchor in the self-attention computation. With the almost consistent multi-view sequences, we then apply the score distillation sampling to optimize the 4D Gaussian point cloud. The 4D Gaussian spatting is specially crafted for the generation task, where an adaptive densification strategy is proposed to mitigate the unstable Gaussian gradient for robust optimization. Notably, the proposed pipeline does not require any pre-training or fine-tuning of diffusion networks, offering a more accessible and practical solution for the 4D generation task. Extensive experiments demonstrate that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video. ?纸|项目页面|代码| ? Short Presentation
16. Comp4D: LLM-Guided Compositional 4D Scene Generation
Authors : Dejia Xu, Hanwen Liang, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Plataniotis, Zhangyang Wang
抽象的
Recent advancements in diffusion models for 2D and 3D content creation have sparked a surge of interest in generating 4D content. However, the scarcity of 3D scene datasets constrains current methodologies to primarily object-centric generation. To overcome this limitation, we present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Utilizing Large Language Models (LLMs), the framework begins by decomposing an input text prompt into distinct entities and maps out their trajectories. It then constructs the compositional 4D scene by accurately positioning these objects along their designated paths. To refine the scene, our method employs a compositional score distillation technique guided by the pre-defined trajectories, utilizing pre-trained diffusion models across text-to-image, text-to-video, and text-to-3D domains. Extensive experiments demonstrate our outstanding 4D content creation capability compared to prior arts, showcasing superior visual quality, motion fidelity, and enhanced object interactions. ?纸|项目页面| Code (not yet) | ? Short Presentation
17. DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Authors : Yuanze Lin, Ronald Clark, Philip Torr
抽象的
We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions. ?纸|项目页面| Code (not yet)
18. SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Authors : Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai
抽象的
Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions. ?纸|项目页面| Code (not yet) | ? Short Presentation
19. Hash3D: Training-free Acceleration for 3D Generation
Authors : Xingyi Yang, Xinchao Wang
抽象的
The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models, demonstrate Hash3D's versatility to speed up optimization, enhancing efficiency by 1.3 to 4 times. Additionally, Hash3D's integration with 3D Gaussian splatting largely speeds up 3D model creation, reducing text-to-3D processing to about 10 minutes and image-to-3D conversion to roughly 30 seconds. ?纸|项目页面|代码
20. Zero-shot Point Cloud Completion Via 2D Priors
Authors : Tianxin Huang, Zhiwen Yan, Yuyang Zhao, Gim Hee Lee
抽象的
3D point cloud completion is designed to recover complete shapes from partially observed point clouds. Conventional completion methods typically depend on extensive point cloud data for training %, with their effectiveness often constrained to object categories similar to those seen during training. In contrast, we propose a zero-shot framework aimed at completing partially observed point clouds across any unseen categories. Leveraging point rendering via Gaussian Splatting, we develop techniques of Point Cloud Colorization and Zero-shot Fractal Completion that utilize 2D priors from pre-trained diffusion models to infer missing regions. Experimental results on both synthetic and real-world scanned point clouds demonstrate that our approach outperforms existing methods in completing a variety of objects without any requirement for specific training data. ?纸
21. [ECCV '24] DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Authors : Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi
抽象的
对虚拟现实应用程序日益增长的需求凸显了制作沉浸式 3D 资产的重要性。 We present a text-to-3D 360∘ scene generation pipeline that facilitates the creation of comprehensive 360∘ scenes for in-the-wild environments in a matter of minutes.我们的方法利用 2D 扩散模型的生成能力并迅速自我完善,以创建高质量且全局一致的全景图像。 This image acts as a preliminary "flat" (2D) scene representation.随后,它被提升为 3D 高斯,采用喷射技术来实现实时探索。为了产生一致的 3D 几何形状,我们的管道通过将 2D 单目深度对齐到全局优化的点云来构造空间相干结构。该点云用作 3D 高斯质心的初始状态。为了解决单视图输入中固有的隐形问题,我们对合成和输入相机视图施加语义和几何约束作为正则化。这些指导高斯的优化,帮助重建未见过的区域。 In summary, our method offers a globally consistent 3D scene within a 360∘ perspective, providing an enhanced immersive experience over existing techniques. ?纸|项目页面|代码| ? Short Presentation
22. RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
Authors : Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi
抽象的
我们介绍 RealmDreamer,这是一种根据文本描述生成通用前向 3D 场景的技术。我们的技术优化了 3D 高斯泼溅表示以匹配复杂的文本提示。我们通过利用最先进的文本到图像生成器来初始化这些图,将它们的样本提升为 3D,并计算遮挡体积。然后,我们使用图像条件扩散模型将这种跨多个视图的表示优化为 3D 修复任务。为了学习正确的几何结构,我们通过对修复模型中的样本进行调节来合并深度扩散模型,从而提供丰富的几何结构。最后,我们使用图像生成器中的锐化样本对模型进行微调。 Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image ?纸|项目页面| Code (not yet)
23. GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
Authors : Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo
抽象的
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling. Existing radiance representations either require an implicit feature decoder, which significantly degrades the modeling power of the representation, or are spatially unstructured, making them difficult to integrate with mainstream 3D diffusion methods. We derive GaussianCube by first using a novel densification-constrained Gaussian fitting algorithm, which yields high-accuracy fitting using a fixed number of free Gaussians, and then rearranging these Gaussians into a predefined voxel grid via Optimal Transport. Since GaussianCube is a structured grid representation, it allows us to use standard 3D U-Net as our backbone in diffusion modeling without elaborate designs. More importantly, the high-accuracy fitting of the Gaussians allows us to achieve a high-quality representation with orders of magnitude fewer parameters than previous structured representations for comparable quality, ranging from one to two orders of magnitude. The compactness of GaussianCube greatly eases the difficulty of 3D generative modeling. Extensive experiments conducted on unconditional and class-conditioned object generation, digital avatar creation, and text-to-3D synthesis all show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a highly accurate and versatile radiance representation for 3D generative modeling. ?纸|项目页面|代码
24. 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Authors : Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee
抽象的
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal deformation based on the canonical representation to capture dynamic interactions in the reference video. The pipeline facilitates the generation of dynamic scenes with enhanced photorealism and structural integrity, viewable from multiple perspectives, thereby setting a new standard in 4D scene generation. ?纸|项目页面| Code (not yet)
2023 年:
1. [CVPR '24] Text-to-3D using Gaussian Splatting
Authors : Zilong Chen, Feng Wang, Huaping Liu
抽象的
在本文中,我们提出了基于高斯 Splatting 的文本转 3D 生成 (GSGEN),这是一种生成高质量 3D 对象的新方法。由于缺乏 3D 事先和正确的表示,以前的方法存在几何不准确和保真度有限的问题。我们利用 3D 高斯分布(3D Gaussian Splatting)(一种最近最先进的表示方法),通过利用能够合并 3D 先验的显式性质来解决现有的缺点。 Specifically, our method adopts a pro- gressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape.随后,对获得的高斯进行迭代细化以丰富细节。 In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity.通过这些设计,我们的方法可以生成具有精致细节和更精确几何形状的 3D 内容。 Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. ?纸|项目页面|代码| ? Short Presentation | ?解说视频
2. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Authors : Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng
抽象的
3D 内容创建的最新进展主要利用通过分数蒸馏采样 (SDS) 进行基于优化的 3D 生成。尽管已经展现出有希望的结果,但这些方法常常受到每个样本优化缓慢的影响,限制了它们的实际使用。在本文中,我们提出了 DreamGaussian,一种新颖的 3D 内容生成框架,可以同时实现效率和质量。我们的主要见解是设计一个生成 3D 高斯泼溅模型,并在 UV 空间中进行网格提取和纹理细化。与神经辐射场中使用的占用修剪相反,我们证明了 3D 高斯的渐进致密化对于 3D 生成任务的收敛速度明显更快。为了进一步提高纹理质量并促进下游应用,我们引入了一种有效的算法将 3D 高斯转换为纹理网格,并应用微调阶段来细化细节。大量的实验证明了我们提出的方法具有卓越的效率和有竞争力的发电质量。值得注意的是,DreamGaussian 仅需 2 分钟即可从单视图图像生成高质量的纹理网格,与现有方法相比,实现了约 10 倍的加速。 ?纸|项目页面|代码| ?解说视频
3. GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Authors : Taoran Yi1, Jiemin Fang, Guanjun Wu1, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Tian Qi, Xinggang Wang
抽象的
最近,根据文本提示生成 3D 资产已显示出令人印象深刻的结果。 Both 2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D 扩散模型具有良好的 3D 一致性,但由于可训练的 3D 数据昂贵且难以获得,因此其质量和泛化能力受到限制。 2D diffusion models enjoy strong abilities of generalization and fine generation, but the 3D consistency is hard to guarantee.本文试图通过最近明确且高效的 3D 高斯分布表示来桥接两种类型的扩散模型的力量。 A fast 3D generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides point cloud priors for initialization and the 2D diffusion model enriches the geometry and appearance.引入噪声点生长和颜色扰动操作来增强初始化高斯。 Our GaussianDreamer can generate a high-quality 3D instance within 25 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. ?纸|项目页面|代码
4. GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise
Authors : Xinhai Li, Huaibin Wang, Kuo-Kun Tseng
抽象的
Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain.然而,Nerf 和 2D 扩散模型的融合经常会产生过饱和的图像,由于像素渲染方法的限制,对下游工业应用造成了严重限制。 Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes. ?纸
5. [CVPR '24] LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Authors : Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen
抽象的
The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency. ?纸|代码
6. LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Authors : Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee
抽象的
随着VR设备和内容的广泛使用,对3D场景生成技术的需求变得更加普遍。然而,现有的 3D 场景生成模型将目标场景限制在特定领域,这主要是由于其使用远离现实世界的 3D 扫描数据集的训练策略。 To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment.首先,为了从输入生成多视图一致图像,我们将点云设置为每次图像生成的几何指南。具体来说,我们将点云的一部分投影到所需的视图,并提供投影作为使用生成模型进行修复的指导。修复后的图像通过估计的深度图提升到 3D 空间,组成新的点。其次,为了将新点聚合到 3D 场景中,我们提出了一种对齐算法,该算法和谐地集成了新生成的 3D 场景的各个部分。最终获得的3D场景作为优化高斯图的初始点。 LucidDreamer 生成的高斯图比之前的 3D 场景生成方法更加详细,并且对目标场景的域没有限制。 ?纸|项目页面|代码
7. [CVPR '24] HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Authors : Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu
抽象的
Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.我们的主要见解是,3D 高斯泼溅是一种具有周期性高斯收缩或增长的高效渲染器,其中这种自适应密度控制可以由内在的人体结构自然地引导。具体来说,1)我们首先提出了一种结构感知的 SDS,它可以同时优化人体外观和几何形状。 The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue.在仅剪枝阶段根据高斯大小进一步消除浮动伪影,以增强生成平滑度。 Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. ?纸|项目页面|代码| ? Short Presentation
8. CG3D: Compositional Generation for Text-to-3D
Authors : Alexander Vilesov, Pradyumna Chari, Achuta Kadambi
抽象的
With the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi-object scenes, (ii) inability to textually control multi-object configurations, and (iii) physically realistic scene composition. In this work, we propose CG3D, a method for compositionally generating scalable 3D assets that resolves these constraints. We find that explicit Gaussian radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes. By utilizing a guidance framework built around this explicit representation, we show state of the art results, capable of even exceeding the guiding diffusion model in terms of object combinations and physics accuracy. ?纸|项目页面| | ? Short Presentation
9. Learn to Optimize Denoising Scores for 3D Generation - A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting
Authors : Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu and Guosheng Lin
抽象的
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss. ?纸|项目页面|代码
10. [CVPR '24] Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Authors : Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
抽象的
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. ?纸|项目页面
11. DreamGaussian4D: Generative 4D Gaussian Splatting
Authors : Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu
抽象的
最近,4D 内容生成取得了显着进展。然而,现有方法存在优化时间长、缺乏运动可控性和细节水平低的问题。在本文中,我们介绍了 DreamGaussian4D,这是一种基于 4D 高斯 Splatting 表示的高效 4D 生成框架。我们的主要见解是,与隐式表示相比,高斯分布中空间变换的显式建模使其更适合 4D 生成设置。 DreamGaussian4D 将优化时间从几个小时缩短到几分钟,允许灵活控制生成的 3D 运动,并生成可以在 3D 引擎中高效渲染的动画网格。 ?纸|项目页面|代码
12. 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency
Authors : Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei
抽象的
Aided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs, they incur significant time and effort in prompt engineering through trial and error. This work introduces 4DGen, a novel, holistic framework for grounded 4D content generation that decomposes the 4D generation task into multiple stages. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos), thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution supervision through rendering during training, thereby facilitating high-quality 4D generation. Additionally, we employ spatial-temporal pseudo labels on anchor frames, along with seamless consistency priors implemented through 3D-aware score distillation sampling and smoothness regularizations. Compared to existing baselines, our approach yields competitive results in faithfully reconstructing input signals and realistically inferring renderings from novel viewpoints and timesteps. Most importantly, our method supports grounded generation, offering users enhanced control, a feature difficult to achieve with previous methods. ?纸|项目页面|代码| ? Short Presentation
13. Text2Immersion: Generative Immersive Scene with 3D Gaussian
Authors : Hao Ouyang, Kathryn Heal, Stephen Lombardi, Tiancheng Sun
抽象的
We introduce Text2Immersion, an elegant method for producing high-quality 3D immersive scenes from text prompts. Our proposed pipeline initiates by progressively generating a Gaussian cloud using pre-trained 2D diffusion and depth estimation models. This is followed by a refining stage on the Gaussian cloud, interpolating and refining it to enhance the details of the generated scene. Distinct from prevalent methods that focus on single object or indoor scenes, or employ zoom-out trajectories, our approach generates diverse scenes with various objects, even extending to the creation of imaginary scenes. Consequently, Text2Immersion can have wide-ranging implications for various applications such as virtual reality, game development, and automated content creation. Extensive evaluations demonstrate that our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation. ?纸|项目页面| Code (not yet)
14. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
Authors : Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
抽象的
Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch. ?纸|项目页面| Code (not yet)
Dynamics and Deformation:
2024 年:
1. 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
Authors : Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen
抽象的
我们考虑动态场景的新颖视图合成(NVS)问题。最近的神经方法已经在静态 3D 场景中取得了出色的 NVS 结果,但扩展到 4D 时变场景仍然很重要。先前的努力通常通过学习规范空间加上隐式或显式变形场来编码动态,这在突然运动或捕获高保真渲染等具有挑战性的场景中遇到困难。 In this paper, we introduce 4D Gaussian Splatting (4DGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images.作为一种明确的时空表示,4DGS 展示了对复杂动态和精细细节进行建模的强大能力,尤其是对于运动突然的场景。我们在高度优化的 CUDA 加速框架中进一步实现了时间切片和泼溅技术,在 RTX 3090 GPU 上实现了高达 277 FPS 的实时推理渲染速度,在 RTX 4090 GPU 上实现了高达 583 FPS 的实时推理渲染速度。 Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DGS, which consistently outperforms existing methods both quantitatively and qualitatively. ?纸
2. GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Authors : Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann
抽象的
Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. ?纸|项目页面| Code (not yet) | ? Short Presentation
3. Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction
Authors : Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li
抽象的
3D 高斯分布 (3DGS) 已成为动态场景重建的新兴工具。 However, existing methods focus mainly on extending static 3DGS into a time-variant representation, while overlooking the rich motion information carried by 2D observations, thus suffering from performance degradation and model redundancy.为了解决上述问题,我们提出了一种用于动态场景重建的新型运动感知增强框架,该框架从光流中挖掘有用的运动线索来改进动态 3DGS 的不同范例。 Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. Then a novel flow augmentation method is introduced with additional insights into uncertainty and loss collaboration. Moreover, for the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed. We conduct extensive experiments on both multi-view and monocular scenes to verify the merits of our work. Compared with the baselines, our method shows significant superiority in both rendering quality and efficiency. ?纸
4. Bridging 3D Gaussian and Mesh for Freeview Video Rendering
Authors : Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao
抽象的
This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the fuzzy geometry. Meanwhile, the point-based splatting (eg the 3D Gaussian Splatting) method usually produces artifacts or blurry pixels in the area with smooth geometry and sharp textures. As a result, it is difficult, even not impossible, to represent the complex and dynamic scene with a single type of primitive. To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes. Given a sequence of tracked mesh as initialization, our goal is to simultaneously optimize the mesh geometry, color texture, opacity maps, a set of 3D Gaussians, and the deformation field. At a specific time, we perform α-blending on the RGB and opacity values based on the merged and re-ordered z-buffers from mesh and 3D Gaussian rasterizations. This produces the final rendering, which is supervised by the ground-truth image. Experiments demonstrate that our approach adapts the appropriate type of primitives to represent the different parts of the dynamic scene and outperforms all the baseline methods in both quantitative and qualitative comparisons without losing render speed. ?纸
5. [ECCV '24] Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
Authors : Jeongmin Bae*, Seoha Kim*, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh
抽象的
As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames for representing a dynamic scene. However, previous works fail to accurately reconstruct complex dynamic scenes. We attribute the failure to the design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce a local smoothness regularization for per-Gaussian embedding to improve the details in dynamic regions. ?纸|项目页面|代码
6. DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Authors : Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki
抽象的
Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be inferred by re-rendering the background to match the video frames. For the object motion, we first model the object-centric deformation of the objects by leveraging rendering losses and multi-view generative priors in an object-centric frame, then optimize object-centric to world-frame transformations by comparing the rendered outputs against the perceived pixel and optical flow. Finally, we recompose the background and objects and optimize for relative object scales using monocular depth prediction guidance. We show extensive results on the challenging DAVIS, Kubric, and self-captured videos, detail some limitations, and provide future directions. Besides 4D scene generation, our results show that DreamScene4D enables accurate 2D point motion tracking by projecting the inferred 3D trajectories to 2D, while never explicitly trained to do so. ?纸|项目页面| Code (not yet)
7. [CVPR '24] 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
Authors : Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai
抽象的
In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance. ?纸|项目页面| Code (not yet)
8. MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos
Authors : Qingming Liu*, Yuan Liu*, Jiepeng Wang, Xianqiang Lv,Peng Wang, Wenping Wang, Junhui Hou†,
抽象的
In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms baseline methods by a significant margin.
?纸|项目页面| Code (not yet)
9. [ECCVW '24] Optimizing Dynamic NeRF and 3DGS with No Video Synchronization
Authors : Seoha Kim*, Jeongmin Bae*, Youngsik Yun, HyunSeung Son, Hahyun Lee, Gun Bang, Youngjung Uh
抽象的
Recent advancements in 4D scene reconstruction using dynamic NeRF and 3DGS have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame, while the multi-view images at the same frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with the field. By design, our method is applicable for various baselines, even regardless of the types of radiance fields. We conduct experiments on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance of our method. ?纸
10. [NeurIPS '24] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
Authors : Ruijie Zhu*, Yanzhe Liang*, Hanzhi Chang, Jiacheng Deng, Jiahao Lu, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang
抽象的
Dynamic scene reconstruction is a long-term challenge in the field of 3D vision. Recently, the emergence of 3D Gaussian Splatting has provided new insights into this problem. Although subsequent efforts rapidly extend static 3D Gaussian to dynamic scenes, they often lack explicit constraints on object motion, leading to optimization difficulties and performance degradation. To address the above issues, we propose a novel deformable 3D Gaussian splatting framework called MotionGS, which explores explicit motion priors to guide the deformation of 3D Gaussians. Specifically, we first introduce an optical flow decoupling module that decouples optical flow into camera flow and motion flow, corresponding to camera movement and object motion respectively. Then the motion flow can effectively constrain the deformation of 3D Gaussians, thus simulating the motion of dynamic objects. Additionally, a camera pose refinement module is proposed to alternately optimize 3D Gaussians and camera poses, mitigating the impact of inaccurate camera poses. Extensive experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods and exhibits significant superiority in both qualitative and quantitative results. ?纸|项目页面| Code (not yet)
11. [ECCV '24] DGD: Dynamic 3D Gaussians Distillation
Authors : Isaac Labe*, Noam Issachar*, Itai Lang, Sagie Benaim
抽象的
We tackle the task of learning dynamic 3D semantic radiance fields given a single monocular video as input. Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics. This enables the segmentation and tracking of a diverse set of 3D semantic entities, specified using a simple and intuitive interface that includes a user click or a text prompt. To this end, we present DGD, a unified 3D representation for both the appearance and semantics of a dynamic 3D scene, building upon the recently proposed dynamic 3D Gaussians representation. Our representation is optimized over time with both color and semantic information. Key to our method is the joint optimization of the appearance and semantic attributes, which jointly affect the geometric properties of the scene. We evaluate our approach in its ability to enable dense semantic 3D object tracking and demonstrate high-quality results that are fast to render, for a diverse set of scenes. ?纸|项目页面|代码| ? Short Presentation
12. [NeurIPS '24] Fully Explicit Dynamic Gaussian Splatting
Authors : Junoh Lee, Changyeon Won, HyunJun Jung, Inhwan Bae, Hae-Gon Jeon
抽象的
3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a Edited{Explicit 4D Gaussian Splatting(Ex4DGS)}. Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU. ?纸|项目页面|代码
13. [3DV '25] EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
Authors : Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang
抽象的
Human activities are inherently complex, often involving numerous object interactions. To better understand these activities, it is crucial to model their interactions with the environment captured through dynamic changes. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand human-object interactions in 3D environments. However, most existing methods for human activity modeling neglect the dynamic interactions with objects, resulting in only static representations. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage the uniquely discrete nature of Gaussian Splatting and segment dynamic interactions from the background, with both having explicit representations. Our approach employs a clip-level online learning pipeline that leverages the dynamic nature of human activities, allowing us to reconstruct the temporal evolution of the scene in chronological order and track rigid object motion. EgoGaussian shows significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art. We also qualitatively demonstrate the high quality of the reconstructed models. ?纸|项目页面| Code (not yet) | ? Short Presentation
14. 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement
Authors : Ziqi Lu, Jianbo Ye, John Leonard
抽象的
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D changes. Our method can detect changes in cluttered environments using sparse post-change images within as little as 18s, using as few as a single new image. It does not rely on depth input, user instructions, object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at https://github.com/520xyxyzq/3DGS-CD. ?纸| Code (not yet)
2023 年:
1. [3DV '24] Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
Authors : Jonathon Luiten, Georgios Kopanas, Bastian Leibe, Deva Ramanan
抽象的
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we allow Gaussians to move and rotate over time while enforcing that they have persistent color, opacity, and size. By regularizing Gaussians' motion and rotation with local rigidity constraints, we show that our Dynamic 3D Gaussians correctly model the same area of physical space over time, including the rotation of that space. Dense 6-DOF tracking and dynamic reconstruction emerges naturally from persistent dynamic view synthesis, without requiring any correspondence or flow as input. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing. ?纸|项目页面|代码| ?解说视频
2. [CVPR '24] Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Authors : Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, Xiaogang Jin
抽象的
Implicit neural representation has opened up new avenues for dynamic scene reconstruction and rendering. Nonetheless, state-of-the-art methods of dynamic neural rendering rely heavily on these implicit representations, which frequently struggle with accurately capturing the intricate details of objects in the scene. Furthermore, implicit methods struggle to achieve real-time rendering in general dynamic scenes, limiting their use in a wide range of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using explicit 3D Gaussians and learns Gaussians in canonical space with a deformation field to model monocular dynamic scenes. We also introduced a smoothing training mechanism with no extra overhead to mitigate the impact of inaccurate poses in real datasets on the smoothness of time interpolation tasks. Through differential gaussian rasterization, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time synthesis, and real-time rendering. ?纸|项目页面|代码
3. [CVPR '24] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Authors : Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Tian Qi, Xinggang Wang
抽象的
Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to maintain. We introduce the 4D Gaussian Splatting (4D-GS) to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency. An efficient deformation field is constructed to model both Gaussian motions and shape deformations. Different adjacent Gaussians are connected via a HexPlane to produce more accurate position and shape deformations. Our 4D-GS method achieves real-time rendering under high resolutions, 70 FPS at a 800×800 resolution on an RTX 3090 GPU, while maintaining comparable or higher quality than previous state-of-the-art method. ?纸|项目页面|代码
4. Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Authors : Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, Li Zhang
抽象的
由于场景复杂性和时间动态性,从 2D 图像重建动态 3D 场景并随着时间的推移生成不同的视图具有挑战性。尽管神经隐式模型取得了进步,但局限性仍然存在:(i)场景结构不足:现有方法很难通过直接学习复杂的 6D 全光函数来揭示动态场景的空间和时间结构。 (ii) 缩放变形建模:对于复杂的动力学,显式地建模场景元素变形变得不切实际。 To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling.学习优化 4D 图元使我们能够在任何需要的时间通过我们定制的渲染例程合成新颖的视图。我们的模型在概念上很简单,由可在空间和时间上任意旋转的各向异性椭圆参数化的 4D 高斯组成,以及由 4D 球谐函数系数表示的依赖于视图和时间演化的外观。这种方法为可变长度视频和端到端训练提供了简单性、灵活性,以及高效的实时渲染,使其适合捕获复杂的动态场景运动。 Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency. ?纸|代码
5. [ECCV '24] A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Authors : Kai Katsumata, Duc Minh Vo, Hideki Nakayama
抽象的
在来自多个输入视图的场景的新颖视图合成中,3D 高斯喷射成为现有辐射场方法的可行替代方案,提供出色的视觉质量和实时渲染。 While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerous observations per time step, due to the onus of storing 3D Gaussian parameters per time step. In this study, we present an efficient 3D Gaussian representation tailored for dynamic scenes in which we define positions and rotations as functions of time while leaving other time-invariant properties of the static 3D Gaussian unchanged. Notably, our representation reduces memory usage, which is consistent regardless of the input sequence length. Additionally, it mitigates the risk of overfitting observed frames by accounting for temporal changes. The optimization of our Gaussian representation based on image and flow reconstruction results in a powerful framework for dynamic scene view synthesis in both monocular and multi-view cases. We obtain the highest rendering speed of 118 frames per second (FPS) at a resolution of 1352×1014 with a single GPU, showing the practical usability and effectiveness of our proposed method in dynamic scene rendering scenarios ?纸|项目页面|代码
6. DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Authors : Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis
抽象的
Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios. ?纸|项目页面| Code (not yet)
7. [CVPR '24] Control4D: Efficient 4D Portrait Editing with Text
Authors : Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu
抽象的
We introduce Control4D, an innovative framework for editing dynamic 4D portraits using text instructions. Our method addresses the prevalent challenges in 4D editing, notably the inefficiencies of existing 4D representations and the inconsistent editing effect caused by diffusion-based editors. We first propose GaussianPlanes, a novel 4D representation that makes Gaussian Splatting more structured by applying plane-based decomposition in 3D space and time. This enhances both efficiency and robustness in 4D editing. Furthermore, we propose to leverage a 4D generator to learn a more continuous generation space from inconsistent edited images produced by the diffusion-based editor, which effectively improves the consistency and quality of 4D editing. Comprehensive evaluation demonstrates the superiority of Control4D, including significantly reduced training time, high-quality rendering, and spatial-temporal consistency in 4D portrait editing. ?纸|项目页面| Code (not yet)
8. [CVPR '24] SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Authors : Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
抽象的
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively.我们的关键思想是使用数量明显少于高斯分布的稀疏控制点来学习紧凑的 6 DoF 变换基,这些变换基可以通过学习的插值权重进行局部插值,以产生 3D 高斯分布的运动场。我们采用变形 MLP 来预测每个控制点的时变 6 DoF 变换,这降低了学习复杂性,增强了学习能力,并有助于获得时间和空间相干运动模式。然后,我们共同学习 3D 高斯、控制点的规范空间位置和变形 MLP,以重建 3D 场景的外观、几何和动态。在学习过程中,控制点的位置和数量会自适应调整,以适应不同区域不同的运动复杂性,并开发遵循尽可能刚性原则的 ARAP 损失,以增强学习运动的空间连续性和局部刚性。最后,由于显式的稀疏运动表示及其外观分解,我们的方法可以实现用户控制的运动编辑,同时保留高保真外观。大量的实验表明,我们的方法在具有高渲染速度的新颖视图合成方面优于现有方法,并且能够实现新颖的保留外观的运动编辑应用程序。 ?纸|项目页面|代码
9. [CVPR '24] Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Authors : Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen
抽象的
Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, highquality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the第二阶段。 The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects, maintaining 3D consistency across novel views. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues. ?纸
10. [CVPR '24] Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Authors : Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao
抽象的
We introduce Gaussian-Flow, a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos. In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS).具体来说,提出了一种新颖的双域变形模型(DDDM)来显式建模每个高斯点的属性变形,其中每个属性的时间相关残差通过时域中的多项式拟合来捕获,并通过时域中的傅里叶级数拟合来捕获频域。所提出的 DDDM 能够对长视频片段中的复杂场景变形进行建模,从而无需为每个帧训练单独的 3DGS 或引入额外的隐式神经场来建模 3D 动态。此外,离散高斯点的显式变形建模确保了 4D 场景的超快速训练和渲染,这与为静态 3D 重建设计的原始 3DGS 相当。 Our proposed approach showcases a substantial efficiency improvement, achieving a 5× faster training speed compared to the per-frame 3DGS modeling. In addition, quantitative results demonstrate that the proposed Gaussian-Flow significantly outperforms previous leading methods in novel view rendering quality. ?纸|项目页面| Code (not yet)
11. [CVPR '24] CoGS: Controllable Gaussian Splatting
Authors : Heng Yu, Joel Julin, Zoltán Á.米拉茨基、新沼光一郎、拉斯洛·A·杰尼
抽象的
Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity. ?纸|项目页面| Code (not yet)
12. GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
Authors : Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao
抽象的
We propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering. ?纸|项目页面| ? Short Presentation
13. [CVPR '24] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Authors : Zhan Li, Zhang Chen, Zhong Li, Yi Xu
抽象的
动态场景的新颖视图合成一直是一个有趣但具有挑战性的问题。尽管最近取得了一些进展,但同时实现高分辨率照片级真实感结果、实时渲染和紧凑存储仍然是一项艰巨的任务。为了应对这些挑战,我们提出时空高斯特征扩散作为一种新颖的动态场景表示,由三个关键组件组成。首先,我们通过时间不透明度和参数化运动/旋转增强 3D 高斯,从而制定富有表现力的时空高斯。这使得时空高斯能够捕获场景中的静态、动态以及瞬态内容。其次,我们引入了splatted特征渲染,它用神经特征代替球谐函数。这些功能有助于对视图和时间相关的外观进行建模,同时保持小尺寸。第三,我们利用训练误差和粗略深度的指导,在难以与现有管道收敛的区域中对新高斯进行采样。对几个已建立的现实世界数据集的实验表明,我们的方法实现了最先进的渲染质量和速度,同时保留了紧凑的存储。在 8K 分辨率下,我们的精简版模型可以在 Nvidia RTX 4090 GPU 上以 60 FPS 的速度渲染。 ?纸|项目页面|代码| ? Short Presentation
14. MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes
Authors : Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, Jeffrey Ichnowski
抽象的
Accurate 3D tracking in highly deformable scenes with occlusions and shadows can facilitate new applications in robotics, augmented reality, and generative AI. However, tracking under these conditions is extremely challenging due to the ambiguity that arises with large deformations, shadows, and occlusions. We introduce MD-Splatting, an approach for simultaneous 3D tracking and novel view synthesis, using video captures of a dynamic scene from various camera poses. MD-Splatting builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel view synthesis. MD-Splatting learns a deformation function to project a set of Gaussians with non-metric, thus canonical, properties into metric space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on local rigidity, conservation of momentum, and isometry, which leads to trajectories with smaller trajectory errors. MD-Splatting achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. Compared to state-of-the-art, we improve 3D tracking by an average of 23.9 %, while simultaneously achieving high-quality novel view synthesis. With sufficient texture such as in scene 6, MD-Splatting achieves a median tracking error of 3.39 mm on a cloth of 1 x 1 meters in size ?纸|项目页面| Code (not yet)
15. [ECCV'24] SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Authors : Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero
抽象的
Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates.然而,它仅限于静态场景。 In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer. ?纸
16. [CVPR '24] 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Authors : Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing
抽象的
Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specificallggy, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the naïve approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods. ?纸|项目页面| Code (not yet) | ? 3DGStream Viewer
编辑:
2024 年:
1. Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation
Authors : Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue
抽象的
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before α blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and α blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. Code and trained models will be released upon acceptance. ?纸
2. CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians
Authors : Bin Dou, Tianyu Zhang, Yongjia Ma, Zhaohui Wang, Zejian Yuan
抽象的
We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based 3D segmentation methods have relied on implicit or voxel neural scene representation and ray-marching volume rendering which are time consuming. Recent 3D Gaussian Splatting significantly improves the rendering speed, however, existing Gaussians-based segmentation methods(eg: Gaussian Grouping) fail to provide compact segmentation masks especially in zero-shot segmentation, which is mainly caused by the lack of robustness and compactness for straightforwardly assigning learnable parameters to each Gaussian when encountering inconsistent 2D machine-generated labels. Our method aims to achieve compact and reliable zero-shot scene segmentation swiftly by mapping fused spatial and semantically meaningful features for each Gaussian point with a shallow decoding network. Specifically, our method firstly optimizes Gaussian points' position, convariance and color attributes under the supervision of RGB images. After Gaussian Locating, we distill multi-scale DINO features extracted from images through unprojection to each Gaussian, which is then incorporated with spatial features from the fast point features processing network, ie RandLA-Net. Then the shallow decoding MLP is applied to the multi-scale fused features to obtain compact segmentation. Experimental results show that our model can perform high-quality zero-shot scene segmentation, as our model outperforms other segmentation methods on both semantic and panoptic segmentation task, meanwhile consumes approximately only 10% segmenting time compared to NeRF-based segmentation. ?纸|项目页面| Code (not yet)
3. TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Authors : Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan
抽象的
Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while keeping the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively. ?纸|项目页面
4. Segment Anything in 3D Gaussians
Authors : Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang
抽象的
3D Gaussian Splatting has emerged as an alternative 3D representation of Neural Radiance Fields (NeRFs), benefiting from its high-quality rendering results and real-time rendering speed. Considering the 3D Gaussian representation remains unparsed, it is necessary first to execute object segmentation within this domain. Subsequently, scene editing and collision detection can be performed, proving vital to a multitude of applications, such as virtual reality (VR), augmented reality (AR), game/movie production, etc. In this paper, we propose a novel approach to achieve object segmentation in 3D Gaussian via an interactive procedure without any training process and learned parameters. We refer to the proposed method as SA-GS, for Segment Anything in 3D Gaussians. Given a set of clicked points in a single input view, SA-GS can generalize SAM to achieve 3D consistent segmentation via the proposed multi-view mask generation and view-wise label assignment methods. We also propose a cross-view label-voting approach to assign labels from different views. In addition, in order to address the boundary roughness issue of segmented objects resulting from the non-negligible spatial sizes of 3D Gaussian located at the boundary, SA-GS incorporates the simple but effective Gaussian Decomposition scheme. Extensive experiments demonstrate that SA-GS achieves high-quality 3D segmentation results, which can also be easily applied for scene editing and collision detection tasks. ?纸
5. GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting
Authors : Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodolà
抽象的
We present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models. Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware. We tackle the problem by leveraging Gaussian splatting to represent 3D scenes, and we optimize the model while progressively varying the image supervision by means of a pretrained image-based diffusion model. The input object may be given as a 3D triangular mesh, or directly provided as Gaussians from a generative model such as DreamGaussian. GSEdit ensures consistency across different viewpoints, maintaining the integrity of the original object's information. Compared to previously proposed methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. Our editing process is refined via the application of the SDS loss, ensuring that our edits are both precise and accurate. Our comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following the given textual instructions while preserving their coherence and detail. ?纸
6. GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Authors : Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu
抽象的
We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works.它可以带来更快的编辑速度和更高的视觉质量。 This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods. ?纸
7. View-Consistent 3D Editing with Gaussian Splatting
Authors : Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
抽象的
The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. ?纸
8. Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Authors : Antoine Guédon, Vincent Lepetit
抽象的
我们提出了 Gaussian Frosting,这是一种新颖的基于网格的表示,用于实时高质量渲染和编辑复杂的 3D 效果。我们的方法建立在最近的 3D 高斯分布框架的基础上,该框架优化了一组 3D 高斯以近似图像的辐射场。 We propose first extracting a base mesh from Gaussians during optimization, then building and refining an adaptive layer of Gaussians with a variable thickness around the mesh to better capture the fine details and volumetric effects near the surface, such as hair or grass.我们将这一层称为高斯糖霜,因为它类似于蛋糕上的糖霜涂层。材料越模糊,糖霜就越厚。我们还引入了高斯参数化,以强制它们留在磨砂层内,并在变形、重新缩放、编辑或动画网格时自动调整其参数。我们的表示允许使用高斯喷射进行高效渲染,以及通过修改基础网格进行编辑和动画。 We demonstrate the effectiveness of our method on various synthetic and real scenes, and show that it outperforms existing surface-based approaches. We will release our code and a web-based viewer as additional contributions. ?纸|项目页面| Code (not yet) | ? Short Presentation
9. Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Authors : Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li
抽象的
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks. ?纸|项目页面| Code (not yet)
10. EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Authors : Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney
抽象的
In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale. ?纸|项目页面| Code (not yet)
11. InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
Authors : Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao
抽象的
3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion. ?纸|项目页面|代码
12. Gaga: Group Any Gaussians via 3D-aware Memory Bank
Authors : Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang
抽象的
We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot segmentation models, significantly enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as scene understanding and manipulation. ?纸|项目页面|代码
13. [CVPR W'24] ICE-G: Image Conditional Editing of 3D Gaussian Splats
Authors : Vishnu Jaganathan, Hannah Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira
抽象的
Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine grained control of editing. ?纸|项目页面| ? Short Presentation
14. Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks
Authors : Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar
抽象的
In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we found that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. ?预印本 |项目页面| Code (Segmentation)
2023 年:
1. [CVPR '24] GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Authors : Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
抽象的
3D 编辑在游戏和虚拟现实等许多领域发挥着至关重要的作用。传统的 3D 编辑方法依赖于网格和点云等表示形式,通常无法真实地描绘复杂的场景。另一方面,基于隐式 3D 表示的方法(例如神经辐射场 (NeRF))可以有效渲染复杂场景,但处理速度慢且对特定场景区域的控制有限。 In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation technique. GaussianEditor enhances precision and control in editing through our proposed Gaussian Semantic Tracing, which traces the editing target throughout the training process. Additionally, we propose hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models.我们还开发了有效的对象删除和集成的编辑策略,这对现有方法来说是一项具有挑战性的任务。 Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. ?纸|项目页面|代码| ? Short Presentation
2. [CVPR '24] GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Authors : Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian
抽象的
Recently, impressive results have been achieved in 3D scene editing with text instructions based on a 2D diffusion model. However, current diffusion models primarily generate images by predicting noise in the latent space, and the editing is usually applied to the whole image, which makes it challenging to perform delicate, especially localized, editing for 3D scenes. Inspired by recent 3D Gaussian splatting, we propose a systematic framework, named GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text instructions. Benefiting from the explicit property of 3D Gaussians, we design a series of techniques to achieve delicate editing. Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians. The Gaussian RoI is further used to control the editing process. Our framework can achieve more delicate and precise editing of 3D scenes than previous methods while enjoying much faster training speed, ie within 20 minutes on a single V100 GPU, more than twice as fast as Instruct-NeRF2NeRF (45 minutes -- 2 hours) ?纸|项目页面| Code (not yet) | ? Short Presentation
3. Point'n Move: Interactive Scene Object Manipulation on Gaussian Splatting Radiance Fields
Authors : Jiajun Huang, Hongchuan Yu
抽象的
We propose Point'n Move, a method that achieves interactive scene object manipulation with exposed region inpainting. Interactivity here further comes from intuitive object selection and real-time editing. To achieve this, we adopt Gaussian Splatting Radiance Field as the scene representation and fully leverage its explicit nature and speed advantage. Its explicit representation formulation allows us to devise a 2D prompt points to 3D mask dual-stage self-prompting segmentation algorithm, perform mask refinement and merging, minimize change as well as provide good initialization for scene inpainting and perform editing in real-time without per-editing training, all leads to superior quality and performance. We test our method by performing editing on both forward-facing and 360 scenes. We also compare our method against existing scene object removal methods, showing superior quality despite being more capable and having a speed advantage. ?纸
4. [ECCV'24] Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Authors : Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke
抽象的
最近的 Gaussian Splatting 实现了 3D 场景的高质量、实时新颖视图合成。 However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene.我们没有采用昂贵的 3D 标签,而是利用 SAM 的 2D 掩模预测以及引入的 3D 空间一致性正则化在可微分渲染期间监督身份编码。与隐式 NeRF 表示相比,我们表明离散和分组 3D 高斯可以以高视觉质量、细粒度和效率重建、分割和编辑 3D 中的任何内容。基于高斯分组,我们进一步提出了一种局部高斯编辑方案,该方案在多功能场景编辑应用中显示出功效,包括 3D 对象去除、修复、着色和场景重组。 ?纸|代码
5. Segment Any 3D Gaussians
Authors : Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian
抽象的
Interactive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation. However, existing methods face challenges in either achieving fine-grained, multi-granularity segmentation or contending with substantial computational overhead, inhibiting real-time interaction. In this paper, we introduce Segment Any 3D GAussians (SAGA), a novel 3D interactive segmentation approach that seamlessly blends a 2D segmentation foundation model with 3D Gaussian Splatting (3DGS), a recent breakthrough of radiance fields. SAGA efficiently embeds multi-granularity 2D segmentation results generated by the segmentation foundation model into 3D Gaussian point features through well-designed contrastive training. Evaluation on existing benchmarks demonstrates that SAGA can achieve competitive performance with state-of-the-art methods. Moreover, SAGA achieves multi-granularity segmentation and accommodates various prompts, including points, scribbles, and 2D masks. Notably, SAGA can finish the 3D segmentation within milliseconds, achieving nearly 1000× acceleration1 compared to previous SOTA. ?纸|项目页面|代码
6. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
抽象的
近年来,3D 场景表示非常受欢迎。使用神经辐射场的方法对于新颖视图合成等传统任务是通用的。 In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. ?纸|项目页面|代码| ? Short Presentation
7. 2D-Guided 3D Gaussian Segmentation
Authors : Kun Lan, Haoran Li, Haolin Shi, Wenjun Wu, Yong Liao, Lin Wang, Pengyuan Zhou
抽象的
Recently, 3D Gaussian, as an explicit 3D representation method, has demonstrated strong competitiveness over NeRF (Neural Radiance Fields) in terms of expressing complex scenes and training duration. These advantages signal a wide range of applications for 3D Gaussians in 3D understanding and editing. Meanwhile, the segmentation of 3D Gaussians is still in its infancy. The existing segmentation methods are not only cumbersome but also incapable of segmenting multiple objects simultaneously in a short amount of time. In response, this paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision. This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information, while nearest neighbor clustering and statistical filtering refine the segmentation results. Experiments show that our concise method can achieve comparable performances on mIOU and mAcc for multi-object segmentation as previous single-object segmentation methods. ?纸
Language Embedding:
2024 年:
1. [IROS '24] Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot
Authors : Justin Yu, Kush Hari, Kishore Srinivas, Karim El-Refai, Adam Rashid, Chung Min Kim, Justin Kerr, Richard Cheng, Muhammad Zubair Irshad, Ashwin Balakrishna, Thomas Kollar, Ken Goldberg
抽象的
构建语义 3D 地图对于搜索办公室、仓库、商店和家庭中感兴趣的对象非常有价值。我们提出了一个地图系统,可以逐步构建语言嵌入的高斯 Splat (LEGS):一种详细的 3D 场景表示,以统一的表示形式对外观和语义进行编码。当机器人遍历其环境时,LEGS 会进行在线训练,以实现开放词汇对象查询的本地化。我们在 4 个房间规模的场景上评估 LEGS,在这些场景中查询场景中的对象以评估 LEGS 如何捕获语义。我们将 LEGS 与 LERF 进行比较,发现虽然两个系统的对象查询成功率相当,但 LEGS 的训练速度比 LERF 快 3.5 倍以上。结果表明,多摄像头设置和增量束调整可以提高受限机器人轨迹中的视觉重建质量,并表明 LEGS 可以以高达 66% 的准确度定位开放词汇和长尾对象查询。 ?纸|项目页面
2. [CVPR '24] Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Authors : Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, Shao-Hua Guan
抽象的
Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. ?纸|项目页面|代码
3. [CVPR '24] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Authors : Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi
抽象的
近年来,3D 场景表示非常受欢迎。使用神经辐射场的方法对于新颖视图合成等传统任务是通用的。 In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. ?纸|项目页面|代码| ? Short Presentation
4. [CVPR '24] LangSplat: 3D Language Gaussian Splatting
Authors : Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister
抽象的
Human lives in a 3D world and commonly uses natural language to interact with a 3D scene.对 3D 语言字段进行建模以支持 3D 开放式语言查询最近受到越来越多的关注。本文介绍了 LangSplat,它构建了一个 3D 语言场,可以在 3D 空间内进行精确、高效的开放词汇查询。与在 NeRF 模型中建立 CLIP 语言嵌入的现有方法不同,LangSplat 通过利用 3D 高斯集合(每种编码语言特征都从 CLIP 中提取)来代表语言领域,从而推进了该领域的发展。通过采用基于图块的splatting技术来渲染语言特征,我们规避了NeRF固有的昂贵的渲染过程。 LangSplat 不是直接学习 CLIP 嵌入,而是首先训练场景语言自动编码器,然后学习场景特定潜在空间上的语言特征,从而减轻显式建模带来的大量内存需求。现有方法难以应对不精确且模糊的 3D 语言领域,无法辨别对象之间的清晰边界。我们深入研究了这个问题,并提出使用 SAM 学习分层语义,从而消除跨不同尺度广泛查询语言领域和 DINO 特征正则化的需要。 Extensive experiments on open-vocabulary 3D object localization and semantic segmentation demonstrate that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a {speed} × speedup compared to LERF at the resolution of 1440 × 1080. ?纸|项目页面|代码| ? Short Presentation
5. SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting
Authors : Xinyi Liu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi
抽象的
机器人代表环境的许多最新进展都集中在照片级真实感重建上。本文特别关注从真实感高斯泼溅模型生成图像序列,这些图像序列与用户输入语言给出的指令相匹配。我们贡献了一个新颖的框架 SplaTraj,它将真实环境表示中的图像生成表述为连续时间轨迹优化问题。成本的设计使得遵循轨迹姿势的相机能够平滑地穿越环境并以上镜的方式渲染指定的空间信息。这是通过查询具有语言嵌入的真实感表示来隔离与用户指定的输入相对应的区域来实现的。然后,当相机随时间移动时,这些区域会被投影到相机的视图中,并构建成本。然后,我们可以应用基于梯度的优化并通过渲染进行区分,以优化定义成本的轨迹。生成的轨迹移动以拍摄出适合照片的每个指定对象。我们在一系列环境和指令上根据经验评估我们的方法,并演示生成的图像序列的质量。 ?纸| Code (not yet) | ? Short Presentation
6. FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
Authors : Xingxing Zuo, Pouya Samangouei, Yunwen Zhou, Yan Di, Mingyang Li
抽象的
精确感知现实世界 3D 对象的几何和语义属性对于增强现实和机器人应用的持续发展至关重要。 To this end, we present algfull{} (algname{}), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection, despite that we are 851× faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. ?纸
7. Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Authors : Hyunjee Lee*, Youngsik Yun*, Jeongmin Bae, Seoha Kim, Youngjung Uh
抽象的
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. ?纸|项目页面| Code (not yet)
Mesh Extraction and Physics:
2024 年:
1. Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting
Authors : Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang
抽象的
We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. ?纸|项目页面| Code (not yet) | ? Short Presentation
2. GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting
Authors : Joanna Waczyńska, Piotr Borycki, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek
抽象的
In recent years, a range of neural network-based methods for image rendering have been introduced. For instance, widely-researched neural radiance fields (NeRF) rely on a neural network to represent 3D scenes, allowing for realistic view synthesis from a small number of 2D images. However, most NeRF models are constrained by long training and inference times. In comparison, Gaussian Splatting (GS) is a novel, state-of-theart technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and swift, real-time rendering. A drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity to condition several hundred thousand Gaussian components. To solve this, we introduce Gaussian Mesh Splatting (GaMeS) model, a hybrid of mesh and a Gaussian distribution, that pin all Gaussians splats on the object surface (mesh). The unique contribution of our methods is defining Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain high-quality renders in the real-time generation of high-quality views. Furthermore, we demonstrate that in the absence of a predefined mesh, it is possible to fine-tune the initial mesh during the learning process. ?纸|代码
3. Mesh-based Gaussian Splatting for Real-time Large-scale Deformation
Authors : Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai
抽象的
Neural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion. Gaussian Splatting(GS) has recently become a promising method with explicit geometry for representing static scenes and facilitating high-quality and real-time synthesis of novel views. However,it cannot be easily deformed due to the use of discrete Gaussians and lack of explicit topology. To address this, we develop a novel GS-based method that enables interactive deformation. Our key idea is to design an innovative mesh-based GS representation, which is integrated into Gaussian learning and manipulation. 3D Gaussians are defined over an explicit mesh, and they are bound with each other: the rendering of 3D Gaussians guides the mesh face split for adaptive refinement, and the mesh face split directs the splitting of 3D Gaussians. Moreover, the explicit mesh constraints help regularize the Gaussian distribution, suppressing poor-quality Gaussians(eg misaligned Gaussians,long-narrow shaped Gaussians), thus enhancing visual quality and avoiding artifacts during deformation. Based on this representation, we further introduce a large-scale Gaussian deformation technique to enable deformable GS, which alters the parameters of 3D Gaussians according to the manipulation of the associated mesh. Our method benefits from existing mesh deformation datasets for more realistic data-driven Gaussian deformation. Extensive experiments show that our approach achieves high-quality reconstruction and effective deformation, while maintaining the promising rendering results at a high frame rate(65 FPS on average). ?纸
4. Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
Authors : Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li
抽象的
Reconstructing and simulating elastic objects from visual observations is crucial for applications in computer vision and robotics. Existing methods, such as 3D Gaussians, provide modeling for 3D appearance and geometry but lack the ability to simulate physical properties or optimize parameters for heterogeneous objects. We propose Spring-Gaus, a novel framework that integrates 3D Gaussians with physics-based simulation for reconstructing and simulating elastic objects from multi-view videos. Our method utilizes a 3D Spring-Mass model, enabling the optimization of physical parameters at the individual point level while decoupling the learning of physics and appearance. This approach achieves great sample efficiency, enhances generalization, and reduces sensitivity to the distribution of simulation particles. We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects. This includes future prediction and simulation under varying initial states and environmental parameters. ?纸|项目页面| Code (not yet)
5. Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
Authors : Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang
抽象的
3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namely Texture-GS, to disentangle the appearance from the geometry by representing it as a 2D texture mapped onto the 3D surface, thereby facilitating appearance editing. Technically, the disentanglement is achieved by our proposed texture mapping module, which consists of a UV mapping MLP to learn the UV coordinates for the 3D Gaussian centers, a local Taylor expansion of the MLP to efficiently approximate the UV coordinates for the ray-Gaussian intersections, and a learnable texture to capture the fine-grained appearance. Extensive experiments on the DTU dataset demonstrate that our method not only facilitates high-fidelity appearance editing but also achieves real-time rendering on consumer-level devices, eg a single RTX 2080 Ti GPU. ?纸|项目页面| Code (not yet)
6. DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
Authors : Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala
抽象的
3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints during optimization. We extend 3D Gaussian splatting with depth and normal cues to tackle challenging indoor datasets and showcase techniques for efficient mesh extraction, an important downstream application. Specifically, we regularize the optimization procedure with depth information, enforce local smoothness of nearby Gaussians, and use the geometry of the 3D Gaussians supervised by normal cues to achieve better alignment with the true scene geometry. We improve depth estimation and novel view synthesis results over baselines and show how this simple yet effective regularization technique can be used to directly extract meshes from the Gaussian representation yielding more physically accurate reconstructions on indoor scenes. ?纸|代码|项目页面
7. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
Authors : Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao
抽象的
3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-accurate 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering. ?纸|项目页面|代码| ? Short Presentation
7.1 Unofficial Implementation and Specification
Authors : Yunzhou Song, Zixuan Lin, Yexin Zhang
代码
8. Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
Authors : Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang
抽象的
Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. ?纸|项目页面| Code (not yet)
9. [ECCV '24] GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Authors : Yaniv Wolf, Amit Bracha, Ron Kimmel
抽象的
最近,3D 高斯分布 (3DGS) 已成为准确表示场景的有效方法。然而,尽管其具有卓越的新颖视图合成功能,直接从高斯属性中提取场景的几何形状仍然是一个挑战,因为这些几何形状是基于光度损失进行优化的。虽然一些并发模型尝试在高斯优化过程中添加几何约束,但它们仍然会产生嘈杂的、不切实际的表面。我们提出了一种新颖的方法,通过将现实世界的知识注入深度提取过程,来弥合噪声 3DGS 表示和平滑 3D 网格表示之间的差距。我们不是直接从高斯属性中提取场景的几何形状,而是通过预先训练的立体匹配模型来提取几何形状。我们渲染与原始训练姿势相对应的立体对齐图像对,将这些图像对输入立体模型以获得深度剖面,最后将所有剖面融合在一起以获得单个网格。与高斯泼溅的其他表面重建方法相比,所得重建更平滑、更准确,并且显示更复杂的细节,同时在相当短的 3DGS 优化过程之上只需要很小的开销。我们对使用智能手机获得的野外场景对所提出的方法进行了广泛的测试,展示了其卓越的重建能力。此外,我们还在 Tanks and Temples 和 DTU 基准测试中测试了该方法,取得了最先进的结果。 ?纸|项目页面|代码
10. RaDe-GS: Rasterizing Depth in Gaussian Splatting
Authors : Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, Ping Tan
抽象的
Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. Our approach achieves a Chamfer distance error comparable to NeuraLangelo[Li et al. 2023] on the DTU dataset and similar training and rendering time as traditional Gaussian Splatting on the Tanks & Temples dataset. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods. ?纸|项目页面| Code (not yet)
11. Trim 3D Gaussian Splatting for Accurate Geometry Representation
Authors : Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang
抽象的
In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. ?纸|项目页面|代码
12. Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting
Authors : Junha Hyung, Susung Hong, Sungwon Hwang, Jaeseong Lee, Jaegul Choo, Jin-Hwa Kim
抽象的
3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals, due to the Gaussians converging into anisotropic Gaussians with one dominant variance. We propose using effective rank analysis to examine the shape statistics of 3D Gaussian primitives, and identify the Gaussians indeed converge into needle-like shapes with the effective rank 1. To address this, we introduce effective rank as a regularization, which constrains the structure of the Gaussians. Our new regularization method enhances normal and geometry reconstruction while reducing needle-like artifacts. The approach can be integrated as an add-on module to other 3DGS variants, improving their quality without compromising visual fidelity. ?纸|项目页面| Code (not yet)
13. CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Authors : Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, Zhaoxiang Zhang
抽象的
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10x compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs. ?纸|项目页面|代码(即将推出)
14. [CoRL '24] Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision
Authors : Alberta Longhini, Marcel Büsching, Bardienus Pieter Duisterhof, Jens Lundell, Jeffrey Ichnowski, Mårten Björkman, Danica Kragic
抽象的
Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately We introduce Cloth-Splatting, a method for estimating 3D states of cloth from RGB images through a prediction-update framework. Cloth-Splatting 利用动作条件动力学模型来预测未来状态,并使用 3D 高斯 Splatting 来更新预测状态。 Our key insight is that coupling a 3D mesh-based representation with Gaussian Splatting allows us to define a differentiable map between the cloth's state space and the image space.这使得能够使用基于梯度的优化技术来仅使用 RGB 监督来细化不准确的状态估计。 Our experiments demonstrate that Cloth-Splatting not only improves state estimation accuracy over current baselines but also reduces convergence time by ~85%. ?纸|项目页面|代码
2023 年:
1. [CVPR '24] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Authors : Tianyi Xie, Zeshun Zong, Yuxin Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang
抽象的
We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, "cage meshes," or any other geometry embedding, highlighting the principle of "what you see is what you simulate (WS2)." Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. ?纸|项目页面|代码| ? Short Presentation
2. [CVPR '24] SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Authors : Antoine Guédon, Vincent Lepetit
抽象的
We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting 最近变得非常流行,因为它可以产生逼真的渲染,同时训练速度比 NeRF 快得多。 It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene.然后,我们介绍一种方法,利用这种对齐方式对场景真实表面上的样本点进行对齐,并使用泊松重建从高斯中提取网格,与通常应用于从神经 SDF 中提取网格。 Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality. ? Paper |项目页面|代码| ? Short Presentation
3. NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance
Authors : Hanlin Chen, Chen Li, Gim Hee Lee
抽象的
Existing neural implicit surface reconstruction methods have achieved impressive performance in multi-view 3D reconstruction by leveraging explicit geometry priors such as depth maps or point clouds as regularization. However, the reconstruction results still lack fine details because of the over-smoothed depth map or sparse point cloud. In this work, we propose a neural implicit surface reconstruction pipeline with guidance from 3D Gaussian Splatting to recover highly detailed surfaces. The advantage of 3D Gaussian Splatting is that it can generate dense point clouds with detailed structure. Nonetheless, a naive adoption of 3D Gaussian Splatting can fail since the generated points are the centers of 3D Gaussians that do not necessarily lie on the surface. We thus introduce a scale regularizer to pull the centers close to the surface by enforcing the 3D Gaussians to be extremely thin. Moreover, we propose to refine the point cloud from 3D Gaussians Splatting with the normal priors from the surface predicted by neural implicit models instead of using a fixed set of points as guidance. Consequently, the quality of surface reconstruction improves from the guidance of the more accurate 3D Gaussian splatting. By jointly optimizing the 3D Gaussian Splatting and the neural implicit model, our approach benefits from both representations and generates complete surfaces with intricate details. Experiments on Tanks and Temples verify the effectiveness of our proposed method. ?纸
杂项:
2024 年:
1. Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting
Authors : Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White
抽象的
The accelerating deployment of spacecraft in orbit have generated interest in on-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possible unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. This requires robust characterization of the target's geometry. In this article, we present an approach for mapping geometries of satellites on orbit based on 3D Gaussian Splatting that can run on computing resources available on current spaceflight hardware. We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up under several realistic lighting and motion conditions. Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms. Such on-board capabilities are critical to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks. ?纸
2. TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
Authors : Linus Franke, Darius Rückert, Laura Fink, Marc Stamminger
抽象的
Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [Rückert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. ? Paper |项目页面|代码
3. EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting
Authors : Lingting Zhu, Zhao Wang, Jiahao Cui, Zhenchao Jin, Guying Lin, Lequan Yu
抽象的
Surgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splatting, a recent trending 3D representation, we present EndoGS, applying Gaussian Splatting for deformable endoscopic tissue reconstruction. Specifically, our approach incorporates deformation fields to handle dynamic scenes, depth-guided supervision with spatial-temporal weight masks to optimize 3D targets with tool occlusion from a single viewpoint, and surface-aligned regularization terms to capture the much better geometry. As a result, EndoGS reconstructs and renders high-quality deformable endoscopic tissues from a single-viewpoint video, estimated depth maps, and labeled tool masks. Experiments on DaVinci robotic surgery videos demonstrate that EndoGS achieves superior rendering quality. ?纸|代码
4. EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction
Authors : Yifan Liu, Chenxin Li, Chen Yang, Yixuan Yuan
抽象的
Reconstructing deformable tissues from endoscopic stereo videos is essential in many downstream surgical applications. However, existing methods suffer from slow inference speed, which greatly limits their practical use. In this paper, we introduce EndoGaussian, a real-time surgical scene reconstruction framework that builds on 3D Gaussian Splatting. Our framework represents dynamic surgical scenes as canonical Gaussians and a time-dependent deformation field, which predicts Gaussian deformations at novel timestamps. Due to the efficient Gaussian representation and parallel rendering pipeline, our framework significantly accelerates the rendering speed compared to previous methods. In addition, we design the deformation field as the combination of a lightweight encoding voxel and an extremely tiny MLP, allowing for efficient Gaussian tracking with a minor rendering burden. Furthermore, we design a holistic Gaussian initialization method to fully leverage the surface distribution prior, achieved by searching informative points from across the input image sequence. Experiments on public endoscope datasets demonstrate that our method can achieve real-time rendering speed (195 FPS real-time, 100× gain) while maintaining the state-of-the-art reconstruction quality (35.925 PSNR) and the fastest training speed (within 2 min/scene), showing significant promise for intraoperative surgery applications. ? Paper |项目页面|代码
5. GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting
Authors : Butian Xiong, Zhuo Li, Zhen Li
抽象的
We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km2. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information ?纸
6. LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering
Authors : Sheng Hong, Junjie He, Xinhu Zheng, Hesheng Wang, Hao Fang, Kangcheng Liu, Chunran Zheng, Shaojie Shen
抽象的
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multi-modal sensor fused mapping system that builds on the differentiable surface splatting to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion. This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initial poses for surface Gaussian scenes are obtained using a LiDAR-inertial system with size-adaptive voxels. Then, we optimized and refined the Gaussians by visual-derived photometric gradients to optimize the quality and density of LiDAR measurements. Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality while also holding potential applicability in real-time SLAM and robotics domains. ? Paper | Code (not yet)
7. VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
Authors : Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao Feng, Huamin Wang, Minchen Li, Henry Lau, Feng Gao, Yin Yang, Chenfanfu Jiang
抽象的
As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience. ? Paper |项目页面
8. Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps
Authors : Timothy Chen, Ola Shorinwa, Weijia Zeng, Joseph Bruno, Philip Dames, Mac Schwager
抽象的
We present Splat-Nav, a navigation pipeline that consists of a real-time safe planning module and a robust state estimation module designed to operate in the Gaussian Splatting (GSplat) environment representation, a popular emerging 3D scene representation from computer vision. We formulate rigorous collision constraints that can be computed quickly to build a guaranteed-safe polytope corridor through the map. We then optimize a B-spline trajectory through this corridor. We also develop a real-time, robust state estimation module by interpreting the GSplat representation as a point cloud. The module enables the robot to localize its global pose with zero prior knowledge from RGB-D images using point cloud alignment, and then track its own pose as it moves through the scene from RGB images using image-to-point cloud localization. We also incorporate semantics into the GSplat in order to obtain better images for localization. All of these modules operate mainly on CPU, freeing up GPU resources for tasks like real-time scene reconstruction. We demonstrate the safety and robustness of our pipeline in both simulation and hardware, where we show re-planning at 5 Hz and pose estimation at 20 Hz, an order of magnitude faster than Neural Radiance Field (NeRF)-based navigation methods, thereby enabling real-time navigation. ?纸
9. Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
Authors : TYuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille
抽象的
X射线因其比自然光更强的穿透力而被广泛应用于透射成像。 When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis.首先,受 X 射线成像各向同性性质的启发,我们重新设计了辐射高斯点云模型。我们的模型在学习预测 3D 点的辐射强度时排除了视角方向的影响。 Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. ?纸
10. ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Authors : Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang
抽象的
Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots. Conventional robotic manipulation methods usually learn semantic representation of the observation for action prediction, which ignores the scene-level spatiotemporal dynamics for human goal completion. In this paper, we propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction. Specifically, we first formulate the dynamic Gaussian Splatting framework that infers the semantics propagation in the Gaussian embedding space, where the semantic representation is leveraged to predict the optimal robot action. Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction. We evaluate our ManiGaussian on 10 RLBench tasks with 166 variations, and the results demonstrate our framework can outperform the state-of-the-art methods by 13.1% in average success rate. ? Paper |项目页面|代码
11. GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
Authors : Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng, Jun Zhang
抽象的
Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3× lower GPU memory usage and 5× faster fitting time not only rivals INRs (eg, WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 1000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. ?纸
12. GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
Authors : Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang
抽象的
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (eg NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. ? Paper | Code (not yet)
13. Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience
Authors : Xiaohang Yu, Zhengxian Yang, Shi Pan, Yuqi Han, Haoxiang Wang, Jun Zhang, Shi Yan, Borong Lin, Lei Yang, Tao Yu, Lu Fang
抽象的
We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than existing datasets, may also inspire space-oriented light field reconstruction, which is potentially different from object-centric 3D reconstruction, for immersive VR/AR experiences. We utilized a total of 40 GoPro 10 cameras, capturing images of 5k resolution. The number of photos captured for each scene is no less than 1000, and the average density (view number within a unit sphere) is 134.68. It is also worth noting that our system is capable of efficiently capturing large outdoor scenes. Addressing the current lack of large-space and dense light field datasets, we made efforts to include elements such as sky, reflections, lights and shadows that are of interest to researchers in the field of 3D reconstruction during the data capture process. Finally, we validated the effectiveness of our provided dataset on three popular algorithms and also integrated the reconstructed 3DGS results into the Unity engine, demonstrating the potential of utilizing our datasets to enhance the realism of virtual reality (VR) and create feasible interactive spaces. ?纸
14. Modeling uncertainty for Gaussian Splatting
Authors : Luca Savant, Diego Valsesia, Enrico Magli
抽象的
We present Stochastic Gaussian Splatting (SGS): the first framework for uncertainty estimation using Gaussian Splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of Neural Radiance Fields (NeRF). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this paper, we introduce a Variational Inference-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. Additionally, we introduce the Area Under Sparsification Error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the LLFF dataset demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications. ?纸
15. TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
Authors : Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu
抽象的
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. ?纸
16. GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis
Authors : Emmanouil Nikolakakis, Utkarsh Gupta, Jonathan Vengosh, Justin Bui, Razvan Marinescu
抽象的
We present GaSpCT, a novel view synthesis and 3D scene representation method used to generate novel projection views for Computer Tomography (CT) scans. We adapt the Gaussian Splatting framework to enable novel view synthesis in CT based on limited sets of 2D image projections and without the need for Structure from Motion (SfM) methodologies. Therefore, we reduce the total scanning duration and the amount of radiation dose the patient receives during the scan. We adapted the loss function to our use-case by encouraging a stronger background and foreground distinction using two sparsity promoting regularizers: a beta loss and a total variation (TV) loss. Finally, we initialize the Gaussian locations across the 3D space using a uniform prior distribution of where the brain's positioning would be expected to be within the field of view. We evaluate the performance of our model using brain CT scans from the Parkinson's Progression Markers Initiative (PPMI) dataset and demonstrate that the rendered novel views closely match the original projection views of the simulated scan, and have better performance than other implicit 3D scene representations methodologies 。 Furthermore, we empirically observe reduced training time compared to neural network based image synthesis for sparse-view CT image reconstruction. Finally, the memory requirements of the Gaussian Splatting representations are reduced by 17% compared to the equivalent voxel grid image representations. ?纸
17. Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion
Authors : Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla
抽象的
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360∘ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance). ?纸
18. Dual-Camera Smooth Zoom on Mobile Phones
Authors : Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo
抽象的
When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. ?纸
19. Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction
Authors : Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano
抽象的
Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer. ?纸
20. SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera
Authors : Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang
抽象的
One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. ?纸
21. [CVPR '24] SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection
Authors : Mathis Kruse, Marco Rudolph, Dominik Woiwode, Bodo Rosenhahn
抽象的
Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set. ? Paper |代码
22. Reinforcement Learning with Generalizable Gaussian Splatting
Authors : Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu
抽象的
An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL. ?纸
23. DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark
Authors : Tianyi Zhang, Kaining Huang, Weiming Zhi, Matthew Johnson-Roberson
抽象的
Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments. ? Paper |代码| ? Short Presentation | ? Short Presentation (Bilibili)
24. Adversarial Generation of Hierarchical Gaussians for 3d Generative Model
Authors : Sangeek Hyun, Jae-Pil Heo
抽象的
Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. ?纸|项目页面
25. Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting
Authors : Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis, Monroe Kennedy III
抽象的
We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. ?纸|项目页面|代码
26. Radiance Fields for Robotic Teleoperation
Authors : Maximum Wilder-Smith, Vaishakh Patil, Marco Hutter
抽象的
Radiance field methods such as Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS), have revolutionized graphics and novel view synthesis. Their ability to synthesize new viewpoints with photo-realistic quality, as well as capture complex volumetric and specular scenes, makes them an ideal visualization for robotic teleoperation setups. Direct camera teleoperation provides high-fidelity operation at the cost of maneuverability, while reconstruction-based approaches offer controllable scenes with lower fidelity. With this in mind, we propose replacing the traditional reconstruction-visualization components of the robotic teleoperation pipeline with online Radiance Fields, offering highly maneuverable scenes with photorealistic quality. As such, there are three main contributions to state of the art: (1) online training of Radiance Fields using live data from multiple cameras, (2) support for a variety of radiance methods including NeRF and 3DGS, (3) visualization suite for these methods including a virtual reality scene. To enable seamless integration with existing setups, these components were tested with multiple robots in multiple configurations and were displayed using traditional tools as well as the VR headset. The results across methods and robots were compared quantitatively to a baseline of mesh reconstruction, and a user study was conducted to compare the different visualization methods. ? Paper |项目页面|代码
2023 年:
1. [ECCV '24] FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
Authors : Wen Jiang, Boshu Lei, Kostas Daniilidis
抽象的
This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the limited availability of 2D images poses uncertainties stemming from occlusions, depth ambiguities, and imaging errors. Efficiently selecting informative views becomes crucial, and quantifying NeRF model uncertainty presents intricate challenges. Existing approaches either depend on model architecture or are based on assumptions regarding density distributions that are not generally applicable. By leveraging Fisher Information, we efficiently quantify observed information within Radiance Fields without ground truth data. This can be used for the next best view selection and pixel-wise uncertainty quantification. Our method overcomes existing limitations on model architecture and effectiveness, achieving state-of-the-art results in both view selection and uncertainty quantification, demonstrating its potential to advance the field of Radiance Fields. Our method with the 3D Gaussian Splatting backend could perform view selections at 70 fps. ? Paper |项目页面|代码
2. Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering
Authors : Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, Li Zhang
抽象的
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. ? Paper |项目页面| Code (not yet)
3. MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians
Authors : Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar
抽象的
Understanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that can cause misalignments resulting in inaccurate contacts. We present MANUS, a method for Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians. We build a novel articulated 3D Gaussians representation that extends 3D Gaussian splatting for high-fidelity representation of articulating hands. Since our representation uses Gaussian primitives, it enables us to efficiently and accurately estimate contacts between the hand and the object. For the most accurate results, our method requires tens of camera views that current datasets do not provide. We therefore build MANUS-Grasps, a new dataset that contains hand-object grasps viewed from 53 cameras across 30+ scenes, 3 subjects, and comprising over 7M frames. In addition to extensive qualitative results, we also show that our method outperforms others on a quantitative contact evaluation method that uses paint transfer from the object to the hand. ?纸
4. [CVPR '24] Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Authors : Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, Song-Hai Zhang
抽象的
Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. ? Paper |项目页面|代码
5. Mathematical Supplement for the gsplat Library
Authors : Vickie Ye, Angjoo Kanazawa
抽象的
This report provides the mathematical details of the gsplat library, a modular toolbox for efficient differentiable Gaussian splatting, as proposed by Kerbl et al. It provides a self-contained reference for the computations involved in the forward and backward passes of differentiable Gaussian splatting. To facilitate practical usage and development, we provide a user friendly Python API that exposes each component of the forward and backward passes in rasterization of [gsplat](https://github.com/nerfstudio-project/gsplat). ?纸
6. PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation
Authors : Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae
抽象的
Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative. ? Paper |项目页面| Code (not yet)
Regularization and Optimization:
2024 年:
1. DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines
Authors : Sankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Nandita Vijaykumar
抽象的
Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (eg 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods have been demonstrated to be very promising, providing state-of-art quality for many important tasks. However, training a model to represent a scene is still a time-consuming task even when using powerful GPUs. In this work, we observe that the gradient computation phase during training is a significant bottleneck on GPUs due to the large number of atomic operations that need to be processed. These atomic operations overwhelm atomic units in the L2 partitions causing stalls. To address this challenge, we leverage the observations that during the gradient computation: (1) for most warps, all threads atomically update the same memory locations; and (2) warps generate varying amounts of atomic traffic (since some threads may be inactive). We propose DISTWAR, a software-approach to accelerate atomic operations based on two key ideas: First, we enable warp-level reduction of threads at the SM sub-cores using registers to leverage the locality in intra-warp atomic updates. Second, we distribute the atomic computation between the warp-level reduction at the SM and the L2 atomic units to increase the throughput of atomic computation. Warps with many threads performing atomic updates to the same memory locations are scheduled at the SM, and the rest using L2 atomic units. We implement DISTWAR using existing warp-level primitives. We evaluate DISTWAR on widely used raster-based differentiable rendering workloads. We demonstrate significant speedups of 2.44x on average (up to 5.7x). ?纸
2. [CVPR '24] FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Authors : Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing
抽象的
3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (eg, Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently. ?纸
3. RAIN-GS: Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting
Authors : Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kim
抽象的
3D Gaussian splatting (3DGS) has recently demonstrated impressive capabilities in real-time novel view synthesis and 3D reconstruction. However, 3DGS heavily depends on the accurate initialization derived from Structure-from-Motion (SfM) methods. When trained with randomly initialized point clouds, 3DGS often fails to maintain its ability to produce high-quality images, undergoing large performance drops of 4-5 dB in PSNR in general. Through extensive analysis of SfM initialization in the frequency domain and analysis of a 1D regression task with multiple 1D Gaussians, we propose a novel optimization strategy dubbed RAIN-GS (Relaxing Accurate INitialization Constraint for 3D Gaussian Splatting) that successfully trains 3D Gaussians from randomly initialized点云。 We show the effectiveness of our strategy through quantitative and qualitative comparisons on standard datasets, largely improving the performance in all settings. ? Paper |项目页面|代码
4. A New Split Algorithm for 3D Gaussian Splatting
Authors : Qiyuan Feng, Gengchen Cao, Haoxiang Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu
抽象的
3D高斯泼溅模型作为一种新颖的显式3D表示,最近已在许多领域得到应用,例如显式几何编辑和几何生成。进展很快。然而,由于其混合尺度和杂乱形状,3D 高斯溅射模型可能会在表面附近产生模糊或针状效果。同时,3D 高斯喷溅模型往往会展平大的无纹理区域,从而产生非常稀疏的点云。这些问题是由3D高斯泼溅模型的不均匀性质引起的,因此在本文中,我们提出了一种新的3D高斯分裂算法,它可以产生更均匀和表面有界的3D高斯泼溅模型。 Our algorithm splits an N-dimensional Gaussian into two N-dimensional Gaussians.它保证了数学特征的一致性和外观的相似性,使生成的3D高斯溅射模型更加均匀,更好地贴合底层表面,从而更适合显式编辑、点云提取等任务。同时,我们的 3D 高斯分裂方法具有非常简单的封闭式解决方案,使其易于适用于任何 3D 高斯模型。 ?纸
5. Revising Densification in Gaussian Splatting
Authors : Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
抽象的
In this paper, we address the limitations of Adaptive Density Control (ADC) in 3D Gaussian Splatting (3DGS), a scene representation method achieving high-quality, photorealistic results for novel view synthesis. ADC has been introduced for automatic 3D point primitive management, controlling densification and pruning, however, with certain limitations in the densification logic. Our main contribution is a more principled, pixel-error driven formulation for density control in 3DGS, leveraging an auxiliary, per-pixel error function as the criterion for densification. We further introduce a mechanism to control the total number of primitives generated per scene and correct a bias in the current opacity handling strategy of ADC during cloning operations. Our approach leads to consistent quality improvements across a variety of benchmark scenes, without sacrificing the method's efficiency. ?纸
2023 年:
1. [CVPRW '24] Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images
Authors : Jaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee
抽象的
In this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide to mitigate overfitting. We obtained the depth map using a pre-trained monocular depth estimation model and aligning the scale and offset using sparse COLMAP feature points. The adjusted depth aids in the color-based optimization of 3D Gaussian splatting, mitigating floating artifacts, and ensuring adherence to geometric constraints. We verify the proposed method on the NeRF-LLFF dataset with varying numbers of few images. Our approach demonstrates robust geometry compared to the original method that relies solely on images. ? Paper |项目页面|代码
2. EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Authors : Sharath Girish, Kamal Gupta, Abhinav Shrivastava
抽象的
Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach results in scene representations with fewer Gaussians and quantized representations, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce memory by more than an order of magnitude all while maintaining the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed. ? Paper |项目页面|代码
3. [CVPR '24] COLMAP-Free 3D Gaussian Splatting
Authors : Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang
抽象的
While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. ? Paper |项目页面| Code (not yet) | ? Short Presentation
4. iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching
Authors : Yuan Sun, Xuan Wang, Yunfan Zhang, Jie Zhang, Caigui Jiang, Yu Guo, Fei Wang
抽象的
We present a method named iComMa to address the 6D pose estimation problem in computer vision. The conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods address mesh-free 6D pose estimation by employing the inversion of a Neural Radiance Field (NeRF), aiming to overcome the aforementioned constraints. However, it still suffers from adverse initializations. By contrast, we model the pose estimation as the problem of inverting the 3D Gaussian Splatting (3DGS) with both the comparing and matching loss. In detail, a render-and-compare strategy is adopted for the precise estimation of poses. Additionally, a matching module is designed to enhance the model's robustness against adverse initializations by minimizing the distances between 2D keypoints. This framework systematically incorporates the distinctive characteristics and inherent rationale of render-and-compare and matching-based approaches. This comprehensive consideration equips the framework to effectively address a broader range of intricate and challenging scenarios, including instances with substantial angular deviations, all while maintaining a high level of prediction accuracy. Experimental results demonstrate the superior precision and robustness of our proposed jointly optimized framework when evaluated on synthetic and complex real-world data in challenging scenarios. ?纸|代码
渲染:
2024 年:
1. [CVPR '24] Gaussian Shadow Casting for Neural Characters
Authors : Luis Bolanos, Shih-Yang Su, Helge Rhodin
抽象的
Neural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density proxy that replaces sampling with a simple analytic formula. It supports dynamic motion and is tailored for shadow computation, thereby avoiding the affine projection approximation and sorting required by the closely related Gaussian splatting. Combined with a deferred neural rendering model, our Gaussian shadows enable Lambertian shading and shadow casting with minimal overhead. We demonstrate improved reconstructions, with better separation of albedo, shading, and shadows in challenging outdoor scenes with direct sun light and hard shadows. Our method is able to optimize the light direction without any input from the user. As a result, novel poses have fewer shadow artifacts and relighting in novel scenes is more realistic compared to the state-of-the-art methods, providing new ways to pose neural characters in novel environments, increasing their applicability. ?纸
2. Optimal Projection for 3D Gaussian Splatting
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
抽象的
3D Gaussian Splatting has garnered extensive attention and application in real-time neural rendering. Concurrently, concerns have been raised about the limitations of this technology in aspects such as point cloud storage, performance , and robustness in sparse viewpoints , leading to various improvements. However, there has been a notable lack of attention to the projection errors introduced by the local affine approximation inherent in the splatting itself, and the consequential impact of these errors on the quality of photo-realistic rendering. This paper addresses the projection error function of 3D Gaussian Splatting, commencing with the residual error from the first-order Taylor expansion of the projection function ϕ. The analysis establishes a correlation between the error and the Gaussian mean position. Subsequently, leveraging function optimization theory, this paper analyzes the function's minima to provide an optimal projection strategy for Gaussian Splatting referred to Optimal Gaussian Splatting. Experimental validation further confirms that this projection methodology reduces artifacts, resulting in a more convincingly realistic rendering. ?纸
3. 360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming
Authors : Letian Huang, Jiayang Bai, Jie Guo, Yanwen Guo
抽象的
3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of 360∘ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (eg, walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel 360∘ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios. ?纸
4. StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering
Authors : Lukas Radl, Michael Steiner, Mathias Parger, Alexander Weinrauch, Bernhard Kerbl, Markus Steinberger
抽象的
Gaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces popping and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation, yet a full per-pixel sort proves excessively costly compared to a global sort operation. In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead. Our software rasterizer effectively eliminates popping artifacts and view inconsistencies, as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with popping, ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion. Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled, making our approach 1.6x faster than the original Gaussian Splatting, with a 50% reduction in memory requirements. ? Paper |项目页面|代码| ? Short Presentation
5. [CVPR '24] GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Authors : Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi
抽象的
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes. It is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (eg squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%. ? Paper |项目页面|代码| ?推介会
6. Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting
Authors : Joongho Jo, Hyeongwon Kim, Jongsun Park
抽象的
3D Gaussian splatting (3D-GS) is a new rendering approach that outperforms the neural radiance field (NeRF) in terms of both speed and image quality. 3D-GS represents 3D scenes by utilizing millions of 3D Gaussians and projects these Gaussians onto the 2D image plane for rendering. However, during the rendering process, a substantial number of unnecessary 3D Gaussians exist for the current view direction, resulting in significant computation costs associated with their identification. In this paper, we propose a computational reduction technique that quickly identifies unnecessary 3D Gaussians in real-time for rendering the current view without compromising image quality. This is accomplished through the offline clustering of 3D Gaussians that are close in distance, followed by the projection of these clusters onto a 2D image plane during runtime. Additionally, we analyze the bottleneck associated with the proposed technique when executed on GPUs and propose an efficient hardware architecture that seamlessly supports the proposed scheme. For the Mip-NeRF360 dataset, the proposed technique excludes 63% of 3D Gaussians on average before the 2D image projection, which reduces the overall rendering computation by almost 38.3% without sacrificing peak-signal-to-noise-ratio (PSNR). The proposed accelerator also achieves a speedup of 10.7x compared to a GPU. ?纸
7. GaussianPro: 3D Gaussian Splatting with Progressive Propagation
Authors : Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen
抽象的
3D高斯泼溅(3DGS)的出现最近给神经渲染领域带来了一场革命,促进了实时速度的高质量渲染。然而,3DGS 在很大程度上依赖于运动结构 (SfM) 技术生成的初始化点云。当处理不可避免地包含无纹理表面的大型场景时,SfM 技术总是无法在这些表面中产生足够的点,并且无法为 3DGS 提供良好的初始化。因此,3DGS 面临优化困难和渲染质量低下的问题。在本文中,受经典多视图立体 (MVS) 技术的启发,我们提出了 GaussianPro,这是一种应用渐进传播策略来指导 3D 高斯的致密化的新颖方法。 Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR. ? Paper |项目页面|代码
8. Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting
Authors : Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin
抽象的
3D 高斯喷射 (3D-GS) 的最新进展不仅促进了通过现代 GPU 光栅化管道的实时渲染,而且还获得了最先进的渲染质量。然而,尽管 3D-GS 在标准数据集上具有出色的渲染质量和性能,但在精确建模镜面反射和各向异性组件时经常遇到困难。这个问题源于球谐函数(SH)表示高频信息的能力有限。为了克服这一挑战,我们引入了 Spec-Gaussian,这是一种利用各向异性球面高斯 (ASG) 外观场而不是 SH 来对每个 3D 高斯的视图相关外观进行建模的方法。此外,我们还开发了从粗到精的训练策略,以提高学习效率并消除现实场景中因过度拟合而导致的漂浮物。我们的实验结果表明,我们的方法在渲染质量方面超越了现有方法。借助 ASG,我们显着提高了 3D-GS 对具有镜面反射和各向异性分量的场景进行建模的能力,而无需增加 3D 高斯的数量。这一改进扩展了 3D GS 的适用性,可处理具有镜面和各向异性表面的复杂场景。 ?纸
9. [CVPR '24] VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Authors : Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang
抽象的
Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering. ? Paper |项目页面|代码
10. 3D Gaussian Model for Animation and Texturing
Authors : Xiangzhi Eric Wang, Zackary PT Sin
抽象的
3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We propose a modeling that is analogous to typical 3D models, which we call 3D Gaussian Model (3DGM); it provides a manipulatable proxy for novel animation and texture transfer. By binding the 3D Gaussians in texture space and re-projecting them back to world space through implicit shell mapping, we show how our 3D modeling can serve as a valid rendering methodology for interactive applications. It is further noted that recently, 3D mesh reconstruction works have been able to produce high-quality mesh for rendering. Our work, on the other hand, only requires an approximated geometry for rendering an object in high fidelity. Applicationwise, we will show that our proxy-based 3DGM is capable of driving novel animation without animated training data and texture transferring via UV mapping of the 3D Gaussians. We believe the result indicates the potential of our work for enabling interactive applications for 3D Gaussian Splatting. ?纸
11. BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Authors : Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa
抽象的
最近使用 3D 高斯进行场景重建和新颖视图合成的努力可以在策划的基准测试中取得令人印象深刻的结果;然而,现实生活中捕捉到的图像通常是模糊的。 In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods.为了解决这个问题,我们提出了模糊不可知高斯泼溅(BAGS)。 BAGS 引入了额外的 2D 建模功能,即使存在图像模糊,也可以重建 3D 一致的高质量场景。具体来说,我们通过模糊提议网络(BPN)估计每像素卷积核来对模糊进行建模。 BPN 旨在考虑场景的空间、颜色和深度变化,以最大限度地提高建模能力。此外,BPN 还提出了一种质量评估掩模,用于指示发生模糊的区域。最后,我们介绍一种由粗到细的内核优化方案;这种优化方案速度很快,并且避免了由于稀疏点云初始化而导致的次优解决方案,当我们在模糊图像上应用运动结构时,这种情况经常发生。我们证明 BAGS 在各种具有挑战性的模糊条件和成像几何条件下实现了逼真的渲染,同时显着改进了现有方法。 ? Paper |代码
12. StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting
Authors : Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu
抽象的
We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. ? Paper |项目页面|代码
13. Gaussian Splatting in Style
Authors : Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Tarun Yenamandra, Daniel Cremers
抽象的
Scene stylization extends the work of neural style transfer to three spatial dimensions. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across a multi-view setting. A vast majority of the previous works achieve this by optimizing the scene with a specific style image. In contrast, we propose a novel architecture trained on a collection of style images, that at test time produces high quality stylized novel views. Our work builds up on the framework of 3D Gaussian splatting. For a given scene, we take the pretrained Gaussians and process them using a multi resolution hash grid and a tiny MLP to obtain the conditional stylised views. The explicit nature of 3D Gaussians give us inherent advantages over NeRF-based methods including geometric consistency, along with having a fast training and rendering regime. This enables our method to be useful for vast practical use cases such as in augmented or virtual reality applications. Through our experiments, we show our methods achieve state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data. ?纸
14. BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
Authors : Lingzhe Zhao, Peng Wang, Peidong Liu
抽象的
虽然神经渲染在 3D 场景重建和新颖的视图合成方面展示了令人印象深刻的能力,但它在很大程度上依赖于高质量的清晰图像和准确的相机姿势。人们提出了多种方法来使用运动模糊图像训练神经辐射场(NeRF),这些图像在现实场景(例如低光或长时间曝光条件)中经常遇到。然而,NeRF 的隐式表示很难从严重运动模糊的图像中准确地恢复复杂的细节,并且无法实现实时渲染。相比之下,3D 高斯分布的最新进展通过将点云显式优化为 3D 高斯来实现高质量的 3D 场景重建和实时渲染。在本文中,我们介绍了一种名为 BAD-Gaussians(捆绑调整去模糊高斯泼溅)的新颖方法,该方法利用显式高斯表示并处理相机姿势不准确的严重运动模糊图像,以实现高质量的场景重建。我们的方法对运动模糊图像的物理图像形成过程进行建模,并在曝光期间恢复相机运动轨迹的同时共同学习高斯参数。在我们的实验中,我们证明,与之前最先进的去模糊神经渲染方法相比,BAD-Gaussians 不仅在合成数据集和真实数据集上实现了卓越的渲染质量,而且还实现了实时渲染功能。 ? Paper |项目页面|代码
15. SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Authors : Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou
抽象的
Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency. ?纸
16. GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Authors : Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari
抽象的
During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces.这种退化显着影响了与训练数据中的视点显着偏离的新颖视图的渲染质量。为了缓解这个问题,我们提出了一种称为 GeoGaussian 的新方法。基于从点云观察到的平滑连接区域,该方法引入了一种新颖的管道来初始化与表面对齐的薄高斯,其中可以通过精心设计的致密化策略将特征转移到新一代。 Finally, the pipeline ensures that the scene's geometry and texture are maintained through constrained optimization processes with explicit geometry constraints.受益于所提出的架构,3D 高斯的生成能力得到增强,特别是在结构化区域中。根据公共数据集的定性和定量评估,我们提出的管道在新颖的视图合成和几何重建方面实现了最先进的性能。 ?纸
17. Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Authors : Zhihao Liang, Qi Zhang, Wenbo Hu, Ying Feng, Lei Zhu, Kui Jia
抽象的
The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity. ?纸
18. Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Authors : Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin
抽象的
High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on detailed modelling of the physical image formation process and utilizes velocities estimated using visual-inertial odometry (VIO). Camera poses are considered non-static during the exposure time of a single image frame and camera poses are further optimized in the reconstruction process. We formulate a differentiable rendering pipeline that leverages screen space approximation to efficiently incorporate rolling-shutter and motion blur effects into the 3DGS framework. Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods, thereby advancing 3DGS in naturalistic settings. ? Paper |代码|项目页面
19. RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
Authors : Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari
抽象的
Recent advances in view synthesis and real-time rendering have achieved photorealistic quality at impressive rendering speeds. While Radiance Field-based methods achieve state-of-the-art quality in challenging scenarios such as in-the-wild captures and large-scale scenes, they often suffer from excessively high compute requirements linked to volumetric rendering. Gaussian Splatting-based methods, on the other hand, rely on rasterization and naturally achieve real-time rendering but suffer from brittle optimization heuristics that underperform on more challenging scenes. In this work, we present RadSplat, a lightweight method for robust real-time rendering of complex scenes.我们的主要贡献有三方面。 First, we use radiance fields as a prior and supervision signal for optimizing point-based scene representations, leading to improved quality and more robust optimization. Next, we develop a novel pruning technique reducing the overall point count while maintaining high quality, leading to smaller and more compact scene representations with faster inference speeds. Finally, we propose a novel test-time filtering approach that further accelerates rendering and allows to scale to larger, house-sized scenes. We find that our method enables state-of-the-art synthesis of complex captures at 900+ FPS. ? Paper |项目页面
20. Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Authors : Guangchi Fang, Bing Wang
抽象的
In this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification including blur split and depth reinitialization, and simplification through Gaussian binarization and sampling. These techniques reorganize the spatial positions of the Gaussians, resulting in significant improvements across various datasets and benchmarks in terms of rendering quality, resource consumption, and storage compression. Our proposed Mini-Splatting method integrates seamlessly with the original rasterization pipeline, providing a strong baseline for future research in Gaussian-Splatting-based works. ?纸
21. Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Authors : Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao
抽象的
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.然而,它严重依赖于初始点云的质量,导致初始化点不足的区域出现模糊和针状伪影。这主要归因于3DGS中的点云生长条件仅考虑可观察视点的点的平均梯度大小,因此无法生长对于许多视点可观察的大高斯,而其中许多仅被边界覆盖。为此,我们提出了一种名为 Pixel-GS 的新方法,在计算生长条件期间考虑每个视图中高斯覆盖的像素数量。我们将覆盖的像素数作为权重,对不同视图的梯度进行动态平均,从而可以促进大高斯的增长。结果,可以更有效地生长初始化点不足的区域内的点,从而实现更准确和详细的重建。此外,我们提出了一种简单而有效的策略,根据到相机的距离缩放梯度场,以抑制相机附近飞蚊的生长。大量的定性和定量实验表明,我们的方法在具有挑战性的 Mip-NeRF 360 和 Tanks & Temples 数据集上实现了最先进的渲染质量,同时保持实时渲染速度。 ?纸
22. Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections
Authors : Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang
抽象的
Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to previous methods, with a 1000× increase in rendering speed. ? Paper |项目页面| Code (not yet) | ? Short Presentation
23. GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
Authors : Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, Bo Dai
抽象的
Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural implicit surfaces is sparked from the success of neural rendering. Current works either constrain the distribution of density fields or the shape of primitives, resulting in degraded rendering quality and flaws on the learned scene surfaces. The efficacy of such methods is limited by the inherent constraints of the chosen neural representation, which struggles to capture fine surface details, especially for larger, more intricate scenes. To address these issues, we introduce GSDF, a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting (3DGS) representation with neural Signed Distance Fields (SDF). The core idea is to leverage and enhance the strengths of each branch while alleviating their limitation through mutual guidance and joint supervision. We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions, and at the meantime benefits 3DGS rendering with structures that are more aligned with the underlying geometry. ? Paper |项目页面| Code (not yet)
24. Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
Authors : Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai
抽象的
The recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularly noticeable in zoom-out views and can lead to inconsistent rendering speeds in scenes with varying details. Moreover, it often struggles to capture the corresponding level of details at different scales with its heuristic density control operation. Inspired by the Level-of-Detail (LOD) techniques, we introduce Octree-GS, featuring an LOD-structured 3D Gaussian approach supporting level-of-detail decomposition for scene representation that contributes to the final rendering results. Our model dynamically selects the appropriate level from the set of multi-resolution anchor points, ensuring consistent rendering performance with adaptive LOD adjustments while maintaining high-fidelity rendering results. ?纸|项目页面| Code (not yet)
25. SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing
Authors : Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao
抽象的
In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS).虽然最先进的方法 Mip-Splatting 需要修改高斯分布的训练过程,但我们的方法在测试时起作用并且无需训练。 Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance.核心技术是在测试期间对每个高斯应用 2D 尺度自适应滤波器。正如 Mip-Splatting 所指出的,在不同频率下观察高斯分布会导致训练和测试期间高斯尺度之间的不匹配。 Mip-Splatting 使用 3D 平滑和 2D Mip 滤波器解决了这个问题,遗憾的是它们不知道测试频率。在这项工作中,我们证明了知道测试频率的二维尺度自适应滤波器可以有效地匹配高斯尺度,从而使高斯本原分布在不同的测试频率上保持一致。当消除尺度不一致时,小于场景频率的采样率会导致传统的锯齿状,我们建议在测试期间将投影的 2D 高斯集成到每个像素内。这种集成实际上是超级采样的极限情况,与普通高斯泼溅相比,它显着提高了抗锯齿性能。通过使用各种设置以及有界和无界场景的大量实验,我们证明 SA-GS 的性能与 Mip-Splatting 相当或更好。 Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. ?纸| Project Page |代码
26. Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
抽象的
Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis. Our method optimises 3D Gaussian primitives to accurately model the object's geometry at points of contact. By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map. Touch is particularly useful when considering non-Lambertian objects (eg shiny or reflective surfaces) since contemporary methods tend to fail to reconstruct with fidelity specular highlights. By combining vision and tactile sensing, we achieve more accurate geometry reconstructions with fewer images than prior methods. We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach, offering significant improvements in reconstruction quality. ?纸
27. 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting
Authors : Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison
抽象的
在本文中,我们提出了一种采用 3D 高斯分布 (3DGS) 的隐式表面重建方法,即 3DGSR,它可以实现具有复杂细节的精确 3D 重建,同时继承了 3DGS 的高效率和渲染质量。关键的见解是将隐式符号距离场 (SDF) 纳入 3D 高斯函数中,使它们能够对齐并联合优化。首先,我们引入一个可微的 SDF 到不透明度转换函数,它将 SDF 值转换为相应的高斯不透明度。该函数连接 SDF 和 3D 高斯,允许统一优化并对 3D 高斯施加表面约束。在学习过程中,优化 3D 高斯函数为 SDF 学习提供监督信号,从而能够重建复杂的细节。然而,这仅在高斯占据的位置处向 SDF 提供稀疏的监督信号,这不足以学习连续的 SDF。然后,为了解决这个限制,我们结合了体积渲染,并将渲染的几何属性(深度、法线)与从 3D 高斯导出的几何属性对齐。这种一致性正则化将监督信号引入到离散 3D 高斯未覆盖的位置,有效消除高斯采样范围之外的冗余表面。我们广泛的实验结果表明,我们的 3DGSR 方法能够实现高质量的 3D 表面重建,同时保持 3DGS 的效率和渲染质量。此外,我们的方法与领先的表面重建技术竞争,同时提供更高效的学习过程和更好的渲染质量。 ? Paper | Code (not yet)
28. Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
抽象的
3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. ?纸
29. OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images
Authors : Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma
抽象的
Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. To benefit the research community, the code will be made publicly available once the paper is published. ?纸
30. Robust Gaussian Splatting
Authors : François Darmon, Lorenzo Porzi, Samuel Rota-Bulò, Peter Kontschieder
抽象的
In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines. ?纸
31. DeblurGS: Gaussian Splatting for Camera Motion Blur
Authors : Jeongtaek Oh, Jaeyoung Chung, Dongwoo Lee, Kyoung Mu Lee
抽象的
Although significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to realworld applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challenge, we propose DeblurGS, a method to optimize sharp 3D Gaussian Splatting from motion-blurred images, even with the noisy camera pose initialization. We restore a fine-grained sharp scene by leveraging the remarkable reconstruction capability of 3D Gaussian Splatting. Our approach estimates the 6-Degree-of-Freedom camera motion for each blurry observation and synthesizes corresponding blurry renderings for the optimization process. Furthermore, we propose Gaussian Densification Annealing strategy to prevent the generation of inaccurate Gaussians at erroneous locations during the early training stages when camera motion is still imprecise. Comprehensive experiments demonstrate that our DeblurGS achieves state-of-the-art performance in deblurring and novel view synthesis for real-world and synthetic benchmark datasets, as well as field-captured blurry smartphone videos. ?纸
32. StylizedGS: Controllable Stylization for 3D Gaussian Splatting
Authors : Dingxi Zhang, Zhuoxun Chen, Yu-Jie Yuan, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, Lin Gao
抽象的
With the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user experience and the implicit nature limits its ability to transfer the geometric pattern styles. Additionally, the ability for artists to exert flexible control over stylized scenes is considered highly desirable, fostering an environment conducive to creative exploration. In this paper, we introduce StylizedGS, a 3D neural style transfer framework with adaptable control over perceptual factors based on 3D Gaussian Splatting (3DGS) representation. The 3DGS brings the benefits of high efficiency. We propose a GS filter to eliminate floaters in the reconstruction which affects the stylization effects before stylization. Then the nearest neighbor-based style loss is introduced to achieve stylization by fine-tuning the geometry and color parameters of 3DGS, while a depth preservation loss with other regularizations is proposed to prevent the tampering of geometry content. Moreover, facilitated by specially designed losses, StylizedGS enables users to control color, stylized scale and regions during the stylization to possess customized capabilities. Our method can attain high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls. Extensive experiments across various scenes and styles demonstrate the effectiveness and efficiency of our method concerning both stylization quality and inference FPS. ?纸
33. LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field
Authors : Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He
抽象的
Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. ? Paper |项目页面|代码
34. GSCore: Efficient Radiance Field Rendering via Architectural Support for 3D Gaussian Splatting
Authors : Junseo Lee, Seokwon Lee, Jungi Lee, Junyong Park, Jaewoong Sim
抽象的
This paper presents GSCore, a hardware acceleration unit that efficiently executes the rendering pipeline of 3D Gaussian Splatting with algorithmic optimizations. GSCore builds on the observations from an in-depth analysis of Gaussian-based radiance field rendering to enhance computational efficiency and bring the technique to wide adoption. In doing so, we present several optimization techniques, Gaussian shape-aware intersection test, hierarchical sorting, and subtile skipping, all of which are synergistically integrated with GSCore. We implement the hardware design of GSCore, synthesize it using a commercial 28nm technology, and evaluate the performance across a range of synthetic and real-world scenes with varying image resolutions. Our evaluation results show that GSCore achieves a 15.86× speedup on average over the mobile consumer GPU with a substantially smaller area and lower energy consumption. ? Paper | ? Short Presentation
2023 年:
1. Mip-Splatting Alias-free 3D Gaussian Splatting
Authors : Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger
抽象的
Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, eg, by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our comprehensive evaluation, including scenarios such as training on single-scale images and testing on multiple scales, validates the effectiveness of our approach. ? Paper |项目页面|代码
2. Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing
Authors : Jian Gao, Chun Gu, Youtian Lin, Hao Zhu, Xun Cao, Li Zhang, Yao Yao
抽象的
我们提出了一种新颖的可微分基于点的渲染框架,用于从多视图图像中进行材质和照明分解,从而实现 3D 点云的编辑、光线追踪和实时重新照明。具体来说,3D 场景表示为一组可重新照明的 3D 高斯点,其中每个点还与法线方向、BRDF 参数和来自不同方向的入射光相关联。为了实现稳健的照明估计,我们进一步将每个点的入射光划分为全局和局部组件,以及依赖于视图的可见性。 3D 场景通过 3D 高斯喷射技术进行优化,同时 BRDF 和光照通过基于物理的可微分渲染进行分解。此外,我们还引入了一种基于边界体积层次结构的创新点光线追踪方法,以实现高效的可见性烘焙,从而实现实时渲染和重新照明 3D 高斯点,并具有准确的阴影效果。大量实验证明,与最先进的材质估计方法相比,BRDF 估计得到了改进,并且具有新颖的视图渲染结果。我们的框架展示了通过仅基于点云的可重新点亮、可追踪和可编辑的渲染管道彻底改变基于网格的图形管道的潜力。 ? Paper |项目页面|代码
3. [CVPR '24] GS-IR: 3D Gaussian Splatting for Inverse Rendering
Authors : Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia
抽象的
We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (eg NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (eg rasterization and splatting) cannot trace the occlusion like backward mapping (eg ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes. ? Paper |项目页面| Code (not yet)
4. [CVPR '24] Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Authors : Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee
抽象的
3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13%-66% PSNR and 160%-2400% rendering speed improvement at 4×-128× scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splatting. ?纸
5. [CVPR '24] GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Authors : Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, Yuexin Ma
抽象的
The advent of neural 3D Gaussians has recently brought about a revolution in the field of neural rendering, facilitating the generation of high-quality renderings at real-time speeds. However, the explicit and discrete representation encounters challenges when applied to scenes featuring reflective surfaces. In this paper, we present GaussianShader, a novel method that applies a simplified shading function on 3D Gaussians to enhance the neural rendering in scenes with reflective surfaces while preserving the training and rendering efficiency. The main challenge in applying the shading function lies in the accurate normal estimation on discrete 3D Gaussians. Specifically, we proposed a novel normal estimation framework based on the shortest axis directions of 3D Gaussians with a delicately designed loss to make the consistency between the normals and the geometries of Gaussian spheres. Experiments show that GaussianShader strikes a commendable balance between efficiency and visual quality. Our method surpasses Gaussian Splatting in PSNR on specular object datasets, exhibiting an improvement of 1.57dB. When compared to prior works handling reflective surfaces, such as Ref-NeRF, our optimization time is significantly accelerated (23h vs. 0.58h). Please click on our project website to see more results. ? Paper |项目页面|代码
6. [CVPR '24] Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Authors : Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai
抽象的
Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects.我们引入了 Scaffold-GS,它使用锚点来分布局部 3D 高斯,并根据视锥体内的观察方向和距离动态预测它们的属性。 Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed. ? Paper |项目页面| Code https://github.com/maturk/dn-splatter
7. Deblurring 3D Gaussian Splatting
Authors : Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park
抽象的
辐射场的最新研究以其逼真的渲染质量为新颖的视图合成铺平了道路。然而,它们通常采用神经网络和体积渲染,训练成本高昂,并且由于渲染时间长而阻碍了它们在各种实时应用中的广泛使用。 Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. ? Paper | Project Page | Code (not yet)
8. GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization
Authors : Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jingtuo Liu, Liangjun Zhang, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang
抽象的
This paper presents GIR, a 3D Gaussian Inverse Rendering method for relightable scene factorization. Compared to existing methods leveraging discrete meshes or neural implicit fields for inverse rendering, our method utilizes 3D Gaussians to estimate the material properties, illumination, and geometry of an object from multi-view images. Our study is motivated by the evidence showing that 3D Gaussian is a more promising backbone than neural fields in terms of performance, versatility, and efficiency. In this paper, we aim to answer the question: "How can 3D Gaussian be applied to improve the performance of inverse rendering?" To address the complexity of estimating normals based on discrete and often in-homogeneous distributed 3D Gaussian representations, we proposed an efficient self-regularization method that facilitates the modeling of surface normals without the need for additional supervision. To reconstruct indirect illumination, we propose an approach that simulates ray tracing. Extensive experiments demonstrate our proposed GIR's superior performance over existing methods across multiple tasks on a variety of widely used datasets in inverse rendering. This substantiates its efficacy and broad applicability, highlighting its potential as an influential tool in relighting and reconstruction. ? Paper | Project Page | Code (not yet)
9. Gaussian Splatting with NeRF-based Color and Opacity
Authors : Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir Tadeja, Przemysław Spurek
抽象的
神经辐射场 (NeRF) 已经证明了神经网络在捕捉 3D 物体的复杂性方面的巨大潜力。通过在神经网络权重中编码形状和颜色信息,NeRF 擅长生成 3D 对象的极其清晰的新颖视图。最近,出现了许多利用生成模型的 NeRF 泛化,扩大了其多功能性。相比之下,高斯泼溅 (GS) 提供了类似的渲染质量以及更快的训练和推理,因为它不需要神经网络来工作。我们对高斯分布集中的 3D 对象信息进行编码,这些信息可以像经典网格一样以 3D 方式渲染。不幸的是,GS 很难调节,因为它们通常需要大约十万个高斯分量。为了减轻这两种模型的缺陷,我们提出了一种混合模型,该模型使用 3D 对象形状的 GS 表示以及基于 NeRF 的颜色和不透明度编码。 Our model uses Gaussian distributions with trainable positions (ie means of Gaussian), shape (ie covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity.因此,我们的模型可以更好地描述 3D 对象的阴影、光反射和透明度。 ?纸|代码
评论:
2024 年:
1. Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human
Authors : Song Bai, Jie Li
抽象的
While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future. ?纸
2. A Survey on 3D Gaussian Splatting
Authors : Guikun Chen, Wenguan Wang
抽象的
3D Gaussian splatting (3D GS) has recently emerged as a transformative technique in the explicit radiance field and computer graphics landscape. This innovative approach, characterized by the utilization of millions of 3D Gaussians, represents a significant departure from the neural radiance field (NeRF) methodologies, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS 凭借其明确的场景表示和可微的渲染算法,不仅保证了实时渲染功能,而且还引入了前所未有的控制和可编辑性水平。这使得 3D GS 成为下一代 3D 重建和表示的潜在游戏规则改变者。在本文中,我们首次系统地概述了 3D GS 领域的最新发展和关键贡献。我们首先详细探讨 3D GS 出现背后的基本原理和驱动力,为理解其重要性奠定基础。我们讨论的一个焦点是 3D GS 的实际适用性。通过促进实时性能,3D GS 开辟了从虚拟现实到交互式媒体等多种应用。对此进行了补充,对领先的 3D GS 模型进行了比较分析,并在各种基准任务中进行了评估,以突出其性能和实用性。该调查最后确定了当前的挑战并提出了该领域未来研究的潜在途径。通过这项调查,我们的目标是为新手和经验丰富的研究人员提供宝贵的资源,促进在适用和明确的辐射场表示方面的进一步探索和进步。 ?纸
3. 3D Gaussian as a New Vision Era: A Survey
Authors : Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, Ying He
抽象的
3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural networks, such as Neural Radiance Fields (NeRF). This technique has found diverse applications in areas such as robotics, urban mapping, autonomous navigation, and virtual reality/augmented reality, just name a few. Given the growing popularity and expanding research in 3D Gaussian Splatting, this paper presents a comprehensive survey of relevant papers from the past year. We organize the survey into taxonomies based on characteristics and applications, providing an introduction to the theoretical underpinnings of 3D Gaussian Splatting. Our goal through this survey is to acquaint new researchers with 3D Gaussian Splatting, serve as a valuable reference for seminal works in the field, and inspire future research directions, as discussed in our concluding section. ?纸
4. Neural Fields in Robotics: A Survey
Authors : Muhammad Zubair Irshad, Mauro Comi, Yen-Chen Lin, Nick Heppert, Abhinav Valada, Zsolt Kira, Rares Ambrus, Johnathan Trembley
抽象的
神经场已成为计算机视觉和机器人技术中 3D 场景表示的变革性方法,可以从 2D 数据中准确推断几何形状、3D 语义和动力学。 Leveraging differentiable rendering, Neural Fields encompass both continuous implicit and explicit neural representations enabling high-fidelity 3D reconstruction, integration of multi-modal sensor data, and generation of novel viewpoints. This survey explores their applications in robotics, emphasizing their potential to enhance perception, planning, and control. Their compactness, memory efficiency, and differentiability, along with seamless integration with foundation and generative models, make them ideal for real-time applications, improving robot adaptability and decision-making. This paper provides a thorough review of Neural Fields in robotics, categorizing applications across various domains and evaluating their strengths and limitations, based on over 200 papers. First, we present four key Neural Fields frameworks: Occupancy Networks, Signed Distance Fields, Neural Radiance Fields, and Gaussian Splatting. Second, we detail Neural Fields' applications in five major robotics domains: pose estimation, manipulation, navigation, physics, and autonomous driving, highlighting key works and discussing takeaways and open challenges. Finally, we outline the current limitations of Neural Fields in robotics and propose promising directions for future research. ?纸
5. How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey
Authors : Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi
抽象的
Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges. ?纸
6. Recent Advances in 3D Gaussian Splatting
Authors : Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao
抽象的
3D高斯溅射(3DGS)的出现,大大加快了新颖视图合成的渲染速度。与使用位置和视点条件神经网络表示 3D 场景的神经辐射场 (NeRF) 等神经隐式表示不同,3D 高斯泼溅利用一组高斯椭球体对场景进行建模,以便通过将高斯椭球体光栅化为高斯椭球体来实现高效渲染图像。除了渲染速度快之外,3D 高斯分布的显式表示还有助于动态重建、几何编辑和物理模拟等编辑任务。考虑到该领域的快速变化和不断增长的工作数量,我们对最近的 3D 高斯分布方法进行了文献综述,这些方法可以按功能大致分为 3D 重建、3D 编辑和其他下游应用。还说明了传统的基于点的渲染方法和 3D 高斯分布的渲染公式,以便更好地理解该技术。本次调查旨在帮助初学者快速进入该领域,并为经验丰富的研究人员提供全面的概述,从而可以刺激 3D 高斯泼溅表示的未来发展。 ?纸
7. Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
Authors : Anurag Dalal, Daniel Hagen, Kjell G. Robbersmyr, Kristian Muri Knausgård
抽象的
Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting. ?纸
满贯:
2024 年:
1. SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Authors : Mingrui Li, Shuhong Liu, Heng Zhou
抽象的
Semantic understanding plays a crucial role in Dense Simultaneous Localization and Mapping (SLAM), facilitating comprehensive scene interpretation. Recent advancements that integrate Gaussian Splatting into SLAM systems have demonstrated its effectiveness in generating high-quality renderings through the use of explicit 3D Gaussian representations. Building on this progress, we propose SGS-SLAM, the first semantic dense visual SLAM system grounded in 3D Gaussians, which provides precise 3D semantic segmentation alongside high-fidelity reconstructions. Specifically, we propose to employ multi-channel optimization during the mapping process, integrating appearance, geometric, and semantic constraints with key-frame optimization to enhance reconstruction quality. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and semantic segmentation, outperforming existing methods meanwhile preserving real-time rendering ability. ?纸
2. SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM
Authors : Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang
抽象的
We propose SemGauss-SLAM, the first semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering in real-time. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift and improve reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to more robust tracking and consistent mapping. Our SemGauss-SLAM method demonstrates superior performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in novel-view semantic synthesis and 3D semantic mapping. ?纸
3. Compact 3D Gaussian Splatting For Dense Visual SLAM
Authors : Tianchen Deng, Yaohui Chen, Leyan Zhang, Jianfei Yang, Shenghai Yuan, Danwei Wang, Weidong Chen
抽象的
Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, ie, the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation. ?纸
4. NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting
Authors : Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie
抽象的
We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping. ?纸
5. High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization
Authors : Shuo Sun, Malcolm Mielle, Achim J. Lilienthal, Martin Magnusson
抽象的
We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM. ?纸
6. RGBD GS-ICP SLAM
Authors : Seongbo Ha, Jiung Yeon, Hyeonwoo Yu
抽象的
Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map. ? Paper |代码| ? Short Presentation
7. EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting
Authors : Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, Wei Shen
抽象的
Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Mapping) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperative applications among endoscopic surgeries. In this paper, we introduce EndoGSLAM, an efficient SLAM approach for endoscopic surgeries, which integrates streamlined Gaussian representation and differentiable rasterization to facilitate over 100 fps rendering speed during online camera tracking and tissue reconstructing. Extensive experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches, showing tremendous potential for endoscopic surgeries ?纸|项目页面| Code (not yet)
8. CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Authors : Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui
抽象的
Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, ie, CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available. ? Paper | Project Page | Code (not yet)
9. MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
Authors : Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu
抽象的
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. ? Paper | Project Page | Code (not yet)
10. Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting
Authors : Xiaolei Lang, Laijian Li, Hang Zhang, Feng Xiong, Mu Xu, Yong Liu, Xingxing Zuo, Jiajun Lv
抽象的
We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems. ?纸
11. RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting
Authors : Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou
抽象的
We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking准确性。 ? Paper | Project Page |代码
12. [3DV '25] LoopSplat: Loop Closure by Registering 3D Gaussian Splats
Authors : Liyuan Zhu, Yue Li, Erik Sandström, Shengyu Huang, Konrad Schindler, Iro Armeni
抽象的
Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM. ? Paper | Project Page |代码
13. MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation
Authors : Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu
抽象的
新兴的 3D 场景表示,例如神经辐射场 (NeRF) 和 3D 高斯分布 (3DGS),已经证明了它们在同步定位和建图 (SLAM) 中实现照片级真实感渲染的有效性,特别是在使用高质量视频序列作为输入时。然而,现有的方法很难处理运动模糊帧,这在低光或长时间曝光条件等现实场景中很常见。这通常会导致相机定位精度和地图重建质量显着降低。 To address this challenge, we propose a dense visual SLAM pipeline (ie MBA-SLAM) to handle severe motion-blurred inputs.我们的方法将高效的运动模糊感知跟踪器与神经辐射场或基于高斯溅射的映射器集成在一起。通过精确建模运动模糊图像的物理图像形成过程,我们的方法同时学习 3D 场景表示并估计相机在曝光期间的局部轨迹,从而能够主动补偿相机运动引起的运动模糊。在我们的实验中,我们证明 MBA-SLAM 在相机定位和地图重建方面都超越了以前最先进的方法,在一系列数据集上展示了卓越的性能,包括具有清晰图像的合成数据集和真实数据集以及受影响的数据集通过运动模糊,突出了我们方法的多功能性和鲁棒性。 ? Paper |项目页面| Code (not yet)
2023 年:
1. [CVPR '24] GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Authors : Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, Xuelong Li
抽象的
在本文中,我们介绍了 GS-SLAM,它首先在同步定位与建图 (SLAM) 系统中利用 3D 高斯表示。它有利于效率和准确性之间更好的平衡。 Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas.该策略对于扩展 3D 高斯表示以重建整个场景而不是在现有方法中合成静态对象至关重要。此外,在姿态跟踪过程中,设计了一种有效的从粗到精的技术来选择可靠的3D高斯表示来优化相机姿态,从而减少运行时间和鲁棒估计。与 Replica、TUM-RGBD 数据集上现有最先进的实时方法相比,我们的方法实现了具有竞争力的性能。 The source code will be released upon acceptance. ?纸
2. [CVPR '24] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Authors : Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten
抽象的
Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2× state-of-theart performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D地图。 ? Paper |项目页面|代码| ?解说视频
3. [CVPR '24] Gaussian Splatting SLAM
Authors : Hidenobu Matsuki, Riku Murai, Paul HJ Kelly, Andrew J. Davison
抽象的
我们首次将 3D 高斯分布应用于使用单个移动单目或 RGB-D 相机进行增量 3D 重建。 Our Simultaneous Localisation and Mapping (SLAM) method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering.需要多项创新才能从实时摄像机中持续重建高保真度的 3D 场景。 First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence.其次,通过利用高斯的显式性质,我们引入几何验证和正则化来处理增量 3D 密集重建中出现的歧义。 Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation, but also reconstruction of tiny and even transparent objects. ?纸| Project Page |代码| ? Short Presentation
4. Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
Authors : Vladimir Yugay, Yue Li, Theo Gevers, Martin R. Oswald
抽象的
We present the first neural RGBD SLAM method capable of photorealistically reconstructing real-world scenes. Despite modern SLAM methods achieving impressive results on synthetic datasets, they still struggle with real-world datasets. Our approach utilizes 3D Gaussians as a primary unit for our scene representation to overcome the limitations of the previous methods. We observe that classical 3D Gaussians are hard to use in a monocular setup: they can't encode accurate geometry and are hard to optimize with single-view sequential supervision. By extending classical 3D Gaussians to encode geometry, and designing a novel scene representation and the means to grow, and optimize it, we propose a SLAM system capable of reconstructing and rendering real-world datasets without compromising on speed and efficiency. We show that Gaussian-SLAM can reconstruct and photorealistically render real-world scenes. We evaluate our method on common synthetic and real-world datasets and compare it against other state-of-the-art SLAM methods. Finally, we demonstrate, that the final 3D scene representation that we obtain can be rendered in Real-time thanks to the efficient Gaussian Splatting rendering. ? Paper | Project Page |代码| ? Short Presentation
5. [CVPR '24] Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Authors : Huajian Huang, Longwei Li, Hui Cheng, Sai-Kit Yeung
抽象的
The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, eg, PSNR is 30% higher and rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform such as Jetson AGX Orin, showing the potential of robotics applications. ? Paper | Project Page |代码
疏:
2024 年:
1. [CVPR '24] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Authors : Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu
抽象的
辐射场在从稀疏输入视图合成新颖视图方面表现出了令人印象深刻的性能,但流行的方法存在训练成本高和推理速度慢的问题。本文介绍了 DNGaussian,一种基于 3D 高斯辐射场的深度正则化框架,以低成本提供实时、高质量的少样本新颖视图合成。我们的动机源于最近 3D Gaussian Splatting 的高效表示和令人惊讶的质量,尽管当输入视图减少时它会遇到几何退化。在高斯辐射场中,我们发现场景几何形状的退化主要与高斯图元的定位有关,并且可以通过深度约束来缓解。 Consequently, we propose a Hard and Soft Depth Regularization to restore accurate scene geometry under coarse monocular depth supervision while maintaining a fine-grained color appearance. To further refine detailed geometry reshaping, we introduce Global-Local Depth Normalization, enhancing the focus on small local depth changes.在 LLFF、DTU 和 Blender 数据集上进行的大量实验表明,DNGaussian 的性能优于最先进的方法,可以显着降低内存成本,减少 25 倍的训练时间,并将渲染速度提高 3000 倍以上,从而获得可比或更好的结果。 ? Paper | Project Page |代码| ? Short Presentation
2. Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting
Authors : Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III
抽象的
In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and透明物体。 ? Paper |项目页面
3. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Authors : Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
抽象的
We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane sweeping in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian primitives' opacities, covariances, and spherical harmonics coefficients jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussian Splatting models via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, our model achieves state-of-the-art performance with the fastest feed-forward inference speed (22 fps). Compared to the latest state-of-the-art method pixelSplat, our model uses 10× fewer parameters and infers more than 2× faster while providing higher appearance and geometry quality as well as better cross-dataset generalization. ?纸| Project Page |代码
4. latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Authors : Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen
抽象的
我们提出了 LatentSplat,一种在 3D 潜在空间中预测语义高斯的方法,可以通过轻量级生成 2D 架构进行展开和解码。 Existing methods for generalizable 3D reconstruction either do not enable fast inference of high resolution novel views due to slow volume rendering, or are limited to interpolation of close input views, even in simpler settings with a single central object, where 360-degree generalization is possible 。 In this work, we combine a regression-based approach with a generative model, moving towards both of these capabilities within the same method, trained purely on readily available real video data. The core of our method are variational 3D Gaussians, a representation that efficiently encodes varying uncertainty within a latent space consisting of 3D feature Gaussians. From these Gaussians, specific instances can be sampled and rendered via efficient Gaussian splatting and a fast, generative decoder network. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data. ? Paper |项目页面|代码
5. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Authors : Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein
抽象的
我们引入了 GRM,一种大型重建器,能够在大约 0.1 秒内从稀疏视图图像中恢复 3D 资源。 GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, ie, text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. ? Paper | Project Page |代码
6. Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Authors : Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang
抽象的
We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU. ?纸
7. CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Authors : Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari
抽象的
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (eg, 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes. ? Paper |项目页面
8. InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds
Authors : Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang
抽象的
While novel view synthesis (NVS) has made substantial progress in 3D computer vision, it typically requires an initial estimation of camera intrinsics and extrinsics from dense viewpoints. This pre-processing is usually conducted via a Structure-from-Motion (SfM) pipeline, a procedure that can be slow and unreliable, particularly in sparse-view scenarios with insufficient matched features for accurate reconstruction. In this work, we integrate the strengths of point-based representations (eg, 3D Gaussian Splatting, 3D-GS) with end-to-end dense stereo models (DUSt3R) to tackle the complex yet unresolved issues in NVS under unconstrained settings, which encompasses pose-free and sparse view challenges. Our framework, InstantSplat, unifies dense stereo priors with 3D-GS to build 3D Gaussians of large-scale scenes from sparseview & pose-free images in less than 1 minute. Specifically, InstantSplat comprises a Coarse Geometric Initialization (CGI) module that swiftly establishes a preliminary scene structure and camera parameters across all training views, utilizing globally-aligned 3D point maps derived from a pre-trained dense stereo pipeline. This is followed by the Fast 3D-Gaussian Optimization (F-3DGO) module, which jointly optimizes the 3D Gaussian attributes and the initialized poses with pose regularization. Experiments conducted on the large-scale outdoor Tanks & Temples datasets demonstrate that InstantSplat significantly improves SSIM (by 32%) while concurrently reducing Absolute Trajectory Error (ATE) by 80%. These establish InstantSplat as a viable solution for scenarios involving posefree and sparse-view conditions. ? Paper | Project Page |代码| ?解说视频
9. Sp 2 360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
Authors : Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen
抽象的
We aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diffusion models can strongly improve the reconstruction of a scene with low-cost fine-tuning. Specifically, we present SparseSplat360 (Sp2360), a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Due to superior training and rendering speeds, we use an explicit scene representation in the form of 3D Gaussians over NeRF-based implicit representations. We propose an iterative update strategy to fuse generated pseudo novel views with existing 3D Gaussians fitted to the initial sparse inputs. As a result, we obtain a multi-view consistent scene representation with details coherent with the observed inputs. Our evaluation on the challenging Mip-NeRF360 dataset shows that our proposed 2D to 3D distillation algorithm considerably improves the performance of a regularized version of 3DGS adapted to a sparse-view setting and outperforms existing sparse-view reconstruction methods in 360 scene reconstruction. Qualitatively, our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail. ? Paper | Code (not yet)
2023 年:
1. SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting
Authors : Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi
抽象的
The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360 scenes from sparse training views. We find that using naive depth priors is not sufficient and integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by up to 30.5% and NeRF-based methods by up to 15.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost. ?纸| Project Page | Code (not yet)
2. FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Authors : Zehao Zhu, Zhiwen Fan, Yifan Jiang, Zhangyang Wang
抽象的
Novel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a few-shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender ? Paper |项目页面|代码
3. [CVPR '24] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
Authors : David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann
抽象的
We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance场地。 ? Paper | Project Page |代码
4. [CVPR '24] Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Authors : Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi
抽象的
We introduce the Splatter Image, an ultra-fast approach for monocular 3D object reconstruction which operates at 38 FPS. Splatter Image is based on Gaussian Splatting, which has recently brought real-time rendering, fast training, and excellent scaling to multi-view reconstruction. For the first time, we apply Gaussian Splatting in a monocular reconstruction setting. Our approach is learning-based, and, at test time, reconstruction only requires the feed-forward evaluation of a neural network. The main innovation of Splatter Image is the surprisingly straightforward design: it uses a 2D image-to-image network to map the input image to one 3D Gaussian per pixel. The resulting Gaussians thus have the form of an image, the Splatter Image. We further extend the method to incorporate more than one image as input, which we do by adding cross-view attention. Owning to the speed of the renderer (588 FPS), we can use a single GPU for training while generating entire images at each iteration in order to optimize perceptual metrics like LPIPS. On standard benchmarks, we demonstrate not only fast reconstruction but also better results than recent and much more expensive baselines in terms of PSNR, LPIPS, and other metrics. ? Paper |项目页面|代码| ? Short Presentation
导航:
2024 年:
1. GaussNav: Gaussian Splatting for Visual Navigation
Authors : Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li
抽象的
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. ? Paper | Project Page |代码
2. 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization
Authors : Peng Jiang, Gaurav Pandey, Srikanth Saripalli
抽象的
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset. ?纸
3. Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF
Authors : Guangyi Liu, Wen Jiang, Boshu Lei, Vivek Pandey, Kostas Daniilidis, Nader Motee
抽象的
This work proposes a novel approach to bolster both the robot's risk assessment and safety measures while deepening its understanding of 3D scenes, which is achieved by leveraging Radiance Field (RF) models and 3D Gaussian Splatting. To further enhance these capabilities, we incorporate additional sampled views from the environment with the RF model. One of our key contributions is the introduction of Risk-aware Environment Masking (RaEM), which prioritizes crucial information by selecting the next-best-view that maximizes the expected information gain. This targeted approach aims to minimize uncertainties surrounding the robot's path and enhance the safety of its navigation. Our method offers a dual benefit: improved robot safety and increased efficiency in risk-aware 3D scene reconstruction and understanding. Extensive experiments in real-world scenarios demonstrate the effectiveness of our proposed approach, highlighting its potential to establish a robust and safety-focused framework for active robot exploration and 3D scene understanding. ?纸
4. 3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration
Authors : Quentin Herau, Moussab Bennehar, Arthur Moreau, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux
抽象的
Reliable multimodal sensor fusion algorithms re- quire accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high compu- tational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new ren- dering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset. ?纸
5. HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Authors : Zhuopeng Li, Yilin Zhang, Chenming Wu, Jianke Zhu, Liangjun Zhang
抽象的
The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas.为了克服这些挑战,我们提出了一种名为 HO-Gaussian 的混合优化方法,它将基于网格的体积与 3DGS 管道相结合。 HO-Gaussian 消除了对 SfM 点初始化的依赖,允许渲染城市场景,并结合点密度化来提高训练期间有问题区域的渲染质量。此外,我们引入高斯方向编码作为渲染管道中球谐函数的替代方案,从而实现依赖于视图的颜色表示。为了考虑多摄像头系统,我们引入了神经扭曲来增强不同摄像头之间的对象一致性。 Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets. ?纸
6. SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
Authors : Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun
抽象的
Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views. ?纸
7. BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting
Authors : Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang
抽象的
图像目标导航使机器人能够使用视觉提示进行引导,到达捕获目标图像的位置。然而,当前的方法要么严重依赖数据和计算成本昂贵的基于学习的方法,要么由于探索策略不足而在复杂环境中缺乏效率。 To address these limitations, we propose