Conference item
N2F2: hierarchical scene understanding with nested neural feature fields
- Abstract:
- Understanding complex scenes at multiple levels of abstraction remains a formidable challenge in computer vision. To address this, we introduce Nested Neural Feature Fields (N2F2), a novel approach that employs hierarchical supervision to learn a single feature field, wherein different dimensions within the same high-dimensional feature encode scene properties at varying granularities. Our method allows for a flexible definition of hierarchies, tailored to either the physical dimensions or semantics or both, thereby enabling a comprehensive and nuanced understanding of scenes. We leverage a 2D class-agnostic segmentation model to provide semantically meaningful pixel groupings at arbitrary scales in the image space, and query the CLIP vision-encoder to obtain language-aligned embeddings for each of these segments. Our proposed hierarchical supervision method then assigns different nested dimensions of the feature field to distill the CLIP embeddings using deferred volumetric rendering at varying physical scales, creating a coarse-to-fine representation. Extensive experiments show that our approach outperforms the state-of-the-art feature field distillation methods on tasks such as open-vocabulary 3D segmentation and localization, demonstrating the effectiveness of the learned nested feature field.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 4.6MB, Terms of use)
-
- Publisher copy:
- 10.1007/978-3-031-73202-7_12
Authors
+ European Research Council
More from this funder
- Funder identifier:
- https://ror.org/0472cxd90
- Grant:
- 101001212
+ Engineering and Physical Sciences Research Council
More from this funder
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/T028572/1
- Publisher:
- Springer
- Host title:
- Computer Vision – ECCV 2024 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIX
- Pages:
- 197–214
- Series:
- Lecture Notes in Computer Science
- Series number:
- 15117
- Publication date:
- 2024-11-21
- Acceptance date:
- 2024-07-01
- Event title:
- 20th European Conference on Computer Vision (ECCV 2024)
- Event location:
- Milan, Italy
- Event website:
- https://eccv.ecva.net/
- Event start date:
- 2024-09-29
- Event end date:
- 2024-10-04
- DOI:
- EISSN:
-
1611-3349
- ISSN:
-
0302-9743
- EISBN:
- 978-3-031-73202-7
- ISBN:
- 978-3-031-73201-0
- Language:
-
English
- Keywords:
- Pubs id:
-
2017721
- Local pid:
-
pubs:2017721
- Deposit date:
-
2024-07-22
- ARK identifier:
Terms of use
- Copyright holder:
- Bhalgat et al.
- Copyright date:
- 2024
- Rights statement:
- © 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
- Notes:
- This paper was presented at the 20th European Conference on Computer Vision (ECCV 2024), 29th September - 4th October 2024, Mlian, Italy. This is the accepted manuscript version of the article. The final version is available online from Springer at https://dx.doi.org/10.1007/978-3-031-73202-7_12
If you are the owner of this record, you can report an update to it here: Report update to this record