Conference item
MOSE: a new dataset for video object segmentation in complex scenes
- Abstract:
- Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J & F) on existing datasets. However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied. To revisit VOS and make it more applicable in the real world, we collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex scenarios. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks. The most notable feature of MOSE dataset is complex scenes with crowded and occluded objects. The target objects in the videos are commonly occluded by others and disappear in some frames. To analyze the proposed MOSE dataset, we benchmark 18 existing VOS methods under 4 different settings on the proposed MOSE dataset and conduct comprehensive comparisons. The experiments show that current VOS algorithms cannot well perceive objects in complex scenes. For example, under the semi-supervised VOS setting, the highest J & F by existing state-of-the-art VOS methods is only 59.4% on MOSE, much lower than their ∼90% J & F performance on DAVIS. The results reveal that although excellent performance has been achieved on existing benchmarks, there are unresolved challenges under complex scenes and more efforts are desired to explore these challenges in the future.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 2.8MB, Terms of use)
-
- Publisher copy:
- 10.1109/iccv51070.2023.01850
Authors
- Publisher:
- IEEE
- Host title:
- Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023)
- Pages:
- 20167-20177
- Publication date:
- 2024-01-15
- Acceptance date:
- 2023-07-14
- Event title:
- IEEE/CVF International Conference on Computer Vision (ICCV 2023)
- Event location:
- Paris, France
- Event website:
- https://iccv2023.thecvf.com/
- Event start date:
- 2023-10-02
- Event end date:
- 2023-10-06
- DOI:
- EISSN:
-
2380-7504
- ISSN:
-
1550-5499
- EISBN:
- 9798350307184
- ISBN:
- 9798350307191
- Language:
-
English
- Pubs id:
-
1700226
- Local pid:
-
pubs:1700226
- Deposit date:
-
2024-03-20
Terms of use
- Copyright holder:
- IEEE
- Copyright date:
- 2023
- Rights statement:
- © 2023 IEEE
- Notes:
- This paper was presented at the IEEE/CVF International Conference on Computer Vision (ICCV 2023), 1st-6th October 2023, Paris, France. This is the accepted manuscript version of the article. The final version is available online from IEEE at: https://dx.doi.org/10.1109/iccv51070.2023.01850
If you are the owner of this record, you can report an update to it here: Report update to this record