Towards human-centric story understanding in video

Korbar, B

Abstract:: With endless amounts of data being uploaded every day, the potential for swift development of artificial intelligence has never been higher. Videos in particular contain a plethora of information for learning about the world. We can discern actions, interactions, movement patterns, speech, etc. But all too often, research tends to group and classify: a dog is a dog is a dog.

One of the challenges lies in transcending conventional class-based visual understanding and exploring the realm of instances. This thesis concerns itself with both named instances -- more specific than traditional classes -- and open-world, open-set instances -- more general than conventional class frameworks. In it, we discuss methods that address these challenges and could later serve as building blocks for holistic story understanding.

The thesis is structured in two broad themes: (1) identity-agnostic video understanding methods, and (2) personalisation of various video understanding tasks.

We first develop methods that are class-agnostic, and serve towards better tracking, re-identification, retrieval and semantic video processing. Our work demonstrates that localisation and re-identification of a person or an object in a video can be trained jointly, using semantically-initialised embeddings. Furthermore, we show that by designing a task-agnostic video sampler, we can increase the number of frames a large-language model can process, allowing us to learn from progressively longer videos.

We then focus on making video-understanding tasks identity dependent. We first design a method that tackles problems of compound retrieval, being able to jointly reason about `\textit{who} is doing what and where'. We then generalise this approach to work on not only humans, but any arbitrary object. We show that large visual-language models can recognise a specific instance (e.g. 'my dog Chia') amongst a large corpus of images. Finally, we recognise that not only visual representations, but also speech needs to be personalised. To this end, we present a method able to assign character names to speech segments even across multiple TV shows. Thus, we demonstrate crucial building blocks necessary for a more in-depth story understanding.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Korbar, B. (2024). Towards human-centric story understanding in video [PhD thesis]. University of Oxford.

MLA Style

Korbar, B. Towards Human-Centric Story Understanding in Video. 2024. University of Oxford, PhD thesis.

Chicago Style

Korbar, B. 2024. “Towards Human-Centric Story Understanding in Video.” PhD thesis, University of Oxford.
Print

Access Document

Files:: Korbar_2024_Towards_human-centric_story.pdf

(Preview, Dissemination version, pdf, 40.0MB, Terms of use)

Authors

+ Korbar, B More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

Contributors

+ Andrew, Z

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Supervisor
ORCID:: 0000-0002-8945-8573

DOI:: 10.5287/ora-zrr7dx6pa
Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Keywords:: video understanding

deep learning
Subjects:: Information storage and retrieval systems--Engineering--Code words

Computer vision

Video understanding
Deposit date:: 2026-04-09
ARK identifier:: ark:/29072/ora_7d210c2a6c9a4e2486827b2ed63cd172

Terms of use

Copyright holder:: Bruno Korbar

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Towards human-centric story understanding in video

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Towards human-centric story understanding in video

Actions

Access Document

Authors

Contributors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions