Conference item
A pipeline for the creation of multimodal corpora from YouTube videos
- Abstract:
- This paper introduces an open-source pipeline for the creation of multimodal corpora from YouTube videos. It minimizes storage and bandwidth requirements, because the videos themselves need not be downloaded and can remain on YouTube’s servers. It also minimizes processing requirements by using YouTube’s automatically generated subtitles, thus avoiding a computationally expensive automatic speech recognition processing step. The pipeline combines standard tools and provides as its output a corpus file in the industry-standard vertical format used by many corpus managers. It is straightforwardly extensible with the addition of further levels of annotation and can be adapted to languages other than English.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 136.0KB, Terms of use)
-
- Publication website:
- https://aclanthology.org/2023.limo-1.1
Authors
- Publisher:
- Association for Computational Lingustics
- Host title:
- Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing
- Pages:
- 1–5
- Publication date:
- 2023-09-01
- Acceptance date:
- 2023-08-21
- Event title:
- Linguistic Insights from and for Multimodal Language Processing @KONVENS 2023 (LIMO 2023)
- Event location:
- Ingolstadt, Germany
- Event website:
- https://sites.google.com/view/limo2023/home
- Event start date:
- 2023-09-22
- Event end date:
- 2023-09-22
- ISBN:
- 979-8-89176-031-8
- Language:
-
English
- Pubs id:
-
1560567
- Local pid:
-
pubs:1560567
- Deposit date:
-
2023-11-09
Terms of use
- Copyright holder:
- Association for Computational Linguistics
- Copyright date:
- 2023
- Rights statement:
- © 2023 Association for Computational Linguistics
- Notes:
- This paper was presented at the Linguistic Insights from and for Multimodal Language Processing @KONVENS 2023 (LIMO 2023), 22nd September 2023, Ingolstadt, Germany.
If you are the owner of this record, you can report an update to it here: Report update to this record