Thesis
Deep learning sonographer visual attention
- Abstract:
-
Current automated fetal ultrasound (US) analysis methods are heavily influenced by the recent success of deep learning in computer vision tasks. Models built on convolutional neural networks (CNN) for fetal biometry planes detection have surpassed classic models built on hand-crafted features, but training such networks requires large dataset, especially sonographer annotations, which is normally not available in US image analysis. Meanwhile, sonographer visual attention has proven to be a strong prior for human interpretation of US video frames. This thesis attempts to utilize sonograher visual attention in the form of gaze-tracking data in deep learning frameworks to assist US image analysis tasks.
We created a single sweep dataset on fetal abdominal videos with retrospective gaze-tracking, then implemented deep learning frameworks that utilize gaze-tracking data to assist fetal biometry plane detection. We first developed a CNN called SonoEyeNet for standardized abdominal circumference plane (ACP) detection informed by sonographer visual attention. We demonstrate that with the assistance of human visual attention information, ACP detection performance is increased compared to models not using gaze information.
We extended this framework by proposing a novel multi-task CNN called Multi-task SonoEyeNet (MSEN) that learns to generate clinically relevant spatial visual attention maps using sonographer gaze tracking data, and used the predicted visual attention maps to assist ACP detection. This framework expands the potential clinical usefulness of the previous framework by eliminating the requirement of input gaze-tracking data during inference without compromising its ACP detection performance.
With the availability of a novel dataset containing real-time screen recordings of US anomaly scans coupled with simultaneous gaze-tracking, we further extended the CNN framework by introducing a bi-directional convolutional long-short term memory (LSTM) as a recurrent module to model spatio-temporal visual attention as well as to detect all standard biometry planes of fetal abdomen (ACP), head (HCP) and femur (FLP). It was demonstrated that by modeling spatio-temporal visual attention, standard biometry planes detection performance can be further improved.
This work constitutes the first demonstration that learning sonographer visual attention in an ultrasound video in a deep learning framework is an efficient method to assist other US image analysis tasks.
Actions
Access Document
- Files:
-
-
(Preview, Dissemination version, pdf, 32.3MB, Terms of use)
-
(Supplementary materials, zip, 251.7MB, Terms of use)
-
Authors
Contributors
+ Noble, J
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Engineering Science
- Role:
- Supervisor
- ORCID:
- 0000-0002-3060-3772
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2026-04-28
- ARK identifier:
Terms of use
- Copyright holder:
- Yifan Cai
- Copyright date:
- 2019
If you are the owner of this record, you can report an update to it here: Report update to this record