Deep learning for automatic segmentation of the nuclear envelope in electron microscopy data, trained with volunteer segmentations

Advancements in volume electron microscopy mean it is now possible to generate thousands of serial images at nanometre resolution overnight, yet the gold standard approach for data analysis remains manual segmentation by an expert microscopist, resulting in a critical research bottleneck. Although some machine learning approaches exist in this domain, we remain far from realizing the aspiration of a highly accurate, yet generic, automated analysis approach, with a major obstacle being lack of sufficient high‐quality ground‐truth data. To address this, we developed a novel citizen science project, Etch a Cell, to enable volunteers to manually segment the nuclear envelope (NE) of HeLa cells imaged with serial blockface scanning electron microscopy. We present our approach for aggregating multiple volunteer annotations to generate a high‐quality consensus segmentation and demonstrate that data produced exclusively by volunteers can be used to train a highly accurate machine learning algorithm for automatic segmentation of the NE, which we share here, in addition to our archived benchmark data.


| INTRODUCTION
Until recently, the study of cell morphology with electron microscopy (EM) was often restricted to qualitative illustration, as technological limitations prevented quantitative analysis of samples in three dimensions. Development of novel volume EM methodologies, including serial blockface scanning electron microscopy (SBF SEM) 1 and focused ion beam SEM (FIB SEM), 2 has enabled automated acquisition of images through greater depths at high resolution, 3 with one microscope able to generate hundreds of gigabytes of aligned serial images per day.
However, our ability to analyse these data has not seen comparable advancement; segmentation of EM images remains a difficult and time-consuming manual process. Hence, to fully realize the analytical The Zooniverse Volunteer Community: This publication has been made possible by the participation of volunteers in the Etch A Cell project. Their contributions are acknowledged at https://www.zooniverse.org/projects/h-spiers/etch-a-cell/about/results. potential of EM, there is a great need to develop fast, generalizable and accurate analysis solutions. Although some EM image analysis can be automated through application of methods such as machine learning, 4-8 these advances have mainly benefited specific domains such as connectomics, 9,10 where the segmentation problem is focused on tracing neurons and synapses in serial images from brain and nerves. This focus has generated a large amount of 'ground truth' data that have been successfully used in deep learning to generate algorithms to automate the task.
The same cannot be said of cell biology, where the segmentation challenge is more diverse, encompassing common organelles such as the nucleus, nuclear envelope (NE), mitochondria, endoplasmic reticulum and endosomes, as well as rare or transient organelles such as autophagosomes, secretory granules and phase-separated entities. As in connectomics, the production of ground truth segmentations has, to date, relied on the effort of the expert EM community. At present rates of data generation, this community alone is unable to generate sufficient ground truth segmentation data, representative of the appearance of the full range of organelles in different experimental conditions and biological model systems. To enable data analysis at a scale beyond the capacity of the research community, we engaged the help of a global community of willing volunteers through a novel online citizen science project, 'Etch a Cell' (https://www.zooniverse.org/projects/h-spiers/ etch-a-cell), which asked members of the public to manually segment the NE, which was targeted for volunteer segmentation as it is the most easily identifiable subcellular structure for which reliable automatic segmentation was not widely available.
The NE is a double lipid bilayer found in most eukaryotic cells where it surrounds the nucleoplasm and encloses the genetic material of the cell. Alterations in the structure of the NE have been associated with disease 11 including cancer 12,13 and nuclear laminopathies. 14 However, despite the clear critical role of the NE in cell function, the nanoscale three-dimensional structure of this organelle has been poorly understood to date. In addition to its biological importance, segmentation of the NE is often a critical first step in the segmentation of a cell, as this structure provides important context to the three dimensional spatial distribution of other organelles.
Here, we present our method for establishing a high-quality consensus segmentation from multiple volunteer annotations on the same image. We demonstrate that exclusively volunteer produced data can be used to train a machine learning model for highly accurate automatic segmentation of the NE. Finally, we present a novel multi-axis modification of our machine learning algorithm that resulted in a marked improvement in model performance. We share all benchmark data and algorithms produced for the use of the wider research community.

| Etch A Cell: An online citizen science project for NE segmentation
An online citizen science project, 'Etch A Cell' (EAC), was developed to enable large-scale segmentation of the NE in volume EM data through public engagement. Although online citizen science has been previously applied in similar contexts, 15,16 to our knowledge, this is the first application of non-expert, volunteer effort for the segmentation of organelles in EM data. To maximize the potential utility of the data produced for the research community, the commonly used HeLa cell line 17 was selected for analysis. A benchmark serial image data set was generated at 10-nm pixel resolution with SBF SEM (Figure 1 and Movie S1), and n = 18 cells selected from this volume for volunteer segmentation (Table S1 details the unique cell ID assigned to each region of interest (ROI) and provides further descriptive information), resulting in a total of n = 4241 slices for project inclusion after volume pre-processing ( Figure S1AD and Section 4). Raw data have been made available via the EMPIAR repository (deposition ID: 137, accession code: EMPIAR10094, https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/ 10094/). consist of an arbitrary number of lines, which were recorded as an array of x,y pairs (Section 4, Figure S1F).

| An overview of volunteer interaction with EAC
In total, n = 104 612 classifications were submitted by volunteers before the EAC workflow was deactivated on 1 August 2019. As classifications could be made by unregistered volunteers, it was not possible to establish precisely how many individuals contributed, however, classifications submitted by logged in users were associated with n = 4749 user IDs and n = 9444 IP addresses, indicating between 5000 and 10 000 individuals contributed. As is often observed for online citizen science projects, 18

| Forming consensus from multiple segmentationsaggregating volunteer annotations
To generate sufficiently high-quality data for downstream analyses, each individual slice within EAC was presented to multiple volunteers for segmentation. As expected, most volunteer segmentations were distributed on and around the NE (Figure 2A,C and Movie S2); however, distinct classes of segmentation error were observed, including F I G U R E 1 Workflow for the acquisition and segmentation of serial EM images from benchmark samples. In this study we imaged resinembedded HeLa cells at 10nm pixel resolution, A, using SBF SEM, B. This produced an image stack, C, of 518 sections (50 nm thickness, 8192 Â 8192 pixels, Movie S1) which were used to construct a 3D volume, D. ROIs from within this volume were segmented by both experts, E, and volunteers, F. Table S1 provides further information about the individual ROIs within the volume 'graffiti' ( Figure 2D, possibly produced as an inadvertent consequence of well-intended classroom-based engagement using this project), 'false-positive segmentation' in which non-NE pixels are segmented ( Figure 2E) and 'false-negative segmentation' where NE pixels are missed ( Figure 2F). Of these error classes, the graffiti class was comparatively rare (Movie S2).
To remove outlying data and establish a 'consensus' segmentation for each slice, it was necessary to aggregate the multiple volunteer annotations. As the Freehand Drawing Tool was developed specifically for EAC, it was necessary to develop a novel aggregation approach. Because of the presence of noise, erroneous segmentations, and an unknown, variable number of line segments within the data, this was not trivial; hence, multiple novel aggregation approaches were developed and explored. Of these, the 'CRIA' algorithm was selected for our analytical pipeline, as it had a number of advantages compared with other approaches as shall be outlined.
Briefly, the CRIA algorithm procedure involved the following steps: first, closed loops were formed from each individual volunteer segmentation ( Figure 3A-C), which could consist of multiple separate lines ( Figure S1F). The closed loops were produced through connecting separate lines after ordering them by minimizing distances. Next, interior areas were generated from the closed loops ( Figure 3B). The interior areas were overlaid to generate a height map, with the 'height' reflecting the level of agreement between the separate volunteer segmentations for a single slice ( Figure 3D). The consensus segmentation was determined through taking a mean 'height' level; hence the resulting, F I G U R E 3 The CRIA algorithm was developed for the aggregation of multiple volunteer segmentations. In this algorithm, each individual volunteer segmentation, A, was converted into a closed loop, B. This procedure was performed for all the segmentations associated with each slice of the ROI, as can be seen stacked in C. The closed loops were converted to interior areas and stacked, D. A final, consensus segmentation was determined as the outline of all interior areas where half or more of the volunteer segmentations were in agreement, E. This generated a high-quality, volunteer-produced segmentation (5μm scale bar), F. We show here the annotations and aggregation for slice number 150 from C001 (ROI 1656-6756-329). This process was applied to each slice of all n = 18 volunteer-segmented ROIs, allowing generation of a 3D reconstruction of each ROI, G (Movie S3)

| Machine learning for NE segmentation
Aggregated volunteer NE segmentations were used to train a U-Net convolutional neural network (CNN) architecture 19,20 for automatic segmentation of the NE in SBF SEM data. Model performance was assessed through presenting the model with two previously unseen ROIs, and comparing the resulting predicted NE segmentations with 'ground truth' (Table S1 and Section 4 provide further information about ROIs used for model training, validation and testing).
Two complementary forms of 'ground truth' data were available; expert generated segmentations (available at http://www.ebi.ac.uk/ biostudies/files/S-BSST448/Expert) and aggregated volunteer F I G U R E 4 Consensus volunteer and machine-predicted NE segmentations are high quality. Visual inspection reveals a high similarity between expert, A, aggregated volunteer, B, and machine predicted, C segmentations [shown for slice number 150 from C001 (ROI 1656-6756-329)], and a high degree of overlap of these segmentations with the NE. Segmentations from slices found at the top and bottom of the volume (D, E, F) showed greater segmentation variability due to the presence of NE islands and membrane parallel to the cutting plane, which make these regions more challenging to segment [shown for slice number 40 from C001 (ROI 1656-6756-329)]; 5μm scale bar is shown on panel A. Despite this, 3D reconstruction of nuclei revealed a high similarity between expert, G, volunteer, H, and machine, I, segmented nuclei [shown for C001 (ROI 1656-6756-329)] (Movies S4 and S5). Automatic NE segmentation using our trained model applied to our full data volume captured many nuclei which had not previously been segmented through expert or volunteer effort, J (Movie S9), and the n = 18 nuclei previously segmented by volunteers, K (Movie S10). Machine-predicted segmentations were produced with TAP segmentations (Section 4, Table S1 and Figure 4) providing a means to test two facets of model performance. In comparing the prediction with the aggregated volunteer data for each ROI, we were able to establish how well the model had learnt to perform the task of NE segmentation from the training data provided, which consisted of exclusively volunteer produced segmentations. Comparing model performance to expert data enabled assessment of how well the model (hence, indirectly, the volunteers) performed this task in comparison to experts.
The model performed well when compared to aggregated volunteer data. The average Hausdorff distance (AHD) between the predicted segmentation and the aggregated volunteer segmentation was 1.638 pixels (corresponding to a distance of 16.377 nm) for C001, and 1.767 pixels (17.675 nm) for C006. The F-measure, recall and precision of the model were 0.700, 0.792, 0.628, respectively, for C001 and 0.687, 0.767, 0.621 for C006 (Table 1). Although these metrics may initially seem poor in comparison to similar previous work, 21,22 it should be emphasized that we are examining the overlap between lines (the NE) rather than areas (the nucleus). Hence, for easier comparison of our model with previously reported metrics, we also provide the F-measure for the nucleus area for a single slice within each ROI. As anticipated, this metric shows a much higher model performance of 0.995 (C001) and 0.991 (C006).
Reflecting differences in volunteer and expert segmentation skill, it was expected that we would see reduced model performance (trained exclusively using volunteer produced data), when comparing against expert-produced 'ground truth' data. We observed an AHD of 3.129 pixels (31.287 nm) for C001 and 3.890 pixels (38.904 nm) for C006 between the prediction and expert data. The F-measure, recall and precision were 0.340, 0.697, 0.225, respectively, for C001, and 0.375, 0.697, 0.256 for C006 (Table 1). Although most of these metrics indicate good model performance, the F-measure and precision warrant further explanation. These measures are particularly poor in the case of comparing the model to the expert data due to an idiosyncrasy of the expert data. The width of the available expert data is narrower (30 nm) compared to both the aggregated and predicted width of the NE (70 nm), and because of this we see a degradation of the precision and F-measure metrics. This is because the model has assigned pixels as NE that do not correspond with pixels annotated by the expert, therefore, the false-positives rate is seemingly inflated.
Unfortunately, it was not feasible to amend our expert ground truth through either asking an expert to resegment the ROIs (this was not practical due to time constraints) nor was it recommendable to dilate the width of the expert segmentation (as this would introduce greater errors, e.g. incorrectly assigning cytoplasm pixels as NE). Despite this, upon visual inspection, it was found that the model performance was arguably superior to the expert segmentation as more relevant pixels appeared to be assigned to the NE by the model (Figure 4), which raises questions regarding the legitimacy of 'ground truth' data produced by a single expert, as shall be discussed later.

| Improved model performance with tri-axis prediction
Although the model performed well, expert visual inspection revealed some regions of under-segmentation ( Figure S4A). These regions were not randomly distributed across the data, but were instead localized to sites at the top and bottom of the volume (the highest and lowest z slices, Figure S4B,C), presumably due to the higher degree of visual ambiguity in these regions caused by the presence of a greater number of NE islands and the membrane being oriented parallel to the SBF SEM imaging plane. To improve the automated segmentation, we sought to leverage additional information available in the volume. The data examined here were downscaled in the xy plane to 50 nm to be isotropic, therefore, it was possible to transpose the stack and run the model on each axis ( Figure S4D-F). This resulted in three orthogonal NE predictions which were recombined to form a final segmentation, with pixels assigned to NE in all three predictions accepted ( Figure S4G, Movies S6 and S7) and over-segmented pixels removed using a connected components analysis ( Figure S4H). Visual inspection revealed a significantly improved segmentation ( Figure S4I); however, it was not possible to quantify this improvement due to a lack of appropriate ground truth data.
The tri-axis prediction (TAP) approach was applied to the entire volume ( Figure 4, Movies S1, S8 and S9), and took a total of 48 minutes to produce NE predictions for all nuclei within the volume ( Figure 4J), including the n = 18 nuclei already segmented with volunteer effort ( Figure 4K and Movie S10). TAPs have been made available at http://www.ebi.ac.uk/biostudies/files/S-BSST448/Aggregations. Serendipitously, a cell within the volume was undergoing mitosis, T A B L E 1 CNN performance metrics. Model performance was assessed by comparing the predicted NE segmentation to ground truth for two ROIs [ROI 1656-6756-329 (cell ID = C001) and ROI 3624-2712-201 (cell ID = C006)]. We report multiple metrics of model performance (Section 4) against two complementary modes of 'ground truth' data available (aggregated volunteer data and expert-produced segmentations) allowing us to observe that our model performed well in this challenging context in which the NE had partially broken down, despite not having been exposed to training data of this type ( Figure 5A and Movie S11). This is in contrast to some other approaches for NE identification which rely on the presence of a clear boundary, such as flood or marker based watershed methods. 23 The ability of the algorithm to segment disassembled mitotic NE is particularly surprising given the NE effectively regresses to become ER during mammalian cell division. Further analysis of the features identified by the model may be useful in defining the transition of the NE to the ER and back during mitosis. TAP was also applied to an alternative region from the same resin-embedded sample imaged at higher resolution (5 nm) on the same microscope (which also contained both mitotic and interphase cells, Figure 5B and Movie S12), and to a HeLa cell from the same sample imaged by an alternative volume EM methodology (FIB SEM) ( Figure 5C and Movie S13). Visual inspection of these data sets showed good model performance indicating the model is generalizable to novel contexts; however, it should be acknowledged that some erroneous over-segmented pixels can be observed, particularly in the peripheral ER and edges of lipid droplets bordering the nuclear region in mitotic cells ( Figure 5B and Movie S12), indicating there is scope for future improvement.
To provide a comparison of the approach presented here with an

| DISCUSSION
We show here that volunteer effort through online citizen science can be effectively applied to the task of manual segmentation of organelles in electron micrographs, enabling data analysis at a scale not achievable by experts alone. We demonstrate the data produced is of sufficient quality for task automation through training a CNN capable of segmenting the NE at high accuracy. Although prior work has shown crowdsourced volunteer effort can be productively applied to comparable tasks, such as the marking of single particles from cryo-EM micrographs to generate 3D protein reconstructions, 15 and to the marking of whole cells, 16 to our knowledge this is the first study to demonstrate the ability of volunteers to effectively perform manual freehand segmentation of an organelle in volume EM data.
Such large-scale, systematic segmentation makes quantitative examination of organelle morphology feasible. This has the potential to drastically advance our understanding of NE morphology and function, in both normal and diseased states such as cancer 12,13 and nuclear laminopathies. 14 Yet, even with the collaboration of a community of citizen scientists it will not be possible to segment data at a scale proportional to current data production rates, and this challenge will become greater with further technological advancement. Hence, we sought to automate NE segmentation using volunteer produced segmentations as training data for a CNN, 19 resulting in a model able F I G U R E 5 Model shows high NE segmentation performance in novel contexts. Applying TAP to the full data volume allowed us to observe that the model performed well when segmenting the partially broken down NE of a mitotic cell, A (Movie S11). TAP also showed good performance when applied to the same resin-embedded sample (which also contained both mitotic and interphase cells) imaged at higher resolution (5 nm) on the same microscope, B (Movie S12) and when applied to a HeLa cell from the same sample imaged by an alternative volume EM methodology, FIB SEM, C (Movie S13). All scale bars are 5 μm to segment the NE to a high standard in a matter of minutes, rather than the hours, days or weeks required for manual segmentation. Critically, our model was trained exclusively with volunteer segmentations and required no expert microscopist input or intervention.
Although our model performed surprisingly well when applied to data produced under different conditions, there was scope for improved performance. Hence, we remain far from our aspiration of a highly-accurate, yet broadly applicable approach for the automated analysis of microscopy data for a single organelle, let alone each feature of interest in every volume acquired. It is anticipated that future work in this arena may be accelerated through application of approaches including transfer learning 25 and multiclass predictive models. 26  with green fluorescent protein) other study design adjustments will need to be explored such as presenting correlative light images in conjunction with EM images to guide segmentation. Beyond study design modifications, in such challenging instances it may also be possible to achieve higher data quality through engaging the help of a greater number of volunteers in providing segmentations for each image. Secondly, it is anticipated that the likely increased structural heterogeneity of future target organelles will require adjustment of our aggregation pipeline, for example, to enable the aggregation of many independent structures within a single ROI. However, preliminary work to aggregate mitochondrial segmentations indicates that this, while being an important consideration, is not insurmountable: the analytical pipelines presented here may require modification; however, they do not require complete reinvention.
Although challenging from a study design perspective, the possibility of designing a portfolio of projects of varying difficulty provides a rich opportunity to engage volunteers through serving a greater variety of skills and interests. Reassuringly, citizen scientists have proven capable of performing a growing array of challenging tasks, from identifying supernovae 27 to visually assessing the quality of brain registration in functional magnetic resonance imaging studies. 28 We have been astonished that it is possible to train non-experts to recognize and segment complex organelles in minutes with just an online tutorial. We therefore remain confident in the abilities of our volunteer community to successfully perform novel segmentation tasks.
The challenge of motivating increased engagement and highquality contributions will become increasingly important as our repertoire of citizen science projects expands and diversifies. Manual segmentation is a challenging task requiring a large time investment. We must therefore continue to develop novel modes of engaging our community and work to reduce the effort required to segment each slice. Mechanisms to achieve reduced volunteer effort per slice may include 'smart subject assignment' 29,30the intelligent passing of slices of appropriate difficulty to volunteers. Furthermore, it may be possible to actively retire slices from the project once an acceptable segmentation quality has been achievedthis would enable volunteer effort to be reduced for 'easier' images, and to instead be applied to images of greater difficulty.
Incorporating 'computers in the loop' may provide additional mechanisms for reducing segmentation effort. Future pipelines may include presenting volunteers with predicted segmentations for correction, rather than full segmentation. Feedback loops between computer-prediction and crowdcorrection could enable real-time model refinement, improve predictions and therefore progressively reduce need for volunteer correction, resulting in greater project efficiency. Predictive models need not be fully optimized to be useful; if a model is not yet able to accurately segment its target organelle, it may still provide valuable information that could be fruitfully leveraged, for example, the anticipated number of a particular organelle class and their approximate location. This would provide a mechanism for assessing volunteer ability, segmentation quality and subject difficulty.
We have demonstrated that experts can be removed from the task of manual segmentation, however, researcher time remains necessary to generate the infrastructure supporting this effort and for the continued refinement of multiple aspects of the analytical pipelines underlying these studies. More critically, researcher effort continues to be needed to interpret and assess the quality of volunteer or machine produced segmentations. This is a particularly challenging component of this work, as the required quality of a 'final segmentation' is often intimately linked to the research question being addressed.
Related to the ubiquitous challenge of finalizing segmentations and establishing 'truth' in this domain: it can be difficult to definitively assign pixels of noisy and nuanced micrographs to different regions, and much inter-expert (and intra-expert) variation can exist. The potential for pixel-assignment disagreement raises an interesting possibility regarding additional value of collectively producing segmentations; when multiple individuals annotate each slice, rather than a single expert, it is possible to produce a level of confidence that each pixel belongs to a certain region, rather than simply a binary designation.
Such a segmentation-confidence map may be more reflective of the reality of cell morphology, where a subset of pixels may not definitely belong to a particular region. This may provide insight, with regions of variable confidence being of possible biological relevance, for example, we may expect nuclear pores in the NE to be less confidently designated as this organelle.
Future collaboration of the crowd and computing is poised to enable, for the first time, the large-scale, generalizable, yet accurate, quantification of multiple subcellular structures across many data modalities at the nanoscale.  Figure 1 and Movie S1). One image was excluded from analysis due to a technical fault resulting in loss of the cut material and the production of a blank image.

| Expert production of ground truth data
Ground truth segmentations for two ROIs were obtained by manual annotation in the Amira software package. 33   including the project workflow and supporting materials such as the project tutorial ( Figure S2). Project development took approximately 6 months, during this time the project workflow was designed and all supporting materials produced, including an in-depth tutorial to comprehensively explain the NE segmentation task ( Figure S2). Prior to launch, the project was refined through a multi-step review process involving thorough assessment of the project by both Zooniverse volunteers and the Zooniverse research team. For key Zooniverse terms, refer to https://help.zooniverse.org/getting-started/glossary/.
Slices were embedded within the project 'workflow'. The 'workflow' of a Zooniverse citizen science project refers to the series of tasks a volunteer is asked to complete when presented with data in the project's classification interface.
In the EAC project workflow, upon being presented with one of the uploaded cell slices at random, volunteers were asked to perform the task of segmenting the NE using a Freehand Drawing Tool applied directly to the image in a web browser ( Figure S1E). Upon submission of the classification, the individual lines drawn by the volunteers were recorded as arrays of x,y pairs defining a line path ( Figure S1F).
To support volunteers in the task of NE segmentation, a detailed project tutorial was provided on the classification interface ( Figure S2). To enable more accurate annotation, pan and Zoom functionality was

| Data aggregation with CRIA
Multiple volunteer segmentations were produced for each slice uploaded to the EAC project. It was therefore necessary to remove outlying data and establish a 'consensus' segmentation for each slice.
The CRIA algorithm was developed to aggregate the volunteer segmentations. In this approach, first, each individual volunteer segmentation was converted into a closed loop. This procedure was performed for all segmentations associated with each slice of the ROI.
Next, these closed loops were converted to interior areas and stacked.
A consensus segmentation was determined by taking the outline of all interior areas where half or more of the volunteer segmentations were in agreement ( Figure 3A,F

| Model performance metrics
A commonly applied approach to assess the quality of a model prediction vs ground truth for image data is to directly map the pixels between the two images. We report the F-measure, which is similar to the Dice coefficient. This measures the coincidence of predicted cell membrane to ground truth membrane. The F-measure of the nucleus area (as opposed to the NE) is also reported to enable easier comparison with previous work. As this metric requires a closed area, this was performed on a qualified single slice near the centre of the cell. Finally, we also report the AHD, in both pixels and nm, between the predicted NE and the position of the ground truth NE. This metric takes an average of all minimal distances between pixels in the prediction (P) and ground truth (G):

| Tri-axis prediction
A multi-axis modification of our machine learning model was implemented to improve performance ( Figure S4). Data were downscaled in the xy plane to 50 nm to be isotropic, and the stack transposed to run the model over each axis (Movies S6 and S7). The resulting three orthogonal NE predictions were recombined to generate a final segmentation. All pixels assigned to NE in all three predictions were accepted, and connected components analysis was to remove oversegmented pixels (Section 4). This approach was named 'TAP'. TAP results for each ROI have been made available at http://www.ebi.ac.uk/ biostudies/files/S-BSST448/Predictions. TAP code is available at https://github.com/FrancisCrickInstitute/Etch-a-Cell-Nuclear-Envelope.

| Post-processing of 3D volumes
Connected components analysis 35 was implemented as a postprocessing step at multiple points within the analyses presented.
Within the single-axis implementation of the machine learning model, connected components analysis was used to remove objects below a threshold of 10 000 voxels. This threshold was selected as it resulted in the removal of small areas of erroneous over-segmented pixels, while legitimate membrane was preserved due to its comparatively large size. Connected components analysis was also used to isolate the predicted NE segmentation for the target nuclei within each ROI, to discard any predicted NE associated with peripheral cells potentially present. In the TAP modification of the machine learning model, connected components analysis was similarly used to remove oversegmented pixels by removing objects below a threshold of 10 000 voxels and to identify and isolate the target nuclei within each ROI by selecting the largest connected component. The Python package, scikit-image 36 was used to automate these aspects of the data analysis pipeline (https://github.com/FrancisCrickInstitute/Etch-a-Cell-Nuclear-Envelope). Where NE segmentations produced by TAP are presented for the whole volume (e.g. Figure 4J), objects below a more stringent threshold of 100 000 voxels were removed using Mor-phoLibJ. 23 3D renderings of segmentations were generated using Fiji's 3D Viewer plugin. 37

| Edge detection
Edge detection was performed with Fiji's "Find Edges" function (based on a 2D 3 Â 3 Sobel filter) after data rescaling and pre-processing with a median filter. Edge detection was applied to each slice of a single ROI (cell ID = C001 (ROI 1656-6756-329), Movie S14). Edge detection code is available at https://github.com/FrancisCrickInstitute/Etch-a-Cell-Nuclear-Envelope.

| Code availability
All assets relating to the analysis and training have been made available on public repositories and a single automated pipeline for reproducing the work has been containerized using Docker to capture environment configurations. The agnosticity of the containerized pipeline has been tested by running on a public cloud instance (AWS). Further information regarding re-running the pipeline has been provided in the readme on GitHub. We provide both reproducibility instructions (using the original data) and instructions for applying the trained model to other data sets. Code is available at: https://github.com/FrancisCrickInstitute/Etch-a-Cell-Nuclear-Envelope.