AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation

Coupé, Pierrick; Mansencal, Boris; Clément, Michaël; Giraud, Rémi; de Senneville, Baudouin Denis; Ta, Vinh-Thong; Lepetit, Vincent; Manjon, José V.

doi:10.1007/978-3-030-32248-9_52

Pierrick Coupé¹⁶,
Boris Mansencal¹⁶,
Michaël Clément¹⁶,
Rémi Giraud¹⁷,
Baudouin Denis de Senneville¹⁸,
Vinh-Thong Ta¹⁶,
Vincent Lepetit¹⁶ &
…
José V. Manjon¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11766))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

10k Accesses
12 Citations

Abstract

Whole brain segmentation using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a global convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processing different overlapping brain areas. Inspired by parliamentary decision-making systems, we propose a framework called AssemblyNet, made of two “assemblies” of U-Nets. Such a parliamentary system is capable of dealing with complex decisions and reaching a consensus quickly. AssemblyNet introduces sharing of knowledge among neighboring U-Nets, an “amendment” procedure made by the second assembly at higher-resolution to refine the decision taken by the first one, and a final decision obtained by majority voting. When using the same 45 training images, AssemblyNet outperforms global U-Net by 28% in terms of the Dice metric, patch-based joint label fusion by 15% and SLANT-27 by 10%. Finally, AssemblyNet demonstrates high capacity to deal with limited training data to achieve whole brain segmentation in practical training and testing times.

You have full access to this open access chapter, Download conference paper PDF

Transfer Learning from Partial Annotations for Whole Brain Segmentation

Fast Learning from Imperfect Labels to Segment Brain Based on Active Contour Model and 3D U-Net

Automatic Brain Structures Segmentation Using Deep Residual Dilated U-Net

Keywords

1 Introduction

Quantitative brain analysis is crucial to better understand the human brain and to detect pathologies. However, whole brain segmentation is still a very challenging problem, mostly due to the high number of anatomical labels compared to the limited number of available training data. Indeed, manual segmentation of the whole brain is a very tedious and difficult task, preventing the production of large annotated datasets. To address this question, several methods have been proposed in the past years. One of the main references in the domain is the patch-based joint label fusion (JLF) which won the MICCAI challenge in 2012 [1]. More recently, deep leaning (DL) methods have also been proposed. Due to limited GPU memory, first attempts were based on patchwise strategies [2, 3] or 2D segmentation (slice by slice) [4]. Last year, first 3D fully convolutional network methods were proposed using reduced input data size (i.e., 128 × 128 × 128 voxels) [5] or Spatially Localized Atlas Network Tiles (SLANT) strategy [6]. This latter framework divides the whole volume into overlapping sub-volumes, each one being processed by a different U-Net [7] (e.g., 27). The SLANT strategy addresses the problem of memory and simplifies the complex problem of whole brain segmentation into simpler problems, better suited to limited training data.

In this paper, we propose to extend this framework by using a much larger number of simpler U-Nets (i.e., 250). The main addressed question is the optimal organization of this large ensemble. To this end, we propose a new framework called AssemblyNet. Inspired by decision-making process developed by human societies to deal with complex problems, we decided to model a parliamentary system based on two separate assemblies. Such bicameral – meaning two chambers – parliament has been adopted by many countries around the world. A bicameral system is usually composed of an upper and a lower chamber, both having their own independency to ensure the balance of power. However, an assembly may communicate its vote to the other for amendment. Such parliamentary system is capable of dealing with complex decisions and reaching a consensus quickly.

2 Methods

2.1 General Overview

In AssemblyNet, both assemblies are composed of U-Net considered as “assembly members” (see Fig. 1). Each member represents one territory (i.e., brain area) in the final vote. To this end, we used spatially localized networks where each U-Net only processes a sub-volume of the global volume as done in [6]. Sub-volumes overlap each other, so the final segmentation results from an overcomplete aggregation of local votes. A majority vote is used to obtain the global segmentation. Moreover, each member can share knowledge with his nearest neighbor in the assembly. In particular, we propose a novel nearest neighbor transfer learning strategy, where weights of the spatially nearest U-Net are used to initialize the next U-Net. In addition, we also propose to use prior knowledge on the expected final decision which can be viewed as the bill (i.e., draft law) submitted to an assembly for consideration. As prior knowledge, we decided to use non-linearly registered Atlas priors. Finally, we also propose modeling communication between both assemblies using an innovative strategy. In AssemblyNet, we used a multiscale cascade of assemblies where the first assembly produces coarse decision at 2 × 2 × 2 mm. This coarse decision is transmitted to the second assembly for analysis at 1 × 1 × 1 mm. This amendment procedure is similar to an error correction or a refinement step. After consideration by both assemblies, the bill under consideration becomes a law which represents the final segmentation in our system.

Our contributions are threefold: (i) the use of prior knowledge based on fast atlas registration, (ii) a knowledge sharing between CNNs using nearest neighbor transfer learning and (iii) an iterative refinement process based on a multiscale cascade of assemblies.

2.2 Proposed Framework

Preprocessing: To homogenize input orientations and intensities, all the images are first preprocessed with the following steps: (i) denoising [8], (ii) inhomogeneity correction [9], (iii) affine registration into the MNI space (181 × 217 × 181 voxels at 1 × 1 × 1 mm) [10], (iv) tissue-based intensity normalization [11] and (v) brain extraction [12]. Afterwards, image intensities are centralized and normalized within the brain mask.

Atlas Priors: To obtain priors knowledge on the expected results, we perform a non-linear registration of the MICCAI 2012 Multi-Atlas Labeling Challenge atlas to the subject under consideration.

Assembly Description: Each assembly is composed of 125 U-Nets equally distributed in the MNI space along each axis (i.e., 5 along x, y and z). Each 3D U-Net processes a sub-volume large enough to ensure at least 50% of overlap between sub-spaces. At the end, a majority vote is used to aggregate the local votes.

Nearest Neighbor Transfer Learning: To enable knowledge sharing between U-Nets of an assembly, we propose a new transfer learning where the weights of the nearest U-Net are used to initialize the next U-Net. In practice, we only copy the weights of the descending path of the U-Net architecture. At the beginning, we train the first U-Net from scratch. Then, each U-Net on the first column is initialized with weights of the previous U-Net. Once the first column is trained, each U-Net of the next column is initialized with the U-Net at the same position on the previous column and so on. Finally, once the first 2D plane of U-Nets is trained, each U-Net of the next 2D planes is initialized with the U-Net at the same position on the previous plane and so on.

Multiscale Cascade of Assemblies: To make our decision-making system faster and more robust, we decided to use a multiscale framework. Consequently, the first assembly at 2 × 2 × 2 mm produces a coarse segmentation. Afterwards, an up-sampling of this segmentation to 1 × 1 × 1 mm is performed using nearest neighbor interpolation. The second assembly estimates the final result at 1 × 1 × 1 mm.

3 Experiments

3.1 Datasets

Training Dataset: 45 T1w MRI from the OASIS dataset [13] manually labeled according to the BrainCOLOR protocol were used for training. The selected images were the same than the ones used in [6]. All the used images and manual segmentations are from Neuromorphometrics Inc. During our experiments, we used the 132 anatomical labels consistent across subjects (see [6]).

Testing Dataset: 19 T1w MRI manually labeled according to the BrainCOLOR protocol were used for testing. These MRI come from three different datasets: 5 from the OASIS dataset, one from the colin27 cohort [14] and 13 from the CANDI database [15]. This testing dataset is the same than the one used in [6].

3.2 Implementation Details

Data Augmentation: First, the training images were flipped along mid sagittal plane in the MNI space. Then, we used MixUp data augmentation during training to minimize overfitting problems [16]. This method performs a linear interpolation of a random pair of training examples and their corresponding labels.

Training Framework: For all the networks, we used the U-Nets architecture proposed in [6], but with a lower number of filters. Instead of using a basis of 32 filters of 3 × 3 × 3–32 for the first layer, 64 for the second and so on – we selected a basis of 24 filters of 3 × 3 × 3 to reduce by 25% the network size. Moreover, we used the same parameters for all the U-Nets with: batch size = 1, optimizer = Adam, epoch = 100, loss = Dice and dropout = 0.5 after each block of the descending path. For the U-Nets of the first assembly at 2 × 2 × 2 mm, we used input resolution = 32 × 48 × 32 voxels and input channel = 2 (i.e., T1w and Atlas priors). For the U-Nets of the second assembly at 1 × 1 × 1 mm, we used input resolution = 64 × 72 × 64 voxels and input channel = 3 (i.e., T1w, Atlas priors and up-sampled coarse segmentation). In addition, to compensate for the small batch size, we performed model weights averaging. At the end of the 100 epochs, we performed additional 20 epochs where the model estimated at each epoch is averaged with previous ones. Such average of model weights along the optimization trajectory leads to better generalization than usual training [17]. Finally, we also performed dropout at test time [18]. For each U-Net, we generate 3 different outputs before averaging them. Such method helps reducing variance of the networks. As in [6], the experiments were done with an NVIDIA Titan Xp with 12 GB memory and thus processing times are comparable.

Computational Time: The preprocessing steps take around 90 s. The non-linear registration of the atlas takes less than 5 s thanks to a deep leaning framework similar to [19]. The first assembly at 2 × 2 × 2 mm requires 3 min to segment an image while the second assembly at 1 × 1 × 1 mm requires 5 min. At the end, the final segmentation is registered back to the native space using inverse affine transform estimated during preprocessing. This interpolation takes around 30 s. Therefore, the full AssemblyNet process takes around 10 min including preprocessing and inverse registration back to the native space.

3.3 Validation Framework

First, for each testing subject, we estimated the average Dice coefficient on the 132 considered anatomical labels (without background) in the native space. Afterwards, we estimated the global mean Dice in % over the 19 testing images. In this study, we compared AssemblyNet with several state-of-the-art methods. First, the patch-based joint label fusion (JLF) [1] is used as reference. In addition, we included U-Net [7], SLANT-8 and SLANT-27 methods as proposed in [6]. SLANT-8 is based on 8 U-Nets processing non-overlapping sub-volumes of 86 × 110 × 78 voxels while SLANT-27 is based on 27 U-Nets processing overlapping sub-volumes 96 × 128 × 88 voxels. All these methods were trained on the 45 training images described in Sect. 3.1. Finally, we included SLANT-27 trained on 5111 auxiliary images segmented using JLF and fined tuned on the 45 training images. This is the best published results for whole brain segmentation to our knowledge. For all these methods we report the results published in [6].

4 Results

First, we evaluated the proposed contributions (see Table 1). Compared to baseline results at 2 × 2 × 2 mm (Dice = 67.4%), the used of Atlas priors provided a gain of 0.5% in term of mean Dice. Moreover, the combination of Atlas priors and transfer learning improved by 0.7% the baseline mean Dice. In addition, multiscale cascade of assemblies increased by 1.6% the mean Dice obtained with Assembly at 1 × 1 × 1 mm without multiscale cascade (Dice = 72.2%). Finally, AssemblyNet outperformed by 8.7% the mean Dice obtained with baseline Assembly at 2 × 2 × 2 mm.

Table 1. Evaluation of the proposed contributions. The mean Dice is evaluated on the 19 images of the test dataset in the native space for the 132 considered labels (without background). Testing time includes image preprocessing and registration back to the native space.

Full size table

Afterwards, we compared AssemblyNet with state-of-the-art methods (see Table 2). When considering only methods trained with 45 images, AssemblyNet improved mean Dice obtained with U-Net and SLANT-8 by 28%, JLF by 15% and SLANT-27 by 10%. AssemblyNet was also efficient in term of training and testing times compared to SLANT-based methods. It has to be noted that Assembly at 2 × 2 × 2 mm outperformed all the methods except AssemblyNet while working at lower resolution. Finally, compared to SLANT-27 trained over 5111 + 45 images, our method provided slightly better results (without library extension) while being faster to train and to execute. According to [6], their library extension required 21 CPU years to be completed. Consequently, such an approach is impractical or very costly using a cloud-based solution. Therefore, all these results demonstrate that AssemblyNet is very efficient to deal with limited training data and to accurately achieve segmentation in practical training and testing times.

Table 2. Methods comparison on the 19 images of the testing dataset. The mean Dice is evaluated on the 132 considered labels (without background) in the native space.

Full size table

Finally, we analyzed the performance of the methods according to the dataset. Mean Dice coefficients obtained on each testing dataset (i.e., OASIS, CANDI and Colin27) are provided in Table 3. As expected, all the methods performed better on adult scans from the OASIS dataset since the training dataset comes from the same cohort. On this dataset AssemblyNet outperformed all the methods. On child scans from the CANDI dataset acquired with different protocols, we can note a dramatic drop in performance for all the methods except for AssemblyNet and SLANT-27 trained on 5111 + 45 images. The auxiliary library used by SLANT-27 is based on 9 different databases (64 acquisition sites) and includes more than 1000 scans of child making this method able to segment child MRI. By using only 45 adult scans coming from a single acquisition site, AssemblyNet produced similar results on this dataset. This demonstrates the robustness of the proposed framework to unseen acquisition protocol and ages. Finally, on the high-resolution Colin27 image, AssemblyNet obtained the best segmentation accuracy. As for SLANT-27, one could expect segmentation improvements for AssemblyNet using library extension. We will investigate such framework in future work.

Table 3. Methods comparison on the different testing datasets (5 adult scans from OASIS, 13 child scans from CANDI child and the high-resolution Colin27 image based on scans average).

Full size table

Figure 2 shows segmentation in the native space for a subject from the OASIS dataset and one for the CANDI cohort. This figure shows segmentation at 2 × 2 × 2 mm and the improvement obtained using refinement at 1 × 1 × 1 mm with the second assembly.

5 Conclusion

In this paper, we proposed to use of a large number of CNNs to perform whole brain segmentation. We investigated how to organize this large ensemble of CNNs to quickly and accurately segment the brain. To this end, we designed a novel deep decision-making process called AssemblyNet based on two assemblies of U-Nets. Our validation showed the very competitive results of AssemblyNet compared to state-of-the-art methods. We also demonstrated that AssemblyNet is very efficient to deal with limited training data and to accurately achieve segmentation in a practical training and testing times.

References

Wang, H., Yushkevich, P.: Multi-atlas segmentation with joint label fusion and corrective learning—an open source implementation. Front. Neuroinform. 7, 27 (2013). https://doi.org/10.3389/fninf.2013.00027
Article Google Scholar
de Brebisson, A., Montana, G.: Deep neural networks for anatomical brain segmentation. In: IEEE CVPR Workshops, pp. 20–28 (2015)
Google Scholar
Wachinger, C., et al.: DeepNAT: deep convolutional neural network for segmenting neuroanatomy. NeuroImage 170, 434–445 (2018)
Article Google Scholar
Roy, A.G., Conjeti, S., Sheet, D., Katouzian, A., Navab, N., Wachinger, C.: Error corrective boosting for learning fully convolutional networks with limited data. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 231–239. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_27
Chapter Google Scholar
Wong, K.C.L., Moradi, M., Tang, H., Syeda-Mahmood, T.: 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 612–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_70
Chapter Google Scholar
Huo, Y., et al.: 3D whole brain segmentation using spatially localized atlas network tiles. NeuroImage 194, 105–119 (2019)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Manjón, J.V., et al.: Adaptive non-local means denoising of MR images with spatially varying noise levels. JMRI 31(1), 192–203 (2010)
Article Google Scholar
Tustison, N.J., et al.: N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29(6), 1310–1320 (2010)
Article Google Scholar
Avants, B.B., et al.: A reproducible evaluation of ANTs similarity metric performance in brain image registration. NeuroImage 54(3), 2033–2044 (2011)
Article Google Scholar
Manjón, J.V., et al.: Robust MRI brain tissue parameter estimation by multistage outlier rejection. Magn. Reson. Med. 59(4), 866–873 (2008)
Article Google Scholar
Manjón, J.V., et al.: Nonlocal intracranial cavity extraction. IJBI 2014, 10 (2014)
Google Scholar
Marcus, D.S., et al.: Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19(9), 1498–1507 (2007)
Article Google Scholar
Collins, D.L., et al.: Design and construction of a realistic digital brain phantom. IEEE TMI 17(3), 463–468 (1998)
Google Scholar
Kennedy, D.N., et al.: CANDIShare: a resource for pediatric neuroimaging data. Neuroinformatics 10(3), 319–322 (2012)
Article Google Scholar
Zhang, H., et al.: Mixup: beyond empirical risk minimization. arXiv:1710.09412 (2017)
Izmailov, P., et al.: Averaging weights leads to wider optima and better generalization. arXiv:1803.05407 (2018)
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1019–1027 (2016)
Google Scholar
Balakrishnan, G., et al.: VoxelMorph: a learning framework for deformable medical image registration. In: IEEE TMI (2019)
Google Scholar

Download references

Acknowledgement

This work benefited from the support of the project DeepvolBrain of the French National Research Agency (ANR-18-CE45-0013). This study was achieved within the context of the Laboratory of Excellence TRAIL ANR-10-LABX-57 for the BigDataBrain project. Moreover, we thank the Investments for the future Program IdEx Bordeaux (ANR-10-IDEX- 03- 02, HL-MRI Project), Cluster of excellence CPU and the CNRS. This study has been also supported by the DPI2017-87743-R grant from the Spanish Ministerio de Economia, Industria Competitividad. The authors gratefully acknowledge the support of NVIDIA Corporation with their donation of the TITAN Xp GPU used in this research.

Author information

Authors and Affiliations

CNRS, Univ. Bordeaux, Bordeaux INP, LABRI, UMR5800, 33400, Talence, France
Pierrick Coupé, Boris Mansencal, Michaël Clément, Vinh-Thong Ta & Vincent Lepetit
Bordeaux INP, Univ. Bordeaux, CNRS, IMS, UMR 5218, 33400, Talence, France
Rémi Giraud
CNRS, Univ. Bordeaux, IMB, UMR 5251, 33400, Talence, France
Baudouin Denis de Senneville
ITACA, Universitat Politècnica de València, 46022, Valencia, Spain
José V. Manjon

Authors

Pierrick Coupé
View author publications
You can also search for this author in PubMed Google Scholar
Boris Mansencal
View author publications
You can also search for this author in PubMed Google Scholar
Michaël Clément
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Giraud
View author publications
You can also search for this author in PubMed Google Scholar
Baudouin Denis de Senneville
View author publications
You can also search for this author in PubMed Google Scholar
Vinh-Thong Ta
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Lepetit
View author publications
You can also search for this author in PubMed Google Scholar
José V. Manjon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierrick Coupé .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coupé, P. et al. (2019). AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11766. Springer, Cham. https://doi.org/10.1007/978-3-030-32248-9_52

Download citation

DOI: https://doi.org/10.1007/978-3-030-32248-9_52
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32247-2
Online ISBN: 978-3-030-32248-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation

Abstract

Similar content being viewed by others

Transfer Learning from Partial Annotations for Whole Brain Segmentation

Fast Learning from Imperfect Labels to Segment Brain Based on Active Contour Model and 3D U-Net

Automatic Brain Structures Segmentation Using Deep Residual Dilated U-Net

Keywords

1 Introduction