Abstract
With the increased need for multi-center magnetic resonance imaging studies, problems arise related to differences in hardware and software between centers. Namely, current algorithms for brain volume quantification are unreliable for the longitudinal assessment of volume changes in this type of setting. Currently most methods attempt to decrease this issue by regressing the scanner- and/or center-effects from the original data. In this work, we explore a novel approach to harmonize brain volume measurements by using only image descriptors. First, we explore the relationships between volumes and image descriptors. Then, we train a Relevance Vector Machine (RVM) model over a large multi-site dataset of healthy subjects to perform volume harmonization. Finally, we validate the method over two different datasets: (i) a subset of unseen healthy controls; and (ii) a test-retest dataset of multiple sclerosis (MS) patients. The method decreases scanner and center variability while preserving measurements that did not require correction in MS patient data. We show that image descriptors can be used as input to a machine learning algorithm to improve the reliability of longitudinal volumetric studies.
This project received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 765148.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Large scale multi-site studies are of extreme importance in neuroimaging, both for research purposes and in clinical practice. Such studies face several challenges due to hardware- or center-related variability. It is well known that scanner-factors such as manufacturer, magnetic field and gradient non-linearly influence volume measurements [4, 14] obtained from structural Magnetic Resonance Imaging (MRI). At the image level, these factors are coupled with a high variability of intensities across patients and scanners, which can affect tasks like the segmentation of brain structures [16]. This effect is exemplified in Fig. 1, where three T1-weighted MR images from the same patient obtained on different scanners and their corresponding segmentations are represented.
The need to address multi-scanner and -center data harmonization is evidenced in the follow-up of Multiple Sclerosis (MS) patients. These patients exhibit an increased rate of brain atrophy when compared to healthy subjects, which has been linked to impairment [1]. However, it has been suggested that brain atrophy can only be reliably estimated over periods of at least five years [2], due to the variability caused by scanner and center factors.
Besides image processing approaches that aim at matching image intensity distributions to provide a more consistent input to the segmentation method [11, 12], recent studies have focused on statistical harmonization of volumetric measurements based on scanner- or center-specific information. This type of methods generally apply regression techniques to correct measurements. Linear mixed-effects models using patient- and scanner-specific information as random effects were explored by [5] and [8]. Recently, [6] used an algorithm devised for genomics that extends the same type of model to account for site-specific factors. A data-driven approach based on independent component analysis was explored by [4], where correction was performed by selecting independent components related to scanning parameters. However, since these methods rely on scanner- or acquisition-specific information, they do not generalize and need to be adapted when used in new settings. Additionally, such information can be incomplete, especially in historical data. As such, it would be of interest to use information that is encoded in the images themselves, or that can be extracted from the volume quantification method to build more robust and adaptive techniques.
To address these issues, in this paper we present a novel statistical harmonization approach based on image descriptors and a machine learning algorithm. We first explore the relations between image-extracted properties and brain volume measurements that we could further exploit for harmonization. We then train a machine learning algorithm based on Automatic Relevance Determination on healthy data to perform volumetric corrections. We validate the method on a set of unseen healthy controls, and finally test it on a test-retest dataset of MS patients.
2 Data
Healthy Subjects. This dataset comprises 1996 T1-weighted (T1w) MRI scans from healthy subjects. The data is a compilation of several public datasets, such as [3, 10], and some proprietary data. The overall set comprises data from several different centers and scanner types from the major vendors (Siemens, Philips, GE). Magnetic field strengths (1.5T or 3T) and T1w sequence types also vary. For most of the data we have information regarding age and sex of the subject, scanner type, magnetic field strength and additional acquisition parameters like echo time (TE) and repetition time (TR). For building and testing our model, we randomly divided the data into training (\(70 \%\)) and test sets (\(30 \%\)).
Patient Data. To further validate the approach we test it in a dataset containing data from 10 MS patients as detailed in [7]. Each patient was scanned twice in three different 3 T scanners: Philips Achieva, Siemens Skyra and GE Discovery MR450w. An example is depicted in Fig. 1. We observed that one of the patients was an extreme case, showing very enlarged ventricles. Given that the volumetric measurements in such a case are prone to errors and are considered unreliable, this patient’s data was discarded from further analysis.
2.1 Data Pre-processing and Feature Extraction
For each image we compute gray matter (GM) and white matter (WM) volumes using the well established atlas-based method described in [7]. Whole brain (WB) volume is then defined as the sum of WM and GM volumes.
We are interested in descriptors related to the T1w images that encode information about errors and bias in brain segmentations. Since the quality of a segmentation depends on a good registration to the atlas and is influenced by the contrast and noise present in an image, it is valuable to explore features that convey such information. We extract a total of 16 features of two main types: (i) Alignment information regarding the registration of the T1w image to the MNI atlas space, which includes decomposing the affine transformation (rotation angles, scale and shear factors in three directions), and measuring the similarity between the registered images using Normalized Mutual Information (NMI); and (ii) Contrast to Noise Ratio (CNR) between tissue types and between different brain structures (e.g., lobes and cerebellum). CNR is given by:
where \(\bar{I_t}\) represents the mean intensity of some tissue or structure t and \(\sigma ^2_t\) is the variance of the image intensities across this structure. We compute \(CNR_{t_1, t_2}\) by taking \(t_1\) as the tissue or structure with higher average image intensity than \(t_2\).
The brain volumes and some of the computed descriptors (e.g., CNR and NMI) are known to be age dependent [13]. As such, age is used as a feature at training time, but not at test time. For analysis and comparison, we age-detrend the volumes by subtracting an age-matched estimated median value. CNR and NMI are corrected by fitting a linear regressor to the data.
3 The Relevance Vector Machine for Data Harmonization
To harmonize brain volumes, we subtract correction terms based on estimated variability trends from the original volumes. To determine the variation in the volumetric data that can be explained by the aforementioned image descriptors, we fit a linear model using the extracted features as independent variables. When choosing the model, we take a few important considerations into account: (i) we have 16 image-extracted features plus age; (ii) some of these are related only to one of the brain volumes; (iii) the features are not always unrelated; and (iv) we are interested in a probabilistic model, to capture uncertainty in our predictions. Given that standard generalized linear models do not address all these considerations, we investigate using a probabilistic machine learning technique.
The Relevance Vector Machine (RVM) is defined within a fully probabilistic framework and includes a mechanism of automatic relevance determination [9]. As described in [15], the model defines a conditional distribution for real-valued input-target vector pairs \(\{\mathbf {x}_n, t_n\}_{n=1}^{N}\), of the type: \(p(t_n|\mathbf {x}_n) = \mathcal {N}(t_n|y(\mathbf {x}_n), \beta ^{-1}),\) which specifies a Gaussian distribution over \(t_n\) with mean \(y(\mathbf {x}_n)\) and precision (inverse variance) \( \beta \). Here \(\{\mathbf {x}_n\}_{n=1}^{N}\) are the set of extracted features and \(\{t_n\}_{n=1}^{N}\) the corresponding volume measurement for each image n in a training dataset of size N. The function \(y(\mathbf {x})\) is given by a linear combination of basis functions \(\varvec{\varPhi }(\mathbf {x}) = (\varPhi _1(\mathbf {x}), \ldots , \varPhi _N(\mathbf {x}))\) with weights \(\mathbf {w} = (w_0, \ldots , w_N)^{\mathrm {T}}\):
where the \(i^{\mathrm {th}}\) basis function \(\varPhi _i(\mathbf {x}) \equiv K(\mathbf {x},\mathbf {x_i})\) is a kernel centered around the \(i^{\mathrm {th}}\) training sample. In order to avoid over-fitting due to the large numbers of parameters in the model, a zero-mean Gaussian prior probability distribution is defined over the weights \(\mathbf {w}\). Moreover, a separate hyperparameter \(\alpha _i\) is introduced for each individual weight \(w_i\), representing the precision of the corresponding weight:
where \(\varvec{\alpha } = (\alpha _0, \ldots , \alpha _N)^{\mathrm {T}}\). Using the resulting model, relevance vector learning searches for the hyperparameters \(\varvec{\alpha }\) and \(\beta \) that maximize the marginal likelihood \(p(\mathbf {t}|\varvec{\alpha },\beta )\) of the training data, where \(\mathbf {t} = (t_1, \ldots , t_N)^{\mathrm {T}}\). Defining \(\varvec{\varPhi }\) as the \(N \times N \) matrix with \(\varvec{\varPhi }(\mathbf {x}_n)\) in \(n^{\mathrm {th}}\) row, the learning algorithm proceeds by iteratively updating \(\varvec{\alpha }\) and \(\beta \) as follows [15]:
where \(\varvec{\varSigma } = \left( \beta \varvec{\varPhi }^{\mathrm {T}} \varvec{\varPhi } + \mathrm {diag}(\varvec{\alpha })\right) ^{-1}\) and \(\varvec{\mu } = \beta \varvec{\varSigma } \varvec{\varPhi }^\mathrm {T} \mathbf {t}\) are the posterior covariance and mean for the weights, respectively. In practice, during re-estimation, many \(\alpha _i\)’s tend to infinity, which causes the posterior distributions of the corresponding weights to peak around zero. The basis functions associated with these do not influence the predictions and can be pruned out, resulting in a sparse model. The remaining training samples with non-zero weights are called relevance vectors.
Once the model is trained, we can take a set of descriptors \(\mathbf {x}_*\) of an unseen image, and try to predict the corresponding volume based on these descriptors alone using the posterior mean \(\varvec{\mu }\): \(y_* = \phi (\mathbf {x}_*) \varvec{\mu }\). Finally, we can obtain a corrected volume \(y_{corr}\) by subtracting the estimated contribution of the image descriptors from the original volume y: \( y_{corr} = y - \varvec{\phi }(\mathbf {x}_*)\varvec{\mu } \).
4 Results
4.1 Verification of Observable Correlations in Data
To verify whether there are correlations between image descriptors and the measured volumes, we built a cross-correlation map between these variables (see Fig. 2). Analysing these correlations reveals that image descriptors like NMI and CNR are related to scanner/acquisition specific features. These same image descriptors are in turn correlated to the brain volumes, as well as scale.
4.2 Harmonization of Healthy Population Data Based on RVM
We trained and tested the RVM method described in Sect. 3 for linear regression on the data set of healthy subjects (Sect. 2). We used different kernel types (linear and Gaussian - RBF) and searched for the model that best preserved biological information - namely age - while decreasing the scanner/center-specific variability. Thus, the model should decrease global variance in the data but maintain the original median of the population defined by the training set, given that we build on the assumption that this sample contains enough variability to represent the heterogeneity of scanner and center effects. To evaluate the performance, we produced boxplots to represent the distribution of the measured volumes in each scanner in the test set with at least ten subjects (see Fig. 3). We first removed the age dependency as estimated from the training set, such that the variability due to age is not accounted for. We compare median and standard deviation, preferring values closer to zero, since they represent a decrease in variability while preserving the global trend. Table 1 presents the median values of these same age-detrended values. After correction the distributions from different scanners become more similar. For both GM and WM the linear kernel produced lower or comparable mean and decreased standard deviation. For WB we compared applying a linear kernel to summing the previously corrected WM and GM volumes and verified that the last option performed better.
4.3 Harmonization of Test-Retest Data
To further validate the method, we applied it to the test-retest dataset of MS patients described in Sect. 2. The results are summarized in Fig. 4. On the left side, the absolute volumes before and after correction are represented. First we computed for each tissue type the differences between the volumes from images acquired in the same scanner, which provides a measure of the intra-scanner error (Intra-SE). There is no significant difference between the original and corrected volumes for all the tissue types (\(p>0.05\), paired t-test). Then we computed the difference between the averaged volumes of each scanner type against all the other scanner types. For WM there is a statistically significant difference (\(p=2e^{-3}\), paired t-test) between the original and corrected volumes. For GM, the inter-scanner error (Inter-SE) for the original volumes was very small, being comparable to the Intra-SE. After applying the correction, there is no statistical difference between these volumes and the original ones, even if visually there is an increase in the variability which is propagated for the WB.
5 Conclusions and Future Work
In this work we applied a relevance vector machine approach to find the amount of variability in the data that can be explained by variations in image descriptors. We observe that there is a large dependency of brain volumes with the atlas-registered NMI metric, which was initially not expected. NMI measures the goodness of a non-rigid registration step between the image and an atlas, necessary for the methodology we used. The final volumes depend on the goodness of this registration, and in such a way we are correcting for suboptimal segmentation results that derive from a poor registration step.
We demonstrate that it is possible to achieve a certain degree of harmonization of the data based only on image descriptors. To our knowledge, this is the first approach that does not rely on scanner-specific information to perform harmonization. We expect the current method to perform less efficiently than more tailored methods, but to generalize better. A thorough comparison to such methods still needs to be performed, but it is out of the scope of the current paper. This type of solution is interesting for large scale statistics, and could potentially have a positive impact in longitudinal studies. Moreover, the proposed approach allows dealing with missing scanner/center information, a problem not addressed in previous works and very frequent in practice. Nevertheless, in the test-retest setting inter-scanner error is still high when compared to the measured intra-scanner error, which implies that the method does not provide a completely satisfactory correction for patient specific use, and should be further investigated.
Future steps include exploring more image descriptive features that are independent from the segmentation method used and that can encode the presence of geometrical distortions and artifacts. For controlled environments it could be useful to couple general scanner-dependent information with the image descriptors. Additionally, we aim to extend the method to other brain structures of interest and to compare its performance on a controlled dataset to scanner-specific state-of-the art methods. Finally, it is important to keep in mind that in a cross-sectional setting this type of correction does not replace the need for an improved standardization at the image level.
References
Bermel, R.A., Bakshi, R.: The measurement and clinical relevance of brain atrophy in multiple sclerosis. Lancet Neurol. 5(2), 158–170 (2006)
Biberacher, V., et al.: Intra- and interscanner variability of magnetic resonance imaging based volumetry in multiple sclerosis. NeuroImage 142, 188–197 (2016). https://doi.org/10.1016/j.neuroimage.2016.07.035
Biomedical Image Analysis Group, Imperial College London: Ixi dataset. https://brain-development.org/ixi-dataset/. Accessed on 18 Mar 2019
Chen, J., et al.: Exploration of scanning effects in multi-site structural MRI studies. J. Neurosci. Methods 230, 37–50 (2014). https://doi.org/10.1016/j.jneumeth.2014.04.023
Chua, A.S., et al.: Handling changes in MRI acquisition parameters in modeling whole brain lesion volume and atrophy data in multiple sclerosis subjects: comparison of linear mixed-effect models. NeuroImage: Clin. 8, 606–610 (2015)
Fortin, J.P., et al.: Harmonization of cortical thickness measurements across scanners and sites. NeuroImage 167, 104–120 (2018). https://doi.org/10.1016/j.neuroimage.2017.11.024
Jain, S., et al.: Automatic segmentation and volumetry of multiple sclerosis brain lesions from MR images. NeuroImage: Clin. 8, 367–375 (2015). https://doi.org/10.1016/j.nicl.2015.05.003
Jones, B.C., et al.: Quantification of multiple-sclerosis-related brain atrophy in two heterogeneous MRI datasets using mixed-effects modeling. NeuroImage: Clin. 3, 171–179 (2013). https://doi.org/10.1016/j.nicl.2013.08.001
MacKay, D.J.C.: Bayesian interpolation. Neural Comput. 4(3), 415–447 (1992). https://doi.org/10.1162/neco.1992.4.3.415
Marcus, D.S., et al.: Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19(9), 1498–1507 (2007)
Nyúl, L.G., Udupa, J.K.: On standardizing the MR image intensity scale. Magn. Reson. Med. 42(6), 1072–1081 (1999)
Robitaille, N., Mouiha, A., Crépeault, B., Valdivia, F., Duchesne, S.: Tissue-based MRI intensity standardization: application to multicentric datasets. Int. J. Biomed. Imaging 2012, 1–11 (2012)
Salat, D., et al.: Age-associated alterations in cortical gray and white matter signal intensity and gray to white matter contrast. NeuroImage 48(1), 21–28 (2009)
Takao, H., Hayashi, N., Ohtomo, K.: Effect of scanner in longitudinal studies of brain volume changes. J. Magn. Reson. Imaging 34(2), 438–444 (2011). https://doi.org/10.1002/jmri.22636
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1(3), 211–244 (2001). https://doi.org/10.1162/15324430152748236
Zhuge, Y., Udupa, J.K.: Intensity standardization simplifies brain MR image segmentation. Comput. Vis. Image Understand. 113(10), 1095–1103 (2009). https://doi.org/10.1016/J.CVIU.2009.06.003
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Meyer, M.I., de la Rosa, E., Van Leemput, K., Sima, D.M. (2019). Relevance Vector Machines for Harmonization of MRI Brain Volumes Using Image Descriptors. In: Zhou, L., et al. OR 2.0 Context-Aware Operating Theaters and Machine Learning in Clinical Neuroimaging. OR 2.0 MLCN 2019 2019. Lecture Notes in Computer Science(), vol 11796. Springer, Cham. https://doi.org/10.1007/978-3-030-32695-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-32695-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32694-4
Online ISBN: 978-3-030-32695-1
eBook Packages: Computer ScienceComputer Science (R0)