Stability of MRI features

Stability of MRI features

In the hospital, brain scans are usually assessed by the trained eye of a radiologist. In research however, we use software to make measurements in brain scans. These measures can be used for statistical and machine-learning analyses. This process is called Radiomics. In Poirot et al. 2022, we investigated if these measures are likely to vary due to differences in the software used to outline them. We also check if this affects their usefulness.

Key takeaways

  • Small differences in how brain regions are outlined can affect the measures that we get and how accurate our predictions are.
  • The reliability of these numbers doesn’t seem to depend much on which part of the brain we’re looking at, but simpler features are usually more dependable.
  • Newer methods that use deep learning to outline brain regions might give us more accurate numbers.

Radiomics

The word radi-omics is a combination of ‘radiology’ and the suffix ‘-omics’. Radiology is the main medical specialty to uses medical imaging. It relies heavily on the interpretation of images by medical experts. This is called the “qualitative” assessment. In contrast, the suffix “-omix” implies a measured approach. This is called a “quantitative” approach. Radiology is the backbone of the diagnosis of diseases and guiding their treatment using imaging. However, radiomics has proven to be useful in several applications, primarily in the diagnosis and treatment of cancer.

Applications

Scientific studies have found that radiomics can be useful in understanding conditions like schizophrenia, ADHD, bipolar disorder, and depression. Radiomics is good at spotting small changes in brain scans that the naked eye can’t see. But because it’s so sensitive, even small differences in how we measure brain properties can affect the results. This makes it hard for these applications to be used in hospitals. Thus, radiomics sensitivity is both a blessing and a curse.

One source of issues for radiomics is the disagreement about how the brain regions are delineated. The delineation of brain regions is called segmentation. The segmentation is often performed using software. But each software tool creates (slightly) different segmentations. For example, an animation of the segmentations created by three different tools is shown below.

An animation of the front view of three segmentation methods showing differences An animation of the side view of three segmentation methods showing differences

As with people, each delineation tool will produce slightly different delineations of brain regions.

Q&A

Segmentation is one part of several steps that take an image of the brain to statistical analysis. Commonly, segmentation stands at the middle of these steps. Segmentation is first followed by the extraction of measurements from the segmented regions. The extraction is then followed by the statistical analysis, or prediction using these measures. Our paper took a closer look at the disagreement among segmentation tools, and how this affected the radiomics analysis down the line.

Segmentation: Which brain parts are hardest to agree on?

Smaller brain regions, which have relatively more surface to their volume surface suffer most from the disagreement among segmentation tools. In the first figure of the published work, we showed how well the different tools agreed, on a scale from zero to one. Although some large structures do very well, many smaller deep brain structures do somewhat poorly. Researchers that are interested in measuring these poorly performing regions may need to reconsider using radiomics, because potential research findings may be hard to reproduce.

Extraction: Which measures are most dependable?

Once we have defined a brain region there are many things we can measure about it. Its color, its contrast, all kinds of properties about its shape and even patterns within the regions. There are so many that it is easiest to discuss them as seven groups. These groups are, in order of complexity, related to the shape of the brain region, the image intensity (also known as first-order), and image intensity patterns, of which there are five groups. We found that measures with higher complexity were also more affected by 1) disagreement in segmentation tools, to 2) scan-rescanning (scanning the same patient twice).

Prediction: Does it matter in the end?

Finally, we tested if these changes affected how useful these measures were for predicting something. Interestingly, the data set we used was used before to test for changes in the brain after staying up all night. Healthy volunteers stayed at the hospital all night and were deprived of sleep. A control group went to sleep as normal and all were scanned before and after the night. This previous work already showed significant short-term changes due to sleep deprivation, which is interesting in its own right.

In this work, we used the prediction of sleep deprivation as a test case. Knowing that a machine learning model can identify who these exhausted patients were, would its performance be substantially different due to the different segmentation methods chosen? We created a deep-learning model to identify the sleep-deprived patients in the group. Interestingly, we found that this model did better when it used segmentations that were created by newer, deep learning-based methods, as compared to traditional methods. This suggests that these newer segmentation techniques might offer advantages.

Conclusion

In conclusion, our exploration into the impact of segmentation tools on brain scan analysis underscores the critical need for consistency and precision in radiomics. Minor variations in segmentations can significantly influence the derived measures, and affect the accuracy of predictions. However, amidst these challenges, new deep learning techniques were more resilient. Moving forward, a concerted effort towards standardization and adoption of advanced methodologies holds the key to unlocking the full potential of radiomics in medical research and diagnosis.