Veronika Cheplygina - Shortcuts and other shortcomings in machine learning for medical imaging - Perspectives on Scientific Error 2024
For slides, see osf.io/ayfek/
The application of machine learning (ML) to medical imaging diagnosis has attracted a lot of attention in recent years, with numerous reports of recognising medical images more accurately than human experts (for an overview see Liu et al., 2019). Yet progress in clinical practice has not been proportional to claims. For example Roberts et al. (2021) found that none of the 62 published studies on ML for COVID-19 had potential for clinical use. Studies for other clinical applications of ML have also failed to find reliable published prediction models.
The increased popularity of ML in recent years is often explained by two developments. First, there are several large publicly available datasets. Second, open source deep-learning toolboxes allow development of algorithms without specialised domain knowledge, allowing more researchers into a field. Despite these seemingly ideal conditions for reproducibility, the state of ML in medical imaging is not as positive as one might think. There are various reasons for this which we outline in (Varoquaux and Cheplygina, 2022), here we highlight two.
One reason is that large sample sizes are not a panacea. There is a tendency to expect that a clinical task can be “solved” if the dataset is large enough. However, not all clinical tasks translate neatly into ML tasks. Furthermore, creating larger datasets often comes at the expense of quality, leading algorithms to learn spurious correlations or “shortcuts”. For example, an algorithm might learn that if a patient’s chest x-ray shows a drain - a treatment for a collapsed lung - that that patient is likely to suffer from the collapsed lung condition (Oakden-Rayner, 2020). Similarly, our recent results (in preparation) show that lung diseases can be diagnosed with high accuracy, even if the lungs are hidden from the x-ray.
One reason is that the availability of data and code, plus the theoretical option to “infinitely” repeat experiments (for example, with different subsets of data, different initialization points of the algorithms, and so forth) creates an illusion of generalization. Since there are many degrees of freedom to how such repetition can be done, for practical reasons researchers tend to not do this exhaustively, but might be tempted to formulate their conclusions more generally.
In this talk I dive deeper into these problems and hopefully, with the help of the audience, also explore some solutions.
Негізгі бет Ғылым және технология Veronika Cheplygina - Shortcuts and shortcomings in machine learning for medical imaging - PoSE 2024
Пікірлер: 1