The use of artificial intelligence in medicine and healthcare is thought to be capable of freeing up large quantities of an expert’s time by undertaking exacting and laborious work, saving significant sums of money, improving diagnostic outcomes and democratising medicine by making the world’s experts available globally inside a computer.
I salute those companies that have been able to take a diagnostic product to market but one wonders why, despite the huge amount of effort employed there are so few successful Machine Learning based software tools available now to healthcare professionals?
I run a company that develops tools used to support Machine Learning with a particular interest in Medical Technology (MedTech). Last week, my colleague Lucille Valentine drew my attention to a paper that had been published in Nature recently. The study looked at more than 300 Machine Learning models developed last year to support the diagnosis and treatment of Covid-19 and concluded that none of them were actually suitable for clinical use.
A very great deal of effort is expended demonstrating what I think of as proof of concept AI models for MedTech. We as a company with a deep interest in the medical deployment of AI get introduced often to medics and academics keen to show-off their latest diagnostic. Indeed I spent two and a bit years from the beginning of 2018 to the start of the Covid pandemic attempting to develop and then commercialise an AI diagnostic, in our case looking at defects in the back of the eye from OCT retinal scans.
I no longer need to be convinced that it is possible to encode the knowledge of a true expert into a Machine Learning (ML) model that can, under very tightly controlled and limited circumstances, accurately provide a useful insight to the medical practitioner. I have seen sufficient demonstrations from academic research groups and adventurous medics to know this to be true. It is telling, I feel, that most demonstrations are led by data scientists not practising doctors. The paucity of true domain experts in the sector is a critical omission addressed by my colleague Chas Nelson in his recent article.
There remains however a gulf between the creation of a MedTech ML-model for demonstration purposes and the real-world checks and balances necessary for deployment into a medical setting. Globally there are an emerging set of regulations and governance frameworks that mandate the safety and effectiveness of AI used in all manner of sectors but particularly as applied to Machine Learning in medicine.
As a professional Engineer by training, perhaps I take a biased view but I see the introduction of regulation and standards into the emerging world of ML MedTech to be a wholly positive development. We all expect that the products that we use every day are safe and effective. We presume that the cars we drive have brakes that work and are equipped with lights and horns that sound. As with every other technical development, the introduction of suitable standards heralds the mainstream adoption of that technology. The fundamental reason that the AI models described in the published paper failed to be adopted is because the models’ creators didn’t properly engage with the necessity to comply with the necessary standards and frameworks from the beginning of the project.
A senior partner in a Venture Capital firm described to me during our recent pitch the current approach to Machine Learning for MedTech as “salami-slicing”. He was referring to the tendency for ML models to be super-focused on one small diagnostic tool. As a result the economics for adoption of the technology doesn’t stack-up that well. We know that for ML to be accurate the scope of application for the tool has to be very tightly defined and that the data sent to the tool has to be constrained within very tight parameters to be valid. The data sets used to train the ML models are already so vast and expensive to create and the compute time necessary to train the AI is so costly that the only way the model can be developed is to limit the scope of application to a manageable but very tightly defined degree. But the scope of the model developed is so small that the number of potential users decreases to the point that it is uneconomic to take the model through clinical trials.
Data is the key to all machine learning, in common with all computing and data science. The same has always been true; the aphorism “Garbage-in, garbage-out” has always applied. As applied to Machine Learning in MedTech; I interpret this to mean that the quality of the AI model developed must be directly dependent upon the quality of the training data provided to train the model. It is worth remembering what we are attempting to achieve: to train a computer to mimic the experience of the world’s best clinicians. To train an AI with poor quality data can only lead to failure.
As a company we often get email offers to support the development of AI from organisations from around the world that don’t appear to be able to offer a reliable provenance for the data to be used or the qualifications of the people adding the annotations. It must be entirely clear where the data is obtained from to support a Machine Learning model that is to be used in any regulated environments.
Guaranteeing the quality of the data annotations provided by a pool of expert medics might be sufficient to provide confidence in the effectiveness of a ML model but if the data has been obtained in a biased or unethical way, many subsequent users of the AI will find it unacceptable. For example if the source data is obtained from patients without their knowledge, such that their x-rays are used in an AI without their permission, many feel that is unacceptable. It is also vital that the data used is unbiased: the health outcomes of minority and deprived groups are already demonstrably poorer than the average. Embedding structural healthcare disadvantages into an AI by excluding disadvantaged patients from the dataset in use has to be considered unethical?
As the regulatory environment for AI develops, different regulators focus upon different concerns. My company’s core values are Fairness, Accountability, Transparency and Ethics; both for the behaviors that we try to promote inside the company but also for the AI that we wish to help people build. By applying our core values, we are confident that we can support the development of ML that will comply with all the emerging regulatory requirements.
At its core, the adoption of AI Machine Learning is dependent upon the public’s trust in the AI to do the job that it is asked to perform within the limits that society decided are appropriate. In the UK, the enthusiasm for Nanotechnology created in the early 2000s was dented by the no-doubt well intentioned intervention by the Prince of Wales’ description of Nano-tech as Gray goo. AI has suffered a series of “winters” during which investment and research dried-up. Some commentators predict an upcoming Winter for AI as the current perceived hype around the sector doesn’t match the reality delivered. Certainly the paper would appear to support at least the potential smell of Autumn in the air for this generation of AI. However we as a community of technologists should be at the point of being able to transcend the problems of poor data and a paucity of standards to create an era of trustworthy AI.
For too long, the world of Artificial Intelligence has been opaque to those without a high level of Data Science qualifications. We need to enter a new age of understanding, which will lead to trust. If we build tools that help people create AI, understanding and therefore trust will grow. I am convinced that the leading edge of AI no longer exists at the development of increasingly accurate ML models of the real world but with the provision of easy to use tools for the creation of useful and trustworthy AI.
Big-Tech is promoting the Democratization of AI by lowering the barriers to entry, speed to market, reducing the skills necessary and thereby reducing the overall cost of adoption. But “democratization” can only be successful if the resulting products are trusted by users and regulators alike, for which they will need to be demonstrably safe and effective.
Our approach at gliff.ai is to ensure that the domain expert is at the heart of the Machine Learning process. We are building tools which value the expert’s time by making it as easy as possible to add the expert’s knowledge, or annotations to medical images. gliff.ai is exploring ways to ensure data privacy, to permit teams of experts to collaborate together in the development of ML and tools to check that both the data sets used in training and the resulting inferred models are as free from bias as possible. gliff.ai’s work forms a part of a movement towards MLOps — a concatenation of Machine Learning Operations. In the same way as DevOps redefined software development to make it faster and more efficient, our ambition is to do the same for Machine Learning in the MedTech space.