Join us at Trial Interactive’s Customer Summit: OpTImize
Friday, January 31, 2020 | 6:00 PM
Every day we get closer to being able to automate the document processing and administration that consumes so many hours and can distract from treatment and research. As we can all attest, clinical trials are among the largest administrative endeavors in the wide world of business operations, so we should have every reason to be enthusiastic about any technology that can reduce this effort while improving quality.
The trial master file (TMF) is a focus of at least some of this excitement. The TMF is one of several places where a lot of manual effort is required for the collection and classification of the documentation used to ensure GCP compliance. This effort is a time-consuming back-office requirement that is necessary for regulatory compliance during clinical research.
Manual Administration
During a study, documentation comes in from many clinical sites and usually many countries and in many languages. This documentation is either classified to the site’s metadata specifications or not classified at all. The documents are often scanned with handwritten notes and signatures. What we end up with is a mountain of documentation that must be identified and tagged with metadata for presentation to auditors during an agency inspection. The TMF Reference Model requires a very specific set of metadata and filing that is recommended by regulatory agencies as the accepted standard.
Opportunities for Automation
Challenges
The challenges are immense, however. Have you ever been asked to verify you are a human in an online form using reCAPTCHA? What you might not know is that when you type in the letters and numbers pictured, you are teaching machine algorithms to identify those letters and numbers. The technology is being fed with crowd-sourced information to make it smarter, but it is still far from perfect. Similarly, have you ever had difficulty reading someone’s handwriting? In many cases, advanced AI is not able to decipher that content any better than you.
What this all means is that real challenges exist with scan quality and handwriting legibility as it applies to extracting the metadata necessary for ML to do its job. These issues compound when factoring in different languages and the optical character recognition (OCR) necessary to extract text from a scan. When you are categorizing information inside the TMF it is not enough to know that a document is classified a certain way. You also have to know the site location, contacts involved, and what country it was coming from. You have to classify all this documentation in all these ways to be successful.
Approaches
Classification algorithms can generally verify that documentation is not duplicated. They can also use statistical analysis to compare document images together to verify by what percentage they are similar. However, what is most interesting is the learning part of ML. The technology can be “trained” against a model, which means it can learn through processing training data. If a document has been classified in a specific way many times, it can readily be sent through the ML algorithm and start to learn the differences. For established vendors like TransPerfect’s Trial Interactive, there is a lot of training data available—millions of classified documents can be sent through an ML algorithm to better train it. From there, one can start to apply predictive models: take the training data and statistical analysis for classification, compare it, and begin to predict what a document is and how to classify it within the eTMF.
Once we identify where in the document we will find essential metadata, a variety of tools, including natural language processing (NLP) and zonal OCR, can extract the metadata to better classify the document. We can then run set comparisons to look at all the data collected from a document and compare it against what we know about an investigative site, allowing the identification of anomalies and possible issues.
It's Not Magic... Yet
As innovators and clinical team partners, we want operational leaders to be able to assess realistic expectations as they relate to emerging technology. Artificial intelligence and machine learning, as they apply to clinical processes, and particularly the trial master file, have yet to overcome some real obstacles in their ability to reliably understand the information being fed into their algorithmic “brains.” Let’s face it, we all regularly see documentation that is borderline or sometimes completely unreadable, often laden with very idiosyncratic handwritten information. Bluntly speaking, ML is not ready to handle human nuance. Employing an approach called human-aided active learning is recommended where humans QC machine-determined results and the ML model learns from each human decision. This allows the TMF to stay compliant while making the process much more efficient. Presently, we still need our human clinical professionals to break down documentation into patterns that algorithms will understand.
The Auto-Filing TMF
The electric car company Tesla® has made a lot of press for claims of self-driving capability. Founder Elon Musk used SAE International’s “Levels of Driving Automation” classifications, published in 2014, to measure his company’s success, with the original goal of achieving Class 5, “steering wheel optional,” by 2018. While a laudable goal, it’s important to be skeptical about these kinds of claims. Just for fun, here is a depiction of the classifications for “Self-Filing TMF” based upon the SAE model:
CLASS - DESCRIPTION
0 - All manual processes. Teams of document classifiers. Lists of essential documents. Regular, internal quality review processes. Manual agency inspection.
1 - ("hands on"): Also exists now in Trial Interactive and other products. OCR and ML translations available. Document classification suggestions against essential documents. Some metadata extraction for critical documents such as the 1572 form.
2 - ("hands off"): The automated TMF can classify documentation by itself and can perform limited metadata extraction and verification steps. TMF actively anticipates documents through CTMS processes. Regulatory and QA must still monitor the TMF with regular quality reviews.
3 - ("eyes off"): The TMF self-processes documents and can handle situations that call for a response, like opening queries. However, the TMF does not really understand what essential documents are needed and cannot handle amendments and special situations very well. The TMF cannot audit itself yet.
4 - ("mind off"): Once the TMF is configured, the sponsor can safely turn their attention away from TMF tasks, e.g., they can focus on the clinical trial. Fully self-auditing, no attention is ever required for quality, e.g., the sponsor may safely ignore the TMF for normal trials. However, configuration is still required every time to properly set up the trial, train the models, etc.
5 - (”UI optional"): No human intervention is required at all. The TMF self-configures based on the protocol, self-files, self-processes, and self-audits.
I think we can agree that while Class 5 would be impressive, achieving Class 2 or Class 3 would provide most of the efficiencies while ensuring compliance and a high level of quality. Machine learning is not ready to take over document processing and, at this stage, should be seen as a helping hand for document specialists and TMF managers.
For more information on artificial intelligence and machine learning in the eTMF, visit us at https://www.trialinteractive.com/solutions/ai-and-etmf or contact us at info@trialInteractive.com.
If you would like to continue reading on related topics, please check out these articles:
Contact us here to discuss AI and ML capabilities within the Trial Interactive e-clinical platform.