We categorized essential clinical trial documents in the TMF by reviewing each document’s TMF number, level, zone, section, artifact, sub-artifact, site, contacts, and additional key details with machine learning technologies and near-duplicate detection (NDD), an algorithmic and statistical approach to determining document resemblance.
CHALLENGE
Automatically classify:
- 125,000 documents
- 5 countries
- 7 languages
RESULTS
- No prior model training required
- Language independent
* 94% MATCHING ACCURACY *