A digital healthcare solutions company aimed to automate the manual claim assessment process for claims filed in hospitals and provider networks in the Malaysian market. The scope of this initiative involved encoding free text descriptions in claim invoices into medically standardized categories to streamline the claim assessment process. The project targeted clinical mapping of approximately 15,000 invoices and over 2 million claim lines monthly, covering country, provider network, and hospital-level data.
Overview
Solution
Input Data and Preparation: The input data consisted of scanned and digitized invoices obtained through Optical Character Recognition (OCR), Electronic Data Interchange (EDI), and Customer Relationship Management (CRM) systems. Historical data included 25 million claim lines from 250,000 invoices sourced from OCR and EDI.
Analytics and Modelling:
- Prioritization: Descriptions were prioritized to ensure the maximum number of invoices were fully mapped.
- Annotation: Clinical professionals annotated the data to create training datasets.
- Modelling Techniques: The solution utilized a combination of semantic matching and deep learning algorithms to map and predict clinical categories. The semantic matching algorithm achieved 93% accuracy at a 0.967 threshold, while the deep learning algorithm achieved .763 accuracy.
- Scoring and Reporting: Scoring models were developed, and a reporting pipeline was created based on the scoring output. Power BI dashboards were implemented for data exploration, machine learning insights, and model monitoring.
ML Operations:
- Monitoring: Regular monitoring of model performance was conducted with each run.
- Data Drift: Monthly production data was analyzed for data drift and its impact on model performance.
- Retraining: The models were retrained and refreshed as needed, based on performance drops.
Output