Pre 2025

"Research is the distance between an idea and its realization."
- David Sarnoff
Research

Leveraging Language Models and Common Data Model To Unlock Real World

This project aims to harness the power of Natural Language Processing (NLP) and Large Language Models (LLMs) to extract valuable clinical insights from unstructured Electronic Health Records (EHRs), focusing on understanding antibiotic prescription practices and treatment outcomes in India and the US.

Partnering with MyHealthcare (India’s first ABDHM-accredited health startup), the team works with structured and unstructured clinical data-including prescriptions, notes, and demographic details-transforming them into usable, standardized formats mapped to the OMOP Common Data Model. The initial focus is on respiratory infections and identifying treatment patterns, particularly related to antibiotic use, using a combination of AI pipelines, schema validations (via LinkML), and LLM fine-tuning. Challenges addressed include lack of standard terminologies (e.g., absence of RxNorm in Indian datasets), sparsity of symptom data, and unstructured prescriptions. A parallel goal is to compare and contrast healthcare delivery practices between India and the US, using synthetic clinical notes, annotation-based model training, and cross-country EHR analysis.

Project Head(s)