The article “Foundation model of electronic medical records for adaptive risk estimation” proposes a new way of thinking about early warning and risk prediction in hospitals by treating the entire electronic health record as a sequence to be modeled, rather than a static table of variables. Renc and colleagues argue that conventional tools such as NEWS and MEWS, built on a handful of physiological parameters and fixed thresholds, are inherently limited in how they handle time, complexity and personalization in modern inpatient care (Renc et al., 2025). Using recent advances in transformer architectures and foundation models, they introduce ETHOS, a generative model of patient health timelines, and ARES, an adaptive risk estimation system that sits on top of ETHOS to deliver dynamic, explainable risk scores for critical outcomes such as hospital mortality, ICU admission and prolonged length of stay. giaf107
The paper starts from a familiar health-policy paradox: the United States spends nearly 18% of its GDP on health care, yet life expectancy and rates of preventable mortality are worse than in other high-income countries. Hospitals struggle to identify, early enough and reliably enough, which patients are headed toward deterioration, ICU transfer or extended stays, especially in crowded emergency departments and complex inpatient settings. Traditional early warning scores use a narrow time window, often the first few hours of admission or triage, and compress physiology into coarse risk bins. They ignore much of the longitudinal context contained in the EHR, and they do not update gracefully as new data arrive. The authors position ETHOS and ARES as an attempt to use the full richness of EHR data in a way analogous to how large language models use sequences of words: by modeling the entire trajectory, not just a snapshot, and by generating future “sentences” of clinical events rather than just predicting a label.
Technically, ETHOS is a transformer-based generative model trained on tokenized patient health timelines (PHTs). Each PHT is a chronological sequence derived from the MIMIC-IV v2.2 database and its emergency department extension, covering nearly 300,000 patients treated at Beth Israel Deaconess Medical Center between 2008 and 2019. After cleaning and tokenization, the authors obtain 285,622 distinct PHTs containing over 360 million tokens. Clinical events – admissions, diagnoses, procedures, medications, laboratory results, vital signs and static attributes such as age or sex – are converted into discrete tokens. Time is explicitly modeled through special “time-interval” tokens that capture gaps between events. Continuous values, such as lab results or age, are encoded using quantiles, while diagnosis, procedure and drug codes (ICD-10-CM/PCS, ATC) are represented hierarchically to exploit their structured semantics. This design allows the transformer to attend not only over what happened and when, but also over how specific concepts relate to each other in clinical coding systems.
ETHOS is trained autoregressively, without task-specific labels, to predict the next token in the PHT given all previous tokens. In other words, it learns the joint distribution over entire patient trajectories. During inference, the model is used to simulate many possible future PHTs (fPHTs) for an individual patient, starting from their current timeline. For each simulated trajectory, the authors check whether target events – such as ICU admission, death, or a length of stay beyond the 90th percentile – occur, and at what simulated time. Monte Carlo sampling over at least 100 fPHTs per patient gives empirical probabilities for each event, along with confidence intervals that can be visualized as risk trajectories over time. This is the core statistical trick: risk is not a static output of a classifier but a probability estimated from a distribution of plausible futures induced by the foundation model.
ARES, the Adaptive Risk Estimation System, is the clinical wrapper around ETHOS. It takes the simulated futures and converts them into ordinal risk levels from 1 to 5, defined over probability ranges (0–20%, 20–40%, and so on). Unlike classical tools, ARES is designed to update risk continuously from admission to discharge. For example, the timeline figure in the article shows a patient whose risk of ICU admission rises markedly around day 5 of hospitalization; the model subsequently “confirms” this when ICU admission occurs, then deactivates the ICU component and shifts its attention to the probability of prolonged stay beyond 10 or 15 days. This dynamic view allows the system to behave more like a continuously learning monitor than a one-shot screening tool.
One of the distinctive contributions of the work is the personalized explainability module. Because ETHOS operates on tokens, the authors can ask, for any point in the timeline, which recent tokens most shifted the risk estimate up or down. In a detailed case example, they show how a lactate blood test raises the composite risk, but its quantile-encoded result leads the model to downgrade the risk again, and how subsequent procedure codes for endotracheal intubation and ventilatory support sharply increase ICU and mortality risk. These tokens can be color-highlighted over time, producing a visual narrative: “this lab test at this time pushed risk in this direction; this procedure then pushed it further.” Conceptually, this moves explainability away from global feature importance toward individualized trajectories, which is more aligned with how clinicians reason about patients.
To evaluate ETHOS and ARES, the authors conduct two sets of benchmarks. First, they treat ARES outputs as early warning scores for four inpatient outcomes: hospital mortality, ICU admission, prolonged length of stay (>90th percentile) and a composite endpoint combining all three. Using predictions at the time of hospital admission, they compare ETHOS-based risk with MEDS-Tab, a strong baseline that aggregates time-series EHR data into tabular features for gradient-boosted trees. Across all tasks, ETHOS achieves higher AUROC values, with particularly pronounced gains for some racial subgroups such as Asian and Hispanic patients. Calibration curves show Brier scores between 0.014 and 0.143 depending on outcome, indicating excellent to acceptable calibration and suggesting that predicted probabilities can be interpreted meaningfully in clinical thresholds.
Second, the authors benchmark ETHOS directly on emergency department tasks defined in prior work on the MIMIC-IV-ED benchmark: predicting hospitalization at triage, predicting a critical outcome (death or ICU transfer) within 12 hours of triage, and predicting ED representation within 72 hours of discharge. Here, ETHOS is compared not only to MEDS-Tab and conventional machine-learning models (logistic regression, random forest, gradient boosting, neural networks) but also to standard clinical scores such as NEWS, NEWS2, MEWS, REMS, CART and ESI. On all three tasks, ETHOS achieves the best AUROC and strong precision–recall performance, with AUROC ranging from 0.740 for ED re-presentation to 0.936 for hospital mortality. This demonstrates that a single foundation model, trained in a label-free manner on tokenized trajectories, can match or surpass specialized models trained for each endpoint.
The discussion section emphasizes several conceptual advantages of the ARES framework. First, it enables risk estimation at arbitrary time points, based on whatever data are available, rather than locking clinicians into early fixed windows. This mitigates the “early warning paradox,” where models trained on historical data suggest interventions that were not actually available or realistic at the time in question. Second, because ETHOS simulates entire trajectories, it naturally supports composite outcomes and competing risks. For instance, once death occurs in a simulated timeline, the probability of a prolonged stay necessarily drops to zero, and the model learns these dependencies implicitly. Third, the modular design allows new clinical outcomes to be layered on without retraining ETHOS; once fPHTs are simulated, calculating the probability of an acute kidney injury token or a 30-day readmission token is a matter of counting occurrences, not re-fitting the core model.
At the same time, the authors are explicit about limitations. ETHOS and ARES are trained and evaluated exclusively on MIMIC-IV data from a single academic medical center. Differences in EHR systems, documentation practices, case mix and care pathways across institutions may limit out-of-the-box generalizability, and the authors have not yet performed extensive fairness audits across demographic groups beyond subgroup performance reporting. The current implementation deliberately excludes unstructured clinical text, which may carry critical information about social determinants, clinician impressions or nuanced symptoms. Integrating free-text using clinical language models or multimodal fusion is identified as an important direction for future work. Moreover, although energy-efficient by LLM standards – training ETHOS (≈45M parameters) on eight A100 GPUs consumed an estimated 220 kWh, far less than general-purpose language models – real-world deployment will still require attention to computational costs and infrastructure.
Another major limitation is that ARES has not yet been embedded in real clinical workflows. The authors have obtained informal feedback from emergency physicians on the clarity of risk trajectories and token-level explanations, and they outline plans for simulation studies in which clinicians “round” on de-identified cases using an ARES-powered mock chart. These studies will examine how clinicians interpret the risk levels, whether explanations are actually helpful or distracting, and how to display trajectories in a way that supports decisions without causing alarm fatigue. Ultimately, prospective evaluations that measure changes in outcomes, resource use and equity will be needed before health systems can justify integrating such a system into routine care.
The paper closes with a broader reflection on the role of EHR-based foundation models in health care. By treating longitudinal EHR data as a rich, heterogeneous sequence rather than a static feature table, ETHOS and ARES demonstrate a path toward general-purpose clinical prediction engines: models that can be re-queried for new endpoints, updated with new data streams, and coupled to dashboards for triage, resource planning, and quality improvement. The authors suggest concrete use cases such as dynamic ICU bed planning, early identification of patients with unexpectedly high-risk trajectories for targeted review, and support for research by automatically flagging patterns that merit deeper causal investigation. They also highlight the interoperability advantages of building ETHOS on the MEDS standard, which should allow external groups to retrain or adapt the model on their own MEDS-formatted data while maintaining a common codebase.
In summary, this article is not just “yet another” machine-learning paper comparing AUCs; it is an attempt to define a foundation-model paradigm for EHRs analogous to what GPT-like models have become for text. ETHOS learns a generative model of patient trajectories, and ARES translates that model into dynamic, explainable risk scores that outperform traditional early warning systems on a range of outcomes, with promising calibration and subgroup performance (Renc et al., 2025). For researchers and health-system leaders concerned with patient safety, crowding and proactive resource management, the work provides both a technical blueprint and a conceptual frame: risk estimation as simulation over possible futures, rather than static classification of the present.
Reference: Renc, P., Grzeszczyk, M. K., Oufattole, N., Goode, D., Jia, Y., Bieganski, S., McDermott, M. B. A., Was, J., Samir, A. E., Cunningham, J. W., Bates, D. W., & Sitek, A. (2025). Foundation model of electronic medical records for adaptive risk estimation. GigaScience, 14, 1–12. https://doi.org/10.1093/gigascience/giaf107
