The article titled “Development of a Novel Artificial Intelligence Clinical Decision Support Tool for Hand Surgery: HandRAG” by Özmen et al. (2025) introduces an innovative approach to augment clinical decision-making in hand surgery using artificial intelligence. The authors address a critical gap in the surgical field by presenting HandRAG, the first retrieval-augmented generation (RAG)-enhanced large language model (LLM) specifically trained and optimized for hand surgery applications. The complexity of hand surgery, which demands the integration of advanced anatomical knowledge, personalized treatment considerations, and evolving operative techniques, necessitates a tailored AI solution capable of contextualizing and delivering accurate, evidence-based guidance. Traditional LLMs often lack access to domain-specific literature and validation mechanisms, making them unreliable for specialty-specific applications. HandRAG was developed to bridge this gap by combining a large, curated dataset of hand surgery literature with modern language modeling tools.
To build HandRAG, the researchers collected 4,510 open-access peer-reviewed publications on hand surgery from 2000 to 2024. These texts were cleaned, segmented into smaller meaningful chunks, and transformed into semantic embeddings using the OpenAI text-embedding-ada-002 model. The RAPTOR methodology was employed to hierarchically structure this content, improving the AI’s ability to retrieve relevant documents in response to clinical queries. Dimensionality reduction via UMAP and semantic clustering through Gaussian Mixture Models further enhanced the system’s precision. The knowledge base was stored in a vector database (Chroma), which allowed rapid semantic retrieval. The generation of text-based recommendations was powered by OpenAI’s o3-mini model, a large language model selected for its reasoning capabilities.
When a clinician submits a query, HandRAG expands the query into multiple sub-questions, retrieves the most relevant text segments from the database, and generates a comprehensive, evidence-grounded response with embedded citations. To assess the system’s performance, the authors used 15 clinically representative hand surgery questions covering scenarios such as tendon repair, fracture management, and soft tissue pathologies. The evaluation utilized G-Eval Correctness and Semantic Evaluation Metrics (SEM), which measure factual accuracy and semantic alignment between the generated response and the source documents. The model achieved a mean G-Eval score of 0.79 and a mean SEM score of 0.75, indicating robust and reliable performance. Particularly high performance was observed in common clinical areas with well-established treatment protocols, such as zone 2 flexor tendon repairs and Dupuytren’s contracture management.
The article emphasizes that HandRAG’s RAG architecture provides a significant advantage over conventional LLMs by reducing hallucinations and ensuring responses are traceable to specific literature. This is vital in medical contexts where incorrect or unverifiable outputs could lead to clinical errors. HandRAG’s ability to reference primary sources also supports its use as an educational tool, supplementing resident and fellow training in hand surgery. However, the authors acknowledge several limitations. These include the exclusion of proprietary and textbook content from the knowledge base, reliance on computational (rather than clinical) validation, the need for periodic updating of the literature base, and lack of direct comparison to commercial LLMs. Additionally, the model has not undergone regulatory approval processes such as those required by the FDA, and therefore should not be used in direct clinical practice without further validation.
Despite these limitations, the study concludes that HandRAG represents a meaningful step forward in the application of AI to surgical decision-making. By leveraging a well-structured, domain-specific literature corpus and integrating it with a sophisticated generation model, the system demonstrates strong potential to support evidence-based practice. HandRAG offers promising applications in both clinical and academic settings, particularly as AI continues to evolve within healthcare. The authors propose that future studies should focus on clinical integration, regulatory compliance, and real-world performance validation with surgical experts.
Reference (APA Style)
Ozmen, B. B., Singh, N., Shah, K., Berber, I., Singh, D., Pinsky, E., Rampazzo, A., & Schwarz, G. S. (2025). Development of a novel artificial intelligence clinical decision support tool for hand surgery: HandRAG. Journal of Hand and Microsurgery, 17, 100293. https://doi.org/10.1016/j.jham.2025.100293

Podcast Link: https://notebooklm.google.com/notebook/59bd753e-4a02-4872-a9e9-c9074576d879/audio
