글로벌 연구동향
방사선종양학
- 2025년 12월호
[Radiother Oncol .] Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records연세의대 / 박상준, 변화경*, 금웅섭*
- 출처
- Radiother Oncol .
- 등재일
- 2025 Oct:211:111052.
- 저널이슈번호
- 내용
Abstract
Background and purpose: Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information.Materials and methods: We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM's performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data.
Results: The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model's C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data.
Conclusion: General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.

[그림 1] (A) 기존 방식: 정형 데이터(활력징후, 혈액검사 등)만 사용하거나, 사람이 직접 비정형 데이터를 정리해야 하는 한계가 있음.
(B) RT-Surv 프레임워크: 오픈소스 LLM을 이용해 비정형 텍스트를 자동으로 정형 데이터로 변환함. 이를 기존 데이터와 통합하여 분석함으로써 예측 모델의 성능을 높임

[그림 2] 위험군 분류 성능 비교 (정형 데이터 vs. LLM 결합 데이터)기존 방식 (A, B): 정형 데이터만 사용 시, 외부 검증(B)에서 고위험군 환자를 제대로 선별해내지 못하는 한계를 보임 (p>0.05).제안 모델 (C, D): LLM이 분석한 비정형 데이터를 결합하자, 외부 검증(D)에서도 저·중·고위험군을 모두 명확하게 구분해냄 (p<0.001).
Affiliations
Sangjoon Park 1, Chan Woo Wee 2, Seo Hee Choi 2, Kyung Hwan Kim 2, Jee Suk Chang 2, Hong In Yoon 2, Ik Jae Lee 2, Yong Bae Kim 2, Jaeho Cho 2, Ki Chang Keum 2, Chang Geol Lee 2, Hwa Kyung Byun 3, Woong Sub Koom 4
1Department of Radiation Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Republic of Korea; Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea.
2Department of Radiation Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Republic of Korea.
3Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea; Department of Radiation Oncology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea. Electronic address: HKBYUN05@yuhs.ac.
4Department of Radiation Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, Republic of Korea. Electronic address: mdgold@yuhs.ac.
- 키워드
- Data structurization; Electronic health records; Large language models; Radiotherapy; Survival prediction.
- 연구소개
- 최근 의료 AI 분야에서 큰 주목을 받고 있는 LLM(거대언어모델)을 활용해, 방대한 비정형 임상 텍스트를 정형 데이터로 변환하여 예후 예측 성능을 높인 연구입니다. 기존의 생존 분석 모델들은 주로 키, 몸무게, 혈액 검사 수치 등 정형 데이터에만 의존하여 환자의 구체적인 임상적 맥락(General condition, 질병의 범위 등)을 충분히 반영하지 못하는 한계가 있었습니다. 이 연구는 외부 서버 전송 없이 원내에 구축 가능한 오픈소스 LLM(LLaMA-3)을 이용해 의무기록이나 영상 판독지 같은 비정형 데이터를 구조화하고, 이를 기존 데이터와 결합하여 방사선 치료 후 환자의 30일 사망률 예측 정확도를 향상시켰습니다. 특히 복잡한 파인튜닝(Fine-tuning) 없이도 일반적인 오픈소스 LLM이 의료 특화 모델보다 우수한 성능을 낼 수 있음을 보여주었습니다.
- 덧글달기
- 이전글 [Radiat Oncol .] Correlations of biochemical and clinical outcomes with 10-year results after robotic stereotactic body radiotherapy for localized prostate cancer
- 다음글 [Int J Cancer .] Long-term risk of major cardiac events in breast cancer patients treated with intensity-modulated and 3-dimensional conformal radiotherapy: Secondary analysis of a randomized clinical trial





