Study finds bias in AI models diagnosing skin diseases based on demographics
An international research team led by Assistant Professor Zhiyu Wan discovered biases in large language models like ChatGPT-4 and LLaVA when diagnosing skin diseases from medical images. The study focused on melanoma, melanocytic nevi, and benign keratosis-like lesions, revealing differences in performance and fairness across sex and age groups.

An international research team led by Assistant Professor Zhiyu Wan from ShanghaiTech University has recently published groundbreaking findings in the journal Health Data Science, highlighting biases in multimodal large language models (LLMs) such as ChatGPT-4 and LLaVA in diagnosing skin diseases from medical images. The study systematically evaluated these AI models across different sex and age groups.
Utilizing approximately 10,000 dermatoscopic images, the study focused on three common skin diseases: melanoma, melanocytic nevi, and benign keratosis-like lesions. Results revealed that while ChatGPT-4 and LLaVA outperformed most traditional deep learning models overall, ChatGPT-4 showed greater fairness across demographic groups, whereas LLaVA exhibited significant sex-related biases.
Dr. Wan emphasized, \"While large language models like ChatGPT-4 and LLaVA demonstrate clear potential in dermatology, we must address the observed biases, particularly across sex and age groups, to ensure these technologies are safe and effective for all patients.\"
The team plans further research incorporating additional demographic variables like skin tone to comprehensively evaluate the fairness and reliability of AI models in clinical scenarios. This research provides critical guidance for developing more equitable and trustworthy medical AI systems.
According to the source: News-Medical.
What's Your Reaction?






