BY NIKITA SIROTIN
Abstract:
Speech-based biomarkers have become a popular diagnosis tool for Alzheimer’s due to their non-invasive nature, sensitivity to early cognitive, and motor changes. However, the clinical value of these biomarkers are reliant on the ability to communicate how and why speech patterns are associated with disease processes. Only 0.6% of related studies have met the criteria to be classified as “interpretable models” in recent publications, which makes this a critical barrier to clinical translation and a limitation of current works [1]. Therefore, our work focuses on the creation of a model that is able to act as a translation bridge between computational speech analysis and clinical reasoning. Performance is not used as proof of clinical validity; instead, we examine how speech-derived explanations can support clinical understanding and trust. We outline conceptual frameworks for interpreting speech features that reflect cognitive and motor function and highlight challenges that are faced in real clinical cases. This enables clinicians to identify and discount confounding influences, caused by factors such as anxiety or underlying conditions, that may influence model predictions.
Introduction:
Speech was selected because of its distinguishing features amongst biomedical signals: it is simultaneously a behavioral output, a cognitive process and a motor act. Early changes in neural function often cause changes in speech, seen most clearly in disorders affecting language, memory and executive control. These factors make speech a preferable diagnostic tool, especially in settings where traditional biomarkers are invasive, costly or inaccessible.
Biochemistry conceptualizes disease as a disruption of interconnected biological systems and interest has recently grown in diagnostic signals that capture system-level effects instead of isolated markers. The natural reflection of neural processes in speech makes it a downstream behavioral phenotype of neurological dysfunction. This makes interpretability essential for ensuring that computational predictions are relayed to mechanisms, rather than remaining purely statistical outputs.
Interpretability is also central to clinical trust. Medical diagnosis requires the accountability of a physician, and that is constrained by the use of an opaque model. Thus, models need to allow clinicians to be able to assess and contextualize algorithmic conclusions, acting as a supporting tool and not replacing their judgement. This is particularly important in low resource settings where speech-based tools may be seen as alternative to imaging-based AI systems (like MRI), which create an absence of visual or anatomical confirmation; therefore, this further reflects the need for explainable reasoning. Speech can also be influenced by emotional state, cultural background, and other environmental factors, which causes shifts in results. Therefore, clinicians need to be able to distinguish between disease related changes from confounding variability. Together, these factors place interpretability at the forefront of this study, serving as a foundational requirement for future translation of speech based analysis into clinically meaningful and globally accessible diagnostics.
Key findings:
This model improves on previous AUC scores (accuracy scores) with ≈0.95, compared to previous ≈0.76-0.94[1]. Alzheimer’s patients commonly showed the overuse of vague third-person pronouns (the repeated use of “she” without stable referents); they also had reduced noun specificity and weakened semantic anchoring (correct use of words) across sentences. These failures reflect the breakdown in lexical retrieval and working memory. This reflected the degradation of temporal and prefrontal networks in the brain. Moreover, the Token-level saliency maps (show which words trigger the AI reasoning) confirmed that the Transformer model prioritized these coherence disruptions, instead of superficial stylistic cues. In contrast, convolutional models that were trained on acoustic data failed; this indicated that early Alzheimer’s speech impairment is not primarily motoric but cognitive. Through mapping computational features onto identifiable language failures, we establish speech as a downstream behavioral phenotype of neural system dysfunction. Through this, there is potential for clinicians to discern irrelevant disease related patterns from confounders, such as cultural variation, setting the model apart from others: a tool in diagnosis rather than a “black box” decision maker.
[1] Shankar, Ravi, Ziyu Goh, Fiona Devi, and Qian Xu. 2025. “A Systematic Review of Explainable Artificial Intelligence Methods for Speech-Based Cognitive Decline Detection.” npj Digital Medicine 8, no. 1 (Nov 26). https://doi.org/10.1038/s41746-025-02105-z

