Founded on research from Lund University

A decade of research with over 10 000 participants forms the foundation for the computational models we use to clinically assess mental health.

Selected Research

Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment

Narrative review published in Psychiatry Research 2024

Abstract
In this narrative review, we survey recent empirical evaluations of AI-based language assessments and present a case for the technology of large language models to be poised for changing standardized psychological assessment. Artificial intelligence has been undergoing a purported “paradigm shift” initiated by new machine learning models, large language models (e.g., BERT, LAMMA, and that behind ChatGPT). These models have led to unprecedented accuracy over most computerized language processing tasks, from web searches to automatic machine translation and question answering, while their dialogue-based forms, like ChatGPT have captured the interest of over a million users. The success of the large language model is mostly attributed to its capability to numerically represent words in their context, long a weakness of previous attempts to automate psychological assessment from language. While potential applications for automated therapy are beginning to be studied on the heels of chatGPT's success, here we present evidence that suggests, with thorough validation of targeted deployment scenarios, that AI's newest technology can moveassessment away from rating scales and to instead use how people naturally communicate, in language.

An illustration of assessment charachteristics including information content, resolution, range and multi-dimensionality

Semantic measures: Using natural language processing to measure, differentiate, and describe psychological constructs

Research article published in Psychological Methods 2019

Abstract
Psychological constructs, such as emotions, thoughts, and attitudes are often measured by asking individuals to reply to questions using closed-ended numerical rating scales. However, when asking people about their state of mind in a natural context (“How are you?”), we receive open-ended answers using words (“Fine and happy!”) and not closed-ended answers using numbers (“7”) or categories (“A lot”). Nevertheless, to date it has been difficult to objectively quantify responses to open-ended questions. We develop an approach using open-ended questions in which the responses are analyzed using natural language processing (Latent Semantic Analyses). This approach of using open-ended, semantic questions is compared with traditional rating scales in nine studies (N = 92–854), including two different study paradigms. The first paradigm requires participants to describe psychological aspects of external stimuli (facial expressions) and the second paradigm involves asking participants to report their subjective well-being and mental health problems. The results demonstrate that the approach using semantic questions yields good statistical properties with competitive, or higher, validity and reliability compared with corresponding numerical rating scales. As these semantic measures are based on natural language and measure, differentiate, and describe psychological constructs, they have the potential of complementing and extending traditional rating scales.

An illustration of assessment charachteristics including information content, resolution, range and multi-dimensionality

Natural language analyzed with AI-based transformers predict traditional subjective well-being measures approaching the theoretical upper limits in accuracy

Research article published in Scientific Reports 2022

Abstract
We show that using a recent break-through in artificial intelligence –transformers–, psychological assessments from text-responses can approach theoretical upper limits in accuracy, converging with standard psychological rating scales. Text-responses use people's primary form of communication –natural language– and have been suggested as a more ecologically-valid response format than closed-ended rating scales that dominate social science. However, previous language analysis techniques left a gap between how accurately they converged with standard rating scales and how well ratings scales converge with themselves – a theoretical upper-limit in accuracy. Most recently, AI-based language analysis has gone through a transformation as nearly all of its applications, from Web search to personalized assistants (e.g., Alexa and Siri), have shown unprecedented improvement by using transformers. We evaluate transformers for estimating psychological well-being from questionnaire text- and descriptive word-responses, and find accuracies converging with rating scales that approach the theoretical upper limits (Pearson r = 0.85, p< 0.001, N = 608; in line with most metrics of rating scale reliability). These findings suggest an avenue for modernizing the ubiquitous questionnaire and ultimately opening doors to a greater understanding of the human condition.

table presenting results from the article

The LEADING Guideline. Reporting Standards for Expert Panel, Best-Estimate Diagnosis, and Longitudinal Expert All Data (LEAD) Studies

Under review

Abstract
Accurate assessments of symptoms and diagnoses are essential for health research and clinical practice but face many challenges. The absence of a single error-free measure is currently addressed by assessment methods involving experts reviewing several sources of information to achieve a more accurate or best-estimate assessment. Three bodies of work spanning medicine, psychiatry, and psychology propose similar assessment methods: The Expert Panel, the Best-Estimate Diagnosis, and the Longitudinal Expert All Data (LEAD). However, the quality of such best-estimate assessments is typically very difficult to evaluate due to poor reporting of the assessment methods and when it is reported, the reporting quality varies substantially. Here we tackle this gap by developing reporting guidelines for such studies, using a four-stage approach: 1) drafting reporting standards accompanied by rationales and empirical evidence, which were further developed with a patient organization for depression, 2) incorporating expert feedback through a two-round Delphi procedure, 3) refining the guideline based on an expert consensus meeting, and 4) testing the guideline by i) having two researchers test it and ii) using it to examine the extent previously published articles report the standards. The last step also demonstrates the need for the guideline: 18 to 58% (Mean = 33%) of the standards were not reported across fifteen randomly selected studies. The LEADING guideline comprises 20 reporting standards related to four groups: The Longitudinal design; the Appropriate data; the Evaluation – experts, materials, and procedures; and the Validity group. We hope that the LEADING guideline will be useful in assisting researchers in planning, reporting, and evaluating research aiming to achieve best-estimate assessments.

overview of the criteria in the LEADING Statement

The Text-Package: An R-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Transformers

Tutorial paper published Psychological Methods

Abstract
Natural language is the fundamental way individuals communicate their thoughts and emotions to others.Recent advances in Artificial Intelligence (AI), referred to as transformers, have resulted in large increases in performance at most tasks related to understanding natural language. This tutorial introduce show to use these state-of-the-art AI techniques in both custom research analyses as well as in completely end-to-end analytic processes. We describe text, a software package which provides transformer-based techniques intended to be easily accessible for social scientists. The text-package is open-source, written for the statistical programming language R, and it is free to use or alter. It comprises user-friendly functions to transform text to numeric representations, that are used for examining their relationship to other variables or for visualizing statistically significant features of texts. Transformers can facilitate analyses of natural language for gaining psychological insights with unprecedented accuracy and provide a more detailed understanding of the human condition.

An Illustration of How Information is Connected Within the Output of Different Language Model Architectures

Freely Generated Word Responses Analyzed With Artificial Intelligence Predict Self-Reported Symptoms of Depression, Anxiety, and Worry

Research paper published Frontiers in Psychology

Abstract
Background: Question-based computational language assessments (QCLA) of mental health, based on self-reported and freely generated word responses and analyzed with artificial intelligence, is a potential complement to rating scales for identifying mental health issues. This study aimed to examine to what extent this method captures items related to the primary and secondary symptoms associated with Major Depressive Disorder (MDD) and Generalized Anxiety Disorder (GAD) described in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). We investigated whether the word responses that participants generated contained information of all, or some, of the criteria that define MDD and GAD using symptom-based rating scales that are commonly used in clinical research and practices.
Method: Participants (N = 411) described their mental health with freely generated words and rating scales relating to depression and worry/anxiety. Word responses were quantified and analyzed using natural language processing and machine learning.
Results: The QCLA correlated significantly with the individual items connected to the DSM-5 diagnostic criteria of MDD (PHQ-9; Pearson’s r = 0.30–0.60, p < 0.001) and GAD (GAD-7; Pearson’s r = 0.41–0.52, p < 0.001; PSWQ-8; Spearman’s r = 0.52–0.63, p < 0.001) for respective rating scales. Items measuring primary criteria (cognitive and emotional aspects) yielded higher predictability than secondary criteria (behavioral aspects).
Conclusion: Together these results suggest that QCLA may be able to complement rating scales in measuring mental health in clinical settings. The approach carries the potential to personalize assessments and contributes to the ongoing discussion regarding the diagnostic heterogeneity of depression.

An Illustration of How Information is Connected Within the Output of Different Language Model Architectures

Other relevant resarch from our team

Varadarajan,V., Sikström, S., Kjell, O. N. E., Schwartz H.A., (2024). ALBA: Adaptive Language-Based Assessments for MentalHealth. Paper accepted at the Computer Science conference NAACL 2024. (Annual Conference ofthe North American Chapter of the Association for Computational Linguistics). https://aclanthology.org/2024.naacl-long.136/

Nilsson, A. H., Eichstaedt, J.C., Lomas, T., Schwartz, A., & Kjell, O. N. E. (2024). The Cantril Ladder elicits thoughts about powerand wealth.  Scientific Reports14(1), 2642. https://www.nature.com/articles/s41598-024-52939-y Kjell, O. N. E., Kjell, K.,& Schwartz, H. A. (2024). Beyond Rating Scales: Large Language Models arePoised to Transform Psychological Health Assessment. Psychiatry Research. https://doi.org/10.31234/osf.io/yfd8g

Sikström, S., Pålsson Höök, A.,& Kjell, O. N. E.  (2023). Precise language responses versus easyrating scales—Comparing respondents’ views with clinicians’ belief of therespondent’s views. Plos one, 18(2), e0267995. https://doi.org/10.1371/journal.pone.0267995

Nilsson, A.,Hellryd E.  & Kjell, O. N.E. (2022, June24). Doing Well-Being: Self-Reported Activities Are Related to Subjective Well-Being. PLOSONE. https://doi.org/10.1371/journal.pone.0270503 

Kjell, O.N.E., Daukantaitė, D. & Sikström S.,(2021, May 11) Computational Language Assessments of Harmony in Life –notSatisfaction with Life or Rating Scales– Correlate with Cooperative Behaviours.Frontiers, Topic: Semantic Algorithms in the Assessment of Attitudes andPersonality. https://doi.org/10.3389/fpsyg.2021.601679 

Kjell, O. N. E., & Diener, E. (2020, Mars 13). Abbreviated Three-ItemVersions of the Satisfaction with Life Scale and the Harmony in Life ScaleYield Improved Psychometric Properties. Journal of Personality Assessment. https://doi.org/10.1080/00223891.2020.1737093 

Ivtzan, I., Young, T., Lee, H. C.,Lomas, T., Daukantaité, D. & Kjell, O.N.E. (2017, September 17) Mindfulness based flourishing program: Across-cultural study of Hong Kong Chinese and British participants. Journalof Happiness Studies. https://doi.org/10.1007/s10902-017-9919-1 

Kjell, O.N.E., Daukantaité, D., Hefferon, K.,& Sikström, S., (2016; 2015, February 18). The Harmony in life scalecomplements the Satisfaction with life scale: Expanding the conceptualizationof the cognitive component of Subjective well-being. Social IndicatorsResearch. https://link.springer.com/article/10.1007/s11205-015-0903-z 

Garcia, D., Kjell, O.N.E., Sikström, S., & Archer, T. (2016, May 4). Using Language and Affective Profiles to Investigate Differences betweenIndividuals. Clinical and Experimental Psychology. https://doi.org/10.4172/2471-2701.1000123 

Garcia, D., Anckarsäter, H., Kjell, O.N.E., Archer, T., Rosenberg, P.,Cloninger, C.R., & Sikström, S. (2015, September 8). Agentic, communal, and spiritual traits are related to the semantic representation of written narratives of positive and negative life events. Psychology of Well-Being.https://psywb.springeropen.com/articles/10.1186/s13612-015-0035-x 

Kjell, O.N.E. (2011, September 1). Sustainable Well-Being: A Potential Synergy between Sustainability andWell-Being Research. Review of General Psychology. 15(3), 255-266. https://doi.org/10.1037/a0024603