TY - GEN
T1 - From Post to Personality
T2 - 34th ACM International Conference on Information and Knowledge Management, CIKM 2025
AU - Ma, Tian
AU - Feng, Kaiyu
AU - Rong, Yu
AU - Zhao, Kangfei
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/11/10
Y1 - 2025/11/10
N2 - Personality prediction from social media posts is a critical task that implies diverse applications in psychology and sociology. The Myers-Briggs Type Indicator (MBTI), a popular personality inventory, has been traditionally predicted by machine learning (ML) and deep learning (DL) techniques. Recently, the success of Large Language Models (LLMs) has revealed their huge potential in understanding and inferring personality traits from social media content. However, directly exploiting LLMs for MBTI prediction faces two key challenges: the hallucination problem inherent in LLMs and the naturally imbalanced distribution of MBTI types in the population. In this paper, we propose PostToPersonality (P2P), a novel LLM- based framework for MBTI prediction from social media posts of individuals. Specifically, P2P leverages Retrieval-Augmented Generation with in-context learning to mitigate hallucination in LLMs. Furthermore, we fine-tune a pre-trained LLM to improve model specification in MBTI understanding with synthetic minority oversampling, which balances the class imbalance by generating synthetic samples. Experiments conducted on a real-world social media dataset demonstrate that P2P achieves state-of-the-art performance compared with 10 ML/DL baselines.
AB - Personality prediction from social media posts is a critical task that implies diverse applications in psychology and sociology. The Myers-Briggs Type Indicator (MBTI), a popular personality inventory, has been traditionally predicted by machine learning (ML) and deep learning (DL) techniques. Recently, the success of Large Language Models (LLMs) has revealed their huge potential in understanding and inferring personality traits from social media content. However, directly exploiting LLMs for MBTI prediction faces two key challenges: the hallucination problem inherent in LLMs and the naturally imbalanced distribution of MBTI types in the population. In this paper, we propose PostToPersonality (P2P), a novel LLM- based framework for MBTI prediction from social media posts of individuals. Specifically, P2P leverages Retrieval-Augmented Generation with in-context learning to mitigate hallucination in LLMs. Furthermore, we fine-tune a pre-trained LLM to improve model specification in MBTI understanding with synthetic minority oversampling, which balances the class imbalance by generating synthetic samples. Experiments conducted on a real-world social media dataset demonstrate that P2P achieves state-of-the-art performance compared with 10 ML/DL baselines.
KW - personality prediction
KW - social media analysis
UR - https://www.scopus.com/pages/publications/105023135756
U2 - 10.1145/3746252.3760813
DO - 10.1145/3746252.3760813
M3 - Conference contribution
AN - SCOPUS:105023135756
T3 - CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
SP - 5011
EP - 5015
BT - CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery, Inc
Y2 - 10 November 2025 through 14 November 2025
ER -