1 WeChat AI, Tecent 2 Xi'an Jiaotong University 3 Syracuse University 4 New Jersey Institute of Technology 5 Lehigh University * Indicates Equal Contribution ICML 2024
When reading a story, humans can quickly understand new fictional characters with a few observations, mainly by drawing analogies to fictional and real people they already know. This reflects the few-shot and meta-learning essence of humans’ inference of characters’ mental states, i.e., theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP dataset in a realistic narrative understanding scenario, ToM-in-AMC. Our dataset consists of ~1,000 parsed movie scripts, each corresponding to a few-shot character understanding task that requires models to mimic humans’ ability of fast digesting characters with a few starting scenes in a new movie. We further propose a novel ToM prompting approach designed to explicitly assess the influence of multiple ToM dimensions. It surpasses existing baseline models, underscoring the significance of modeling multiple ToM dimensions for our task. Our extensive human study verifies that humans are capable of solving our problem by inferring characters’ mental states based on their previously seen movies. In comparison, all the AI systems lag >20% behind humans, highlighting a notable limitation in existing approaches’ ToM capabilities.
# Dataset Overview
# ToM Prompting Method (ToMPro)
# Experiments
Our dataset evaluate the machines’ ToM in two settings:
Transductive setting: meta-model predicts with all characters’ previous acts as examples;
Inductive setting: meta-model predicts with a mental model of characters generated by all characters’ previous acts.
The inductive setting is more stringent and has advantages in emphasizing the effects of various ToM dimensions, improving explanability and mitigating shortcuts.
## Main Results
## Analysis
The necessity of comprehensive ToM dimension
All the ToM dimensions contribute to the improvement, affirming that our task necessitates a comprehensive understanding of ToM
Desire and intention are most crucial for our task, while emotion is the least crucial
GPT-4’s memorization issue and the necessity of our perturbation setting:
We asked GPT-4 to identify characters solely based on their names as options, without any historical context or character descriptions, and resulted in an accuracy of 69.2%, which indicates that GPT-4 has indeed been extensively exposed to the content of our movies during its training.
Figure 5 shows a significant gap between the perturbed and non-perturbed settings of GPT-4 ICL results. These results suggest that our perturbation setting effectively enhances the evaluation of ToM abilities by mitigating the impact of memorization.
# Citation
@inproceedings{yu2024few,
title = {Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind},
author = {Yu, Mo and Wang, Qiujing and Zhang, Shunchi and Sang, Yisi and Pu, Kangsheng and Wei, Zekai and Wang, Han and Xu, Liyan and Li, Jing and Yu, Yue and Zhou, Jie},
booktitle = {International Conference on Machine Learning},
pages = {57703--57729},
year = {2024},
organization = {PMLR}
}