Few-Shot Character Understanding in Movies
as an Assessment to Meta-Learning of Theory-of-Mind

1 WeChat AI, Tecent  2 Xi'an Jiaotong University
3 Syracuse University  4 New Jersey Institute of Technology  5 Lehigh University
* Indicates Equal Contribution
ICML 2024

# Abstract

When reading a story, humans can quickly understand new fictional characters with a few observations, mainly by drawing analogies to fictional and real people they already know. This reflects the few-shot and meta-learning essence of humans’ inference of characters’ mental states, i.e., theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP dataset in a realistic narrative understanding scenario, ToM-in-AMC. Our dataset consists of ~1,000 parsed movie scripts, each corresponding to a few-shot character understanding task that requires models to mimic humans’ ability of fast digesting characters with a few starting scenes in a new movie. We further propose a novel ToM prompting approach designed to explicitly assess the influence of multiple ToM dimensions. It surpasses existing baseline models, underscoring the significance of modeling multiple ToM dimensions for our task. Our extensive human study verifies that humans are capable of solving our problem by inferring characters’ mental states based on their previously seen movies. In comparison, all the AI systems lag >20% behind humans, highlighting a notable limitation in existing approaches’ ToM capabilities.

# Dataset Overview

Table 1: Movie genres in our ToM-in-AMC.
Table 2: Statistics of our ToM-in-AMC.

# ToM Prompting Method (ToMPro)

Figure 3: Our proposed ToMPro approach. The method first (a) generates character mental descriptions along multiple ToM dimensions based on input scenes; then (b) predicts the identities of a new testing scene with the generated descriptions.

# Experiments

Our dataset evaluate the machines’ ToM in two settings: The inductive setting is more stringent and has advantages in emphasizing the effects of various ToM dimensions, improving explanability and mitigating shortcuts.

## Main Results

Table 3: Overall performance (%) on our ToM-in-AMC task.
(*) Evaluation was conducted on a subset of the dataset.
(†) Dataset released by Sang et al., 2022.
Table 4: Performance by difficulty levels measured the number of speakers in a scene.

## Analysis

The necessity of comprehensive ToM dimension

Figure 4: Ablation of ToMPro on the 5 ToM dimensions.
  • All the ToM dimensions contribute to the improvement, affirming that our task necessitates a comprehensive understanding of ToM
  • Desire and intention are most crucial for our task, while emotion is the least crucial

GPT-4’s memorization issue and the necessity of our perturbation setting:

Figure 5: Effects of perturbation on GPT-4 ICL.
  • We asked GPT-4 to identify characters solely based on their names as options, without any historical context or character descriptions, and resulted in an accuracy of 69.2%, which indicates that GPT-4 has indeed been extensively exposed to the content of our movies during its training.
  • Figure 5 shows a significant gap between the perturbed and non-perturbed settings of GPT-4 ICL results. These results suggest that our perturbation setting effectively enhances the evaluation of ToM abilities by mitigating the impact of memorization.

# Citation

@inproceedings{yu2024few,
  title        = {Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind},
  author       = {Yu, Mo and Wang, Qiujing and Zhang, Shunchi and Sang, Yisi and Pu, Kangsheng and Wei, Zekai and Wang, Han and Xu, Liyan and Li, Jing and Yu, Yue and Zhou, Jie},
  booktitle    = {International Conference on Machine Learning},
  pages        = {57703--57729},
  year         = {2024},
  organization = {PMLR}
}