Virtual Humans

Our work on virtual humans has incorporated our efforts in modeling emotion, Theory of Mind and dialog, so as to better understand a human participant as well as generate appropriate virtual human behavior.

Additionally, a key element of the work is a multi-year effort on the studying the impact of nonverbal behaviors in face-to-face interaction and developing computational models to generate the behavior.

Co-speech nonverbal behaviors convey a wide variety of meanings and play an important role in face-to-face human interactions, significantly impacting the addressee’s engagement, recall, comprehension, and attitudes toward the speaker. Nonverbals also influence interactions between humans and embodied virtual agents, making the process of selecting and animating of these behaviors a critical focus in the design of virtual agents. However, the automation of this process poses a significant challenge. Prior approaches have attempted to address these challenges, ranging from fully data-driven techniques that struggle to produce contextually meaningful behaviors, to more manual approaches that lack generalizability.

Our approaches to generating nonverbal behavior typically involve some mixure of machine learning techniques and knowledge. We cover two of our approaches here, SIMA and Cerebella.

SIMA (Socially Intelligent Multimodal Agent) is our most recent work. Currently, it models the selection of nonverbal behaviors, with the specific focus currently on gestures. To select gestures, it leverages Large Language Models to perform the semantic and metaphoric analyses to suggest meaningful, appropriate co-speech gestures. This gesture selection system is implemented within a virtual human framework, automating the selection and subsequent animation of the selected gestures for human-agent interactions.

——

Cerebella more broadly automates the generation of all the physical behaviors for virtual humans in face-to-face dialog interaction, including nonverbal behaviors accompanying the virtual humans dialog, responses to perceptual events as well as listening behaviors. Modular processing pipelines transform the input into behavior schedules, written in the Behavior Markup Language and then passed to a character animation system.

Designed as a highly flexible and extensible component, Cerebella realizes a robust process that supports a variety of use patterns. For example, to generate the character's nonverbal behavior for an utterance, Cerebella can take as input detailed information about a character's mental state (e.g., emotion, attitude, etc.) and communicative intent from a a virtual humans models of emotion and social reasoning. On the other hand, in the absence of such information, Cerebella will analyze the virtual human’s utterance text, syntactic structure, semantics and voice prosody to infer that information. These analyses rely on a collection of machine learning tools, to assess syntactic structure and prosodic information, and knowledge bases such as Wordnet to assess semantic and meatphoric content.

Overall, it has been used online to generate behavior in real-time or offline to generate behavior schedules that will be cached for later use. Offline use has also allowed Cerebella to be incorporated into behavior editors that support mixed initiative, iterative design of behavior schedules with a human author, whereby Cerebella and the human author can iterate over a cycle of Cerebella behavior schedule generation and human author modification the schedule.

The clip to the top-right clip is an example of Cerebella performing nonverbal behavior generation just using the analysis of the utterance text and audio. The clip below it illustrates its ability to generate sequences of gestures semantically linked to each other and the text.

References:

SIMA:

Laura Birka Hensel, Nutchanon Yongsatianchot, Parisa Torshizi, Elena Minucci, and Stacy Marsella, "Large language models in textual analysis for gesture selection", in Proceedings of the 25th International Conference on Multimodal Interaction, 2023, pp. 378-387.

Parisa Ghanad Torshizi, Laura Hensel, Ari Shapiro, and Stacy Marsella, "Large Language Models for Virtual Human Gesture Selection", AAMAS 2025.

Cerebella:

Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro, "Virtual Character Performance From Speech", in Symposium on Computer Animation, July 2013. 

Use of LLMs in SIMA, AAMAS 2025

Cerebella generating from speech

Cerebella demonstrating co-articulation of sequences of gestures (ideational units)