Chaitanya Ahuja

IPA: /tʃeətənj/

Let every modalities' voice be heard, sight be seen and text be understood
self.ie
Research Scientist
Meta AI


I am a Reasearch Scientist at Meta AI working on Human-Centric Multimodal Machine Learning and Generative Modeling. Prior to that I completed my PhD at the Language Technologies Institute at Carnegie Mellon University where I was advised by Dr. Louis-Philippe Morency (LP) in the Multicomp Lab. My research focused on endowing agents and remote avatars with Social Intelligence by means of Multimodal Learning. One of the use-cases where we extensively apply these technologies is Computer Animation. These directions have the potential of making a meaningful impact on remote communication, collaborations, education and mental health for human-human and human-robot interaction, especially now when a lot of social and work spaces are gradually moving online.

In the past, I have also interned at Facebook Reality Labs on generation of nonverbal behaviours for a communicating avatar. As an undergraduate researcher at Indian Institute of Technology(IIT), Kanpur, I worked with Dr. Rajesh Hegde on Spatial Audio and Speaker Diarization, and Dr. Vinay Namboodiri on Video Summarization

News

May 2022 Excited to join Meta AI as a Research Scientist
April 2022 Defended my PhD dissertation on Communication Beyond Words: Grounding Visual Body Motion with Language
April 2022 Humbled to be a Highlighted Reviewer at ICLR 2022
March 2022 Paper on Low-Resource Adaptation of Spatio-Temporal Crossmodal Generative Models accepted at CVPR 2022
May 2020 We are organizing the First Workshop on Crossmodal Social Animation at ICCV2021. Consider submissing your work.

December 2020 Succesfully proposed my thesis titled Communication Beyond Words: Grounding Visual Body Motion with Language
September 2020 Paper on Co-Speech Gesture Generation from Language accepted at Findings at EMNLP 2020
September 2020 Paper on Impact of Personality on Non-verbal behvaiours accepted at IVA 2020
August 2020 PATS (Pose-Audio-Transcripts-Style) Dataset released.
August 2020 Code for Style Transfer for Co-Speech Gesture Animation released.
July 2020 Paper on Style Transfer for Co-Speech Gesture Animation accepted at ECCV 2020
August 2019 Paper on Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations accepted at ICMI 2019.
August 2019 Honourable mention in LTI SRS symposium on my talk on Natural Language Grounded Pose Forecasting
July 2019 Paper on Natural Language Grounded Pose Forecasting accepted at 3DV 2019
March 2018 Excited to work at Facebook Reality Labs in Summer'18
January 2018 Paper on Lattice Recurrent Units accepted at AAAI 2018
October 2017 Our survey on Multimodal Machine Learning is online

Book Chapters

1. Challenges and applications in multimodal machine learning
T. Baltrusaitis, C. Ahuja, and L. Morency
The Handbook of Multimodal-Multisensor Interfaces 2018

Pre-prints

1. Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides
D. Lee, C. Ahuja, P. Liang, S. Natu, and L. Morency
Preprint 2022 2022

Selected Publications

Google Scholar

11. Communication Beyond Words: Grounding Visual Body Motion with Language
C. Ahuja
PhD dissertation, Carnegie Mellon University, 2022 2022

10. Low-Resource Adaptation of Spatio-Temporal Crossmodal Generative Models
C. Ahuja, D. Lee, and L. Morency
CVPR 2022 2022

9. No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures
C. Ahuja, D. Lee, R. Ishii, and L. Morency
EMNLP Findings 2020

8. Impact of Personality on Nonverbal Behavior Generation
R. Ishii, C. Ahuja, Y. Nakano, and L. Morency
IVA 2020

7. Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional Mixture Approach
C. Ahuja, D. Lee, Y. Nakano, and L. Morency
ECCV 2020
Media Coverage: TechXplore

6. To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations
C. Ahuja, S. Ma, L. Morency, and Y. Sheikh
ICMI 2019

5. Language2Pose: Natural Language Grounded Pose Forecasting
C. Ahuja and L. Morency
3DV 2019
Media Coverage: Scientific American, Synced, Venture Beat

4. Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling
C. Ahuja and L. Morency
AAAI 2018

3. Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrusaitis, C. Ahuja, and L. Morency
TPAMI 2017

2. Fast modelling of pinna spectral notches from HRTFs using linear prediction residual cepstrum
C. Ahuja and R. Hegde
ICASSP 2014

1. Extraction of pinna spectral notches in the median plane of a virtual spherical microphone array
A. Sohni, C. Ahuja, and R. Hegde
HSCMA 2014

Resources

1. PATS Dataset: Pose, Audio, Transcripts and Style
C. Ahuja, D. Lee, Y. Nakano, and L. Morency


Education

Ph.D. in Language Technologies (4.02/4.00)
Carnegie Mellon University | Pittsburgh, PA


Thesis: Communication Beyond Words: Grounding Visual Body Motion with Language
Advisor: Louis-Philippe Morency

2015 - 2022
B.Tech. in Electrical Engineering (9.5/10)
Indian Institute of Technology | Kanpur, India


Advisors: Rajesh Hegde, Vinay P. Namboodiri

2011 - 2015

Academic Talks

Communication Beyond Words: Grounding Visual Body Motion with Spoken Language
KTH Stockholm, Online
April 2021
Learning Relationships between Spoken Language and Freeform Gestures
EMNLP 2020 Workshop on NLP Beyond Text, Online
Novermber 2020
Style Transfer for Co-speech Gesture Generation
ECCV 2020, Online
September 2020
End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations
ACM International Conference on Multimodal Interaction, Suzhou, China
October 2019
Natural Language Grounded Pose Forecasting
LTI Student Research Symposium, Pittsburgh PA
August 2019

Student Mentorship

Dong Won Lee (CMU BS → CMU MS in Machine Learning): Self-supervised generative models.
Shradha Sehgal (IIIT Hyderabad B.Tech.): Evaluation of generative models.
Arvin Wu (CMU BS): Social intelligence benchmarking.
Nikitha Murikinati (CMU BS): Study of relationships between co-speech gestures and prosody.
Sharath Rao (CMU MS → PlayStation) Back-channel prediction in dyadic conversations.
Qingtao Hu (CMU MS → Amazon): Unsupervised disentanglement of style and content in images.
Anirudha Rayasam (CMU MS → Google): Language grounded pose forecasting.

Teaching Experience

Structured Prediction for Language and Other Discrete Data (CMU 11-763), Head TA Spring 2018
Multimodal Machine Learning (CMU 11-777), Head TA Spring 2017

Professional Activities and Service

Co-organizer: ICCV 2021 First Workshop on Crossmodal Social Animation 2021
Co-organizer: Multimodal Machine Learning Reading Group, CMU Spring 2020
Conference Program Commiitee: Neurips, SIGGRAPH, ICLR, ACL, EMNLP, ACM Multimedia, ICMI
Workshop Program Committee: NeurIPS workshop on Multimodal Machine Learning, ACL Workshop on Multimodal Language, NAACL-HLT Student Research Workshop, ICMI GENEA Workshop
Grant Reviewer: Army Research Office (ARO)
CMU Graduate Applicant Support Program Volunteer 2020
CMU AI Undergraduate Research Mentor 2020-21
CMU Graduate Student Association Representative for Language Technologies Institute 2017