Mixed-modal Language Modeling by Professor Luke Zettlemoyer
Abstract
Multimodal architectures are typically designed for specific modalities (image->text, text->image, text only, etc). In this talk, I will present our recent work on a series of early fusion mixed-modal models with generalized architectures that can instead generate arbitrary mixed sequences of images and text. Such models have the ability to unlock fundamentally new multimodal chain-of-thought reasoning capabilities, as I will show through an early model for multimodal tool use, but determining the best mixed-model architecture remains an open challenge. I will discuss and contrast two models architectures, Chameleon and Transfusion, that make very different assumptions about how to model mixed-modal data, and argue for moving from a tokenize-everything approach to newer models that are hybrids of autoregressive transformers and diffusion. I will also cover recent efforts to better understand how to more stably train such models at scale without excessive modality competition, using a mixture of transformers technique. Together, these advances lay a possible foundation for universal models that can understand and generate data in any modality, and I will also sketch some of the steps that we still need to focus on to reach this goal.
Biography
Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington and a Senior Research Director at Meta. His research interests are in the intersections of natural language processing, machine learning, and decision making under uncertainty, with a recent emphasis on the science of training both text-based and multi-modal language models. Luke did postdoctoral research at the University of Edinburgh, earned my PhD at MIT, and was an undergraduate at NC State University. His honors include numerous paper awards, being named a Schmidt AI 2050 Senior Follow in 2025, elected President of the Association for Computational Linguistics (ACL) in 2024, named a Fellow of the ACL in 2022 along with winning the Presidential Early Career Award for Scientists and Engineers (PECASE) award in 2016, an Allen Distinguished Investigator Award in 2014, and the National Science Foundation (NSF) International Research Fellowship in 2009.