Visiting Researcher Talk: Dr. Xiaoxiao Li - 26 Jan 2026

26 Jan 2026 03.00 PM - 04.00 PM Current Students, Industry/Academic Partners

Talk Title:

Who Deserves Credit? Training Data Valuation For Modern Generative AI

Speaker:

Dr. Xiaoxiao Li

About the speaker:

Dr. Xiaoxiao Li is an Associate Professor in the Department of Electrical and Computer Engineering at the University of British Columbia, a Faculty Member at the Vector Institute, and Visiting Faculty Member at Google. Dr. Li holds a Canada Research Chair (Tier II) in Responsible AI and is recognized as a Canada CIFAR AI Chair. Dr. Li's research aims to enhance the trustworthiness and efficiency of AI models, bridging the gap between cutting-edge AI research and practical real-world applications, such as healthcare. Dr. Li’s current interests include mechanistic analysis of large language and vision-language models (LLMs/VLMs), developing hypothesis-driven evaluations, and advancing methodologies toward artificial general intelligence (AGI). Dr. Li has published over 50 papers on the top ML/AI venues, including ICML, ICLR, NeurIPS, CVPR, ECCV, AAAI, Nature Methods, etc.

Description:

Quantifying the value of training data is a critical challenge for Generative AI and Large Language Models (LLMs). Traditional valuation methods are ill-suited for this new paradigm, as they are computationally infeasible and were designed primarily for small-scale, discriminative models. This talk presents a unified toolkit that redefines data valuation for the modern AI stack. First, for general generative models, we introduce a model-agnostic and training-free framework that values data based on similarity matching. Next, for LLMs and VLMs, we show how leveraging token-level representations enables a highly efficient, forward-only valuation method that avoids costly retraining. Finally, we extend this token-level analysis to Reinforcement Learning, demonstrating how our valuation techniques can steer training dynamics to improve model performance and efficiency. Our methods provide a practical foundation for a more robust data economy, enabling intelligent data curation, equitable compensation, and the development of more transparent and efficient AI systems.