AI-Powered Data Systems for Multimodal Analytics by Dr. Yiming Lin
Abstract
We live in a world overflowing with data, and the emergence of AI, such as Large Language Models (LLMs), is revolutionizing data analytics. However, directly using AI to process massive and complex data is neither effective nor scalable.
In this talk, I share my work on building AI‑native systems to analyze multimodal data at scale, focusing on tables and complex documents. On one hand, when analyzing tables, AI is often used to prepare data, such as cleaning and enriching, and this becomes prohibitively expensive when the data scale is large. I present a set of database techniques to support scalable AI computations without sacrificing accuracy.
On the other hand, when analyzing documents, current approaches typically treat them as plain text and ignore underlying structures, leading to limited accuracy and performance. In this regard, I present our work called data structuring that explores varying degrees of structures in unstructured documents and uses them to optimize query processing for efficient document analytics.
Finally, I’ll share my vision for building data systems for multimodal analytics, including aspects of trustworthy systems, optimization with hardware, and co‑optimization among different data modalities.
Biography