Niagara Falls, Canada
July 15-19, 2024
Jointly with ICME
2024
The integration of Computer Vision (CV) and Natural Language Processing (NLP) is significantly transforming the field of AI. The combination of LLM and CV is paving the way for AI systems to understand and generate multi-modal content. Text-guided multi-modal generation is one of the areas that has been significantly advanced thanks to the evolution of both LLM and CV, where the core challenge of the text-guided generation is the visual-language (VL) alignment.
This is the 1st TMMG workshop to be held in conjunction with ICME 2024 in Niagra Falls, Canada. We aim to bring together researchers from the fields of image, video, audio generations as well as NLP to facilitate discussions and progress at the intersection of the LLMs/LMMs and multi-modal content generation.
We aim to invite a diverse set of experts to discuss their recent research results and future directions for the text-guided multi-modal generation, with a particular focus on improving the visual-language alignment in text- guided generation.
We warmly welcome contributions concerning text-guided generation, LLMs/LMMs for multi-modal generation, and visual-language alignment analysis. The topics of interest include (but are not limited to):
Paper Submission
Authors should prepare their manuscript according to the Guide for Authors of ICME available at Author Information and Submission Instructions.
Submission address: https://cmt3.research.microsoft.com/ICME2024W
Workshop Track: TMMG
Submissions due
|
April 5, 2024
|
Acceptance Notification
|
April 25, 2024
|
Camera-ready
|
May 24, 2024
|
Workshop date
|
July 19, 2024
|
Keynote 1
Speaker:
Dr. Zhengyuan Yang
Title:
Multi-Modal Agents
Time:
8:10 – 8:40, July 19, 2024
Biography:
Zhengyuan Yang is a Senior Researcher at Microsoft. He received his Ph.D. degree in Computer Science at University of Rochester, advised by Prof. Jiebo Luo. He received the bachelors at University of Science and Technology of China. He has received ACM SIGMM Award for Outstanding Ph.D. Thesis, Twitch Research Fellowship, and ICPR 2018 Best Industry Related Paper Award. His research interests involve the intersection of computer vision and natural language processing, including multi-modal vision-language understanding and generation.
Keynote 2
Speaker:
Prof. Siyu Huang
Title:
Navigating the Latent Space of Image Synthesis
Time:
9:25 – 9:55, July 19, 2024
Biography:
Siyu Huang is an assistant professor of Clemson University. He received the B.E. degree and Ph.D. degree in information and communication engineering from Zhejiang University, Hangzhou, China, in 2014 and 2019. He was a Postdoctoral Fellow in the John A. Paulson School of Engineering and Applied Sciences at Harvard University. Before that, he was a Visiting Scholar at Language Technologies Institute in the School of Computer Science, Carnegie Mellon University in 2018, a Research Scientist at Big Data Laboratory, Baidu Research from 2019 to 2021, and a Research Fellow in the School of Electrical and Electronic Engineering at Nanyang Technological University in 2021. He has published more than 20 papers on top-tier computer science journals and conferences. His research interests are primarily in computer vision, multimedia analysis, and generative models.
Keynote 3
Speaker:
Amber (Yijia) Zheng
Title:
Immunizing Text-to-Image Models Against Malicious Adaptation
Time:
10:55 – 11:25, July 19, 2024
Biography:
Amber (Yijia) Zheng is a Ph.D. student in Computer Science at Purdue University advised by Prof. Raymond A. Yeh. She received her B.Sc. in Data Science at School of Statistics and Management, Shanghai University of Finance and Economics, where she was working with Prof. Yixuan Qiu. Her research interests lie in developing algorithms and models for reliable and interpretable AI, specifically focusing on language and vision, Gen AI, and attributing model behaviors through the ML pipeline.