Niagara Falls, Canada
July 15-19, 2024
Jointly with ICME 2024

Call for Papers


The integration of Computer Vision (CV) and Natural Language Processing (NLP) is significantly transforming the field of AI. The combination of LLM and CV is paving the way for AI systems to understand and generate multi-modal content. Text-guided multi-modal generation is one of the areas that has been significantly advanced thanks to the evolution of both LLM and CV, where the core challenge of the text-guided generation is the visual-language (VL) alignment.

This is the 1st TMMG workshop to be held in conjunction with ICME 2024 in Niagra Falls, Canada. We aim to bring together researchers from the fields of image, video, audio generations as well as NLP to facilitate discussions and progress at the intersection of the LLMs/LMMs and multi-modal content generation.

We aim to invite a diverse set of experts to discuss their recent research results and future directions for the text-guided multi-modal generation, with a particular focus on improving the visual-language alignment in text- guided generation.

We warmly welcome contributions concerning text-guided generation, LLMs/LMMs for multi-modal generation, and visual-language alignment analysis. The topics of interest include (but are not limited to):

  • Advances in text-guided image/video/audio/multi-modal generation
  • Visual-language alignment analysis
  • LLM/LMM and text-guided generation
  • Self-supervised learning with generative models
  • Adversarial attacks and defenses with generative models
  • Novel evaluation metrics and methods
  • Benchmark datasets
  • Ethical considerations and bias in text-guided visual generation
  • Augmented and virtual reality applications
  • Accessibility in multimedia content
  • Impact of multi-modal generation on media and Journalism




Paper Submission

Authors should prepare their manuscript according to the Guide for Authors of ICME available at Author Information and Submission Instructions.

Submission address: https://cmt3.research.microsoft.com/ICME2024W

Workshop Track: TMMG


Submit link

Important Dates


Submissions due
April 5, 2024
Acceptance Notification
April 25, 2024
Camera-ready
May 24, 2024
Workshop date
July 19, 2024

Keynotes (1/3)


Keynote 1


Speaker:

Dr. Zhengyuan Yang

Title:

Multi-Modal Agents

Time:

8:10 – 8:40, July 19, 2024

Biography:

Zhengyuan Yang is a Senior Researcher at Microsoft. He received his Ph.D. degree in Computer Science at University of Rochester, advised by Prof. Jiebo Luo. He received the bachelors at University of Science and Technology of China. He has received ACM SIGMM Award for Outstanding Ph.D. Thesis, Twitch Research Fellowship, and ICPR 2018 Best Industry Related Paper Award. His research interests involve the intersection of computer vision and natural language processing, including multi-modal vision-language understanding and generation.

Keynotes (2/3)


Keynote 2


Speaker:

Prof. Siyu Huang

Title:

Navigating the Latent Space of Image Synthesis

Time:

9:25 – 9:55, July 19, 2024

Biography:

Siyu Huang is an assistant professor of Clemson University. He received the B.E. degree and Ph.D. degree in information and communication engineering from Zhejiang University, Hangzhou, China, in 2014 and 2019. He was a Postdoctoral Fellow in the John A. Paulson School of Engineering and Applied Sciences at Harvard University. Before that, he was a Visiting Scholar at Language Technologies Institute in the School of Computer Science, Carnegie Mellon University in 2018, a Research Scientist at Big Data Laboratory, Baidu Research from 2019 to 2021, and a Research Fellow in the School of Electrical and Electronic Engineering at Nanyang Technological University in 2021. He has published more than 20 papers on top-tier computer science journals and conferences. His research interests are primarily in computer vision, multimedia analysis, and generative models.

Keynotes (3/3)


Keynote 3


Speaker:

Amber (Yijia) Zheng

Title:

Immunizing Text-to-Image Models Against Malicious Adaptation

Time:

10:55 – 11:25, July 19, 2024

Biography:

Amber (Yijia) Zheng is a Ph.D. student in Computer Science at Purdue University advised by Prof. Raymond A. Yeh. She received her B.Sc. in Data Science at School of Statistics and Management, Shanghai University of Finance and Economics, where she was working with Prof. Yixuan Qiu. Her research interests lie in developing algorithms and models for reliable and interpretable AI, specifically focusing on language and vision, Gen AI, and attributing model behaviors through the ML pipeline.

Conference Program


Organizing Team


Jie An

University of Rochester

jan6@cs.rochester.edu


Hang Hua

University of Rochester

hhua2@cs.rochester.edu


Hanjia Lyu

University of Rochester

hlyu5@ur.rochester.edu


Xin (Eric) Wang

University of California, Santa Cruz

xwang366@ucsc.edu


Zhe Lin

Adobe Research

zlin@adobe.com


Jiebo Luo

University of Rochester

jluo@cs.rochester.edu