Keynote Speakers

Keynote Speaker I 

Multimedia and Art in the Age of Creative AI

Prof. Ahmed Elgammal

Department of Computer Science, Rutgers University, USA


Advances in Artificial Intelligence are changing things around us. Is art and creativity immune from the perceived AI takeover? In this talk I will highlight some of the advances in the area of Artificial Intelligence and Art. I will argue about how investigating perceptual and cognitive tasks related to human creativity in visual art is essential for advancing the fields of AI and multimedia systems. On the other hand, how AI can change the way we look at art and art history.  


Dr. Ahmed Elgammal is a professor at the Department of Computer Science, Rutgers University. He is the founder and director of the Art and Artificial Intelligence Laboratory at Rutgers, which focuses on data science in the domain of digital humanities. He is also an Executive Council Faculty at Rutgers University Center for Cognitive Science. Prof. Elgammal has published over 160 peer-reviewed papers, book chapters, and books in the fields of computer vision, machine learning and digital humanities. He is a senior member of the Institute of Electrical and Electronics Engineers (IEEE). He received the National Science Foundation CAREER Award in 2006.  Dr. Elgammal recent research on knowledge discovery in art history received wide international media attention, including reports on the Washington Post, New York Times, NBC News, the Daily Telegraph, Science News, and many others. Dr. Elgammal received his M.Sc. and Ph.D. degrees in computer science from the University of Maryland, College Park, in 2000 and 2002, respectively.


Keynote Speaker II 

Perception of Visual Sentiment: From Experimental Psychology to Computational Modeling

Prof. Mohan Kankanhalli

School of Computing, National University of Singapore


A picture is worth a thousand words. Visual representation is one of the dominant forms of social media. The emotions that viewers feel when observing a visual content is often referred to as the content’s visual sentiment. Analysis of visual sentiment has become increasingly important due to the huge volume of online visual data generated by users of social media. Automatic assessment of visual sentiment has many applications, such as monitoring the mood of the population in social media platforms (e.g., Twitter, Facebook), facilitating advertising, and understanding user behavior. However, in contrast to the extensive research on predicting textual sentiment, relatively less work has been done on sentiment analysis of visual content. In contrast to textual sentiment, visual sentiment is more subjective and implicit. There exists significant semantic gap between high-level visual perception and low-level computational attributes.

In this talk, we argue that these challenges can be addressed by combining the findings from the psychology and cognitive science domain. We will show that a deeper understanding of human perception helps create better computational models. To support that thesis, we will first briefly overview our human-centric research framework, which focuses on applying the paradigms and methodologies from experimental psychology to computer science: First, we collect visual data with human perception through online or lab-controlled psychophysics studies. Then we use inferential statistics to analyze the psychophysics data and model human perception empirically. We then design computational models based on the empirical findings.  

We will present three works on visual sentiment in our lab, guided by this research framework.  In our first work, we aim to understand human visual perception in a holistic way. We first fuse various partially overlapping datasets with human emotion. We build an empirical model of human visual perception, which suggests that six different types of visual perception (i.e., familiarity, aesthetics, dynamics, oddness, naturalness, spaciousness) significantly contribute to human’s positive sentiment (i.e., liking) of a visual scene.

In our second work, we investigate the relation between human attention and visual sentiment. We build a unique emotional eye fixation dataset with object and scene-level human annotations, and exploit comprehensively how human attention is affected by emotional properties of images. Further, we train a deep convolutional neural network for human attention prediction on our dataset. Results demonstrate that efficient encoding of image sentiment information helps boost its performance.

Our third work explores how human attention influences visual sentiment. We experimentally disentangle effects of focal information and contextual information on human emotional reactions, then we incorporate related insights into computational models. On two benchmark datasets, the proposed computational models demonstrate superior performance compared to the state-of-the-art methods on visual sentiment prediction. 

We will end with future research direction on visual sentiment analysis. Our studies highlight the importance of understanding human cognition for interpreting the latent sentiments behind visual scenes.


Mohan Kankanhalli is Provost’s Chair Professor of Computer Science at the National University of Singapore (NUS). He is also the Dean of NUS School of Computing. Before becoming the Dean in July 2016, he was the NUS Vice Provost (Graduate Education) during 2014-2016 and Associate Provost during 2011-2013. Mohan obtained his BTech from IIT Kharagpur and MS & PhD from the Rensselaer Polytechnic Institute.

His current research interests are in Multimedia Computing, Information Security & Privacy, Image/Video Processing and Social Media Analysis. He directs the SeSaMe (Sensor-enhanced Social Media) Centre which does fundamental exploration of social cyber-physical systems which has applications in social sensing, sensor analytics and smart systems. He is on the editorial boards of several journals including the ACM Transactions on Multimedia, Springer Multimedia Systems Journal, Pattern Recognition Journal and IEEE Multimedia. He is a Fellow of IEEE.


Keynote Speaker III

Multimodal Social Signals Analysis

Prof. Nicu Sebe

Department of Information Engineering and Computer Science, University of Trento, Italy


Social perception is the main channel through which human beings access the social world and is one of the missing links in the communication between humans and computers. In this presentation I will present our recent research in social signals analysis (e.g., head and body pose, camera-based heart rate estimation, etc.). I will concentrate on  behavior modeling and recognition, with an emphasis on sensing and understanding users’ interactive actions and intentions for  achieving multimodal social interaction in natural settings, in particular pertaining to human dynamic face and body behavior in context dependent situations (task, mood/affect). Perspectives on multisensory observation will be addressed.  


Nicu Sebe is a professor in the University of Trento, Italy, where he is the director of the Department of Information Engineering and Computer Science. He is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered  aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Image and Video Retrieval (CIVR) 2007 and 2010. He was a general  chair of ACM Multimedia 2013 and ACM ICMR 2017 and a program chair of ACM Multimedia 2011 and 2007, ECCV 2016 and ICCV 2017. He is the program chair of ICPR 2020. Currently he is the ACM SIGMM vice chair. He is a fellow of IAPR and a Senior member of ACM and IEEE.