Assoc. Prof. Klaus Schoeffmann (University of Klagenfurt, Austria) & Prof. Cathal Gurrin (Dublin City University, Ireland)
Keynote Title
From Concepts To Embeddings. Charting the use of AI in Digital Video and Lifelog Search over the last Decade
Abstract: In the past decade, the field of interactive multimedia retrieval has undergone a transformative evolution driven by the advances in artificial intelligence (AI). This keynote talk will explore the journey from early concept-based retrieval systems to the sophisticated embedding-based techniques that dominate the landscape today. By examining the progression of such AI-driven approaches at both the VBS (Video Browser Showdown) and the LSC (Lifelog Search Challenge), we will highlight the pivotal role of comparative benchmarking in accelerating innovation and establishing performance standards. We will also forward at the potential future developments in interactive multimedia retrieval benchmarking, including emerging trends, the integration of multimodal data, and the future comparative benchmarking challenges within our community.
Prof. Huiyu Zhou, University of Leicester
Bio: Dr. Huiyu Zhou received a Bachelor of Engineering degree in Radio Technology from Huazhong University of Science and Technology of China, and a Master of Science degree in Biomedical Engineering from University of Dundee of United Kingdom, respectively. He was awarded a Doctor of Philosophy degree in Computer Vision from Heriot-Watt University, Edinburgh, United Kingdom. Dr. Zhou currently is a full Professor at School of Computing and Mathematical Sciences, University of Leicester, United Kingdom. He has published over 500 peer-reviewed papers in the field. His research work has been or is being supported by UK EPSRC, ESRC, AHRC, MRC, EU, Innovate UK, Royal Society, British Heart Foundation, Leverhulme Trust, Puffin Trust, Alzheimer’s Research UK, Invest NI and industry. Homepage: https://le.ac.uk/people/huiyu-zhou
Keynote Title
Video Understanding for Behavioural Analysis
Abstract: Video understanding has emerged as a powerful tool in behavioural analysis, offering innovative methodologies to capture and interpret complex behaviours from visual data. This talk explores the various techniques used in video understanding, including machine learning, deep learning, and computer vision, which can be used to address the challenges in this field, such as accurately detecting and tracking multiple subjects, recognising subtle and nuanced behaviours, and managing large volumes of video data. The application of video understanding extends across numerous sectors from multimedia, healthcare to security, with potential to revolutionise behavioural analysis and beyond.
Prof Zhou will share his experience and insights in video understanding for behavioural analysis. He will present a case study on Parkinson’s disease (PD) diagnosis to demonstrate the capability of video understanding in healthcare. He will describe the methodologies developed to analyse behaviours in both animals (e.g, mice) and humans. These include pioneering techniques for detecting and tracking single and multiple mice, recognising individual and social behaviours, conducting comprehensive social behaviour analysis, and also for distinguishing between normal and PD-afflicted mice by examining their interactions and movements. He will conclude with a vision for the future of video understanding in behavioural analysis, along with an outline for the anticipated advancements in technology and methodology, the potential for broader applications, and the ongoing research efforts aimed at overcoming current limitations.
Prof Mark Plumbley, University of Surrey
Bio:
Prof. Mark Plumbley is Professor of Signal Processing at the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey, in Guildford, UK. He is an expert on analysis and processing of audio, using a wide range of signal processing and machine learning methods. He led the first international data challenge on Detection and Classification of Acoustic Scenes and Events (DCASE), and is a co-editor of the book “Computational Analysis of Sound Scenes and Events” (Springer, 2018). He currently holds a 5-year EPSRC Fellowship “AI for Sound” on automatic recognition of everyday sounds. He is a Member of the IEEE Signal Processing Society Technical Committee on Audio and Acoustic Signal Processing, and a Fellow of the IET and IEEE.
Keynote Title
Machine Learning for Everyday Sounds: Recognition, Captioning, Visualization, Separation and Generation of Audio
Abstract:
The last few years has seen a rapid increase of interest in machine learning for everyday sounds. Starting a decade ago with acoustic scene classification and sound event detection, the challenges and workshops on Detection and Classification of Acoustic Scenes and Events (DCASE) have brought together researchers from academia and industry to establish a new research community. In this talk, I will highlight some of the recent work taking place in this area at the University of Surrey, including pretrained audio neural networks (PANNs), audio captioning, audio visualization, audio source separation and audio generation (AudioLDM). I will also mention some cross-cutting issues such as dataset collection and algorithm efficiency, and discuss how we might design future audio machine learning applications for the benefit of people and society.