From MPEG-4 to Deep Learning: NISS-CANSSI Webinar Explores Transformative Role of AI in Audio-Visual Healthcare Analytics

Event Page: NISS-CANSSI Collaborative Data Science Webinar: From MPEG-4 to Deep Learning: Transforming Audio-Visual Analytics for Healthcare and Beyond

Date: Thursday, May 8, 2025 at 1-2pm ET

The NISS-CANSSI Collaborative Data Science Webinar, From MPEG-4 to Deep Learning: Transforming Audio-Visual Analytics for Healthcare and Beyond, covered the evolution and challenges of audio and video compression, as well as advancements in AI-driven analytics for visual data processing. The discussion included various applications of AI in healthcare, such as drug peel detection, face identification, and skin condition analysis using generative models. While highlighting the potential of AI in medical treatments and surgeries, the speakers also emphasized the importance of accurate data and cautioned that fully autonomous AI in healthcare is still a distant prospect. Speakers during this session included Anand Paul, Associate Professor, Department of Biostatistics and Data Science, School of Public Health, Louisiana State University Health Sciences Center, and An-Chao Tsai, Ph.D., SMIEEE Associate Professor, National Pingtung University, Taiwan. The session was moderated by Qingzhao Yu, Associate Dean for Research at the School of Public Health, Louisiana State University Health, New Orleans.

Audio and Video Compression Challenges

Anand and An-Chao discussed the challenges of audio and video compression in the past, including storage and bandwidth limitations, potential data loss, and latency issues in real-time processing. They also mentioned the lack of efficient hardware for handling complex audio and video data streaming. An-Chao introduced Anand as a professor at Louisiana State University Health Science Center, specializing in big data analytics, AI, and machine learning. An-Chao also mentioned that Anand has served as an editor for several prestigious journals and held key roles in academic conferences.

MPEG-1 to H.265 Compression Evolution

Anand discussed the evolution of audio and video compression formats, starting with MPEG-1 in 1991 and moving to more advanced standards like H.264 and H.265. He explained the basic architecture of these formats, including temporal models, motion estimation, and the use of discrete cosine transforms. Anand also highlighted the impact of the AlexNet paper, published in 2012, which introduced a convolutional neural network architecture for image classification and significantly improved performance on the ImageNet database.

Transformer-Based Models in Audio/Video Analytics

Anand discussed the evolution of audio and video analytics, highlighting the shift from traditional convolutional neural networks (CNN) to transformer-based models. He explained how these models work, focusing on the attention mechanism that allows them to capture specific features and actions. Anand also mentioned the use of GPUs in training these models and the increasing trend of using deep learning in various applications. He concluded by emphasizing the importance of these models in action recognition and their potential to revolutionize the field of video processing and compression.

Visual Data Analysis Model Performance

Anand discussed the performance of different models in visual data analysis, highlighting the advantages of a hybrid model combining CNN and transformers. He also presented a deep, sparse capsule network for non-invasive blood glucose level estimation, which uses PPG signals to classify diabetic and non-diabetic individuals. Anand compared the performance of their model with other models like Bial Stm, decision trees, and xt booster, noting that their model performs better in some areas. He also introduced a federated learning system for medical emergencies, which uses a semi-supervised model to analyze patient data and alert healthcare providers.

AI Research on Drug Peel Detection

Anand and An-Chao discussed their research in artificial intelligence, focusing on drug peel detection and face identification. An-Chao presented a drug peel detection system using a CNN model, which achieved high accuracy in localizing and classifying the peels. They also discussed a face identification system that could work in real-time, even with a mask, using a single-stage headless model. The system was tested with various data sets and achieved high accuracy. An-Chao also mentioned a project involving the detection of black soldier flies, which was considered unusual but had potential applications.

Worm Processing and YOLO Loss Architecture

An-Chao discussed the use of a small worm to process unwanted organic food, which can be digested and turned into fertilizer. An-Chao also explained the different stages of the worm's life cycle and the importance of controlling food quantity to prevent ammonia buildup. An-Chao then presented a YOLO loss architecture for detecting the worm's stage, which was found to be more accurate and efficient than other models. The system was designed to be used with a cell phone, allowing users to take photos of their worm and receive an estimated count based on the size and depth of their pool. The system was compared to other YOLO models and found to have better performance and smaller model size. The work is currently under review.

Generative AI in Dermatology Object Detection

An-Chao discussed the use of generative artificial intelligence in object detection, particularly in dermatology. They collaborated with a dermatologist to develop a system that can detect Melasma, a skin condition, using a cell phone. The system generates images of the skin before and after treatment, helping to reduce medical disputes. An-Chao also explained the use of a model called Pixel to Pixel HD, which is a generating model that handles a discriminator and a generator. They demonstrated the system's ability to generate images of Melasma and hemoglobin, and compared it to other models like CycleGAN. The results showed that Pixel to Pixel HD is very close to the original data set.

AI in Healthcare: Challenges and Future

In the meeting, An-Chao and Anand discussed the potential of AI in healthcare, particularly in image analysis. Anand suggested that AI could assist in certain medical treatments and surgeries, but cautioned that fully autonomous AI in healthcare is still a long way off, possibly not until 2050 or later. An-Chao highlighted the challenge of obtaining accurate and labeled data for training AI models, which is crucial for their effectiveness.

Acknowledgements and Recognition

The webinar was supported by the National Institute for Statistical Science (NISS) and the Canadian Statistical Science Institute (CANSSI). NISS and CANSSI extend their sincere thanks to the distinguished speakers, Dr. Anand Paul and Dr. An-Chao Tsai, for sharing their cutting-edge research and insights into the evolving role of AI in healthcare. Special appreciation goes to Dr. Qingzhao Yu for skillfully moderating the session and guiding the discussion. We are also grateful to CANSSI for their generous sponsorship and continued collaboration in promoting interdisciplinary data science research.

Thursday, May 8, 2025 by Megan Glenn

You are here

From MPEG-4 to Deep Learning: NISS-CANSSI Webinar Explores Transformative Role of AI in Audio-Visual Healthcare Analytics