Session 3: AI and Health Data Science 

<<  Back to 2nd CANSSI-NISS Health Data Science Workshop Event Page

Session 3 Chair: Liqun Diao, University of Waterloo

PointNovo: Paving the Way for Personalized Cancer Vaccines with Deep Learning-Powered Peptide Sequencing

Speaker: Ali Ghodsi, University of Waterloo

Abstract: As we look forward to the next frontier in cancer treatment, personalized cancer vaccines emerge as a pivotal advancement. A critical step in crafting these vaccines lies in pinpointing specific markers, known as tumor-specific neoantigens, found on tumor cells. One effective technique to achieve this is de novo peptide sequencing, a process that deciphers protein fragments using mass spectrometry data. 

In this talk, we present PointNovo, a cutting-edge model for peptide sequencing. Unlike traditional methods, PointNovo captures the essence of a spectrum through direct representation as pairs of mass-to-charge ratio (m/z) and intensity. This approach circumvents the usual trade-offs between accuracy, speed, and memory. PointNovo offers a holistic training and prediction approach for peptide patterns by integrating an order-invariant network structure with recurrent neural networks. Preliminary tests spanning diverse datasets and species underscore the prowess of PointNovo. It surpasses established methods, boasting a 13.01-23.95% leap in accuracy at the peptide level.

Responsible Data Science and AI for Advancing Intelligent and Equitable Health

Speaker: Qi Long, Ph.D., University of Pennsylvania

Abstract: Rapid advances in technologies have enabled generation and collection of vast amounts of health data in research studies, from healthcare delivery, and from other real-world sources. While such data offer great promises in advancing intelligent and equitable health, they also present daunting analytical challenges. One notable example is the data from electronic health records (EHRs) that are recorded at irregular time intervals with varying frequencies and include structured data such as labs and vitals, codified data such as diagnosis and procedure codes, and unstructured data such as clinical notes and pathology reports. They are typically incomplete and fraught with other data errors and biases. What’s more, data gaps and errors in EHRs are often unequally distributed across patient groups: People with less access to care, often people of color or with lower socioeconomic status, tend to have more incomplete EHRs. In this talk, I will discuss these challenges and share my research group’s recent work on developing robust statistical and machine learning methods for addressing some of these challenges.  Our research experience has demonstrated that a trans-disciplinary health data science approach that involves collaboration between statisticians, informaticians, computer scientists, and physician scientists can accelerate innovation in harnessing the transformative power of EHRs to tackle complex real-world problems and exert meaningful impact in medicine. To this end, I will also discuss some open questions that present opportunities for future research.

Robust Q-learning for Dynamic Treatment Regimes

Speaker: Rob Strawderman, University of Rochester

Abstract: Q-learning is a regression-based approach that is widely used to formalize the development of an optimal dynamic treatment strategy, but is highly sensitive to the specification of finite-dimensional working models used to estimate ``treatment free’’ nuisance parameters. Misspecification of these working models can lead to serious bias due to residual confounding, and may result in treatment strategies that are sub-optimal. We propose a robust Q-learning approach that allows estimating such nuisance parameters using data-adaptive techniques. Methodology, asymptotics and simulations will be summarized and we highlight the utility of the proposed methods through simulation. Time permitting, data from the ``Extending Treatment Effectiveness of Naltrexone'' multistage randomized trial will be used to illustrate the proposed methods.


Ensembling Imbalanced-Spatial-Structured Support Vector Machine 

Speaker: Grace Yi,  Western U, Ontario

Abstract: Support Vector Machine (SVM) and its extensions have found wide applications across various domains. However, these methods often face challenges when dealing with imbalanced data that exhibit spatial association. To address this issue, we propose a new approach called the Ensembling Imbalanced-Spatial-Structured Support Vector Machine (EISS-SVM) method. Our method not only considers the relationship between the response and predictors but also takes into account the spatial correlation that may exist in imbalanced data. The EISS-SVM classifier is designed to encompass the conventional SVM as a special case, allowing for a seamless transition between the two methods. We demonstrate the satisfactory performance of the proposed method through simulation studies. Furthermore, we employ the method to effectively handle the imaging data derived from an ongoing prostate cancer imaging study conducted by the Prostate Cancer Imaging Team at the University of Western Ontario, Canada. This is joint work with Xin Liu, Wenqing He and  Glenn Bauman. 

About the Speakers

Ali Ghodsi, University of Waterloo

Profile Link:

Professor Ghodsi's current research sweeps across a broad swath of AI encompassing machine learning, deep learning, and dimensionality reduction.  He studies theoretical frameworks and develops new machine-learning algorithms for analyzing large-scale data sets, with applications in natural language processing, bioinformatics, and computer vision. Dr. Ghodsi's work has been published extensively in high-quality proceedings and journals. He is the co-author of the "Elements of Dimensionality Reduction and Manifold Learning" (Springer) and several US patents. His popular lectures on YouTube have more than one million views. View a complete list of his online lectures. His expertise areas are in Machine learning, Deep learning, Computational statistics, Dimensionality reduction, Natural language processing, Bioinformatics.


Qi Long, Ph.D., University of Pennsylvania
Professor of Biostatistics in Biostatistics and Epidemiology

Profile Link:

Dr. Long's research purposefully includes novel statistical and machine learning research and impactful biomedical research, each of which reinforces the other. Its thrust is to develop robust statistical and machine learning methods for advancing intelligent and equitable health and medicine. Specifically, he has developed methods for analysis of big health data (-omics, EHRs, and mHealth data), predictive modeling, missing data, causal inference, data privacy, data and algorithmic fairness, Bayesian methods and clinical trials. Dr. Long’s methodological research has been supported by the National Institutes of Health, the Patient-Centered Outcomes Research Institute, and the National Science Foundation. Dr. Long has directed the Statistical and Data Coordinating Center for national research networks and large-scale multi-site clinical studies—supervising a team of database administrators and programmers, application developers and statistical analysts. He currently co-directs (with Dr. Nicola Mason at Penn Vet) the Coordinating Center for the Premedical Cancer Immunotherapy Network for Canine Trials (PRECINCT), part of NCI’s Cancer Moonshot Initiative. Dr. Long is the founding Director of the Center for Cancer Data Science, and Associate Director for Cancer Informatics of the Penn Institute for Biomedical Informatics. He also directs the Biostatistics and Bioinformatics Core in the Abramson Cancer Center at the University of Pennsylvania. Dr. Long is an elected fellow of the American Association for the Advancement of Science (AAAS), elected fellow of the American Statistical Association (ASA), and elected member of the International Statistical Institute (ISI).


Rob Strawderman, University of Rochester

Profile Link:

My major research interests fall broadly into the area of survival analysis. I am particularly interested in semiparametric methods for missing and censored data, especially recurrent events, as well as statistical learning methods for risk and outcome prediction in biomedical and public applications involving time-to-event outcomes subject to right censoring and/or competing risks. Other more recent research interests include statistical and computational methods for high dimensional data and variable selection, with applications to semiparametric modeling and inference for dynamic treatment regimes and mediation analysis. Recent collaborative research interests include applications in cancer, heart disease,  neurology and psychiatry.


Grace Yi,  Western U, Ontario

Profile Link:

Grace Y. Yi is a professor at the University of Western Ontario where she currently holds a Tier I Canada Research Chair in Data Science. She is recognized as one of the influential women in Statistics. Professor Yi's research interests focus on developing methodology to address various challenges concerning Data Science, public health, cancer research, epidemiological studies, environmental studies, and social science. Professor Yi's recent research has been centered around investigating machine learning and statistical methods to tackle problems concerning imaging data, missing data, measurement error in variables, causal inference, high dimensional data, survival data, and longitudinal data. Dr. Yi has served the professions in various capacities. She was the Editor-in-Chief of The Canadian Journal of Statistics (2016-2018), and is currently the Editor of Statistical Methodology Section for The New England Journal of Statistics in Data Science. She was the President of the Biostatistics Section of the Statistical Society of Canada in 2016 and the Founder of the first chapter (Canada Chapter, established in 2012) of International Chinese Statistical Association. She takes on the Presidency of the Statistical Society of Canada for the period of 2020-2022.


<<  Back to 2nd CANSSI-NISS Health Data Science Workshop Event Page