Essential Data Science for Business: Predictive Analytics and Machine Learning

November 4, 2020 1-4 pm

The tutorials in this NISS series involve the Top 10 analytics approaches of the key topics that are used in business today!  Students and faculty, these are perhaps the top ten most important and practical topics that may not be covered in your program of study.  (Review the Overview Presentation about all 10 Sessions).

Predictive Analytics and Machine Learning

Overview

Predictive Analytics and Machine Learning cover a large and highly varied arsenal of techniques. To make our precious 3-hour time together more productive and fun, we focus on three commonly used supervised machine learning techniques that are complementary to multiple regression and logistic regression. They are Classification and Regression Trees, and two ensemble methods: Gradient Boosting and  Random Forests. These methods are developed for addressing the limitations in traditional statistical modeling, and for making good predictions when data are extremely messy. In this tutorial, we first introduce classification and regression trees, the single decision tree methodology, which is the foundation for the two ensemble methods. We discuss how to construct a decision tree with binary recursive partitioning algorithm and  how to use the K-fold cross-validation to select the best tree. After that, we fit a classification tree model to the heart data available at https://archive.ics.uci.edu/ml/index.php using Minitab Statistical Software (MSS) and identify important predictors for heart disease. We then turn to gradient boosting and random forests to improve the prediction accuracy from a single decision tree. By carefully working through the two ensemble methods in this data example, we expose the unique specialties of each of them. Even though gradient boosting often produces better prediction results, there is no absolute winner between the two. Towards the end of the tutorial, we fit the gradient boosted classification model and the random forest model to the heart data with MSS to compare the analytical results from the three methods. 

Materials

A PowerPoint document for the presentation will provide clear instructions, interspersed with demonstrations of the examples on Minitab Statistical Software (MSS) using CART commands. Commands for gradient boosting and random forest models inside MSS will be demonstrated and made available later; participants with access to MSS can use the CART commands on the sample data, or their own data.

Instructor

Yanling Zuo (Senior Advisory Statistician at Minitab LLC)

Goals

NISS is interested in sharing knowledge.  To this end, these webinars have been geared to provide practical information that you can use tomorrow. Examples, projects and code sharing are a part of these sessions wherever possible.

Series Prerequisites

Participants require a working knowledge of probability distributions, statistical inference, statistical modeling and time series analysis as a prerequisite. Students who do not have this foundation or have not reviewed this material within the past couple of years will struggle with the concepts and methods that build on this foundation.

Registration

Select a registration/payment option above the 'Register for this Event' button ($35/session, $250 for all 10 Essential Data Science for Business tutorial sessions.). NISS Affiliates, (https://www.niss.org/affiliates-list), please send an email to officeadmin@niss.org.).


Agenda

About the Instructor

Yanling Zuo, PhD, is Senior Advisory Statistician at Minitab LLC. She has been the lead statistical designer of Minitab Statistical Software (MSS) since 2007. Partnered with MSS product managers, Yanling has led the efforts of creating statistical feature development plans for every MSS release since release 15.  Yanling is an expert in statistical methodology. Since joining Minitab in 1998, she has designed/enhanced more than 60 new and existing statistical commands inside MSS. She has written or reviewed several dozens of technical papers for MSS users.  Yanling’s current research interests are Statistical Computing for Large Data, and Bayesian Hierarchical Modeling. Lately, by designing six tree-based commands inside MSS, she has been extremely impressed by the flexibility and the power of the tree-based methods and started to do research in this area.  Yanling has made invited presentations at the Joint Statistical Meetings, the Fall Technical Conference, and the Quality and Productivity Research Conference. She has published research results in peer-reviewed journals and a book. Yanling is a member of ASA and ASQ. 

Event Type

Host

National Institute of Statistical Sciences

Sponsor

National Institute of Statistical Sciences

Cost

$35 for this session; $250 for all 10 Data Science Sessions

Location

Online Tutorial
United States
Yanling Zuo (Senior Advisory Statistician at Minitab LLC), Instructor of the "Predictive Analytics and Machine Learning" tutorial.