Jim Harner Leads First Session Loaded with Methods and Examples Describing Data Science Workflows

Instructor Jim Harner (WVU) responds to a participant's question as he processes a k-means procedure using R and Spark.

October 7, 2020

This was the first of ten tutorials in the NISS Essential Data Science for Business series - the Top 10 analytics approaches of the key topics that are used in business today!  Students and faculty please note: these are perhaps the top ten most important and practical topics that may not be covered in your program of study.  (Review the Overview Presentation about all 10 Sessions).

Dr. Jim Harner, (Professor Emeritus of Statistics at West Virginia University) certainly came prepared with lots of examples, data related to these examples and code that he uses to analyze these data!  If you were looking for a way to immerse yourself into different ways that you might go about managing your data, then this is a session that not only touches on so many topics but is backed by a host of relevant resources to boot! 

The Topics

Jim separated the 3-hour tutorial into three sections.  The first section involved reviewing some of the fundamentals such as understanding what workflows are, the implications of various data structures, what basic functions do and from here, an introduction to machine learning. 

The second section of the tutorial focused on dealing with data.  Jim talked about and demonstrated various data transformations, as well as methods for data manipulation and data cleaning.  He then went on to implement these concepts using both Hadoop and Spark processing techniques.

In the third section Jim’s comments and examples first focused on supervised learning, reviewing methods such as regression, regularization to tree models.  He followed this by talking about unsupervised learning with methods such as k-means.

Links to Resources

As you might imagine, this session was filled to the brim with details.  Some might say that you could have taken a week or more to touch on what was covered here.  However, not to worry, as was commented in a post-session evaluation: “well written materials, Rmarkdown files,”“The materials are amazing, self-contained and useful for further learning.” “All the resources that were given to us beforehand as well as the books and websites to check out. I find very helpful as I am still learning how to use these tools.” 

Access to Materials

If you were not able to attend this live session you can still access a recording of the session along with links to all of the resources.  Use the Registration Option "Post Session Access" on the event webpage, pay the $35 fee, and NISS will provide you with access to the materials for this session. Or register for the full series of ten tutorials, and we will provide the link as well.

What’s Up Next?

The next NISS Essential Data Science for Business tutorial is scheduled to take place on Wednesday, October 21, 2020.  Lee Wilkinson (H2O, and University of Illinois at Chicago) will be the instructor for the next topic, “Descriptive Analytics, Exploratory Data Analysis, and Data Visualization.”  Register today!

Further schedule dates/topics include:  

Thursday, October 8, 2020 by Glenn Johnson