What Should Be In the Modern Statisticians Toolkit? (March 31, 2021)

Mine Cetinkaya-Rundel, (University of Edinburgh, Duke University, RStudio) highlights a number of packages associated with tidy R packages.Mine Cetinkaya-Rundel, (University of Edinburgh, Duke University, RStudio) provides a list of reasons how GitHub can be used to help connect and collaborate with others.Mine Cetinkaya-Rundel, (University of Edinburgh, Duke University, RStudio) responds to questions relayed to her by moderator Analisa Flores (University of California, Riverside).

The session was organized and hosted by both the Academic Affiliates Committee and NISS Graduate Student Network.  Several earlier webinars focused on how to get a job in academia, industry or government but this one took a different angle.  If someone is preparing for a position in any of these sectors, what are the tools that statisticians should have in their ‘back pocket’?  What do these tools do, and why is it important to have these skills?  These must have been good questions, because nearly 100 attendees logged into this session!

Mine Cetinkaya-Rundel, (University of Edinburgh, Duke University, RStudio), was the main speaker. Her goal for this webinar was to introduce an overview of the tools she thought were most valuable and speak to the benefits of individuals investing the time to learn more about them. Most of her suggestions related to various packages or capabilities using R, as this was her main area of expertise and the R software is used extensively by statisticians and data scientists.

Mine started by providing an overview of the various R packages contained in tidyverse.  She provided an explanation of the basic functions that tidyverse provides users and used a simple example that demonstrated the advantages of moving from a typical nested programming approach to one that is piped and therefore easier to construct and read using a tidyverse approach.  Mine next demonstrated using ggplot for data display. Using a simple example, she showed how there are many ways that a user could easily customize how displays can be modified. It was clear that she only scratched the surface of ggplot’s capabilities. From here Mine walked through using tidyr to manipulate data, tidymodels for modeling and machine learning and then provided a sample from an ecosystem of tidy packages that have been developed (and continue to be developed) surrounding tidyverse.

Mine’s next point for up and coming statisticians to consider is the importance of communicating with others.  She first commented on the value-added advantages rmarkdown and its companion packages allow for the organization and sharing of the work that is done within individual projects in R.  Being able to see clearly and document what is going on being the critical aspect of not only sharing but getting help when you are stuck.  Reaching out from here Mine reviewed Git and GitHub as a community to get to know and become a part of.  She listed a number of examples where significant value can be leveraged using the collaborative tools and repositories to connect with this community of developers.  

The final two slides of her presentation were filled with lists of connections and resources.  This certainly will help you get started building your own toolkit!

A good amount of time was given to answering questions and moderator Analisa Flores (University of California, Riverside) was kept busy fielding questions from attendees and sharing these with Mine. Mine was very good at providing well thought out and succinct answers

Questions included: “Based on your experience, which software is better for us for work in with big data processing/ ML algorithm/ data wrangling, Python or R? Why?”, “Is it important to be proficient in python if you are proficient in R? Is python a necessity to be professionally connected to machine learning research?”, “When choosing between tidyverse functions and functions like apply() that comes with basic R, do we need to think about computational speed? In other words, if I am working with a very big data set, should we use one approach over the other?”, and “What are some general themes modern statisticians should keep in mind when looking at career growth? Are there particular hard and/or soft skills to focus on beyond being able to communicate your analysis well?”  Great questions – and even better responses!  

If you are interested in finding ways to maximize the skills that you can bring to the table, either in your current position or for a position that you plan to apply for, a review of this session along with an exploration into the numerous links to resources is essential.  Play the recording of this session along with the slides that Mine used below. The slides not only provide you with the key points that were offered but also include lots of links and suggestions of additional resources that should not be ignored!

Recording of the Session

Slides used by the Speaker

Mine Cetinkaya-Rundel, (University of Edinburgh, Duke University, RStudio)

download pdf: "Toolkit for the Modern Statistician"

Wednesday, March 31, 2021 by Glenn Johnson