About 25 individuals, mostly from various government agencies in and around the Washington, D.C. area, recently came together for a two-day workshop taught by Dr. James Harner at the Bureau of Labor Statistic's conference and training center.
The workshop focuses on the data science process of extracting data from source systems, transforming this data into a tidy form, and then loading this data into distributed file systems, distributed data warehouses, and NoSQL databases, i.e., ETL. This workflow uses the SparkR and sparklyr packages as frontends to Spark from R. This provides the interfaces for modeling big data using regression and classification supervised learning methods These methods and others are covered using a rich set of examples. An encapsulated computing environment containing R, Spark, Hadoop, etc. is available to attendees to interact with the code locally or in Amazon’s cloud.
"R & Spark: Tools for Data Science Workflows" continues to be one of NISS's more popular workshops which is due in large part to the role that Dr. Harner has played as instructor. A couple of his recent workshop participants commented:
"The instructor did a great job of explaining the big picture of how workflows are becoming different, and tying that to concerns of statisticians."
"The ability of the instructor to provide an understanding of the complex interaction of software needed while presenting real examples to help solidify the run. The examples provided a wealth of useful information regarding easier ways to use R, etc., and lots, apparently, of experience-honed knowledge!"
To date, NISS has sponsored this workshop six times at UC Riverside, with CANSII in Toronto,Canada, with SAMSI in North Carolina, and three times in the DC area, twice at the Bureau of Labor Statistics and once at ASA Headquarters! So, where should we send Dr. Harner next?!
If your area or your institution is interested in having this workshop taught in your area, please contact NISS. Or, if you have ideas for other workshop topics / instructors that would be of interest to NISS affiliates, let NISS know about your ideas. Contact Randy Freret at firstname.lastname@example.org.