From Code to Products: Dr. Alex Reinhart Leads Insightful NISS Software Engineering Short Course for Data Scientists

Event Page: Short Course: Alex Reinhart: "From Code to Products - Software Engineering for Data Science" 

Date: Thursday, April 3, 2025 at 1-3pm ET 

On Thursday, April 3, 2025 at 1-3pm ET, the NISS New Researchers Network hosted a virtual short course titled "From Code to Products: Software Engineering for Data Science", led by Dr. Alex Reinhart, assistant teaching professor at Carnegie Mellon University. The two-hour session, moderated by Jason Cho, offered a deep dive into the foundational practices of software engineering tailored specifically for data science and statistical programming. 

Dr. Reinhart opened the session by emphasizing the growing complexity of data analysis projects and the corresponding need for structured software engineering principles. Using real-world examples—including a large-scale survey data processing project—he illustrated the importance of modular code design. By breaking tasks into smaller, reusable components with clear inputs and outputs, teams can reduce redundancy, increase maintainability, and collaborate more effectively. 

One key theme throughout the presentation was documentation. Dr. Reinhart stressed that clear, consistent documentation ensures that code remains usable over time—not just for team members, but also for the original authors revisiting their work. He showcased documentation techniques in both R and Python, including Roxygen2 and Python docstrings, and advocated for adherence to established style guides such as the tidyverse style guide (R) and PEP 8 (Python). 

Testing also played a central role in the course. Dr. Reinhart discussed unit testing, the philosophy of test-driven development, and the use of frameworks like pytest (Python) and testthat (R). He introduced generative testing as a way to manage the inherent randomness of stochastic functions and explained how frequent, automated testing helps prevent bugs and improves code quality. 

Another crucial skill covered was version control. Dr. Reinhart walked attendees through the use of Git and GitHub, demonstrating how to manage changes, collaborate through branching and merging, and maintain a project’s history with logical commits. He briefly introduced continuous integration systems, which automatically run tests and enforce software standards with each update. 

The talk concluded with a candid discussion about common mistakes data scientists make in large software projects. Drawing on lessons from the Covidcast project, Dr. Reinhart emphasized the long-term value of documentation, testing, data dictionaries, and treating code as a product rather than just a tool to produce results. He also addressed how to cultivate a healthy coding culture—one that balances accountability with encouragement, supports continuous improvement, and avoids blame in favor of root-cause analysis. 

Acknowledgment: 

NISS extends our sincere thanks to Dr. Alex Reinhart for his engaging and informative presentation, to moderator Jason Cho for guiding the session, and to the NISS New Researchers Network Committee for organizing this valuable professional development opportunity for early-career researchers and data scientists. 

Post event access: 

The recording of this short course will not be made publicly available until a later date, as it was a paid event; however, all registrants have been provided with access to the recording as part of their registration. 

Thursday, April 3, 2025 by Megan Glenn