Software Analytics in Action: A Hands-on Tutorial on Analyzing and Modelling Software Data

Speaker: Dr. Chakkrit (Kla) Tantithamthavorn


Software analytics focuses on analyzing and modeling a rich source of software data using well-established data analytics techniques in order to glean actionable insights for improving development practices, productivity, and software quality. However, if care is not taken when analyzing and modeling software data, the predictions and insights that are derived from analytical models may be inaccurate and unreliable. The goal of this hands-on tutorial is to guide participants on how to (1) analyze software data using statistical techniques like correlation analysis, hypothesis testing, effect size analysis, and multiple comparisons, (2) develop accurate, reliable, and reproducible analytical models, (3) interpret the models to uncover relationships and insights, and (4) discuss pitfalls associated with analytical techniques including hands-on examples with real software data. R will be the primary programming language. Code samples will be available in a public GitHub repository. Participants will do exercises via RStudio.

Bio: Dr. Chakkrit (Kla) Tantithamthavorn is a lecturer in the Faculty of Information Technology, Monash University, Australia. He is an emerging expert in the areas of Explainable Software Analytics and Software Quality Management, having advanced the foundations of empirical-grounded software quality theories and advanced many intelligence technologies for software quality management. His research aims to develop technologies that enable software practitioners to produce the highest quality software systems with the lowest costs. Currently, his research focused on inventing practical and explainable analytics to prevent future software defects. He is best known as a lead instructor at MSR Education 2019 about Guidelines and Pitfalls for Mining, Analyzing, Modelling, and Explaining Software Defects, and the author of the ScottKnott ESD R package (i.e., a statistical mean comparison test) with more than 8,000 downloads. More about him is available at