2021 ADC – Databases as Statistical Backend Computing
Professor Thomas Lumley, The University of Auckland
Statistical estimation and inference tends to combine simple matrix or table operations on large data sets and complex, task-specific calculation on small summaries extracted from these operations. This division of labour matches the division between flexible but inefficient statistical programming environments and modern database systems. I will discuss the use of R and lazy data backends to combine rapid prototyping, flexible programming, and the ability to use large data sets. I will also talk about the use of statistical sampling theory to further reduce the computational burden.
Thomas Lumley is Professor of Biostatistics in the Statistics department at the University of Auckland. He obtained his PhD in Biostatistics at the University of Washington, Seattle. He has wide-ranging research interests in theoretical and applied biostatistics and statistical computing, recently including large-scale genomics and estimation based on subsamples of large data sets. Thomas is a Fellow of the Royal Society of New Zealand and the American Statistical Association, and has been an R Core developer since 1997.