Tukey ( Figure 1-1) called for a reformation of statistics in his seminal paper “The Future of Data Analysis”. This chapter focuses on the first step in any data science project: exploring the data.Įxploratory data analysis, or EDA, is a comparatively new area of statistics.Ĭlassical statistics focused almost exclusively on inference, a sometimes complex set of procedures for drawing conclusions about large populations based on small samples. The main goal of this book is to help illuminate these concepts and clarify their importance-or lack thereof-in the context of data science and big data. These and many other statistical concepts live largely in the recesses of data science. Introducing key ideas of experimental design and maximum likelihood estimation.
Fisher, in the early 20th century, was a leading pioneer of modern statistics, Modern statistics as a rigorous scientific discipline traces its roots back to the late 1800s and Francis Galton and Karl Pearson. In contrast to the purely theoretical nature of probability, statistics is an applied science concerned with analysis and modeling of data. Probability theory-the mathematical foundation for statistics-was developed in the 17th to 19th centuries based on work by Thomas Bayes, Pierre-Simon Laplace, and Carl Gauss. As a discipline, statistics has mostly developed in the past century.