I have found it very hard to work out the meaning behind the term "data science", but I think I have solved it now.

The scales fell from my eyes after reading John Tukey's famous 1962 article "The future of data analysis":

http://projecteuclid.org/download/pdf_1/euclid.aoms/1177704711

I reached that article from David Donoho's 2015 commentary "50 years of data science":

http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf

I had also read and enjoyed the Berkeley text book for the undergraduate course "foundations of data science"

https://www.inferentialthinking.com/index.html

Here is the key quote from the Tukey article:

All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.

"Data analysis" makes a lot of sense as a term for a way of thinking about data that is not the same as statistics, or computer science. Tukey makes the case that skill in data analysis is something learned by experience. A skillful data analyst looks carefully at the data and uses the structure of the data to think of algorithms that will lead to sound conclusions. These algorithms will often be tailored to the data, and have properties that may, for now, be hard to explore with formal mathematical proof, but they will lead to new insight, and often, new mathematics.

I believe that many people talking about "data science" are not, in fact, talking about a new branch of scientific inquiry, but about an approach to data, that Tukey would have recognized as "data analysis".

The Google+ URL for this post was https://plus.google.com/+MatthewBrett/posts/Xc3EaBEikTC

Share on: TwitterFacebookEmail



Published

Category

G+ archive

Atom feed