I have found it very hard to work out the meaning behind the term
"data science", but I think I have solved it now.
The scales fell from my eyes after reading John Tukey's famous 1962
article "The future of data analysis":
http://projecteuclid.org/download/pdf_1/euclid.aoms/1177704711
I reached that article from David Donoho's 2015 commentary "50 years of
data science":
http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf
I had also read and enjoyed the Berkeley text book for the undergraduate
course "foundations of data science"
https://www.inferentialthinking.com/index.html
Here is the key quote from the Tukey article:
All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.
"Data analysis" makes a lot of sense as a term for a way of thinking
about data that is not the same as statistics, or computer science.
Tukey makes the case that skill in data analysis is something learned by
experience. A skillful data analyst looks carefully at the data and uses
the structure of the data to think of algorithms that will lead to sound
conclusions. These algorithms will often be tailored to the data, and
have properties that may, for now, be hard to explore with formal
mathematical proof, but they will lead to new insight, and often, new
mathematics.
I believe that many people talking about "data science" are not, in
fact, talking about a new branch of scientific inquiry, but about an
approach to data, that Tukey would have recognized as "data
analysis".
The Google+ URL for this post was
https://plus.google.com/+MatthewBrett/posts/Xc3EaBEikTC