Industry as the origin of "data science".

Leo Breiman was a statistician who was interested in algorithmic models. He invented, among other things, the random forest method. Breiman did his first degree in Physics, a PhD in mathematics, and then worked for seven years as an academic probabilist. He resigned his academic position and worked for a further 13 years as a freelance consultant in data analysis, before returning to academia as a statistician in UC Berkeley. In his 2001 paper "Statistical Modeling: The Two Cultures", he describes what he learned from his time working as a consultant:

As I left consulting to go back to the university, these were the perceptions I had about working with data to find answers to problems:

(a) Focus on finding a good solution - that's what consultants get paid for.
(b) Live with the data before you plunge into modeling.
(c) Search for a model that gives a good solution, either algorithmic or data.
(d) Predictive accuracy on test sets is the criterion for how good the model is.
(e) Computers are an indispensable partner.

Leo Breiman (2001) "Statistical Modeling: The two cultures" Statistical Science 16(3), 199–231.

It seems to me this is a manifesto for what is currently being called "data science". I wonder whether this is one case where industry has injected urgency and rigor into the process of analysis.

The Google+ URL for this post was https://plus.google.com/+MatthewBrett/posts/K9LxmTc66b5

Share on: TwitterFacebookEmail



Published

Category

G+ archive

Atom feed