This post links to some evidence that data science in industry may be over the peak of the hype cycle.
Here is the hype cycle 1:
We may be somewhere mid-way between the peak of inflated expectations and the trough of disillusionment.
Where is data science on the hype cycle? (Stefan Groschupf)
Stefan Groschupf is a “big data veteran” who co-founded and ran Datameer, a company that sells a data analytics platform for industry.
He has a Dataversity video blog post from June 2017 on Where is Data Science in the Hype Cycle?.
I think we’re over the peak. It’s very clear that a lot of companies realized that this hype of bringing in data scientist teams and building huge data scientist teams is not the solution for every problem. … We saw that everybody was very excited and data science solved every problem including printing money and making the dishes…
The credibility crisis in data science (Skipper Seabold)
Skipper Seabold is one of the main authors of the Python Statmodels package. He is now director of data science at a data science consultancy company called Civis Analytics. He gave a talk at the 2018 PyData conference called What’s the Science in Data Science?, where his abstract starts:
We will:
- Understand “the credibility revolution in economics” and draw parallels to the current and coming credibility crisis in data science
He then did a March 2019 interview for the DataCamp podcast, about “The Credibility Crisis in Data Science”. His main point was that he was starting to hear that CEOs with data science teams were wondering what these teams were for:
One thing that I’ve heard just in working here at Civis is like you say from a CEO of a very large company that every one would know, if I mentioned who they were. I mean it’s basically saying if I have all these data scientists, I have hundreds of data scientists and I have no idea what the fuck they do all day. …
I think decision makers don’t know what data scientists are capable of. They don’t know how to communicate with them what the business needs are. And I don’t think data scientists necessarily know how to describe the value of what they are doing in terms that the business can understand. Or also even focus their efforts on things that will have an impact for the business, like going out and understanding what it is that will actually make a difference and then making sure you’re doing that.
No data science job titles by 2029 (Noah Gift)
See this Febuary 2019 Forbes article.
The coming Trough of Disillusionment with data science job titles will be the following:
- Many data science teams have not delivered results that can be measured in ROI by executives.
- The excitement of AI and ML has temporary led people to ignore the basic question: What does a data scientist actually do?
- For complex data engineering tasks, you need five data engineers for every one data scientist.
- Automation is coming for many tasks data scientists perform, including machine learning. Every major cloud vendor has heavily invested in some type of AutoML initiative.
I doubt that the last point is true, unless we reach the surprising point that it is no longer necessary to understand what the data means or how the analysis works.
Data science is different now (Vicki Boykis)
Vicki Boykis is a senior manager in data science and engineering at CapTech.
She wrote a Febrary 2019 blog post to record her worry that the job title “data scientist” had been oversold, with the result that there are not enough jobs for people who have trained to be a “data scientist”.
She complains about the hype around “data science”:
Unfortunately, what has not changed is the mass media hype around the field of data science, which has trumpeted data scientist as the ‘sexiest career of the 21st century’ so many times, that there is now what I believe to be an important problem that we as a community need to talk about. That problem is an oversupply of junior data scientists hoping to enter the industry, and mismatched expectations on what they can hope to find once they do get that coveted title of “data scientist.”
Boykis goes on to list evidence and argument that the market for people with the job title “data scientist” will shrink, and that there are too many people training to be a “data scientist”. She calls this a “data science supply bubble”.
This is purely anecdotal evidence, so take it with a large grain of salt. But, based on my own participation as a resume screener, mentor to data scientists leaving boot camps, interviewer, interviewee, and from conversations with friends and colleagues in similar positions, I’ve developed an intuition that the number of candidates per any given data science position, particularly at the entry level, has grown from 20 or so per slot, to 100 or more. I was talking to a friend recently who had to go through 500 resumes for a single opening.
Avoiding the data science hype bubble (Josh Poduska)
Josh Poduska is the chief data scientist at Domino Data Lab. He has a master’s in applied statistics from Cornell.
He wrote a blog post on Avoiding a Data Science Hype Bubble in June 2018.
It starts:
The noise around AI, data science, machine learning, and deep learning is reaching a fever pitch.
He complains that the field has made little effort to define terms like “AI”, “machine learning”, or “data science”.
This has consequences. Two consequences include the creation of a hype-bubble that leads to unrealistic expectations and an increasing inability to communicate, especially with non-data science colleagues. …
The frequent overuse of “AI” when referring to any solution that makes any kind of prediction has been a major cause of this hype. Because of frequent overuse, people instinctively associate data science projects with near perfect human-like autonomous solutions. Or, at a minimum, people perceive that data science can easily solve their specific predictive need, without any regard to whether their organizational data will support such a model.
He offers some sober definitions of “AI”, “machine learning” and “data science”, and concludes:
I think we can all agree that there is too much hype in our industry today, especially around AI. Each of us has seen how this hype limits real progress. I argue that a lot of the hype is from misuse of the terms of data science.
What should we do?
It is very surprising to me that a large proportion of people working in or around data science do not have a good definition for the term, and do not appear interested to find one. It feels as if the standard answer to “What is data science?” or “What is AI?” is something close to “It’s whatever you want it to be.” This must be for the same reasons that Josh Poduska discusses; “data science” and “AI” are selling well, for now, so there is little apparent cause to slow down, and consider whether we are heading in the right direction, and at the right speed. I think he’s right, that if we don’t do some hard reflection, very soon, we are going to find ourselves doing a face-plant on gravel as we fall from the bursting bubble.
Graphic by Jeremykemp at English Wikipedia CC BY-SA 3.0.↩︎