Hadley Wickham gave a talk with the title You can’t do data science in a GUI.

He summarizes his argument as:

“What I believe in, is code as the primary artifact from a data analysis”.

Programming languages allow you to express your analysis concisely in words, that can be understood by you, and your readers, and the computer:

[programming languages] give you a language to express your ideas; they give you very few constraints, which makes life tough if you are learning, or if you only do data science occasionally, but the payoff for investing in a programming language, is you get this new language in which you can express your thoughts.

He goes on to emphasize the value of expressing your analysis in text:

Now the other thing that I think is really great about programming languages is that you interact with a programming language with code, and code is just text.

And there are two amazingly powerful workflows that text gives you.

The first workflow is copy and paste. … it is an incredibly powerful strategy to repeat yourself …

And the other great workflow it gets you is StackOverflow. Because code is just text, you can dump your error message, you can stick your error message into Google, and Google will lead you to StackOverflow, which will solve your problem. … Because code is just text this means you can put it in an email, like you can tweet it, you can Google it. …

There is also a great set of tools around the provenance of text.

Provenance tools for text allow your analysis to be:

  • Reproducibile
  • Diffable
  • Readable
  • Open

At the end, someone asks him about visual pipelining tools for data analysis:

I really dislike those kind of like pipelining programs where you like solve a problem by like drawing dragging things together. And I think I dislike them because … what they try and sell you is that the hard part is typing the code.

The hard part is not typing the code, the hard part is figuring out which input should be connected to which output and which components you need …

They don’t make the problem that much easier, because you’ve still got all this flexibility, but because you don’t get code, you lose like all of the benefits of all the like code tools that software engineers have spent the last 50 years developing … they make the easy problem easier, and make the hard problem harder.

Share on: TwitterFacebookEmail



Published

Category

teaching

Atom feed