The software carpentry movement has the aim of teaching scientists how to use
It is important to teach this so we can produce work of high quality. It is
more likely that our work will be transparent enough for someone else to use
it, and reproduce it.
Once we agree that it is useful to teach this, we still have to decide what we
are going to do about the following problem:
Difficult, tiring and confusing
Maintaining scientific software on your computer is a difficult, tiring and
confusing task. It is a task that needs a combination of commitment and
skill in solving problems where the solution may be hard to find and
understand, and there is a great deal of conflicting advice.
Two different approaches to teaching
Approach 1: try to keep students away from the problem
Here, you design your course setup and course materials so that you avoid
confusion or ambiguity in installing or maintaining the software.
For example, you might make all the students use Jupyter notebooks hosted on a
central server, so they don't use their own computers at all, except as a
client via their web-browser.
You might use Wakari to do the same kind of thing.
Less comprehensively, you might insist or suggest that everyone use the same
Python distribution and packages, which will usually be Anaconda, so at least
everyone has roughly the same setup on their computer. These packages are
usually easy to very easy to install and use, but don't protect you from later
confusion and pain for not-default packages.
The advantage of this approach is that you get to spend your time teaching the
stuff you are interested in, instead of struggling with the complexity of
individual user installs.
The disadvantage is that, as soon as the student leaves the class and starts
on the road to maturity, they will hit the problems that you have shielded
them from. No matter what you told them in the class, they will conclude that
this work is much harder than you claimed, or that they are deficient and
should give up.
Approach 2: drop students in it and help them out again
The other approach is to treat the problem of installation and maintenance as
one that the students will have to learn to face. They need to live in this
world of confusion and ambiguity, and learn the skills to survive and
There are two obvious difficulties with this approach.
The first is that you will have to spend class time struggling with
installation and maintenance problems, that might well be - difficult, tiring
and confusing. The time that you give to this, you must take from other
The second is that is hard to teach students these skills. It is even hard to
explain what those skills are.
What's the best way?
When we decide what to do, we have to agree on a goal. For example, your goal
might be to teach "data science" . The students should
leave the class with a better understanding of issues in data science. If
that is your entire goal, then you may not worry that the students will soon
stop using the tools that you have taught them. No-one will blame you for
On the other hand, your goal might be more broad. You might want to make the
students better at data science in the long term - long after the students
have left your class. In that case, you will worry about students being
unable to continue using the tools after the end of the class.
I am personally more interested in the broader goal. That is, I would like to
teach in way that makes it most likely that students will become mature users
of the tools they learn in class. I believe that forces me towards approach
\2. That is, I really do need to teach the students how to deal with the
difficult, tiring and confusing problem of maintaining their software.
How should we deal with the issues of lost class time, and teaching the
unteachable skill of learning within confusion? I believe the only way to do
this is to teach by example. That is, we have to give the students something
similar to standard installation advice - the kind of advice that we would
give our own graduate students getting started. Then we have to sit down with
the students who run into problems, and suffer with them for a while. We show
them how we try and solve the problem, we do our diagnostics, we check
StackOverflow, we look at the command help and man pages, and we work it out,
with them. Yes, this is very hard to do with large classes or MOOCs. With
smaller classes, or with a reasonable number of teaching assistants, I believe
this is practical.
The advantage of doing this, is that we show our students what real scientific
computing looks like. It isn't streamlined, smooth or easy, it is hard
confusing and complex. We give our students our best if we teach them to be
comfortable and optimistic on this, our current frontier.
Don't agree? - we should test
I've asserted that approach 2 will cause more students to develop into mature
users of scientific computing.
Maybe that isn't true. Maybe, if we use approach 1 to the fullest extent,
this will get students so excited about the possibilities of the tools, that
this will impel them to overcome all later obstacles.
I think that won't happen, but I'm a scientist, I could be wrong, and that's
an empirical question.
What we should do is have a randomized controlled trial. Allocate half the
students to approach 1, and half to approach 2. Design an assessment of
their computing maturity and assess the students at 1 and 2 years after the
course. I predict that students taught with approach 2 will be doing better
on average. You might predict the opposite. Let's get data.