First Software Carpentry Workshop at the Charité

Berlin rails and TV tower

On four mornings during the last two weeks of May, 2023 we ran a Software Carpentry Workshop covering the Unix Shell, Plotting and Programming with Python and the Version Control with Git lessons. This was the first in-person workshop since before the pandemic. It was also the first time for me to teach the workshop at the Charité - Universitätsmedizin Berlin.

We planned the workshop for 25 participants. Expecting some no-shows we allowed 30 people to sign up. On the four morning 19 to 22 people actually showed up. The workshop was only open to members of the Charité to simplify account management. Unsurprisingly, most participants came from a medicine, life sciences and health sciences background. Some people had a computer science, mathematics or engineering background. Most participants were graduate students, some were staff or post docs. We also had a few undergraduates. The majority of users were Windows users, some used macOS. We also had some Linux users. Most users had used the shell before, some even use it on a daily basis. Similarly, most users had also used python or R before. This showed during the workshop, some users clearly had a good idea about programming. Most users had not used git before the workshop.

The workshop participants had a very clear idea of what they would like to gain from the workshop: they all see the importance of programming and data science. The participants want to become more confident programmers and know more about best practices. Most wanted to get some git training. Some participants wanted to find out more about our HPC cluster and how to use it in particular.

We scheduled the workshop over four mornings rather than two entire days. In the past we found that it is easier for participants to schedule half-days rather than consecutive full days. It also avoids overloading the participants and gives us a little more time to cover the material. We created HPC accounts for all participants to allow them to use our jupyterhub. Many users made use of the cluster, some used other Linux systems they had access to, and others used a local installation.

The first morning covered the Unix Shell. I do think the story of marine biologist Nelle Nemo is great (although I am never quite sure how to pronounce the name). As always we did not manage to complete the entire workshop even though we had a little extra time. We did cover writing scripts but missed out the last section on finding stuff. I had the impression that the participants struggled most with this session. I think that is due to the fact that the shell session covers a lot of different concepts, many of which will be totally new such as navigating directory trees. The difficulties might also reflect the fact that the UNIX shell has been around for a very long time and has not changed all that much. The old and pretty raw computing technology is challanging. It is also still extremely useful.

Mornings 2 and 3 covered Plotting and Programming with Python. I really like this lesson since at the end of the first morning we got to do some fairly complicated things with pandas and produce nice plots. I think it is really good to demonstrate what you can do with relatively little knowledge of the language. There were quite a few aha moments when even the more experienced python programmers learned some new tricks. During the second python session we went back to basics and covered lists and flow control. We talked a little bit about documenting code using sphinx, best practices and how to become a good programmer. Unfortunately, I think there are no shortcuts. Programming involves a lot of good taste which you get through experience. With more experience you have seen more patterns and approaches to solving them. The trick is then to avoid bad habits and looking at how others have solved similar problems. I think one of the good reason to learn programming with python is that there is usually only one good way of solving a particular task.

The last session covered Version Control with Git. I am always a little apprehensive of this session. It tends to end in total confusion of both the students and the trainer. I am also not particularly fond of the story of Dracula and Wolfman planning to go to Mars. I do very much like the last few sections on Open Science, Licensing and Citation. It went surprisingly well and I had the feeling nobody was totally confused. I think it helped that we started fresh in the morning rather than on the last afternoon of a two day session when everybody is quite tired. We managed to use the BIH gitlab service to collaborate on a test project. Personally, I think it would be better to start with the remote repository since I think most new projects will be created using a web service such as gitlab or github and then cloned.

Participants highly praised the Software Carpentry materials. I agree. The materials are excellent, well thought out and put together to give participants a good idea of how all this works and how it fits together. I really love the new Carpentries Workbench used to build the materials for the web. The pages look very snazzy and navigation works very well. Participants really appreciated that the material is available online and many of them worked through the material after the sessions had finished.

Quite a number of participants asked if we are going to provide a R session as well. Given that we are spreading the workshop over a number of mornings, I think we should be able to also provide two morning sessions of R. Maybe we will make the sessions individually bookable. Perhaps we can use a booking system to make this easier. I hate to say it, but I used a spreadsheet to manage sign-up for this workshop (although with a little help from awk). Another suggestion was to start off with a round of introductions and to use name tags. I think that is a great idea that we will pick up given that one of our aims is to encourage community around the HPC. We also had requests for more advanced topics, such as packaging and publishing python packages. There are some carpentry lessons covering these topics being developed. I was also asked for certificates of attendance by a number of participants. This is not something I had come across before when I was teaching in the UK. We will need to investigate how to usefully provide these certificates.

I would like to thank the helpers and the people who helped me organise the workshop, find the rather lovely seminar room and sort out refreshments for the breaks. I thoroughly enjoyed the course. It was good to teach again in person after the two long years of the pandemic.