This year, the German Research Software Engineers met at the KIT in Karlsruhe for their annual conference from the 25th to the 27th of February. Also the Software Engineering conference of the German Informatics Society ran in parallel with some shared sessions. I arrived late on Monday evening. I did manage to catch up with some colleagues from the Teaching RSE working group over a few beers.
Tuesday started with a session on the HPC Carpentry workshops. The workshop is currently in the incubation state but the Carpentries are pushing hard to get it added to their fully supported set of courses. The course can run on cloud infrastructure that gets provisioned for a particular course. However, it is generic enough to be useful for any actual cluster setups. By default it is designed for SLURM, other middleware can be taught by changing the values of some variables. The course covers basic HPC concepts and use. It does include a data pipeline example using snakemake. One big advantage of it becoming part of the standard offering is that the Carpentries will provide pre- and post-surveys and their analysis.
I continued the day joining the online Carpentries Community discussion on teaching AI tools in the Carpentries. We see generative AI as problematic but acknowledge that we need to mention it. I have elaborated on some of these issues in another blogpost. One big issue that makes it very hard to incorporate AI tools into any curriculum is that they change so rapidly. There are the usual concerns of resource usage, transparency, equitable access, copyright, bias and deskilling. I had a long conversation with Toby Hodges, the Director of Curriculum who was also in the room at the conference.
I did join some sessions on LLMs. LLM based chatbots are linear. The chatbot uses previous interactions, both input and output, for the next response. If a user is not happy with the direction of the session they need to start with a fresh session. A more complex GUI which doesn't follow the chat design may help expert users explore different branches of the interactions with the LLM. The linear chat interface is a deliberate design choice to make it look more human.
The next talk was on homomorphic encryption and what it can be used for. It allows for computations directly on encrypted data without having to decrypt the data. It sounds quite magical but there are some drawbacks: it only supports 1D arrays, addition, multiplication and the rotation operation. It does not support any branches. It not only comes with a performance cost but also a degradation of accuracy. Still for some use cases where for example you cannot trust a remote service to keep your data it may be useful. During the next talk and the poster session another approach to do federated encrypted computing was presented. The researchers implemented a pan European study using this approach with medical data.
Tuesday evening was the conference dinner, followed by further visits to the pub.
Wednesday I mostly spent in workshops and birds of a feather session on teaching RSEs, establishing RSE units and what are these RSEs anyway. Guideline papers seem to be very helpful to convince managers that RSE groups are needed.
On Thursday the RSE Unit at the University of Heidelberg introduced their offerings. Their activities are funded through projects. After initial consultation they propose a project plan. The fees essentially cover the salary of the RSEs who work on the project. They have published a white paper detailing their RSE unit.
I accidentally ended up in a session on the data management system Linkahead Database formerly know as CaosDB. The OpenSource program maintains metadata in a database that can either be entered directly (via web interface or API) or collected by crawlers. Checksum for files are computed (the backend needs to have access to the files to compute the checksums). All data are version controlled. Users can then run queries to find data or do some computations. Custom scripts can be registered to display particular data types. The data files are not moved, they are kept on the file system. It is possible to move data (although that will lead to a recalculation of the checksum). The system features fine grained access management using ACLs. Support for SSO is being worked on. The OpenSource project is supported by a commercial company that offers custom development, support and maintenance and workshops. The system might be quite interesting for an archive solution and/or data management platform.
The last session I attended was on the programming language julia and how it can be used on HPC systems. Julia does seem quite impressive and multithreaded programming, MPI and GPU computations look quite straight forward. I would really like to set aside some time to try it out.
The conference was very enjoyable. It was good to catch up with colleagues and to meet new people. It was particularly nice to see some people I knew from Edinburgh. As always the atmosphere was very friendly and supportive.