Monday, 23 September 2019

Research Software Engineering Conference, Birmingham 2019

binder starting up

Day 1

The conference started with a keynote by Andy Stanford-Clark, IBM UK Chief Technology Officer, on IoT, AI and Quantum Computing. The talk was very much fun. Although I must say I am still not convinced by IoT and I am somewhat worried by AI. The issue of needing to train the neural networks properly to reduce bias was brought up but all in all the talk was unsurprisingly optimistic. The third part on quantum computing was very interesting. IBM does provide access to their quantum computers online.

After the coffee break I joined the session on reproducible software. Anna Krystalli from the University of Sheffield gave an excellent introduction to rrtools which is an R package that helps scientists package up a paper and all associated data and R scripts as a package so that the paper can be regenerated. R seems to come with a lot of very handy tools that make this sort of workflow easy by providing templates and autogenerating a lot of the required infrastructure. Rmarkdown and bookdown featured to manage generating high-quality print outputs. The make for R system, drake, was also mentioned. It would be very nice to get similar tools for python.

The other talk in that session on BioSimSpace given by Lester Hedges from the University of Bristol was also very interesting. BioSimSpace provides an abstract interface to various computational chemistry packages which is quite neat. The thing that really impressed me was that it also provides a jupyter notebook GUI elements that handles inputs for a script with drop-downs and file uploads, etc. The really nifty thing is that these notebooks can be downloaded as python scripts. These inputs then are handled by the argparse module and can be run from the command line. I think this is a really cool approach to bridging the notebook/command line gap and could be used for all sorts of applications.

After lunch I attended the Revitalising Legacy Languages session. The first talk by Chris MacMackin, UKAEA, on object oriented fortran was excellent. I wish I could do some Fortran programming again as it is really good fun. New to me was the second talk on web assembly by Drew Silcock from STFC. Web assembly can take your favourite language such as,eg python + numpy + matplotlib (pyodide) and compile it to a bytecode that can be executed within your browser. It can also do 3D visualisation in the browser on the client. This looks very cool and a good way of avoiding having to write javascript. This might be a good way of providing interactive web applications.

For the third talk I nipped to the useful tools and libraries session to see Declan Valters' (now BGS) talk on geopandas. Declan presented a nice python notebook demonstrating the features of geopandas.

The remainder of the afternoon was about the RSE society, lightning talks introducing the posters and a panel session on sharing RSE work across boundaries. It is clear that the RSE movement is very collaborative and a large aspect is about training.

Day 2

Most of the morning session I spent in Citation and Software Discovery session. Alexander Konovalov from St Andrews developed templates ( to help register software in PURE. PURE can import data from ORCID. The other talk (by Stephan Druskat, DLR) I saw in this session was very theoretical and involved constructing graphs of the relationships between authors, software revisions, dependencies and institutions. Quite complicated and I am not entirely sure how useful that is, apart from dependencies should be cited properly. In between the two talks I went to see a talk on the limitations of machine learning by Camilla Longden from Microsoft Research. This was a recurring theme. Bias in the training sets were discussed as was the difficulty of interpreting the results. The tank story cropped up in a number of presentations in two versions: One version has it that the American military was training an AI to distinguish between American and Russian tanks. It turned out that the AI identified more (Russian) or less (American) grainy pictures. The other variation also involved the American military. This time they wanted to find camouflaged tanks in a wood. They used a training set with and without tanks in a wood. The system worked well for the training data. When tried with another data set it failed. The AI had successfully figured out that it was nice and sunny when the tanks were present but overcast when they were absent. AI is a bit of a buzz technology - in many instances linear regression or decision trees are sufficient and easier to understand.

Next followed the keynote given by Ben Goldacre. The presentation was excellent, enthusiastic, entertaining and shocking. Mostly on pharmaceutical tests and the fact that more often than not only the successful trials are reported. He also discussed sampling error and abuses of visualisation.

After lunch I attended the demonstration of nbfancy by Jack Betteridge and James Grant, both from the University of Bath. nbfancy can be used to annotate jupyter notebooks to produce teaching materials similar to the software carpentry style. Another tool to automatically mark submissions - submitty - was demonstrated by Anastasis Georgoulas and David Perez-Suarez, both from UCL. submitty is a rather nifty web application that allows students to upload programming tasks. These get automatically tested using predefined tests. The system allows for anonymous submissions, multiple markers, extra manual marks and penalising late submissions. One drawback with automatic marking is that it requires slightly different assignments that can be marked automatically.

After the break I attended a demonstration of autograd and automatic differentiation tool used by pytorch. Douglas Finch from the School of GeoSciences, Edinburgh presented his work on scraping DEFRA air quality data and displaying it as interactive graphs using django and plotly. Finally, Mike Simpson from the University of Newcastle presented his work on visualising uncertainty. They use blender and its python API to automatically generate hight quality 3D visualisations using a 3D model of Newcastle and sensor data. Data is presneted as glyphs (green, amber, red) and uncertainty as a sinusoidal border of the glyph - higher frequency sinusoid indicates more uncertainty. I was slightly irritated - the glyphs looked a bit like flowers. The blender visualisation (on youtube) was very nice though.

Maggie Aderin-Pocock gave the after-dinner presentation on having crazy dreams and being a space scientist. The presentation was very entertaining. I was particularly amused by the Clangers being the gateway drug to harder stuff - Star Trek.

Day 3

The last day of the conference was dedicated to workshops. I attended the Binder workshop in the morning. During the workshop we created a binder cluster on the Microsoft Azure cloud using kubernetes. I was interested both in how kubernetes and Auzre works and what binder looks like. Binder is a way of packaging jupyter notebooks in a docker container and running it in the cloud. The notebook, its dependencies and any datafiles are described in a file that is stored in github. Binder will build the image and deploy it on the cluster. The user gets a URL that can be shared. When someone connects to the URL a new container instance is started so that every user gets their own. I presume their is a way of limiting resource usage. Binder looks quite useful for people who want to share live notebooks with others. It might be possible to extend the EDINA noteable jupyterhub service to include a binder service. There is also a public mybinder service that you can use for small notebooks.

In the afternoon I attended the modern C++ workshop introducing the latest features of the standard C++ library. There is some really cool stuff that is worth looking into once the features become supported by the compilers (it'll take years for g++ in scientific linux to catch up). For anyone interested in C++ the website cppreference .com was highly recommended.

Tuesday, 5 March 2019

AI Crystal Ball


I went to very interesting talk by Howard Covington from the Alan Turing Institute on Glimpsing our AI Future at the Bayes Centre, University of Edinburgh. The talk was very interesting albeit short on technical detail and maybe somewhat worrying. This is a summary of what stuck in my mind together with some of my own thoughts.

The talk started with an overview of the evolution of computer technology and the software advances made possible by those changes. The web allowed the development of search engines. Internet shopping followed, then messaging and social networks. The next big disruptive technology will by self-driving cars, powered by ever more sophisticated AIs. Covington reckons that over the next decade or so renewables will become lots cheaper than fossil fuels displacing them.

The 2030ies will see agriculture revolutionised: Cultured meat grown in factories and vertical farms will totally change how we produce our food. In particular leafy vegetables can be grown very efficiently indoors using electricity from renewables powering LEDs optimised to emit light at the preferred wavelength of the plants.

The 2040ies will bring fully automated and autonomous factories. These factories need no human workers and can be reconfigured to produce different things using software. AIs will also revolutionise medicine, ranging from expert systems used for diagnosis to surgical robots. Quantum computing will break current encryption systems including blockchain.

These developments go hand in hand with pervasive sensor networks to measure all sort of things. These sensors are used to optimise energy and food systems and feed the AIs with information. They will also lead to deep surveillance. China is already using a social credit system which scores individuals depending on their behaviour. Certain privileges (such as high-speed travel) are only available to people with a high enough score.

Covington pointed out that the truly big big data companies are mostly based in the US and China, the only two European big data companies are SAP and ing. Covington reckons that the attitude towards risk and investment in Europe makes it difficult to create new multi-billion companies. China is investing huge amounts of money in infrastructure in Eurasia through their Belt and Road Initiative. Our future looks distinctly Chinese.

I am very worried about the way software systems are put together especially as they increase in complexity. Programming Sucks is an excellent article on software engineering (or the lack of it). Another aspect of our complex society is that we more and more depend on the frictionless operation of our society. There is potential for things to go spectacularly wrong. It gets very terrifying when we add vulnerability to cyber attacks on these critical infrastructures. These attacks will be made even bigger due to the monoculture of systems. For example, the social credit system built by the Chinese will be a desirable target as it would allow the subversion of an entire continent. Distributed systems using blockchain technology will help until quantum computing comes along.

One aspect was not touched at all: all these technologies reduce the need for human labour. What will all the people do that are no longer required to work? It is clear that it will hit the factory floor workers and delivery and taxi drivers. But it will also hit many white collar workers whose jobs can be automated. As a society we need to spend more effort on social jobs, such a looking after our kids and teaching them and looking after the vulnerable and old.

Howard Covington's talk finished with the prediction of collapse in 2050 due to climate change. A member of the audience asked why collapse when the technologies should allow us to reduce emissions of climate gases. Covington's answer was vested interests will get in the way.

In any case it is clear that technology evolution will accelerate over the next decades and we will face huge changes.