AI and Programming

poster advertising an debate on AI

I am running introductory courses to programming and digital literacy for students, both as one-off Software Carpentry workshops and, soon, part of an international master course. During the last few sessions questions of using generative AI for programming have come up. The Carpentries have started to consult the community on using LLMs for teaching and so far produced some blog posts (Part 1, Part 2 and The Ethics of Teaching LLMs). Meanwhile, I have also had a number of conversations with colleagues who are professional software developers on their thoughts on LLMs and programming. Personally, I have many reservations on the use of generative AI but we have to engage with the topic as it is not going to go away. So here are my thoughts on using AI for software development and in particular for teaching software development to scientists. The ideas and thoughts laid out here form the basis of what I will tell the students in my courses.

My concerns about generative AI for software development fall into a number of categories: fundamental philosophical concerns, ethical concerns and practical concerns.

Fundamental Philosophical Concerns

Artificial Intelligence is currently a very much hyped technology. It does not help that it is a very wide field and encompasses many technologies and approaches. Here, I am talking specifically about Large Language Models (LLMs). These models are trained by feeding them large amounts of texts including source code. The resulting model consists of weights connecting the nodes in a deep neural network. The LLM does not constitute a model of the world which obeys the rules of physics. It is a statistical construct computing a likely sequence of words given an input sequence of words. There is a good chance that an LLM will come up with a reasonable answer for questions covering common subjects that are well represented in the training material. Questions on more unusual concepts will likely produce less reliable answers.

Natural language is very imprecise. Words can have multiple meanings. The meaning of sentences depend on context. The meaning of a statement can change depending on who the recipient is and how it is said. Written communication (especially short messages) is notoriously fraught with misunderstandings because it is one-dimensional and missing tone and expression. This ambiguity also makes literature interesting. Sometimes, it is necessary to be very precise when we communicate. For those instances we use jargon where words and phrases have exact meaning to avoid ambiguity. Each area of expertise uses its own jargon that needs to be learned by practitioners to be able to communicate. This is not to frustrate outsiders but it allows precise and concise communication. Programming languages are a special form of jargon. They precisely instruct a computer exactly what to do. If a computer could interpret a computer program in multiple ways there would be something wrong with the programming language. If the program does not do what the programmer intended then the programmer made a mistake formulating the program.

The desire to instruct a computer what to do using natural language comes from Science Fiction stories such as Star Trek or Star Wars. A computer would need a human socialisation to deal with ambiguities in our language (and even we often fail). Otherwise we would need to limit context and/or use jargon to be precise.

Another issue is that LLMs are based on language. So the LLM can only come up with things that can be expressed in that language. Human languages do not all describe the same things in the same amount of detail, for example the different types of snow are not useful to people living in the tropics. Furthermore, there a things which we struggle to adequately express in language. Human intelligence is not completely expressible in language.

Finally, can an LLM produce something truly new? It is a statistical model. It can recombine existing things which in some instance might well be quite useful. LLMs are based on texts created by humans. Expecting to find a deeper truth which would be necessary to produce something new is a kind of modern, high-tech numerology.

Ethical Concerns

LLMs are trained using vast amounts of data that have been scraped from public sites on the internet. In most cases the original authors of the works were not asked whether they agree to their works being used to train LLMs. The appropriated collective cultural output is used to enrich private entities. In the case of using LLMs for programming one of the sources for training material is stackoverflow where thousands of volunteers offered their expertise to help other people. Training LLMs also requires a lot of human work, for example labelling training data or checking results. This work is usually done under bad working conditions for poor pay.

Should LLMs really work for programming tasks then there is a risk of deskilling the workers. Less expert workers can do the same work experts can do today or the same experts are expected to do more work. The increased profits from the increased productivity go to big corporations who have produced the LLMs.

In the context of scientific work, there is also the question of plagiarism. At what stage does an LLM generated code stop being your own code? Another question is, who is responsible for any problems arising from malfunctioning code? Is it the programmer who uses the tool or is it the maker of the tool?

Training LLMs uses vast amounts of resources: the energy to run the data centres, the minerals to built the hardware and the water and energy for cooling them. All limited resources that we need to use wisely. Presumably, at some stage models have been trained. However, resource usage coming from inference goes up as the services become more popular. An LLM query uses about 10 times as much energy as a web search. I doubt that there are great energy production revolutions around the corner. I could imagine that LLMs can become more energy efficient, especially if they are trained for specific purposes. The current idea is to satisfy the huge energy demand of data centres using new nuclear power stations. These come with the usual problems of requiring a finite resource, lots of water for cooling and producing highly dangerous waste we don't know what to do with. Molten salt reactors have been talked about for decades without any currently running.

Practical Concerns

A software developer does not just type in code. A large amount of time is spent analysing the problem to be solved. The program code not only solves the problem at hand, it is also a concise description of the problem itself. Using an LLM to write code transforms the hard problem of programming into the even harder problem of debugging. In any case the programmer needs to understand the code. This is true whether the code is programmed from scratch, copied from some other source such as stackoverflow or produced by an LLM. LLMs often produce overly complex, difficult to understand code. A developer under time pressure might well be tempted to just accept the LLM solution without spending the time to analyse it.

For small problems, LLMs produce impressive solutions. However, these could have equally well been found using a simple web search. For more complex problems more context is required. This means giving the LLM access to the entire code base which might raise worries about leaking information.

Undoubtedly, access to AI tools will cost money. This will further disadvantage underprivileged people.

A large aspect of software engineering is managing complexity. AI programming tools potentially allow us to produce more complex systems. When we stop understanding the system as a whole, we completely rely on the AI which has strategic implications.

On the plus side

AI programming tools can be used to explain pieces of software code. This is particularly useful to developers moving into a new (to them) field. AI tools can also be used for code review. Obviously, it would be better if the code was reviewed by actual people, especially colleagues, as code review encourages knowledge transfer within a department.

Another use for AI tools is to produce scaffolding for new functions together documentation and tests that need to be fleshed out by the developer.

Conclusion

AI tools for programming are here and available. People will want to use them. Like any tool we need to figure out how to use them effectively. When I started programming I had to do many things by hand from scratch. For example, I implemented an adaptive Runge Kutta 4th order algorithm in Fortran. Today, I just use a call to a library. This is much quicker to get something working. Is it optimal? In many cases I do not care. I still need to know the limitations of the algorithm. Or alternatively, I have good test cases that convince me that my code is doing what I expect. I should have both. Also, low-level complexity has been hidden away in frameworks. For example using SQLAlchemy in python, you do not need to worry about relational databases and writing SQL code to query them. The framework generates all the code for you for many different databases. Sometimes, however, the code it produces is noticeably sub-optimal and you need to write the SQL by hand to get the required performance. The trick is to know when you need to do something different.

Users of AI tools need to have an idea how they work. They need to think about the ethical and legal implications. They also need to consider resource usage. Organisations need to provide guidelines of what AI tools are permissible to use and in which circumstances.

A developer I talked to told me that she is worried about getting too lazy, i.e. relying too much on AI tools. We need to be aware of this risk and sometimes confront our laziness.

The need for expert programmers will not go away. A learner should be exposed to some of the basics before turning to advanced tools so they can assess the performance of the advanced tool. They still need to think about and analyse the problem. In the end a human needs to decide whether what the software does is fit for purpose and solves the problem at hand adequately.

Programming is difficult. It requires us to carefully analyse a problem and then express this analysis in code. The former difficulty will not go away and might even get bigger if AI tools can really help us with the latter difficulty.

Further Reading