Ensuring AI-generated code quality success

Ensuring AI-generated code quality success

By Rafi Katanasho (pictured), APAC Chief Technology Officer at Dynatrace

 

Coding by humans is already not 100% accurate. This is evident from the number of vulnerabilities that evade detection across different stages of the software development lifecycle and manifest during runtime. The prevalence of errors in production is also why runtime code inspection and application protection tooling exists.

But the software development space is undergoing a fundamental change. Code is no longer something created only by people; instead, it is becoming increasingly common to see individuals and teams explore the potential for Generative AI-enabled copilots to speed up coding tasks and get to the endgame – digital services delivery – much faster.

An early concern is what impact will this have on code quality. Will Artificial Intelligence (AI) augmentation lead to greater or fewer code-level vulnerabilities being present at runtime?

There are already some clues to how this might play out. In short, the quality might get worse before it gets better.

A recent survey by Stack Overflow saw over one-third of developers report that code assistants “provide inaccurate information half of the time or more.” The same survey found between 23% and 29% of developers don’t trust “the output or answers” provided by the AI; the percentage varies depending on how widely the tools are used internally.

It’s important to note these results aren’t curbing enthusiasm or adoption of AI coding assistants, nor impacting developer satisfaction with using the tools. Coding teams understand the current software development process is not perfect either, and developers sense they are overall better off with an AI assistant than without it.

Instead, the nature of the challenge simply emphasises an immediately elevated role for runtime-based security controls in the majority of software engineering environments.

This provides important protections so that AI-generated or AI-augmented development efforts can proceed, knowing that any vulnerabilities they introduce – and that escape early detection – can still be caught and remediated.

Without such controls in place, Australian organisations that go down the AI path could be putting the reliability and security of their software at risk.

How AI is impacting code quality

When it comes to AI-generated code, the way that copilots work is the main factor that could lead to problems with code quality, and to reliability and security issues in the customer-facing digital experiences and applications that development teams are contributing to.

AI models are trained on a corpus of publicly-accessible code, from – where licensing is permissible – open source repositories, developer assistance communities such as Stack Overflow, and other sources across the web. The majority of this code today is created and maintained by humans. Over time, however, one might expect the web-based sources of code that the model ‘learns’ from to become more of a mix of human-created, AI-augmented and AI-generated.

It’s unlikely that every example of AI-generated code that is produced will be vulnerability free. This is because unlike a human, AI is unable to determine whether the code snippets it cobbles together from various web-based sources, or takes its inspiration from, represent best practice or can be seamlessly joined together without creating additional bugs.

Time-poor human developers may not recognise these shortfalls in the AI-generated code they ask copilots to find or produce for them, or they may simply lack the bandwidth to comb every line of AI-generated output. After all, a key reason that they’re experimenting with AI copilots in the first place is to be able to move faster and reduce time-to-value.

And so, in the short-term, the quality of the output of AI models is likely to decrease. As AI models ‘learn’ from other poorly-composed AI-generated code that’s published online, they are likely to perpetuate mistakes made by these other AI models. That could lead to the introduction of even more vulnerabilities into runtime environments, reinforcing the need for tooling at this stage of the software development process.

To be clear, this isn’t theoretical. A study by Cornell University researchers found a decline in the linguistic capabilities of a language model consistently trained on the output of its predecessors. In addition, there is already concrete evidence that considerably more code on GitHub repos now has to be updated less than a fortnight after it’s written. That study by GitClear projects that the percentage of impacted lines of code will “double in 2024 compared to its 2021, pre-AI baseline.”

Navigating a path forward

Organisations can address concerns about AI-generated code quality with a multi-pronged approach.

In organisations where AI is actively generating code, they should firstly aim to automate inspection of libraries and first-party code to detect code-level vulnerabilities that may be introduced to applications.

Then, organisations can enhance this posture and capability by leveraging code-level insights and transaction analysis from observability data to automatically detect attacks on applications, even if they try to exploit unknown weaknesses.

On top of rich attack information for further analysis of an attack, such a capability also gives developers detailed insights on the underlying vulnerability, like source code location, for fast remediation.

Those that get this right will find themselves at the leading edge of incorporating AI augmentation into their software engineering operations, and able to deliver faster without impacting quality.