Smart Contract Auditing

Undergraduate research I did in Summer 2025

Background

In Summer 2025, I had an undergraduate research internship at my university. I was tasked with analyzing various tools used for "automated smart contract auditing" and identifying recurring themes and flaws in the technology. My results were turned into a poster and presented at my university's summer research symposium. My findings were to be used by one of the graduate students in the lab to develop a better tool that didn't suffer from the same issues many of the currently existing tools did.

What is "Automated Smart Contract Auditing?"

As a high level overview, when people are doing transactions through blockchain technology (i.e trading cryptocurrency), they need some way of ensuring that all parties involved give and receive the money they agreed to give / receive at the start. Since one of the whole points of blockchain technology is that it's decentralized, there is no higher authority that can enforce the transactions and ensure people do what they're supposed to. As such, people rely on "smart contracts." These contracts are essentially pieces of software that handle the exchange of money. The only issue is that you can't exactly test them in advance, and due to how the blockchain works, you can't modify them once they're in use. What's more, since just anybody can write up one of these smart contracts, and most people aren't cybersecurity professionals or software engineers, you can't exactly catch all the errors in all the contracts in advance. As a result, people have lost a LOT of money from bug-ridden smart contracts. With the rise of LLMs in recent years, researchers have begun investigating how these LLMs can be used for automatically detecting errors in these contracts. Based on the tool, these LLMs might just identify issues, explain issues, fix issues, or do some combination of those tasks.

My work was to analyze some of the most commonly used tools in order to identify recurring themes and flaws, ultimately turning my results into a poster for presentation at a symposium. I was also tasked with comparing the LLM-based tools to more conventional non-LLM-based tools (also called "static tools").

The Poster

My poster for my undergraduate research internship.

Key Findings

During my research, I identified 3 recurring weaknesses in the tools I was tasked with investigating.

The first was that the LLM-based tools were bad. Like really, really bad. I don't mean to diss on the hard work done by the researchers and companies who made the tools, but of the ones I was able to get access to, the results were nearly unusable. They would incorrectly detect errors, exhibited little to no robustness against adversarial input (aka me adding an extra space or random sentence into the code of the contract it was reviewing), and would occasionally start speaking in what looked like Thai for a few sentences. The upside was that they could explain the issue they found and generalize across different smart contract versions, unlike the static tools, but ultimately these explanations didn't matter when the issues they idenified didn't exist in the first place.

The second trend I noticed was that the LLM-based tools did not accept as diverse an array of inputs as the static tools did. Static tools could work with the original .sol file, the bytecode version of the file, regular plaintext, or other niche representations of the smart contract. The LLM-based tools accepted almost exclusively the plaintext version. In order to create LLM-based tools that could successfully replace the static tools, I recommended developing new LLM-based tools that were capable of accepting the smart contract in any form, whether it was plaintext, the original .sol file, or bytecode.

My last finding was one that I've come to realize is not exclusive to the smart contract auditing field. As it turns out, the datasets and models produced by many papers are not actually accessible. Many papers did not have any links to find their code. Some provided links to sites like HuggingFace, but the models weren't publicly available. Others would require huge installations totalling a few GB of data you didn't care about just to get access to the model you did care about. Without access to most of the datasets and models, you can't falsify or reproduce most of the results. I'd argue that this was a more pressing issue than the tools just being unusable. The graphs and tables and stellar metrics from the papers are useless if you can't build upon their work, and if you can't build upon all the hard work that's already been done, it'll take much longer for the field to advance the technology they're trying to advance.