Chair of Programming Languages and AI

Breadcrumb Navigation


Talk by Mihai Christodorescu (Google)

Challenges and Opportunities in LLM-based Program Analysis


On Friday, 28 June 2024, 11:00 - 12:00, Mihai Christodorescu of Google will give a talk on "Challenges and Opportunities in LLM-based Program Analysis" in Room 123, Oettingenstr. 67, as part of the ConVeY seminar series.


We posit that Large Language Models (LLMs) present a middle ground between security decisions based on empirical, reactive information and security decisions based on logic-based analyses. For example, understanding the security semantics of programs requires modeling the semantics of APIs exposed by the runtime system. Unfortunately, such APIs are not designed to be "semantically orthogonal" and often overlap by offering different performance points for the same functionality. This leaves it to the security mechanism to discover and account for API proxies, i.e., groups of APIs which together approximate the functionality of some other API. Lacking a complete view of the structure of the API-proxy relationship, current security mechanisms address it in an ad-hoc and reactive manner, by updating the implementation of policies whenever new API proxies are discovered and abused by attackers. LLMs can help here by discovering API proxies for commonly used APIs. Furthermore, LLMs may be able to produce program information similar to logic-based analyses, especially given the LLMs' remarkable performance on code comprehension tasks (including code completion and editing). By exploring to what degree auto-regressive models understand the semantics of programs, we discover that current models lack understanding of concepts such as data flow and control flow. Our Counterfactual Analysis for Programming Concept Predicates (CACP) is a testing framework to evaluate whether Large Code Models understand programming concepts and to help us point specific limitations in LLMs' understanding of program semantics.


Dr. Mihai Christodorescu is a Research Scientist at Google, where he focuses on security and privacy of mobile software. His research interests are in fundamental approaches to computer security and privacy problems by combining methods from multiple domains, from programming languages, to machine learning, behavioral modeling, and formal methods. Most recently, he focused on translating progress in user authentication to software service authentication and on designing cryptographic techniques to allow users to disclose their personal data in flexible ways. He received his Ph.D. in Computer Sciences from the University of Wisconsin–Madison in 2007. Dr. Christodorescu holds 25 patents and has published more than 35 papers in several international conferences and journals, including the IEEE Symposium on Security and Privacy (S&P), the ACM Conference on Computer and Communications Security (CCS), the USENIX Security Symposium, the Annual Computer Security Applications Conference (ACSAC), and many more.