BSc/MSc Theses

This is a (non-exclusive) list of thesis topics currently offered at PLAI. Additionally, other related topics may be available.

Please contact the mentioned person with a transcript of record and highlight all relevant experience related to the topic you are interested in for a BSc or MSc thesis. Also, please take note of the languages the person speaks.

Open

BSc

Analyzing the Migration from Manifest V2 to V3 in Chrome Extensions: Impacts, Challenges, and Automation

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

Chrome Web Store is phasing out support for Manifest V2 extensions, pushing developers to migrate to the new Manifest V3 format. This transition has sparked debate over whether the motivations are security-driven or intended to limit certain extension capabilities like ad-blocking. This thesis investigates the effects of this migration on extension quality, security, and ecosystem dynamics across Chrome, Firefox, and Edge.

Your Part
- Conduct a comparative study of Manifest V2 and V3 specifications and their real-world implementations.
- Develop or improve tooling to assist in the migration from V2 to V3 and analyze edge cases where migration fails.
- Analyze key differences in V2 and V3 versions of the same extension across stores.
- Measure the loss of open-source extensions and the adoption rate of V3.
- Provide insights into whether the problems/benefits are intrinsic to V3 or a side-effect of the migration process.
- Use datasets of Chrome, Firefox, and Edge extensions to evaluate the impact of migration on security, popularity, policy compliance, and code quality.
Prerequisites
- Solid programming skills, ideally with some experience in JavaScript or TypeScript
- Basic understanding of browser extensions and web security
- Interest in empirical software analysis and dataset-driven research
Related Work

Work-in-Progress: Manifest V3 Unveiled: Navigating the New Era of Browser Extensions
MSc

Malware detection in the CWS using Code Similarity

Matías Federico Gobbi, Lic.

Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

Code similarity measures how closely two pieces of source code resemble each other in structure, syntax, or behavior. It can be analyzed using various techniques, including syntactic (token-based, tree-based), semantic (execution-based, embedding models), and hybrid approaches. Applications include detecting plagiarism, identifying code clones, and clustering similar malware samples.

Your Part Use recently detected malicious browser extensions to find further malware samples. We want a fully automatic approach that could be applied iteratively on this ecosystem.

Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

Related Work Did I Vet You Before? Assessing the Chrome Web Store Vetting Process through Browser Extension Similarity
MSc

Vulnerability Detection in CWS using CodeQL

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

CodeQL is a static analysis tool that models code as a queryable database, allowing users to write custom queries to detect security vulnerabilities and code quality issues. It is widely used for identifying common software weaknesses, such as injection flaws and insecure dependencies, by analyzing source code, dependencies, and data flows. As a vulnerability detector, CodeQL enables security researchers and developers to automate security audits, detect zero-day vulnerabilities, and enforce secure coding practices across large codebases.

Your Part Use CodeQL to detect security vulnerabilities in Chrome Web Store extensions by analyzing their source code and identifying common weaknesses. The findings will be compared to the current state-of-the-art tools, DoubleX and CoCo, to evaluate CodeQL's effectiveness in detecting malicious or insecure behavior. This comparison will assess detection accuracy, coverage, and false positive rates, providing insights into CodeQL's strengths and limitations for browser extension security analysis.

Prerequisites
- Strong JavaScript and HTML knowledge is required for this topic.
- Understanding of browser extensions functionality.
- Knowledge about dataflow analysis.
Related Work
MSc

Truth in Code - Using LLMs to Verify the Consistency of GitHub Commits

Matías Federico Gobbi, Lic.
Motivation Commit messages are essential for understanding the intent behind code changes and are a key part of software development history. They help collaborators and future developers grasp the reasoning behind a change without reading the entire diff. High-quality, accurate messages improve maintainability, support debugging, and enhance code review processes.

Large Language Models (LLMs) can understand both natural language and source code, making them a promising tool for assessing whether commit messages align with the corresponding code changes. By comparing the semantics of the message and the diff, LLMs can flag misleading or suspicious commits. This could support both security (e.g., catching stealthy malicious code) and developer productivity (e.g., catching mismatches or errors in intent).

Your Part Build a dataset of commits with messages and diffs, labeled as matching or not. Prompt LLMs (or fine-tune) to classify these pairs or generate explanations. Evaluation could include accuracy metrics and case studies of real-world suspicious commits from public repositories.

Prerequisites
- Familiarity with Git and GitHub
- Basic knowledge of NLP or deep learning
- Experience with LLM APIs or related frameworks
Related Work
BSc

Information-theoretic/Statistical Analysis of Explainability in Binary Code Embedding Models

Moritz Dannehl, M.Sc.
Application: Please write a short email describing your interests and strengths and attach a transcript of records (incl. bachelor grades). Long walls of text and Chat-GPT generated emails will be ignored without further consideration.

Motivation
- Deep Learning models have become increasingly popular in the area of binary code analysis
- Explainable AI (XAI) methods help us to understand why the DL models work
- State-of-the-art models rely on the co-occurence of assembly instruction tokens
Your Part
- Conduct a explanatory data analysis (EDA) on our binary code dataset
- Analyze relationships and possible correlations of frequency of specific instructions with their corresponding saliencies ("importance")
- An information-theoretic analysis would try to reason from entropy of specific instructions' occurence
Prerequisites
- Excellent academic track record
- Practical experience using machine learning techniques
- Basic knowledge or interest in binary code and/or software security
- Strong background in statistics/statistical methods and/or background in information theory (entropy)
Related Work
- Moritz Dannehl, Samuel Valenzuela, and Johannes Kinder. Which Instructions Matter the Most: A Saliency Analysis of Binary Function Embedding Models. In Proc. IEEE Symp. Security and Privacy Workshops (SPW), Deep Learning Security and Privacy Workshop (DLSP), IEEE, 2025.
MSc

Teaching LLMs to Write CodeQL Queries for JavaScript

Matías Federico Gobbi, Lic.
Motivation CodeQL is a static analysis tool that models code as a queryable database, allowing users to write custom queries to detect security vulnerabilities and code quality issues. It is widely used for identifying common software weaknesses, such as injection flaws and insecure dependencies, by analyzing source code, dependencies, and data flows. As a vulnerability detector, CodeQL enables security researchers and developers to automate security audits, detect zero-day vulnerabilities, and enforce secure coding practices across large codebases.

Recent advances in large language models (LLMs), such as GPT and CodeLlama, have demonstrated strong capabilities in code understanding and generation across a wide range of programming languages and tasks. By using techniques like prompt engineering or fine-tuning with task-specific data, these models can be adapted to perform specialized functions such as static analysis, vulnerability detection, or automated repair. In the context of CodeQL, there is a unique opportunity to train or instruct LLMs to generate high-quality queries by learning from structured documentation, code examples, and natural language descriptions. This approach could enable scalable, AI-assisted development of security checks and significantly lower the barrier for developers to write effective CodeQL queries.

Your Part

The goal of this thesis is to build a pipeline that teaches an LLM to write CodeQL queries targeting JavaScript code. The student will:
- Develop a crawler or parser to extract structured information from the official CodeQL JavaScript documentation and GitHub query examples.
- Align extracted examples (queries + explanations) to form high-quality training or prompting datasets.
- Fine-tune an open-source LLM (e.g., CodeLlama, Mistral) or engineer prompts for few-shot learning using commercial APIs (e.g., OpenAI GPT).
- Evaluate the generated queries against benchmark JavaScript codebases using correctness, coverage, and semantic intent as criteria.
- Optionally, assess the potential of the LLM to discover or replicate known security vulnerabilities.
Prerequisites
- Solid programming skills, ideally with some experience in Python and JavaScript
- Familiarity with static analysis and CodeQL basics
- Experience with LLMs (prompting, finetuning, Hugging Face, or OpenAI APIs)
- Ability to work with web scraping and structured data extraction (e.g., BeautifulSoup, Scrapy)
- Interest in software security and machine learning
Related Work
MSc

Comparison of Explainability Methods for Deep Learning-Based Binary Code Models

Samuel Valenzuela, M.Sc.
Motivation
- Deep learning models have become increasingly popular in the area of binary code analysis
- However, little work has been done to investigate the behavior of those models
- Recently, our research group took a first step in this direction and analyzed state-of-the-art models by masking individual instructions in the binary code and determining which instructions impact the models' output the most
- Many opportunities remain to introduce explainability to this field
Your Part
- Expand the explainability analyses of the state-of-the-art models by researching, adapting, and applying explainability techniques to an existing dataset of binary functions
- Conduct exploratory and systematic analyses to compare the results between the explainability techniques and evaluate the suitability of different methods in this domain
Prerequisites
- Excellent academic track record
- Practical experience using machine learning techniques
- Basic knowledge or interest in binary code and/or software security
Related Work
- Moritz Dannehl, Samuel Valenzuela, and Johannes Kinder. Which Instructions Matter the Most: A Saliency Analysis of Binary Function Embedding Models. In Proc. IEEE Symp. Security and Privacy Workshops (SPW), Deep Learning Security and Privacy Workshop (DLSP), IEEE, 2025.
MSc

Building a Cross-Architecture Testsuite for ARM32 binary patches

Sebastian Jänich, M.Sc.
MSc

Replicate results from research done for CWS - Hulk

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

The paper Hulk: Eliciting Malicious Behavior in Browser Extensions presents a system that dynamically analyzes Chrome extensions by interacting with them in an instrumented environment to detect malicious behaviors, such as ad injection, credential theft, and code obfuscation. By executing extensions in a controlled setting and monitoring their network activity, DOM modifications, and API calls, Hulk identifies patterns of malicious behavior, uncovering previously undetected malware in the Chrome Web Store. The study highlights the widespread presence of malicious extensions and the challenges in securing browser ecosystems.

Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to leverage their system for malware detection and validate if their claims still hold in this ecosystem.

Prerequisites
- Strong JavaScript and HTML knowledge is required for this topic.
- Understanding of browser extensions functionality.
- Knowledge about fuzzing.
Related Work Hulk: Eliciting Malicious Behavior in Browser Extensions
MSc

A Study of Data Collection in the CWS

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

The paper Detection of Inconsistencies in Privacy Practices of Browser Extensions investigates how browser extensions handle user data and whether their behaviors align with their stated privacy policies. The authors develop an analysis framework to compare declared permissions, privacy policies, and actual data access patterns, uncovering discrepancies that indicate potential privacy violations. Their findings reveal that many extensions request excessive permissions or secretly leak user data, emphasizing the need for stricter enforcement of privacy policies in extension stores.

Your Part Using the paper as inspiration, we want to study the collection of data types that were not covered in recent research. To list a few examples, - Personal Identifiable Information - Health Information - Financial & Payment Information - Authentication Information - Personal Communications Your experiments will be done in the current state of the Chrome Web Store.

Prerequisites
- Strong JavaScript and HTML knowledge is required for this topic.
- Understanding of browser extensions functionality.
- Interest in privacy.
Related Work Detection of Inconsistencies in Privacy Practices of Browser Extensions
MSc

Replicate results from research done for CWS - You've Changed

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

The paper You've Changed: Detecting Malicious Browser Extensions analyzes how benign Chrome extensions turn malicious after being updated. The authors propose a system that tracks extension updates and detects suspicious changes in permissions, network behavior, and injected scripts. Their findings show that many extensions become harmful post-installation, highlighting the need for stricter vetting and continuous monitoring in browser extension ecosystems.

Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to leverage their system for malware detection and validate if their claims still hold in this ecosystem.

Prerequisites
- Strong JavaScript and HTML knowledge is required for this topic.
- Understanding of browser extensions functionality.
- Interest in differential analysis.
Related Work You’ve Changed: Detecting Malicious Browser Extensions
BSc/MSc

Code Optimizations For Demand-Driven Vulnerability Scanning Of Bytecode

Tim Lange, M.Sc.
Motivation Vulnerability scanners often work on an intermediate representation (IR) because source code is too ambiguous or not available and bytecode is too complex (200+ instructions; keeping track of the stack). To get the intermediate representation, the tool converts the bytecode to the easy-to-analyze IR and employs classical compiler optimizations during that step. However, an optimized IR is not necessarily the fastest one to analyze and choosing the right order of optimizations is also non trivial ("phase-ordering-problem").

Your Part You will investigate the effects of different optimizations and code structures on the runtime of demand-driven vulnerability scanners. For that, 1) you need to identify (with my help) beneficial code structures for demand-driven analyses and then 2) think about how to optimize the code in such a way that these code structures appear more often. Further, we could also look at the problem at a more fundamental level and investigate whether more fine-granular variable lifetimes (c.f. non-lexical lifetimes in Rust's IR) increase the performance of the analysis.

Prerequisites
- You will have to interact with a large Java codebase and analyze Java applications, therefore being fluent in Java is a requirement
- Some basic knowledge about compilers or static analysis in beneficial
MSc

Replicate results from research done for CWS - Expector

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

The paper Understanding Malvertising Through Ad-Injecting Browser Extensions analyzes ad-injecting browser extensions to understand their ecosystem, revenue models, and impact on users. The authors study a dataset of extensions that modify webpages by injecting ads, revealing how they hijack legitimate traffic, evade detection, and sometimes distribute malware. Their findings highlight the scale of ad injection, its financial incentives, and the security risks posed by such extensions.

Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to leverage their system for malware detection and validate if their claims still hold in this ecosystem.

Prerequisites
- Strong JavaScript and HTML knowledge is required for this topic.
- Understanding of browser extensions functionality.
- Knowledge about malvertising.
Related Work Understanding Malvertising Through Ad-Injecting Browser Extensions
MSc

Detecting clones in the CWS - Analysis of Imitation Attacks

Matías Federico Gobbi, Lic.

Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

The Chrome Web Store contains many extensions that mimic popular ones in name, appearance, and functionality, tricking users into installing them by mistake. These clone extensions often introduce security risks, such as injecting ads, stealing user data, or executing malicious code. The lack of rigorous vetting and automated detection mechanisms allows attackers to exploit user trust and distribute harmful software.

Your Part Analyze extensions in the Chrome Web Store to identify clones of popular extensions. Develop an approach to detect imitation attempts based on code similarity, metadata, and/or behavior. The solution should scale to continuously monitor and flag potential security risks.

Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

Related Work What is in the Chrome Web Store? Investigating Security-Noteworthy Browser Extensions
MSc

Vulnerability Detection in IoT-Firmware binaries

Sebastian Jänich, M.Sc.
BSc

Replicate results from research done for CWS - Hardening the Security Analysis

Matías Federico Gobbi, Lic.

Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

The paper Hardening the Security Analysis of Browser Extensions conducts a systematic study of attack entry points within the browser extension ecosystem, identifying both known and novel vulnerabilities such as password theft, traffic interception, and inter-extension attacks. By analyzing the interactions between extensions, browsers, and web applications, the authors propose a comprehensive approach to enhance security analysis, combining static and dynamic methods to detect insecure extensions and, in some cases, synthesize attack payloads. Their evaluation, which involved downloading and examining 133,365 extensions from the Chrome Web Store, underscores the necessity for a more robust threat model to effectively mitigate these security risks.

Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to validate if their claims still hold in this ecosystem.

Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

Related Work Hardening the Security Analysis of Browser Extensions
BSc/MSc

Attacking and Defending Binary Code Embedding Models

Moritz Dannehl, M.Sc.
Application: Please write a short email describing your interests and strengths and attach a transcript of records (incl. bachelor grades). Long walls of text and Chat-GPT generated emails will be ignored without further consideration.

Motivation
- Deep learning models are susceptible for adversarial attacks, i.e. perturbating the input to the model in order to change the output. This is an increasing problem, especially in security related domains, e.g. facial recognition
- Deep learning models have become increasingly popular in the area of binary code analysis
- Those models can be used for malware analysis, thus attackers may want to evade detection
Your Part
- Learn about and define the threat model
- Conduct literature research about current state of the art
- Generate adversarial examples, analyze the model's weaknesses, and/or explore possible hardening/defense techniques
Prerequisites
- Excellent academic track record
- Practical experience using machine learning techniques
- Strong mathematical background
- Basic knowledge or interest in binary code and/or software security
Related Work
- Capozzi, Gianluca, et al. "Adversarial attacks against binary similarity systems." IEEE Access (2024).
- Aryal, Kshitiz, et al. "A survey on adversarial attacks for malware analysis." IEEE Access (2024).
BSc

Replicate results from research done for CWS - No Signal Left to Chance

Matías Federico Gobbi, Lic.

Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

The paper No Signal Left to Chance: Driving Browser Extension Analysis by Download Patterns investigates the use of download patterns as a signal for analyzing browser extensions. By leveraging machine learning to cluster extensions based on their download behaviors, the study identifies groups of extensions with similar patterns, some of which are associated with malicious activity. The authors demonstrate that analyzing these patterns can effectively detect malicious extensions, enhancing security measures in browser ecosystems.

Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to validate if their claims still hold in this ecosystem.

Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

Related Work No Signal Left to Chance: Driving Browser Extension Analysis by Download Patterns
MSc

A Study on Search Abuse in the CWS

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

In the article How extensions trick CWS search from Almost Secure, the author examines how certain browser extensions manipulate the Chrome Web Store's search functionality to gain higher visibility. Developers exploit the multilingual support of CWS by inserting unrelated keywords, including competitors' names, into the descriptions of less commonly used languages. This tactic causes their extensions to appear in search results for terms they wouldn't typically be associated with, leading to misleading and spammy search outcomes.

Your Part Investigate this kind of abuse in the CWS by leveraging NLP techniques. Your experiments will be done in the current state of the Chrome Web Store.

Prerequisites
- Strong JavaScript and HTML knowledge is required for this topic.
- Understanding of browser extensions functionality.
- Interest in NLP.
Related Work Almost Secure
MSc

Behavior Profiling of Browser Extensions

Matías Federico Gobbi, Lic.
Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

This topic focuses on analyzing updates in Chrome Web Store extensions by applying CodeQL queries to detect behavioral changes. The goal is to automatically identify modifications in network communication, API usage, and data access patterns, highlighting potential security and privacy risks. By systematically comparing extension versions, the analysis aims to provide structured insights into how their behavior evolves over time.

Your Part
- Identify Key Behaviors to Monitor – Create a dataset that highlights the behaviors of interest for tracking updates, such as network requests, data access, and permissions.
- Define CodeQL Queries – Develop specific CodeQL queries tailored to detect the behaviors outlined in the dataset.
- Generate Reports from Results – Process the query results to generate a detailed report on the identified changes in extension behavior.
- Analyze Behavioral Changes Over Time – Investigate how the behavior of extensions evolves through multiple updates, identifying patterns or emerging trends.
Prerequisites
- Strong JavaScript and HTML knowledge is required for this topic.
- Understanding of browser extensions functionality.
- Knowledge about dataflow analysis.
Related Work Differential Static Analysis for Detecting Malicious Updates to Open Source Packages
BSc/MSc

Automatic Modularization of Software Projects for LLM-based Transpilation

Oliver Braunsdorf, M.Sc.
Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

Motivation
- With the help of LLMs more tools are developed for transpiling software from old programming languages to new languages, e.g. C to Rust transpilation
- However, most of the LLM-based research prototypes only operate on Code Snippets, because their context size is too small
Your Part
- Given an arbitrary context size limitation of an LLM ($n$ number of characters), implement an algorithm to split up the code base into small chunks of transpilable code snippets, e.g. by calculating a call graph and identifying strongly connected components between the functions in the program.
- Challenge: How to split up the code base arbitrarly small chunks but ensure that each chunk is still compilable.
Prerequisites
- Basic principles of Software Engineering
- Experience with the C programming language
- Lecture "Program Analysis for Security" or "Compiler-Design"
Project

CI-Pipeline for a binary patching tool

Sebastian Jänich, M.Sc.

We are looking for a student to help develop a continuous integration (CI) pipeline for a binary patching tool. The project involves automating the build, test, and deployment processes to ensure reliability and efficiency.

Prerequisites

Strong Python skills, Comfortable with Bash scripting on Linux
BSc

Finding Safe Rust Replacements for C Libraries

Oliver Braunsdorf, M.Sc.
Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

Motivation
- Rust is a modern programming language which shows strong performance with improved security. However re-writing existing code bases is too expensive.
- To make widespread reuse existing Rust libraries, we want to build an index of Rust libraries from https://crates.io/ can act as replacements for widespread C libraries, e.g. the image library as replacement for libpng
- This way, all C applications that depend on libpng might be made a little bit safer by swapping out libpng for Rusts "image" library
Your Part
- Use a combination of LLMs and deterministic methods to implement a search for potential Rust replacements for a given C library.
- By iterating this search over multiple popular C libraries, build an index of potential Rust Replacements
- If possible, not only search for functionally equivalent Rust libraries but also assess other properties that would simplify the developers' work in the replacing process, e.g. potential API compatibility, build system compatibility, risk of including unstable 3rd party dependencies (see: https://tweedegolf.nl/en/blog/119/sudo-rs-depencencies-when-less-is-better)
- As a case study, take an example C application and one of its libraries with an adequate Rust replacement and evaluate the results for equivalence and performance.
MSc

Lifting Cooperative Taint Analysis

Tim Lange, M.Sc.
Motivation Taint analysis is a fundamental dataflow problem that can be used to identify security vulnerabilities such as SQL injections, XSS, insecure deserialization and many others. To be useful and not overwhelm the user with too many false positives, the state-of-the-art uses precise heap abstractions such as access graphs or access paths. You can think of an access path as a variable in the code and a chain of field references with a maximum length of k. These heap abstractions are additionally enriched with type information and other metadata.
One algorithm to solve these dataflow problems is IFDS. Especially the ability to solve two dependent problems in cooperation, i.e. resolving aliases on-demand for the taint analysis, allows IFDS-based taint analysis to discover non-trivial dataflows while still being reasonable precise. However, IFDS has a worst-time complexity of O(|E| |D|^3) with D being the domain and the access path domain can grow quite large, making it impossible to analyze complex applications in a reasonable timeframe.
IDE is a generalization of IFDS, originally developed to solve map domains such as constant propagation (Var -> Value), and allows to split up a powerset domain into a dataflow domain and a value domain. The time complexity of IDE is the same as for IFDS.
A recent paper (Oct 2024) shows that it is possible to split up the access path heap model into a dataflow domain of local variables and a value domain of field (de-)references, leading to speedups of average 200x(!) compared to the equivalent IFDS formulation. Besides them basing their work on an artefact from 2016 and not publishing their code, they also consider alias analysis orthogonal to the taint analysis problem and use a previously published alias analysis still operating on an expensive access graph domain to resolve the aliases.

Your Part First, we want to reproduce the work based on an up-to-date version of Soot and FlowDroid, because we are highly interested in open-sourcing the more scalable analysis. Second, we have the idea of a cooperative alias analysis in IDE. IDE's phase 1 is basically IFDS, so we do think it is possible to exchange dataflow facts in phase 1. Also, FlowDroid's call site matching during the path building stage has already proven that you can solve a context-free language across different analysis directions. Thus, the research question is whether it is possible to extend the CFL grammar aka the value domain to also solve aliases on-demand and asynchronously with multiple IDE solvers as it is done with IFDS. To the best of our knowledge, there isn't yet any paper that uses multiple IDE solvers in cooperation, so this is actually something completly new to work on!

Prerequisites
- You will be implementing the solution on top of a large Java codebase, so proficiency in Java is benefical.
- Furthermore, this topic requires you to design a complex static analysis that is supposed to analyze real-world Java applications, which requires good knowledge about the Java language semantics.
- Knowledge about static analysis and context-free languages obtained from courses like Principles of Compiler Design or Program Analysis for Security is also beneficial.
Related Work
- Boosting the Performance of Alias-Aware IFDS Analysis with CFL-Based Environment Transformers
- IFDS-based taint analysis: FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps, GitHub
- Independent Alias Analysis used by the paper: Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java, GitHub

In Progress

Project

Concerning Call Targets in Transformer-based Architectures for Binary Function Modeling

Moritz Dannehl, M.Sc.
MSc

Searching for vulnerabilities in Java/Android binaries with CodeQL

Tim Lange, M.Sc.
BSc/Project

Disk-Assisted Vulnerability Scanning

Tim Lange, M.Sc.
BSc

Mining Taint Specifications

Tim Lange, M.Sc.
BSc

Rückverfolgung von binären Funktionen in Programmen zu ihrem ursprünglichen Quellcode

Moritz Dannehl, M.Sc.
BSc

Evaluierung von Frameworks zur Kompilierung von Paketen mit Debug/Source-Information in unterschiedlichen Konfigurationen

Moritz Dannehl, M.Sc.
BSc/MSc

Deferring Flow-Sensitivity of Alias-Aware IFDS Problems to the Path Reconstruction Phase

Tim Lange, M.Sc.
BSc

Security ShowCase: Designing, Attacking, and Defending an RFID-Based Smart Lock

Sebastian Jänich, M.Sc.
BSc

Optimal Placement of Memory Safety Checks in Control-Flow-Graphs

Oliver Braunsdorf, M.Sc.
MSc

Generalizing Code Representations for Binary Code Similarity Detection Using Function Name

Yunru Wang, MSc.
BSc

Revisiting IFDS Optimizations

Tim Lange, M.Sc.
MSc

Comprehensive Framework for Analyzing and Detecting Malicious Browser Extensions

Matías Federico Gobbi, Lic.
MSc

Symbolic Execution of Sanitizer Code in Web Applications

Prof. Dr. Johannes Kinder
BSc

Grammar-based Fuzzing using ML

Prof. Dr. Johannes Kinder
BSc

Evaluating and Optimizing the Inference Stage for BCSD

Yunru Wang, MSc.
BSc

Driller

Sebastian Jänich, M.Sc.
MSc

Monitoring of the Node Package Manager

Matías Federico Gobbi, Lic.
MSc

Classical Machine Learning for Binary analysis

Moritz Dannehl, M.Sc.
BSc

Re-Evaluating BinShot

Moritz Dannehl, M.Sc.
MSc

LLM Automated black-box adversarial prompting for large language models

Prof. Dr. Johannes Kinder
MSc

Anomaly Detection for Log Messages from Multiple Sources

Prof. Dr. Johannes Kinder
MSc

Automated black-box adversarial prompting for large language models for Code

Prof. Dr. Johannes Kinder
BSc

Re-Evaluating Fuzzware for Linux-based firmware

Sebastian Jänich, M.Sc.
BSc

Control-Flow Graph Recovery: Angr vs. Ghidra

Sebastian Jänich, M.Sc.
BSc

Replicate results from research done for CWS - DoubleX

Matías Federico Gobbi, Lic.
BSc

Automatic Generation of Exercises for ProMo

Matías Federico Gobbi, Lic.
BSc

Replicate results from research done for CWS - Arcanum

Matías Federico Gobbi, Lic.
MSc

Studying cases of Affiliate Fraud in the CWS - Honey

Matías Federico Gobbi, Lic.
BSc

Replicate results from research done for CWS - CoCo

Matías Federico Gobbi, Lic.
BSc

Automated Testing and Error Clustering for Haskell Exercises in ProMo - Static Analysis

Matías Federico Gobbi, Lic.
MSc

ML-based Transpilation from C to Rust

Oliver Braunsdorf, M.Sc. and Moritz Dannehl, M.Sc.
BSc

Automated Testing and Error Clustering for Haskell Exercises in ProMo - Dynamic Analysis

Matías Federico Gobbi, Lic.
BSc

Building a Memory-Safety Benchmark Suite for Rust

Oliver Braunsdorf, M.Sc.
BSc

Detecting Search Query Hijacking in New Tab Browser Extensions

Matías Federico Gobbi, Lic.

Last update 14.05.2025

Links and Functions

Breadcrumb Navigation

Main Navigation

Content

BSc/MSc Theses

Open

In Progress

Writing Tips

Archive

Footer