I'm a Researcher on the AI Red Team at Microsoft, where I study safety and security vulnerabilities in generative AI systems. In doing so, I hope to both mitigate AI risks in the short term and build more robust systems in the long term. I became interested in safety and alignment while working on my master's thesis, which focused on incorporating physics-based constraints into neural networks.
In my free time, I enjoy running, biking, and being outside. I love learning and have a (neglected) page of math notes to explain important topics in my own words. You can find my resume here and should feel free to reach out!
A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks
B Bullwinkel, M Russinovich, A Salem, S Zanella-Beguelin, D Jones, G Severi, E Kim, K Hines, A Minnich, Y Zunger, R Shankar Siva Kumar
ICML Workshop on Data in Generative Models (DIG-BUGS), 2025
Arxiv
Steering Language Model Refusal with Sparse Autoencoders
K O'Brien, D Majercak, X Fernandes, R Edgar, B Bullwinkel, J Chen, H Nori, D Carignan, E Horvitz, F Poursabzi-Sangdeh
ICML Workshop on Actionable Interpretability (AIW), 2025
Arxiv
Poster
A Systemization of Security Vulnerabilities in Computer Use Agents
D Jones, G Severi, M Pouliot, G Lopez, J de Gruyter, S Zanella-Beguelin, J Song, B Bullwinkel, P Cortez, A Minnich
ICML Workshop on Computer Use Agents, 2025
Arxiv
Lessons From Red Teaming 100 Generative AI Products
B Bullwinkel et al.
Microsoft BlueHat, 2024
NeurIPS Workshop on Red Teaming GenAI, 2024
Blog
Arxiv
eBook
Talk
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle
B Bullwinkel et al.
Arxiv, 2024
Arxiv
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI Systems
B Bullwinkel et al.
Conference on Applied Machine Learning in Information Security (CAMLIS), 2024
Arxiv
Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations
Z Ma, S Su, N Zhao, L Bieske, B Bullwinkel, Y Zhang, S Yang, Z Luo, S Li, G Liao, B Wang, J Gao, Z Wen, C Bruderlein, W Pan
ICML Workshop on the Next Generation of AI Safety (NextGenAISafety), 2024
Arxiv
Poster
Transfer Learning with Physics-Informed Neural Networks for Efficient Simulation of Branched Flows
R Pellegrin, B Bullwinkel, M Mattheakis, P Protopapas
NeurIPS Workshop on Machine Learning and the Physical Sciences, 2022
Arxiv
Poster
DEQGAN: Learning the Loss Function for PINNs with Generative Adversarial Networks
B Bullwinkel, D Randle, P Protopapas, D Sondak
ICML Workshop on AI for Science (AI4Science), 2022
Arxiv
Poster
Evaluating the Fairness Impact of Differentially Private Synthetic Data
B Bullwinkel, K Grabarz, L Ke, S Gong, C Tanner, J Allen
ICML Workshop on Theory and Practice of Differential Privacy (TPDP), 2022
Arxiv
Poster
Azure/PyRIT
Active contributor to PyRIT, an open source framework that empowers security professionals and machine learning engineers to proactively find risks in their generative AI systems.
Repo
DEQGAN
Co-created a Python package that implements DEQGAN, an unsupervised generative adversarial network method for solving ordinary and partial differential equations.
Repo
Marble Groceries
Developed an iOS app (defunct) that helps users understand the environmental impact of their grocery purchases by scanning product barcodes.
Article
Classifying the Sounds of NYC
Trained and tuned a variety of models to classify audio clips recorded around New York City from the UrbanSound8k dataset into ten different classes.
Report
Notebook
Modeling ASA Section Membership
Constructed binary response generalized linear models to predict whether or not members of the American Statistical Association belonged to at least one section.
Report
Code
Woof Woof! Computer Vision & NLP App for Austin Pets Alive
Built a web app that allows users to "chat" with and search for visually similar dogs in the Austin Pets Alive animal shelter using NLP and computer vision models.
Code
Wildfire Risk Prediction & Response Optimization
Trained tree-based classifiers on weather data to predict wildfire risk in California counties and used mixed-integer programming to determine the optimal assignment of firefighters across the state.
Report
Code
DreamDiff Python Package
Worked in a team of three to develop a Python package that implements forward-mode automatic differentiation (AD), root-finding, optimization, and quadratic spline interpolation.
Repo
PyPI
Analysis of Wildfires, Air Quality, and Public Health
Conducted time series analysis in R to link spikes in PM2.5 concentration to specific wildfire events in California and used major axis regression to explore correlations between air quality and public health outcomes.
Slides
Code
Modeling Electricity Consumption in the US
Built linear regression models to predict household electricity consumption in the US from various residential characteristics.
Report
Code
Early Epidemiological Model Parameters for COVID-19
Modeled early-stage COVID-19 case data in mainland China using systems of ordinary differential equations.
Slides
Code