Santiago Aranguri

I am a research scientist at Goodfire working on new techniques for auditing LLMs, with a special focus on addressing evaluation awareness.

I am a fourth-year PhD student at NYU (on leave), where I studied scaling laws and phase transitions of neural networks and diffusion models with Arthur Jacot and Eric Vanden-Eijnden. I received my B.S. in Mathematics from Stanford University.

News: I'm giving an invited lightning talk at the Mechanistic Interpretability Workshop at ICML 2026 in Seoul (July 10). Reach out if you'd like to chat!

Discovering Undesired Rare Behaviors via Model Diff Amplification
S. Aranguri, T. McGrath
Used by Anthropic to evaluate Claude Sonnet 4.5, see system card (page 95)

Verbalized Eval Awareness Inflates Measured Safety
S. Aranguri, J. Bloom
Cited in Claude Fable 5 & Claude Mythos 5 system card (page 192)

Logits as a New Monitor for Evaluation Awareness
S. Aranguri

Reproducing Steering Against Evaluation Awareness in a Large Open-Weight Model
T. Read, B. Schoen, S. Aranguri, J. Bloom

Predicting Rare LLM Failures with 30x Fewer Rollouts
F. Pernice, S. Aranguri

Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training
F. Xiao, S. Aranguri

SAE on Activation Differences
S. Aranguri, Jacob Drori, Neel Nanda

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal
L. Bergen, ..., S. Aranguri, ..., E. S. Lubana

Inference-Time Toxicity Mitigation in Protein Language Models via Logit-Diff Amplification
M. Burda, S. Aranguri, I. Arcuschin, E. Ferrante
ICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design

Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models
S. Aranguri, F. Insulla
ICLR 2025 Deep Generative Model in Machine Learning Workshop and Frontiers in Probabilistic Inference Workshop

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Z. Tu, S. Aranguri, A. Jacot
NeurIPS 2024

Optimizing Noise Schedules of Generative Models in High Dimensions
S. Aranguri, G. Biroli, M. Mezard, E. Vanden-Eijnden

Untangling planar graphs and curves by staying positive
S. Aranguri, H. Chang, D. Fridman
ACM-SIAM Symposium on Discrete Algorithms 2022

Santiago Aranguri

Publications

Contact