I am a research scientist at Goodfire working on new techniques for auditing LLMs, with a special focus on addressing evaluation awareness.
I am a fourth-year PhD student at NYU (on leave), where I studied scaling laws and phase transitions of neural networks and diffusion models with Arthur Jacot and Eric Vanden-Eijnden. I received my B.S. in Mathematics from Stanford University.
Discovering Undesired Rare Behaviors via Model Diff Amplification
S. Aranguri, T. McGrath
Used by Anthropic to evaluate Claude Sonnet 4.5, see
system card (page 95)
Verbalized Eval Awareness Inflates Measured Safety
S. Aranguri, J. Bloom
Predicting Rare LLM Failures with 30x Fewer Rollouts
F. Pernice,
S. Aranguri
Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training
F. Xiao,
S. Aranguri
SAE on activation differences
S. Aranguri, Jacob Drori, Neel Nanda
Inference-Time Toxicity Mitigation in Protein Language Models via Logit-Diff Amplification
M. Burda,
S. Aranguri, I. Arcuschin, E. Ferrante
ICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design
Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models
S. Aranguri, F. Insulla
ICLR 2025 Deep Generative Model in Machine Learning Workshop and Frontiers in Probabilistic Inference Workshop
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Z. Tu,
S. Aranguri, A. Jacot
NeurIPS 2024
Optimizing Noise Schedules of Generative Models in High Dimensions
S. Aranguri, G. Biroli, M. Mezard, E. Vanden-Eijnden
Untangling planar graphs and curves by staying positive
S. Aranguri, H. Chang, D. Fridman
ACM-SIAM Symposium on Discrete Algorithms 2022