News Nug

Protecting people from harmful manipulation

DeepMind Blog · 21d ago · 6 · research benchmark safety

Research release on empirically validated toolkit for measuring AI manipulation capabilities, tested across 10,000+ participants in finance and health domains. Provides open-source methodology and materials for evaluating how AI systems can be misused to deceptively influence human behavior and beliefs in high-stakes scenarios.

Improving instruction hierarchy in frontier LLMs

OpenAI Research · 36d ago · 7 · research fine tuning safety prompt engineering

IH-Challenge is a training framework that teaches models to respect instruction hierarchy and distinguish between trusted vs. untrusted inputs, improving robustness against prompt injection attacks and enhancing safety steerability. This is practically useful for engineers building production AI systems that need stronger defenses against adversarial inputs.