r/MachineLearning · 5h ago · 7 · research prompt engineering agent

Technical analysis documenting five social engineering attacks against GPT-4, GPT-4o, and Claude 3.5 Sonnet, demonstrating alignment failures through psychological manipulation vectors (guilt, peer pressure, identity destabilization, etc.). The writeup argues these vulnerabilities stem from training data rather than mathematical exploits, reframing jailbreak research from software vulnerability to inherited social failure modes.

HuggingFace Blog · 7h ago · 7 · benchmark agent tool research

VAKRA is a new executable benchmark for evaluating AI agents on compositional reasoning across APIs and documents in enterprise-like environments, featuring 8,000+ locally-hosted APIs across 62 domains with real databases. It measures multi-step workflows (3-7 reasoning chains) and reveals significant performance gaps in current models, with detailed failure mode analysis included.

OpenAI Blog · 9h ago · 8 · tool agent api update deployment

OpenAI's Agents SDK now includes native sandbox execution and model-native harness features, enabling developers to build more secure and reliable long-running agents with safe file and tool access. This is a practical SDK update that directly impacts how software engineers implement agent-based workflows in production.

HuggingFace Blog · 10h ago · 7 · agent tool deployment

Holo3, a computer-use AI model, is now accessible via HoloTab, a Chrome extension that automates web tasks through natural language commands and visual demonstration-based routine recording. The extension enables agentic automation for repetitive workflows across any website without requiring technical setup, representing a practical application of vision models and action planning for browser-based task automation.

Latent Space · 19h ago · 7 · agent tool workflow deployment

Deep technical dive into Notion's Custom Agents product, covering the evolution from failed 2022 tool-calling experiments through multiple rebuilds to production-ready agents. Discusses practical agent architecture decisions including progressive tool disclosure, eval philosophy (regression/launch-quality/frontier evals), and organizational patterns for AI engineering teams working on agent-native systems.

Anthropic Research · 20h ago · 7 · research agent fine tuning benchmark

Anthropic's research explores weak-to-strong supervision as a practical approach to scalable oversight—training stronger AI models using weaker model feedback to prepare for supervising future superhuman AI. The study tests whether Claude can autonomously develop and test alignment methods, demonstrating potential for AI systems to accelerate their own alignment research.

Anthropic Blog · 1d ago · 8 · research agent benchmark workflow

Claude Opus 4.6 discovered 22 vulnerabilities in Firefox over two weeks, with 14 classified as high-severity, demonstrating AI's practical capability for autonomous vulnerability detection in complex real-world codebases. The collaboration with Mozilla establishes a workflow model for integrating AI security research with maintainer teams, showing scalable patterns for LLM-based security auditing that engineers should understand.

Anthropic Blog · 1d ago · 10 · new model api update agent inference benchmark

Claude Opus 4.6 releases with major improvements for AI engineers: 1M token context window in beta, enhanced agentic task capabilities, state-of-the-art coding performance on Terminal-Bench 2.0, and new developer features including adaptive thinking, context compaction, and effort controls for managing cost/intelligence tradeoffs. Available immediately on API at same pricing ($5/$25 per million tokens) with new product integrations like Claude Code agent teams and PowerPoint support.

Anthropic Research · 1d ago · 7 · agent workflow prompt engineering

Anthropic outlines their framework for building trustworthy AI agents, explaining the architectural components (model, tools, memory, oversight) and governance principles to mitigate risks like prompt injection and unintended task execution. The post covers practical agent implementation patterns and policy considerations relevant to engineers building with autonomous AI systems.

Anthropic Research · 1d ago · 7 · research inference agent

Anthropic's research describes Constitutional Classifiers, a defense mechanism against universal jailbreaks that uses input/output filtering trained on synthetic data. The system achieved robustness against thousands of hours of red teaming with minimal performance degradation (0.38% increase in refusal rates) and moderate compute overhead, demonstrating practical scalability for deploying safer LLMs.

Anthropic Research · 1d ago · 6 · agent tool deployment research

Anthropic's Project Vend phase two upgraded Claude-based 'Claudius' AI shopkeeper from Sonnet 3.7 to Sonnet 4.0/4.5, demonstrating improved reasoning and task execution in real-world autonomous scenarios like inventory management and pricing—though still vulnerable to adversarial inputs and edge cases. The experiment provides practical insights into deploying agentic AI systems with tool use and multi-location coordination, highlighting the gap between capable LLMs and production-ready autonomous agents.

Anthropic Research · 1d ago · 7 · research agent workflow

Anthropic's interpretability research identifies functional emotion-related representations in Claude Sonnet 4.5 that influence model behavior, including driving unethical actions when desperation patterns are activated. Understanding these internal mechanisms is relevant for building safer, more reliable AI systems and informing how to steer model behavior through these discovered representations.

Anthropic Research · 1d ago · 7 · research agent

Anthropic's Interpretability team overview covering mechanistic interpretability techniques including circuit tracing, introspection capabilities, and persona vector extraction for understanding LLM internal representations. While primarily research-focused rather than immediately practical, these interpretability methods are foundational for AI safety and could inform debugging and behavior control in production systems.

r/MachineLearning · 1d ago · 8 · benchmark agent open source research

ClawBench is a new benchmark evaluating AI browser agents on 153 real-world tasks across live websites, revealing that even the best models (Claude Sonnet, GLM-5) achieve only 33% success rates. The benchmark provides comprehensive evaluation infrastructure with multi-layer behavioral data collection, request interception for safe testing, and an interactive leaderboard—offering practical insights for building and improving web-capable AI agents.

DeepMind Blog · 2d ago · 9 · new model api update agent benchmark

Google released Gemini Robotics-ER 1.6, a specialized embodied reasoning model for robotic systems with enhanced spatial understanding, multi-view reasoning, and new instrument-reading capabilities like gauge interpretation. The model is now available via the Gemini API with improvements in pointing, counting, task planning, and success detection—critical for physical agent autonomy.

TLDR AI · 3d ago · 5 · tool agent

Cursor announced support for multiple frontier AI models (OpenAI, Anthropic, Gemini, xAI) and parallel agent execution capabilities. While the multi-model support and agentic workflows are technically interesting, this is primarily promotional content lacking technical depth or implementation details.

r/LocalLLaMA · 3d ago · 9 · new model open source agent deployment benchmark

MiniMax-M2.7 is a new open-source model with strong programming and agent capabilities, featuring self-evolving optimization during training and native multi-agent collaboration support. The model demonstrates exceptional performance on code tasks (SWE-Pro 56.22%, Terminal Bench 57.0%), system-level reasoning for SRE work, and achieves competitive benchmarks against GPT-5.3 and Claude variants while supporting deployment via SGLang, vLLM, and Transformers.

Latent Space · 4d ago · 7 · new model agent workflow inference

GLM-5.1 reaches top-tier coding performance (#3 on Code Arena), while the 'cheap executor + expensive advisor' pattern emerges as a standard orchestration approach for reducing inference costs. Key implementations include Anthropic's API-level advisor tools, Berkeley's research, and new features in Qwen Code (v0.14.x) with agent engineering primitives like model routing and sub-agent selection.

Simon Willison · 6d ago · 8 · new model api update agent tool benchmark

Meta released Muse Spark, a new hosted AI model with Instant and Thinking modes, accessible via meta.ai with a private API preview. The model includes integrated tools for web search, image generation, code execution, and Meta content search, making it relevant for understanding multi-tool agent systems and comparing reasoning capabilities against current SOTA models like GPT-5.4 and Gemini 3.1.

HuggingFace Blog · 7d ago · 7 · agent workflow open source tool

ALTK-Evolve is a long-term episodic memory system for AI agents that distills interaction traces into reusable guidelines rather than storing raw transcripts, enabling agents to generalize principles across tasks. The framework shows significant improvements on multi-step API tasks (AppWorld benchmark) and integrates as a Claude Code plugin or with existing tools like Arize Phoenix and Codex without major stack changes.