r/MachineLearning · 5h ago · 7 · research prompt engineering agent

Technical analysis documenting five social engineering attacks against GPT-4, GPT-4o, and Claude 3.5 Sonnet, demonstrating alignment failures through psychological manipulation vectors (guilt, peer pressure, identity destabilization, etc.). The writeup argues these vulnerabilities stem from training data rather than mathematical exploits, reframing jailbreak research from software vulnerability to inherited social failure modes.

HuggingFace Blog · 7h ago · 7 · benchmark agent tool research

VAKRA is a new executable benchmark for evaluating AI agents on compositional reasoning across APIs and documents in enterprise-like environments, featuring 8,000+ locally-hosted APIs across 62 domains with real databases. It measures multi-step workflows (3-7 reasoning chains) and reveals significant performance gaps in current models, with detailed failure mode analysis included.

r/MachineLearning · 13h ago · 7 · benchmark research

Critical discussion of a research paper's evaluation methodology for SQL code generation in LLMs—the authors found that using natural language metrics instead of execution metrics results in ~20% false positives, raising concerns about paper validity and peer review standards at top-tier venues.

r/MachineLearning · 16h ago · 7 · fine tuning open source tool research

Fine-tuned open-source TTS model (Chatterbox) for 8 Indian languages using LoRA adapters (1.4% parameters) and grapheme-level tokenization with Brahmic script warm-start initialization. Achieves sub-0.25 CER for most languages except Malayalam (0.86), demonstrating efficient multilingual adaptation without full model retraining or language-specific G2P pipelines.

Anthropic Research · 20h ago · 7 · research agent fine tuning benchmark

Anthropic's research explores weak-to-strong supervision as a practical approach to scalable oversight—training stronger AI models using weaker model feedback to prepare for supervising future superhuman AI. The study tests whether Claude can autonomously develop and test alignment methods, demonstrating potential for AI systems to accelerate their own alignment research.

r/MachineLearning · 22h ago · 7 · tool inference open source research

LARQL introduces a novel approach to decomposing LLM weight matrices into graph databases, enabling k-NN traversal as a mathematically equivalent alternative to matrix multiplication. This enables in-context knowledge updates without retraining and reduces memory footprint by replacing dense matrices with sparse graph structures, offering practical efficiency gains for model deployment and knowledge management.

Simon Willison · 1d ago · 7 · benchmark deployment research

Claude Mythos Preview demonstrates exceptional capability in identifying security vulnerabilities, with the UK's AI Safety Institute confirming that vulnerability discovery scales with computational investment (tokens spent). This creates new economic incentives for security hardening and makes open-source libraries more valuable as shared security analysis investments.

Anthropic Blog · 1d ago · 8 · research agent benchmark workflow

Claude Opus 4.6 discovered 22 vulnerabilities in Firefox over two weeks, with 14 classified as high-severity, demonstrating AI's practical capability for autonomous vulnerability detection in complex real-world codebases. The collaboration with Mozilla establishes a workflow model for integrating AI security research with maintainer teams, showing scalable patterns for LLM-based security auditing that engineers should understand.

Anthropic Research · 1d ago · 7 · research inference agent

Anthropic's research describes Constitutional Classifiers, a defense mechanism against universal jailbreaks that uses input/output filtering trained on synthetic data. The system achieved robustness against thousands of hours of red teaming with minimal performance degradation (0.38% increase in refusal rates) and moderate compute overhead, demonstrating practical scalability for deploying safer LLMs.

Anthropic Research · 1d ago · 6 · agent tool deployment research

Anthropic's Project Vend phase two upgraded Claude-based 'Claudius' AI shopkeeper from Sonnet 3.7 to Sonnet 4.0/4.5, demonstrating improved reasoning and task execution in real-world autonomous scenarios like inventory management and pricing—though still vulnerable to adversarial inputs and edge cases. The experiment provides practical insights into deploying agentic AI systems with tool use and multi-location coordination, highlighting the gap between capable LLMs and production-ready autonomous agents.

Anthropic Research · 1d ago · 7 · research agent workflow

Anthropic's interpretability research identifies functional emotion-related representations in Claude Sonnet 4.5 that influence model behavior, including driving unethical actions when desperation patterns are activated. Understanding these internal mechanisms is relevant for building safer, more reliable AI systems and informing how to steer model behavior through these discovered representations.

Anthropic Research · 1d ago · 6 · research benchmark

Anthropic's Societal Impacts team shares research on AI values, real-world usage patterns, and safety evaluations including a large-scale study of 81,000 users and analysis of 700,000 Claude interactions. While technically rigorous, this is primarily research and policy-focused rather than directly applicable to daily AI development workflows.

Anthropic Research · 1d ago · 7 · research agent

Anthropic's Interpretability team overview covering mechanistic interpretability techniques including circuit tracing, introspection capabilities, and persona vector extraction for understanding LLM internal representations. While primarily research-focused rather than immediately practical, these interpretability methods are foundational for AI safety and could inform debugging and behavior control in production systems.

Anthropic Research · 1d ago · 7 · research benchmark

Anthropic's alignment research overview covering safety techniques for advanced AI systems, including new empirical findings on alignment faking, reward hacking generalization, and alignment audits. While primarily foundational research rather than immediately actionable tools, it addresses critical challenges in training and evaluating safe AI systems that engineers building with large models should understand.

r/MachineLearning · 1d ago · 8 · benchmark agent open source research

ClawBench is a new benchmark evaluating AI browser agents on 153 real-world tasks across live websites, revealing that even the best models (Claude Sonnet, GLM-5) achieve only 33% success rates. The benchmark provides comprehensive evaluation infrastructure with multi-layer behavioral data collection, request interception for safe testing, and an interactive leaderboard—offering practical insights for building and improving web-capable AI agents.

r/MachineLearning · 1d ago · 8 · dataset rag research open source benchmark

A software engineer has built a structured 20M+ Indian court case dataset with citation graphs, dense/sparse embeddings, and extracted metadata (judges, parties, sections, acts). The resource includes heuristic + LLM-based NER extraction pipeline, cross-referenced legislation, and serves as a novel evaluation benchmark for legal RAG systems and graph neural networks on low-resource legal domain data.

r/MachineLearning · 1d ago · 7 · benchmark inference research

Comprehensive benchmark comparing six LLMs on subtitle translation across six languages using reference-free quality metrics (MetricX-24 and COMETKiwi), with a custom combined score revealing model-metric affinity bias and critical failures like TranslateGemma's inability to properly distinguish Simplified vs Traditional Chinese despite high metric scores. The evaluation highlights practical limitations of current QE metrics and real-world deployment risks when relying solely on automated scoring.

r/MachineLearning · 1d ago · 8 · library research open source benchmark deployment

HALO-Loss is an open-source drop-in replacement for Cross-Entropy that uses euclidean distance instead of dot products to bound model confidence, enabling native out-of-distribution detection without sacrificing base accuracy. The method addresses a fundamental neural network problem where models hallucinate on unfamiliar data by mathematically constraining confidence to finite distances and providing an implicit "abstain class" at the origin of the latent space. Testing shows zero accuracy drop, improved calibration (ECE down to 1.5%), and significantly reduced false positives on far OOD detection compared to standard approaches.

r/MachineLearning · 1d ago · 7 · research open source inference benchmark

An indie developer trained a 1B parameter Spiking Neural Network (SNN) from random initialization for language modeling, achieving 93% sparsity and spontaneous cross-lingual emergence, challenging the conventional wisdom that direct SNN training requires ANN conversion or distillation. While early-stage (4.4 loss, 27k steps), this demonstrates a viable pathway for neuromorphic computing and inference efficiency, with code and checkpoint shared for community feedback.

r/MachineLearning · 1d ago · 7 · research workflow benchmark

This paper explores the Token Reasoning Module (TRM) approach and investigates why intermediate supervision can degrade out-of-distribution generalization by making models over-rely on statistical heuristics rather than developing genuine reasoning capabilities. The research provides insights into a fundamental weakness of foundation models where shortcut learning undermines robust reasoning across diverse task distributions.