News Nug

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

DeepMind Blog · 3h ago · 8 · new model api update inference

Gemini 3.1 Flash TTS, Google's latest text-to-speech model, introduces granular audio tags for precise vocal control across 70+ languages with improved naturalness (Elo score 1,211 on benchmarks). Developers can now embed natural language commands directly in text to control style, pacing, and delivery, with all audio watermarked using SynthID, available in Google AI Studio, Vertex AI, and Google Vids.

Jailbreaks as social engineering: 5 case studies suggest LLMs inherit human psychological vulnerabilities from training data [D]

r/MachineLearning · 5h ago · 7 · research prompt engineering agent

Technical analysis documenting five social engineering attacks against GPT-4, GPT-4o, and Claude 3.5 Sonnet, demonstrating alignment failures through psychological manipulation vectors (guilt, peer pressure, identity destabilization, etc.). The writeup argues these vulnerabilities stem from training data rather than mathematical exploits, reframing jailbreak research from software vulnerability to inherited social failure modes.

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

HuggingFace Blog · 7h ago · 7 · benchmark agent tool research

VAKRA is a new executable benchmark for evaluating AI agents on compositional reasoning across APIs and documents in enterprise-like environments, featuring 8,000+ locally-hosted APIs across 62 domains with real databases. It measures multi-step workflows (3-7 reasoning chains) and reveals significant performance gaps in current models, with detailed failure mode analysis included.

The next evolution of the Agents SDK

OpenAI Blog · 9h ago · 8 · tool agent api update deployment

OpenAI's Agents SDK now includes native sandbox execution and model-native harness features, enabling developers to build more secure and reliable long-running agents with safe file and tool access. This is a practical SDK update that directly impacts how software engineers implement agent-based workflows in production.

Meet HoloTab by HCompany. Your AI browser companion.

HuggingFace Blog · 10h ago · 7 · agent tool deployment

Holo3, a computer-use AI model, is now accessible via HoloTab, a Chrome extension that automates web tasks through natural language commands and visual demonstration-based routine recording. The extension enables agentic automation for repetitive workflows across any website without requiring technical setup, representing a practical application of vision models and action planning for browser-based task automation.

Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO written from scratch in PyTorch - updates! [P]

r/MachineLearning · 10h ago · 7 · fine tuning benchmark inference workflow

Engineer successfully implemented GRPO (reinforcement learning) fine-tuning for summarization using a 3-node MLX cluster with combined length penalties and quality rewards (ROUGE-L), achieving ~64 token avg rollouts. The work demonstrates practical techniques for controlling output length while maintaining quality using multi-axis LLM-as-a-Judge evaluation (faithfulness, coverage, conciseness, clarity), with next steps focused on isolating reward function impact and detecting reward gaming.

Was looking at a ICLR 2025 Oral paper and I am shocked it got oral [D]

r/MachineLearning · 13h ago · 7 · benchmark research

Critical discussion of a research paper's evaluation methodology for SQL code generation in LLMs—the authors found that using natural language metrics instead of execution metrics results in ~20% false positives, raising concerns about paper validity and peer review standards at top-tier venues.

[P] Added 8 Indian languages to Chatterbox TTS via LoRA — 1.4% of parameters, no phoneme engineering [P]

r/MachineLearning · 16h ago · 7 · fine tuning open source tool research

Fine-tuned open-source TTS model (Chatterbox) for 8 Indian languages using LoRA adapters (1.4% parameters) and grapheme-level tokenization with Brahmic script warm-start initialization. Achieves sub-0.25 CER for most languages except Malayalam (0.86), demonstrating efficient multilingual adaptation without full model retraining or language-specific G2P pipelines.

Notion’s Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion

Latent Space · 19h ago · 7 · agent tool workflow deployment

Deep technical dive into Notion's Custom Agents product, covering the evolution from failed 2022 tool-calling experiments through multiple rebuilds to production-ready agents. Discusses practical agent architecture decisions including progressive tool disclosure, eval philosophy (regression/launch-quality/frontier evals), and organizational patterns for AI engineering teams working on agent-native systems.

Apr 14, 2026AlignmentAutomated Alignment Researchers: Using large language models to scale scalable oversight

Anthropic Research · 20h ago · 7 · research agent fine tuning benchmark

Anthropic's research explores weak-to-strong supervision as a practical approach to scalable oversight—training stronger AI models using weaker model feedback to prepare for supervising future superhuman AI. The study tests whether Claude can autonomously develop and test alignment methods, demonstrating potential for AI systems to accelerate their own alignment research.

You can decompose models into a graph database [N]

r/MachineLearning · 22h ago · 7 · tool inference open source research

LARQL introduces a novel approach to decomposing LLM weight matrices into graph databases, enabling k-NN traversal as a mathematically equivalent alternative to matrix multiplication. This enables in-context knowledge updates without retraining and reduces memory footprint by replacing dense matrices with sparse graph structures, offering practical efficiency gains for model deployment and knowledge management.

Trusted access for the next era of cyber defense

Simon Willison · 22h ago · 6 · new model fine tuning api update

OpenAI released GPT-5.4-Cyber, a fine-tuned variant optimized for defensive cybersecurity use cases, along with a Trusted Access for Cyber program using identity verification for reduced-friction access. The announcement emphasizes OpenAI's existing cybersecurity work and self-service verification, though premium tools still require application approval similar to competing offerings.

Updated Minimax m2.7 still doesn't allow coding a product. But before the next riot starts, Ryan Lee has already confirmed that they are still working on the license, and sale of products built by m2.7 is permitted.

r/LocalLLaMA · 23h ago · 5

Cybersecurity Looks Like Proof of Work Now

Simon Willison · 1d ago · 7 · benchmark deployment research

Claude Mythos Preview demonstrates exceptional capability in identifying security vulnerabilities, with the UK's AI Safety Institute confirming that vulnerability discovery scales with computational investment (tokens spent). This creates new economic incentives for security hardening and makes open-source libraries more valuable as shared security analysis investments.

Business

The Batch · 1d ago · 7 · tool tutorial inference open source

SGLang is a framework for efficient inference optimization that supports both text and image generation workloads. This course provides practical training on deploying and optimizing models, which is directly relevant for engineers looking to improve inference performance and reduce latency in production AI applications.

ML Research

The Batch · 1d ago · 7 · tool inference tutorial

SGLang is a framework for efficient inference optimization that handles both text and image generation workloads. This course provides practical training on reducing inference latency and computational costs, valuable for engineers deploying language and multimodal models in production.

Data Points

The Batch · 1d ago · 7 · tool tutorial inference open source

SGLang is an open-source framework for efficient inference that supports both text and image generation with optimized serving capabilities. This course provides practical guidance on using SGLang to accelerate model inference, which is directly applicable for engineers building production AI systems.

Andrew's Letter

The Batch · 1d ago · 7 · tool inference tutorial

SGLang is a framework for efficient inference optimization in both text and image generation tasks. The course covers practical techniques for reducing latency and resource consumption in LLM deployments, directly applicable to production AI systems.

AI Newsletter

The Batch · 1d ago · 7 · tool tutorial inference

New course on SGLang covering efficient inference techniques for both text and image generation. SGLang is a practical tool for optimizing LLM inference performance, making this relevant for engineers building production AI applications.

Mar 6, 2026PolicyPartnering with Mozilla to improve Firefox’s security

Anthropic Blog · 1d ago · 8 · research agent benchmark workflow

Claude Opus 4.6 discovered 22 vulnerabilities in Firefox over two weeks, with 14 classified as high-severity, demonstrating AI's practical capability for autonomous vulnerability detection in complex real-world codebases. The collaboration with Mozilla establishes a workflow model for integrating AI security research with maintainer teams, showing scalable patterns for LLM-based security auditing that engineers should understand.