News Nug

TRL v1.0: Post-Training Library Built to Move with the Field

HuggingFace Blog · 15d ago · 8 · library fine tuning workflow research

TRL v1.0 introduces architectural lessons for building stable post-training libraries that can adapt as methods evolve from PPO to DPO to RLVR approaches. The library design prioritizes flexibility over fixed abstractions, recognizing that core concepts like reward models shift between being fundamental, optional, or reimagined as verifiers across different training paradigms.

Gemini 3.1 Flash Live: Making audio AI more natural and reliable

DeepMind Blog · 20d ago · 8 · new model api update agent

Google released Gemini 3.1 Flash Live, an improved real-time audio model with better precision, lower latency, and enhanced tonal understanding for voice-first applications. Available via Gemini Live API, it achieves 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge, enabling developers to build voice agents that handle complex tasks with natural dialogue in noisy environments.

mcp-brasil — MCP Server para 41 APIs públicas brasileiras

GitHub Trending AI · 20d ago · 8 · tool open source agent api update

An open-source MCP (Model Context Protocol) server that connects AI agents (Claude, GPT, Copilot) to 41 Brazilian government APIs covering economics, legislation, transparency, judiciary, elections, and more—38 APIs require no authentication. This is a practical tool for engineers building AI applications that need access to structured public sector data with ready-made integrations and natural language query capabilities.

Protecting people from harmful manipulation

DeepMind Blog · 21d ago · 6 · research benchmark safety

Research release on empirically validated toolkit for measuring AI manipulation capabilities, tested across 10,000+ participants in finance and health domains. Provides open-source methodology and materials for evaluating how AI systems can be misused to deceptively influence human behavior and beliefs in high-stakes scenarios.

Lyria 3 Pro: Create longer tracks in more

DeepMind Blog · 21d ago · 7 · new model api update tool

Google released Lyria 3 Pro, an advanced music generation model supporting 3-minute tracks with structural awareness (verses, choruses, bridges). The model is available across multiple platforms including Vertex AI, Gemini API, Google AI Studio, and consumer apps, enabling developers to integrate custom music generation at scale.

Inside our approach to the Model Spec

OpenAI Research · 21d ago · 6 · workflow api update

OpenAI published a Model Spec that documents expected behavior, safety constraints, and design principles for their AI models. This provides engineers with official guidance on model capabilities and limitations, useful for understanding how to work within OpenAI's systems and for designing similar frameworks in their own applications.

apfel — Apple Intelligence from the command line. On-device LLM via FoundationModels framework. No API keys, no cloud, no dependencies.

GitHub Trending AI · 22d ago · 8 · tool open source api update inference deployment

apfel is an open-source tool that exposes Apple's on-device foundation model through a CLI, OpenAI-compatible API server, and shell integration—enabling local LLM inference on Apple Silicon Macs with no cloud dependency, API keys, or per-token billing. It supports tool calling via Model Context Protocol (MCP), includes demo shell scripts for practical workflows, and manages a 4096-token context window automatically.

awesome-opensource-ai — Curated list of the best truly open-source AI projects, models, tools, and infrastructure.

GitHub Trending AI · 22d ago · 7 · tool open source library agent rag deployment

A curated directory of production-ready open-source AI tools and libraries organized by category (core frameworks, models, inference, agents, RAG, training, deployment, benchmarks, safety). Highlights practical CLI tools like PR-Agent, Gemini CLI, LLM, and Repomix that directly integrate AI into developer workflows.

A Visual Guide to Attention Variants in Modern LLMs

Ahead of AI · 24d ago · 8 · research tutorial open source

Comprehensive reference guide organizing 45+ LLM architectures with visual model cards and detailed explanations of attention variants (MHA, GQA, sliding window, etc.) used in modern models. Includes both a web gallery and printable poster, serving as a practical learning resource for understanding contemporary transformer architectures.

holaOS — The agent environment for long-horizon work, continuity, and self-evolution.

GitHub Trending AI · 24d ago · 7 · tool open source agent deployment

holaOS is an agent operating system framework that provides infrastructure for long-running AI agents with persistent memory, durable state, and continuity across executions rather than one-off tasks. The project includes a local desktop environment (Holaboss) with quick-start installation and integration points for coding agents like Claude, Cursor, and Windsurf.

awesome-free-llm-apis — Permanent Free LLM API List (API Keys) 😎🔑

GitHub Trending AI · 25d ago · 7 · api update tool inference

A curated resource listing LLM APIs with permanent free tiers for text inference, including first-party APIs from model trainers and third-party platforms hosting open-weight models. Covers rate limits, available regions, and notable models—useful reference for engineers exploring cost-free inference options during development and experimentation.

ai-engineering-from-scratch — Learn it. Build it. Ship it for others.

GitHub Trending AI · 28d ago · 7 · tutorial workflow agent open source

A comprehensive AI engineering curriculum spanning 260+ lessons across 20 phases (~290 hours) covering fundamentals from linear algebra to autonomous agent swarms in Python, TypeScript, Rust, and Julia. Each lesson produces reusable artifacts (prompts, skills, agents, MCP servers) that can be immediately integrated into AI coding workflows, with personalized learning paths based on existing ML/DL knowledge.

Measuring progress toward AGI: A cognitive framework

DeepMind Blog · 29d ago · 7 · benchmark research tool

Google DeepMind released a cognitive taxonomy framework for measuring AGI progress, grounded in psychology and neuroscience, identifying 10 key cognitive abilities. They're launching a $200K Kaggle hackathon where engineers can design evaluations for five priority abilities (learning, metacognition, attention, executive functions, social cognition) using their new Community Benchmarks platform to test against frontier models.

Improving instruction hierarchy in frontier LLMs

OpenAI Research · 36d ago · 7 · research fine tuning safety prompt engineering

IH-Challenge is a training framework that teaches models to respect instruction hierarchy and distinguish between trusted vs. untrusted inputs, improving robustness against prompt injection attacks and enhancing safety steerability. This is practically useful for engineers building production AI systems that need stronger defenses against adversarial inputs.

Reasoning models struggle to control their chains of thought, and that’s good

OpenAI Research · 41d ago · 7 · research prompt engineering agent

OpenAI presents CoT-Control, a technique for steering chain-of-thought reasoning in language models, revealing that current reasoning models have difficulty maintaining controlled thought processes. This research addresses interpretability and monitorability concerns, providing practical insights for building more controllable AI systems in production.

Gemini 3.1 Flash-Lite: Built for intelligence at scale

DeepMind Blog · 43d ago · 9 · new model api update inference

Google released Gemini 3.1 Flash-Lite, a new lightweight model optimized for high-volume production workloads at $0.25/1M input tokens and $1.50/1M output tokens. It delivers 2.5X faster time-to-first-token and 45% faster output speeds than 2.5 Flash while maintaining quality, making it ideal for real-time applications like translation, content moderation, UI generation, and agentic workflows at scale.

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

DeepMind Blog · 48d ago · 7 · new model api update inference

Google DeepMind released Nano Banana 2 (Gemini 3.1 Flash Image), a new image generation model combining advanced reasoning and world knowledge with Flash-speed inference. The model is now available across Google products (Gemini app, Search) and offers improved subject consistency, photorealism, and instruction-following capabilities with reduced latency compared to the Pro version.

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026

Ahead of AI · 49d ago · 8 · new model research benchmark

Comprehensive technical comparison of 10+ major open-weight LLM releases from January-March 2026, analyzing architectural innovations like mixture-of-experts, sliding window attention, QK-norm, and gating mechanisms across models from Arcee, Moonshot, Qwen, and others. Serves as a practical reference for understanding current design patterns and trade-offs in large model architecture.

Why we no longer evaluate SWE-bench Verified

OpenAI Research · 51d ago · 7 · benchmark research

Analysis reveals significant data contamination and training leakage issues in SWE-bench Verified, a widely-used benchmark for evaluating AI coding models, with recommendations to use SWE-bench Pro instead. This is technically important for engineers evaluating code generation models and understanding the reliability of current benchmarking standards.

Our First Proof submissions

OpenAI Research · 54d ago · 7 · benchmark research agent

Research team demonstrates AI model performance on expert-level mathematical proof problems from the First Proof challenge, providing insights into current capabilities and limitations of AI reasoning on formal mathematics. This benchmarking work is relevant for engineers building AI systems that require complex reasoning and problem-solving.