News Nug

[P] Added 8 Indian languages to Chatterbox TTS via LoRA — 1.4% of parameters, no phoneme engineering [P]

r/MachineLearning · 16h ago · 7 · fine tuning open source tool research

Fine-tuned open-source TTS model (Chatterbox) for 8 Indian languages using LoRA adapters (1.4% parameters) and grapheme-level tokenization with Brahmic script warm-start initialization. Achieves sub-0.25 CER for most languages except Malayalam (0.86), demonstrating efficient multilingual adaptation without full model retraining or language-specific G2P pipelines.

You can decompose models into a graph database [N]

r/MachineLearning · 22h ago · 7 · tool inference open source research

LARQL introduces a novel approach to decomposing LLM weight matrices into graph databases, enabling k-NN traversal as a mathematically equivalent alternative to matrix multiplication. This enables in-context knowledge updates without retraining and reduces memory footprint by replacing dense matrices with sparse graph structures, offering practical efficiency gains for model deployment and knowledge management.

Business

The Batch · 1d ago · 7 · tool tutorial inference open source

SGLang is a framework for efficient inference optimization that supports both text and image generation workloads. This course provides practical training on deploying and optimizing models, which is directly relevant for engineers looking to improve inference performance and reduce latency in production AI applications.

Data Points

The Batch · 1d ago · 7 · tool tutorial inference open source

SGLang is an open-source framework for efficient inference that supports both text and image generation with optimized serving capabilities. This course provides practical guidance on using SGLang to accelerate model inference, which is directly applicable for engineers building production AI systems.

ClawBench: Can AI Agents Complete Everyday Online Tasks? 153 tasks, 144 live websites, best model at 33.3% [R]

r/MachineLearning · 1d ago · 8 · benchmark agent open source research

ClawBench is a new benchmark evaluating AI browser agents on 153 real-world tasks across live websites, revealing that even the best models (Claude Sonnet, GLM-5) achieve only 33% success rates. The benchmark provides comprehensive evaluation infrastructure with multi-layer behavioral data collection, request interception for safe testing, and an interactive leaderboard—offering practical insights for building and improving web-capable AI agents.

baidu/ERNIE-Image · Hugging Face

r/LocalLLaMA · 1d ago · 8 · new model open source inference tool

Baidu released ERNIE-Image, an 8B-parameter open-weight text-to-image diffusion model with strong instruction-following and text-rendering capabilities, alongside ERNIE-Image-Turbo optimized for fast inference (8 steps). The model is available via Hugging Face with practical examples for integration into workflows.

20M+ Indian legal documents with citation graphs and vector embeddings – potential uses for legal NLP? [D]

r/MachineLearning · 1d ago · 8 · dataset rag research open source benchmark

A software engineer has built a structured 20M+ Indian court case dataset with citation graphs, dense/sparse embeddings, and extracted metadata (judges, parties, sections, acts). The resource includes heuristic + LLM-based NER extraction pipeline, cross-referenced legislation, and serves as a novel evaluation benchmark for legal RAG systems and graph neural networks on low-resource legal domain data.

[AINews] Top Local Models List - April 2026

Latent Space · 1d ago · 6 · open source deployment benchmark

Community survey of popular open-weight models across local deployment use cases, highlighting Qwen 3.5, Gemma 4, DeepSeek V3.2, and others based on actual Reddit recommendations rather than benchmarks. Focuses on practical model selection for engineers building local inference systems, with specific callouts for coding (Qwen3-Coder-Next) and agentic workloads (MiniMax M2.5/M2.7).

"I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]

r/MachineLearning · 1d ago · 8 · library research open source benchmark deployment

HALO-Loss is an open-source drop-in replacement for Cross-Entropy that uses euclidean distance instead of dot products to bound model confidence, enabling native out-of-distribution detection without sacrificing base accuracy. The method addresses a fundamental neural network problem where models hallucinate on unfamiliar data by mathematically constraining confidence to finite distances and providing an implicit "abstain class" at the origin of the latent space. Testing shows zero accuracy drop, improved calibration (ECE down to 1.5%), and significantly reduced false positives on far OOD detection compared to standard approaches.

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R]

r/MachineLearning · 1d ago · 7 · research open source inference benchmark

An indie developer trained a 1B parameter Spiking Neural Network (SNN) from random initialization for language modeling, achieving 93% sparsity and spontaneous cross-lingual emergence, challenging the conventional wisdom that direct SNN training requires ANN conversion or distillation. While early-stage (4.4 loss, 27k steps), this demonstrates a viable pathway for neuromorphic computing and inference efficiency, with code and checkpoint shared for community feedback.

Exploring the new `servo` crate

Simon Willison · 2d ago · 7 · tool library open source

Servo browser engine is now available on crates.io as an embeddable library, enabling Rust developers to integrate it into applications. The post demonstrates practical usage including a CLI screenshot tool and explores WebAssembly compilation possibilities, though full Servo WebAssembly compilation isn't feasible due to threading and dependency constraints.

Gemma 4 audio with MLX

Simon Willison · 2d ago · 7 · tutorial inference open source tool

Practical walkthrough of running local audio transcription using Gemma 4 E2B model with MLX framework on macOS via uv run. Demonstrates real-world inference with a 10GB model and shows actual transcription output with accuracy notes, useful for developers building local AI audio pipelines.

mtmd: add Gemma 4 audio conformer encoder support

r/LocalLLaMA · 3d ago · 7 · open source inference tool benchmark

This PR adds audio processing support to Gemma 4 models in llama.cpp using a USM-style Conformer encoder, with key fixes for CUDA/Vulkan/Metal backend compatibility. The implementation includes optimizations like replacing unsupported ops (ggml_roll → view+concat) and fixing contiguity issues that caused CPU fallbacks, achieving strong audio transcription results across different quantization levels and backends.

Minimax M2.7 Released

r/LocalLLaMA · 3d ago · 9 · new model open source agent deployment benchmark

MiniMax-M2.7 is a new open-source model with strong programming and agent capabilities, featuring self-evolving optimization during training and native multi-agent collaboration support. The model demonstrates exceptional performance on code tasks (SWE-Pro 56.22%, Terminal Bench 57.0%), system-level reasoning for SRE work, and achieves competitive benchmarks against GPT-5.3 and Claude variants while supporting deployment via SGLang, vLLM, and Transformers.

SQLite 3.53.0

Simon Willison · 3d ago · 5 · tool open source

SQLite 3.53.0 release includes result formatting improvements via a new Query Results Formatter library, with a WebAssembly playground built using Claude Code. While SQLite is foundational infrastructure, this release focuses on general database improvements rather than AI-specific tooling or capabilities.

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

HuggingFace Blog · 6d ago · 8 · new model inference open source

Waypoint-1.5 is Overworld's improved real-time video world model now optimized for consumer hardware, running up to 720p/60fps on RTX 3090+ and 360p on broader gaming laptops/Apple Silicon. The model was trained on 100x more data than v1 with more efficient video modeling techniques, prioritizing interactive responsiveness and local deployment over pure visual fidelity.

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

HuggingFace Blog · 7d ago · 7 · agent workflow open source tool

ALTK-Evolve is a long-term episodic memory system for AI agents that distills interaction traces into reusable guidelines rather than storing raw transcripts, enabling agents to generalize principles across tasks. The framework shows significant improvements on multi-step API tasks (AppWorld benchmark) and integrates as a Claude Code plugin or with existing tools like Arize Phoenix and Codex without major stack changes.

Safetensors is Joining the PyTorch Foundation

HuggingFace Blog · 7d ago · 7 · tool open source deployment

Safetensors, the secure model weight format that replaced pickle-based serialization, is moving to PyTorch Foundation governance to become truly community-owned while remaining the de facto standard for model distribution across Hugging Face Hub. The move enables vendor-neutral stewardship and potential integration into PyTorch core, with no breaking changes for existing users but clearer paths for community contributors.

GLM-5.1: Towards Long-Horizon Tasks

Simon Willison · 7d ago · 7 · new model open source benchmark

GLM-5.1, a 754B parameter open-weights model from Z.ai, demonstrates strong capabilities in multimodal generation and instruction-following, particularly for SVG/HTML creation tasks. The model can self-correct technical issues (CSS animations breaking SVG positioning) and generate well-structured code with detailed comments, making it worth testing for creative code generation workflows.

Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

Latent Space · 8d ago · 7 · agent workflow prompt engineering open source

OpenAI's Ryan Lopopolo discusses 'Harness Engineering'—a methodology for building AI-native software where agents operate autonomously with zero human-written code, using >1B tokens/day and extensive prompt engineering via Symphony (a multi-agent orchestration system). The approach shifts focus from prompt optimization to building proper context, structure, and observability for agents to function as full teammates rather than copilots.