A software engineer has built a structured 20M+ Indian court case dataset with citation graphs, dense/sparse embeddings, and extracted metadata (judges, parties, sections, acts). The resource includes heuristic + LLM-based NER extraction pipeline, cross-referenced legislation, and serves as a novel evaluation benchmark for legal RAG systems and graph neural networks on low-resource legal domain data.
r/MachineLearning
·
1d ago
·
8
·
dataset
rag
research
open source
benchmark
HuggingFace Blog
·
6d ago
·
8
·
tutorial
rag
library
inference
Practical guide to multimodal embedding and reranker models that extend traditional RAG pipelines to handle text, images, and other modalities in a shared embedding space. Covers model loading, encoding mixed-modality inputs, and computing cross-modal similarities with concrete code examples and performance considerations.
GitHub Trending AI
·
22d ago
·
7
·
tool
open source
library
agent
rag
deployment
A curated directory of production-ready open-source AI tools and libraries organized by category (core frameworks, models, inference, agents, RAG, training, deployment, benchmarks, safety). Highlights practical CLI tools like PR-Agent, Gemini CLI, LLM, and Repomix that directly integrate AI into developer workflows.