model-quality

Here are 6 public repositories matching this topic...

encord-team / encord-active

The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.

python data-science data machine-learning computer-vision deep-learning data-validation annotations ml object-detection data-cleaning active-learning data-quality data-centric mlops noisy-labels model-quality label-errors label-quality

Updated May 23, 2025
Python

google / nitroml

Star

NitroML is a modular, portable, and scalable model-quality benchmarking framework for Machine Learning and Automated Machine Learning (AutoML) pipelines.

python benchmarking machine-learning modular scale portable automl kubeflow tfx model-quality

Updated Mar 10, 2021
Jupyter Notebook

vivekkrishna / semantic-conflicts-benchmark

Star

Benchmarking the ability of large language models to detect semantic conflicts across domains, documents, and evolving knowledge bases.

law science semantic benchmark philosophy artificial-intelligence teams software knowledge-base knowledge-management reasoning conflicts-detection model-quality llm agent-quality

Updated Apr 16, 2026
Python

WillLewis / atlas-agentic-fraud-lab

Star

Adversarial Testing Lab for Agentic Safeguards (ATLAS). A synthetic multi-agent eval environment for adversarial fraud decisioning inspired by Anthropic's Project Deal. Measures how model quality, tool access, and agent orchestration affect attack discovery & defensive recovery, with deterministic evals and realistic customer-friction limits

model-quality adversarial-ml evolutionary-search anthropic agent-orchestration claude-code agent-evaluation