SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence

This summer, SandboxAQ released the Structurally Augmented IC50 Repository (SAIR), the largest dataset of co-folded 3D protein-ligand structures paired with experimentally measured IC₅₀ labels, directly linking molecular structure to drug potency and overcoming a longstanding scarcity in training data. This dataset is now available on Hugging Face, and for the first time, researchers have open access to more than 5 million AI‑generated, high‑accuracy protein-ligand 3D structures, each paired with validated empirical binding potency data.

image/png

SAIR is an open-sourced dataset and is publicly available for free under a permissive CC BY 4.0 license, making

 

 

 

To finish reading, please visit source site