民族药智算基准数据平台The Integrated Bioresource Database
一个面向网络药理学研究的离线知识库:整合 14 个国际权威生化与本草数据库,系统映射「民族药材–化学成分–蛋白靶点–疾病」多层关联,并集成 ADMET 与中医证候–症状维度。An offline knowledge base for network pharmacology: 14 authoritative databases unified into an ethnic-herb–compound–target–disease network, enriched with ADMET and TCM syndrome–symptom dimensions.
数据集概览Dataset at a Glance
4,510 味人工校验药材,命中成分覆盖率约 70%,含 53 列完整中药志条目与民族药名。4,510 curated herbs (~70% with mapped compounds), each with a 53-column monograph and ethnic names.
基于 InChIKey / PubChem CID / CAS 多级比对去重的高保真分子实体库,含 SMILES 与物化指标。High-fidelity molecules deduplicated via InChIKey / PubChem CID / CAS, with SMILES and physicochemical data.
以 UniProt / Gene Symbol 为主键的人源蛋白靶点,整合测定级活性(Ki / IC50 / pChEMBL)。Human protein targets keyed on UniProt / gene symbol, with assay-level activities (Ki / IC50 / pChEMBL).
对齐 ICD-11 / MeSH / OMIM,并经 MONDO 跨本体桥接,连接靶点与化学物–疾病证据。Aligned to ICD-11 / MeSH / OMIM and bridged via MONDO, linking targets and chemical–disease evidence.
跨 NPASS / ITCM / CMAUP / DrugBank 去重聚合的成分-靶点关联,22% 含 pChEMBL 标准化活性。Compound–target links aggregated across NPASS / ITCM / CMAUP / DrugBank; 22% carry standardized pChEMBL.
LOTUS、COCONUT、CMAUP、SymMap、ChEMBL、CTD 等 14 个国际数据库,经统一 ETL 质控合并。14 databases (LOTUS, COCONUT, CMAUP, SymMap, ChEMBL, CTD …) merged through a unified, QC'd ETL pipeline.
数据集简介About the Dataset
覆盖民族医药从药材到疾病的全链路结构化数据。Structured, end-to-end data from ethnic herbs to diseases.
民族药物库Ethnic Libraries
按民族医药体系(藏、蒙、维、傣、壮、苗、彝……)组织 4,510 味药材,呈现各民族用药特色与收录规模。4,510 herbs organized by ethnic medical systems (Tibetan, Mongolian, Uyghur, Dai, Zhuang, Miao, Yi …).
进入民族药物库Open libraries →化合物结构库Compounds
118,115 个深度去重化学成分,提供 InChIKey、SMILES、PubChem CID 等完整标识符与多源跨库追踪。118,115 deduplicated compounds with InChIKey, SMILES, PubChem CID and multi-source provenance tracking.
检索化合物Search compounds →蛋白靶点图谱Targets Map
完备的蛋白靶点映射网络,整合测定级定量活性数据(Ki、IC50、EC50、Kd),依托 UniProt 标准化。A protein-target network enriched with assay-level activities (Ki, IC50, EC50, Kd), standardized via UniProt.
浏览靶点Browse targets →疾病与证候Diseases & Syndromes
标准化「疾病-证候」映射,对齐 ICD-11、MeSH、OMIM,并连接 CTD 化学物-疾病证据。Standardized disease–syndrome mapping aligned to ICD-11, MeSH, OMIM, linked to CTD chemical–disease evidence.
浏览疾病Browse diseases →网络药理学可视化Network Pharmacology
以药材 / 成分 / 疾病为种子展开「药材→成分→靶点→疾病」机制网络,服务端封顶,可导出 PNG / 边表。Seed on a herb / compound / disease to expand the herb→compound→target→disease mechanism network.
打开图谱Open graph →分析工具Analysis Tools
靶点反向筛选等分析工具:靶点→候选成分、疾病→反向筛选药材、成分→靶点垂钓,更多工具陆续上线。Analysis tools incl. target reverse screening (target→compounds, disease→herbs, compound→targets); more coming.
进入工具Open tools →更新日志Changelog
- 2026-06-09
v1.0 发布库重建:22 张扁平表 SQLite + 同源 CSV 镜像;新增 CTD 化学物–疾病(compound_disease 27,965)与基因–疾病关联;纳入 TCM 证候 / 症状维度。v1.0 release rebuilt: 22 flat SQLite tables + CSV mirror; added CTD chemical–disease (27,965) and gene–disease links; included TCM syndrome / symptom dimensions.
- 2026-06-04
发布库备份快照(TCM-DB-v1.0-backup-20260604),完成 SQLite ↔ CSV 全量一致性校验。Release backup snapshot taken; full SQLite ↔ CSV consistency verified.
- 2026-05
药材清单去重合并 4,853 → 4,510 味,统一重映射 herb_id 与跨库桥接。Herb list deduplicated 4,853 → 4,510; herb_id remapped and cross-database bridges rebuilt.
- 2026-05
接入 14 个离线数据源并完成跨库归一(InChIKey / UniProt / ICD-11 / MONDO)。Integrated 14 offline sources with cross-database normalization (InChIKey / UniProt / ICD-11 / MONDO).