Dataset metadata & provenance (lite) — v1.0.0
Released: 2026-02-02 · Hash: sha256:d3474f980bddd6311c740bf48abd61fbc62486b6539d575c2446d08d80cffb63
Identification & citation
| Criterion | Text |
|---|---|
| must MDP-01 | Provide a stable landing page for the dataset, ideally with a persistent identifier (e.g., DOI) and an explicit version. |
| must MDP-02 | Provide a recommended citation for the dataset including creators, year, title, version, and PID/URL. |
| should MDP-03 | List dataset creators/contributors, affiliation/organization, and a contact point for questions or issue reporting. |
Structure & data dictionary
| Criterion | Text |
|---|---|
| must MDP-10 | Provide a file inventory describing each file (name, format, purpose) and how files relate (e.g., tables joined by keys). |
| must MDP-11 | Provide a data dictionary for variables/fields (name, description, type, units, coding, missing value codes). |
| should MDP-12 | Describe data collection context: sampling/eligibility, time coverage, geography/setting, instruments, and key procedures. |
| may MDP-13 | Describe identifier strategy (subject IDs, linkage keys) and any pseudonymization/de-identification steps that affect linkage. |
Provenance & processing
| Criterion | Text |
|---|---|
| must MDP-20 | Describe the processing pipeline from raw sources to released files, including cleaning rules and transformations; link to scripts/notebooks where feasible. |
| must MDP-21 | Document external data sources and merges (source, version/date accessed, and license/terms), including join keys and match logic. |
| should MDP-22 | Record software/environment used to produce the dataset (tools + versions) and any nondeterministic steps. |
Access, licensing, governance
| Criterion | Text |
|---|---|
| must MDP-30 | State the license/usage terms and any access restrictions; if controlled access, describe the request procedure and decision criteria. |
| must MDP-31 | Address sensitive data and governance: consent/ethics constraints, de-identification, and what cannot be shared (and why). |
Integrity & quality notes
| Criterion | Text |
|---|---|
| must MDP-40 | Provide data quality checks/validation rules and known issues/limitations; include recommended use and misuse warnings. |
| should MDP-41 | Provide file integrity checks (e.g., checksums) or an archive manifest to detect corruption and support auditability. |
| should MDP-42 | Include rich keywords and machine-readable metadata in the repository record (e.g., DataCite fields) to support discovery. |