Dataset metadata & provenance (lite) — v1.0.0

Released: 2026-02-02 · Hash: sha256:d3474f980bddd6311c740bf48abd61fbc62486b6539d575c2446d08d80cffb63

JSON View JSON (API) Open in Builder
Identification & citation
CriterionText
must MDP-01
Provide a stable landing page for the dataset, ideally with a persistent identifier (e.g., DOI) and an explicit version.
must MDP-02Provide a recommended citation for the dataset including creators, year, title, version, and PID/URL.
should MDP-03List dataset creators/contributors, affiliation/organization, and a contact point for questions or issue reporting.
Structure & data dictionary
CriterionText
must MDP-10Provide a file inventory describing each file (name, format, purpose) and how files relate (e.g., tables joined by keys).
must MDP-11
Provide a data dictionary for variables/fields (name, description, type, units, coding, missing value codes).
should MDP-12Describe data collection context: sampling/eligibility, time coverage, geography/setting, instruments, and key procedures.
may MDP-13Describe identifier strategy (subject IDs, linkage keys) and any pseudonymization/de-identification steps that affect linkage.
Provenance & processing
CriterionText
must MDP-20Describe the processing pipeline from raw sources to released files, including cleaning rules and transformations; link to scripts/notebooks where feasible.
must MDP-21Document external data sources and merges (source, version/date accessed, and license/terms), including join keys and match logic.
should MDP-22Record software/environment used to produce the dataset (tools + versions) and any nondeterministic steps.
Access, licensing, governance
CriterionText
must MDP-30State the license/usage terms and any access restrictions; if controlled access, describe the request procedure and decision criteria.
must MDP-31Address sensitive data and governance: consent/ethics constraints, de-identification, and what cannot be shared (and why).
Integrity & quality notes
CriterionText
must MDP-40Provide data quality checks/validation rules and known issues/limitations; include recommended use and misuse warnings.
should MDP-41Provide file integrity checks (e.g., checksums) or an archive manifest to detect corruption and support auditability.
should MDP-42Include rich keywords and machine-readable metadata in the repository record (e.g., DataCite fields) to support discovery.