Documentation and metadata are two different things. While documentation is intended to be read by humans, metadata is designed more for machine processing than for human readability. However, metadata serves as documentation.
Documentation | Metadata |
---|---|
Documentation involves explaining how your data was created, its context, structure and content, as well as any manipulations carried out. All data collected or generated must be documented as part of your project. It is also advisable to document all file formats used. Data documentation is part of the data retention process. | This is a set of structured information describing the characteristics of a datum or dataset. Its role is to facilitate data search, management and reuse. Metadata can be descriptive (e.g. title, author, project date, etc.), technical (e.g. formats used, etc.) or about usage and access rights (e.g. rights holders). We recommend creating metadata as early as possible. |
Readme file | Codebook |
---|---|
This document is a form of documentation containing metadata. It explains the content and structure of your dataset, and provides sufficient information for a potential user to determine whether or not the data is of interest to them. If your dataset requires a Codebook, this can be included in the Readme file. You can also create secondary Readme files in sub-folders to document specific parts of your data. Best practices:
|
A codebook is a document (usually a table) describing the variables present in a data set. Its purpose is to record detailed information on each variable. The Codebook generally contains codes, symbols and abbreviations used in the files, a list of variables with their full name and definition, units of measurement and data formats (e.g. YYYY-MM-DD), handling of missing data (code, etc.). |