Skip to Main Content

Research Data Management (RDM): Documentation

One stop shop for all things related to Research Data and how to manage your data throughout its entire lifecycle

Documentation

How Is Metadata Different from Data Documentation? 

Metadata is a method of documentation that refers to a particular standard that has been agreed upon by a specific community or group.

Providing documentation is a crucial part of the research process; in order for data to be used effectively and efficiently, certain facts about that data must be recorded. Depending on one’s own memory is not a good solution, especially in collaborative research environments.

Describing research protocols and documenting data through the creation of metadata, lab notebooks, instrument calibrations, methodology outlines or codebooks will ensure that your data is valid and reproducible.

What to document?

To ensure maximum interoperability and reproducibility, data should be accompanied by full documentation.

Research Project Documentation Dataset documentation
Rationale and context for data collection Variable names and descriptions
Data collection methods Explanation of codes and classification schemes used
Structure and organization of data files Algorithms used to transform data (may include computer code)
Data sources used (see citing data) File format and software (including version) used
Data validation and quality assurance  
Transformations of data from the sanitized data through analysis  
Information on confidentiality, access and use conditions  

 

General aspects of your data that you should document

✦ Title - Name of the dataset or research project that produced it


✦ Creator - Names and addresses of the organizations or people who created the data; preferred format for personal names is surname first (e.g., Smith, Jane)


✦ Identifier - Unique number used to identify the data, even if it is just an internal project reference number


✦ Date - Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, such as maintenance cycle, update schedule; preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range


✦ Method - How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook


✦ Processing - How the data have been altered or processed (e.g., normalized)


✦ Source - Citations to data derived from other sources, including details of where the source data is held and how it was accessed


✦ Funder - Organizations or agencies who funded the research


Adapted from the Curtin University Documenting Research library guide

Subject - Keywords or phrases describing the subject or content of the data


Place - All applicable physical locations


Language - All languages used in the dataset


Variable list - All variables in the data files, where applicable


✦ Code list - Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. "999 indicates a missing value in the data")

File inventory - All files associated with the project, including extensions (e.g. "NWPalaceTR.WRL", "stone.mov")


File formats - Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.


File structure - Organisation of the data file(s) and layout of the variables, where applicable


Version - Unique date/time stamp and identifier for each version


Checksum - A digest value computed for each file that can be used to detect changes; if a recomputed digest differs from the stored digest, the file must have changed


Necessary software - Names of any special-purpose software packages required to create, view, analyze, or otherwise use the data

✦ Rights - Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data


✦ Access information - Where and how your data can be accessed by other researchers

How to record metadata

Metadata should be stored alongside your research data.

There are a number of ways you can add documentation to your data.

Embedded documentation Supporting Documentation

Supporting documentation
Information about a file or dataset can be included within the data or document itself. For digital datasets, this means that the documentation can sit in separate files (for example text files) or be integrated into the data file(s), as a header or at specified locations in the file. Examples of embedded documentation include:

  • code, field and label descriptions
  • descriptive headers or summaries
  • recording information in the Document Properties function of a file (Microsoft)

You may wish to consider creating a README file which documents the contents, naming convention and structure of files and folders and the project they relate to, as well as a description of file formats and software needed to use the files. You can also document details of licence or any restrictions placed on the data.

This is information in separate files that accompany data in order to provide context, explanation, or instructions on confidentiality and data use or reuse. Examples of supporting documentation include:

  • Working papers or laboratory books
  • Questionnaires or interview guides
  • Final project reports and publications
  • Catalogue metadata

Supporting documentation should be structured, so that it can be used to identify and locate the data via a web browser or web based catalogue.


Forms of Documentation

README

A README File is a text file located in a project-related folder that describes the contents and structure of the folder and/or a dataset the project they relate to, so that a researcher can locate the information they need.

A readme file provides information about a data file and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data.

Information should include a description of file formats and software needed to use the files. You can also document details of licence or any restrictions placed on the data. 

Cornell University has an excellent guide on Readme files.

Data dictionary/ codebook

Also known as a codebook, a data dictionary defines and describes the elements of a dataset so that it can be understood and used at a later date. It informs the data user about the study, data file(s), variables, categories, etc., that make up a complete dataset.

The following tools can be useful for creating a codebook/data dictionary 

Protocol

A protocol describes the procedure(s) or method(s) used in the implementation of a research project or experiment. 

If you need to maintain protocols, we can a tool like protocols.io - a collaborative platform and preprint server for methods and protocols.

Learn more about and watch a seminar on Protocols and Methods delivered by Harvard Biomedical Data Management.

Lab Notebook aka ELN

For research groups that use them, lab notebooks are often the primary record of the research process. They are used to document hypotheses, experiments, analyses, and interpretations of experiments.

In an Electronic Lab Notebooks (ELN) you can enter protocols, observations, notes, and other data using your computer or mobile device.

The following are some examples of Open-Source ELN:

  • Evernote: general-purpose note-taking app, allows you to organize many types of data, and sync those data to different devices.
  • Confluence: A popular documentation and sharing platform that can be configured to manage workflows.
  • Benchling: This is a free cloud-based ELN platform that also provides Molecular Biology tools.

From Knowledge clip: Data Documentation [Video], by UGent Open Science, 2021, Ghent University. (https://www.youtube.com/watch?v=7Ogbkx74Ym8). CC BY.

Ask us at the Library

   08 8946 7016

   +61 4 8885 0811 (text only)

   askthelibrary@cdu.edu.au

   Book an Appointment

   Frequently Asked Questions

Helpful links

Library home page
Study Skills site
Language and Learning support home page
Current students information page
Reading list link
Distance learning help
Recorded workshops
Past exam papers
Charles Darwin University acknowledges the traditional custodians across the lands on which we live and work, and we pay our respects to Elders both past and present.
CRICOS Provider No: 00300K (NT/VIC) 03286A (NSW) RTO Provider No: 0373 Privacy StatementCopyright and DisclaimerFeedback • ABN 54 093 513 649