Bootcamp: Organising and documenting data

Organising and documenting data
Section 7 of 8

old labels
Image: Broomwicks by Dan Gregory

One of the chief aims of research data management is to make the research process as efficient as possible by reducing the time spent looking for things, which creates more time for important processes such as critical thinking, analysis, and writing.

At the start of a research project it's easy to believe that you'll remember what name you gave to a file and where you put it. However, when you begin to create, gather, and manipulate more and more files they can soon become disorganised. Trying to find a data file that you need which has been stored or named incorrectly or inaccurately can be both frustrating and a waste of valuable time. In extreme cases, data can be rendered unusable if you fail to organise it properly. It's important therefore that at the outset of a project you and your colleagues decide on a consistent approach to naming and structuring files and directories to make sure that you can all find and use them.

As well naming your files consistently it will also help to add documentation to the materials you create to ensure that your data is usable, shareable and can be understood in the future. By adding this documentation, or metadata, you will be making sure your data can be can be interpreted, not only by you and your colleagues, but also by others in the future.

It's best to document your data at the time when it is created, rather than trying to remember details later. Documentation can be created in a number of ways: for instance by including information in the data itself (e.g. a TIFF file can be a lot more than just an image, it can contain a lot of extra, written information too), maintaining a readme.txt file of basic information, maintaining a lab notebook or research journal, or keeping a database of information about your data with links to the actual files. When documenting your data, try to imagine the information that somebody else would need in order to understand and use this data in (say) twenty years time.

There are various pieces of information you may wish to record, including the following:

  • Basic information like: title, date, creator, format, subject, rights, conditions of access
  • Explanations of any codes, classification systems or abbreviations you use
  • Details about how the data was created, analysed, anonymised etc
  • Information about the project and creators of the data 
  • Anomalies and irregularities: e.g. exceptions, quirks or questionable results
  • 1. Which of the following file naming conventions would you use if you wanted your files to appear in date order?

    Leave a Reply

    Your email address will not be published. Required fields are marked *