Skip to Main Content

Research Data Management

This guide will assist researchers in planning for the various stages of managing their research data, implementing best practices for managing research data, and in preparing data management plans required with funding proposals.

organization

Organization

File Formats


In planning a research project, it is important that you consider which file formats you will use to store your data. The file formats in which you record, store, and transmit your data is a primary factor in the ability to use the data in the future. Since technology continually changes, researchers should plan for both hardware and software obsolescence. 

In many cases, the file format used will be dictated by the software you have access to or the conventions of your discipline. However, for long-term preservation and ease of sharing, best practices dictate that the files be converted to a different format after your project has ended. 

Whenever possible use uncompressed, unencrypted, non-proprietary (open) formats like the following: 

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoJSON, KML, NetCDF, GeoTIFF/TIFF, NetCDF, HDF-EOS
  • Moving images:  MOV, AVI, MXF
  • eScholarship requires MP4 to embed
  • Presentations: PDF
  • Sounds: WAV, AIFF, MXF
  • eScholarship requires MP3 to embed
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, JPEG, PDF
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

File Organization 


Before you begin your research, adopt a naming convention for your files and use it throughout your project. Document the naming convention you choose, most likely in your ReadMe.txt file, and make sure that you and your collaborators follow it.

Best practices for naming conventions include:  

  • Describe the contents of the file, but do not be overly long. Avoid generic names (like draft.doc; final2.xls) that can be hard to decipher and easily overwritten.
  • Separate elements in a file name using underscores (_) or hyphens (-). Avoid using blank spaces in a file name. Use periods only to separate the file name from the file type extension (.txt, .jpg, etc.)
  • Use the dating convention: YYYY-MM-DD or YYMMDD
  • Avoid special characters such as: " / \ : * ? < > [ ] & $ . These have meaning in software and operating systems and can cause trouble.
  • When using numbers, use leading zeros to make sure files sort in sequential order. Use 001, 002, ...020, 021 … instead of 1, 2… 20, 21…

Data Versioning 


Data versioning refers to saving new copies of your files when you make changes so that you can go back and retrieve specific versions of your files later. Saving multiple versions makes it possible to decide at a later time that you prefer an earlier version. You can then immediately revert back to that version instead of having to retrace your steps to recreate it. 

The following naming structure for data versioning is as follows: 

  • DataFileName_1.0 = original document
  • DataFileName_1.1 = original document with minor revisions
  • DataFileName_2.0 = document with substantial revisions

Maps and Parking