File Formats |
The file format you choose for your data is a primary factor in someone else's ability to access it in the future.
Think carefully about the best file format to manage, share, and preserve your data. Technology continually changes, and all contemporary hardware and software should be expected to become obsolete.
Consider how your data will be read if the software used to produce it becomes unavailable. Although any file format you choose today may become unreadable in the future, some formats are more likely to be readable than others.
Choosing good formats will improve the accessibility of your research and make it easier for yourself and other future researchers to use or reuse with a wide range of computer systems regardless of available software packages.
When performing research, it’s often necessary to use specialised and proprietary file formats. This may be for many reasons: your method of data analysis; the hardware used; the software available to you or to meet discipline-specific standards. Regardless of these issues, it’s still important to make a conscious and informed decision on choosing file formats.
At a minimum you should consider:
At later stages of your research, such as when publishing traditional research outputs or making your data publicly available, you should consider transferring your data to a file format that can be utilised by people who may not have access to the exact suite of software you have.
Researchers may sometimes encounter situations where they absolutely must use a proprietary/discouraged file format. In this case, they should make every possible effort to provide a backup version of the file in a different format. They should also provide documentation explaining how to use the problematic format.
Formats likely to be accessible in the future are:
✦ Non-proprietary
✦ Open, with documented standards
✦ In common usage by the research community
✦ Using standard character encodings (i.e., ASCII, UTF-8)
✦ Uncompressed (space permitting)
Examples of preferred format choices: |
Image | JPEG, JPG-2000, PNG, TIFF | |
Moving images | MOV, MPEG, AVI, MXF | |
Text | plain text (TXT), HTML, XML, PDF/A | |
Audio | AIFF, WAVE, MP3, MXF | |
Containers | TAR, GZIP, ZIP | |
Databases | XML, CSV | |
Statistics | ASCII, DTA, POR, SAS, SAV, R | |
Geospatial | SHP, DBF, GeoTIFF, NetCDF | |
Tabular data | CSV | |
Web archive | WARC |
A list of recommended file formats
✦ The ETH Zurich library has a list of recommended file formats for data preservation.
✦ The UK Data Service Recommended File Formats table can help you use a file format best suited to long term accessibility.
Tabular data |
Tabular data warrants special mention because it is so common across disciplines, mostly as Excel spreadsheets.
Favour open, low-tech formats
If you do your analysis in Excel, you should use the "Save As..." command to export your work to .csv format when you are done.
CSV (comma-separated values) files may not look as good as a native Excel file, but they have multiple advantages when preserving tabular data:
✦ They are simple: they can be opened and read even with a simple text editor.
✦ They are open: the development of software that can use them is not hindered by intellectual property.
✦ Being open also means they are not attached to a single software system and are compatible with many different options.
Your spreadsheets will be easier to understand and export if you follow best practices when you set them up, such as:
✦ Don't put more than one table on a worksheet
✦ Include a header row with understandable title for each column
✦ Create charts on new sheets- don't embed them in the worksheet with the data
When this is not possible, favour the most popular file format
While Word or Excel files are not open or low-tech formats, their ubiquity means that they should remain readable in the foreseeable future. You might lose some of the formatting or formulas, but some sort of compatibility should remain.
Some proprietary file formats were even developed specifically for preservation purposes: PDF/A, for example, will stand the test of time better than the average PDF.
Whenever using a specific format and software, you should always document [hyperlink to Documenting page of the guide] the version of the software you used to create, use, and save the data.
Knowledge Clip: File Formats |
From Knowledge clip: file formats [Video], by UGent Open Science, 2020, Ghent University. (https://www.youtube.com/watch?v=kxxlQnc8u1I). CC BY.
08 8946 7016
+61 4 8885 0811 (text only)