Archives Banner

Preferred File Formats for Electronic Records

The Archives can accept many electronic file formats. In the case of older or specialized formats, however, it may not be possible to preserve the complete functionality of individual files. Archival best practices suggest that successful long-term preservation is more likely when file formats have the following characteristics:

  • complete and open documentation
  • platform-independence
  • non-proprietary (vendor-independent)
  • no “lossy” or proprietary compression
  • no embedded files, programs or scripts
  • no full or partial encryption
  • no password protection

The following table identifies various formats with high, medium, or low probability for long-term preservation.

High-confidence formats are non-proprietary, open-source, uncompressed or utilize lossless compression, and are self-documenting (i.e. instructions for rendering and viewing the file is either embedded within the document itself or freely available in multiple places online). These formats can usually be viewed using multiple programs or file viewers and are most likely to be readable into the future, giving the best chance of preserving these files down the road.

Medium-confidence formats are usually some combination of proprietary, undocumented, or utilize lossy compression. However, they are currently in widespread use (i.e. are the most recent version of a particular format). These formats will usually be supported by an ongoing business need to be able to read or render them, may be viewable by universal file viewers, and as such are likely to be accessible in their native format over the short- to medium- term. These files will most likely be able to be preserved for the long term, but may require reformatting to a preservation format as time goes on.

Low-confidence formats are usually proprietary, undocumented, or utilize lossy compression. They may be obsolete or in a state of near obsolescence. These formats are not currently supported by most mainstream programs or file viewers and may not be able to be viewed at all without specialized software or hardware environments. These files may be able to be preserved over the long term, but they will need to be reformatted before they can be made accessible.

The Archives prefers to accept file formats with a greater probability for long-term preservation.

 

Document Type

High Confidence

Medium Confidence

Low Confidence

Text Documents

XML with standard DTD or link to SchemAHTML

PDF/A (ISO 19005-1) [.pdf]

Open Document Format [.odt]

Plain Text (Encoding: UTF-8 or ASCII)

PDF (other subtypes) [.pdf]

Open Office XML [.docx and .xlsx]

Rich Text Format [.rtf]

EPUB [.epub]

WordPerfect [.wpd]

Office 2003 and older [.doc and .xls]

All other text formats not listed here

E-mail

MBOX (entire folders or accounts) [.mbox]

Plain Text (UTF-8 or ASCII encoding)

EML [.eml]

Outlook individual email formats [.msg, .ost]

PDF Portfolio [.pdf]

HTML [.htm, .html]

Outlook Personal Storage Table files [.pst]

Opera Mail [.mbs]

Entourage [.rge]

All other email formats not listed here

Audio

WAV [.wav]

Audio Interchange File Format [.aif, .aiff]

 

MP3 [.mp3]

Advanced Audio Coding [.aac, .mp4, .m4a]

MIDI [.mid, .midi]

Ogg Vorbis [.ogg]

Free Lossless Audio Codec [FLAC]

 

Windows Media Audio [.wma]

RealAudio [.ra, .rm, .ram]

Protected AAC [.m4p]

All other audio formats not listed here

Video

AVI (Uncompressed, motion JPEG) [.avi]

Quicktime Movie (Uncompressed, motion JPEG) [.mov]

MPEG-4 AVC [.mp4]

Windows Media Video [.wmv]

MPEG-2 (wrapped in AVI or MOV) [.avi, .mov]

MPEG-4 (wrapped in AVI or MOV) [.avi, .mov]

Protected MPEG-4 [.m4p]

RealVideo [.rv]

All other video formats not listed here

Presentation Files
Convert to PDF

Open Office [.sxi, .odp]

Office Open XML [.pptx]

PowerPoint 2003 or earlier [.ppt]

All other presentation formats not listed here

Raster Image

TIFF (Uncompressed) [.tif]

PDF/A or PDF/X (Graphic exchange format) [.pdf]

PNG [.png]

JPEG2000 (lossless) [.jp2]

TIFF (Compressed) [.tif]

GIF [.gif]

BMP [.bmp]

Adobe Digital Negative [.dng]

RAW digital camera images [.raw]

Photoshop images [.psd]

Encapsulated PostScript [.eps]

All other raster image formats not listed here

Vector Image

Scalable Vector Graphics [.svg]

AutoCAD Drawing Interchange Format [.dxf]

Computer Graphics Metafile [.cgm]

Adobe Illustrator [.ai]

Standard CAD drawing [.dwg]

CorelDraw [.cdr]

Windows Metafile [.wmf]

All other vector image formats not listed here

Data Sets

Comma Separated Values [.csv]

Tab Delimited [usually .txt]

NOTE: All data sets should be preserved with appropriate documentation in a high-confidence text format.

Open Office Format [.sxc, .ods]

Office Open XML [.xlsx]

dBASE [.dbf]

Excel 2003 or earlier [.xls]

Microsoft Access Database [.mdb]

Lotus 1-2-3 [.wks]

All other database/spreadsheet formats not listed here