A Stable Package for Email in Multiple Formats

Mailbag Specification Mailbagit Github

The Mailbag project is a draft specification and mailbagit open source tool for preserving email archives using multiple formats, such as MBOX, PDF, and WARC.

Currently there is no single effective preservation format, so the Mailbag approach is to preserve multiple formats in a stable and computer-actionable package. MBOX or EML files provide structured access for computational use, PDF files preserve the document-like rendering of email well and provide easy dissemination, and web archives preserve the potential interactivity of email HTML and CSS, as well as embedded and linked Web content from external sources.

The Mailbag specification is an extension of the Bagit specification. A mailbag is a special type of “bag,” with designated storage for common email exports like MBOX or PST, PDF files, and Web Archives. Mailbags also contain specific metadata about its contents to enable them to be computer actionable, as well as limited serialized email header data.

Many of the tools available for email processing are also challenging for many archivists to use. Email also must be processed near-to-capture, to ensure that content hosted on external servers is not lost. The mailbagit tool enables archivists to rapidly process email archives and package them into Mailbags. A basic graphical user interface (GUI) lowers the barrier for this work.

An overview diagram of a Mailbag and its use by the mailbag tool.


This project was made possible by funding from the University of Illinois’s Email Archives: Building Capacity and Community Project.

We owe a lot to the hard work that goes towards developing and maintaining the libraries mailbagit uses to parse email formats and make bags. We’d like to thank these awesome projects, without which mailbagit wouldn’t be possible:

We’d also like to thank the RATOM project whose documentation was super helpful in guiding us though some roadblocks.

Hosted by the M.E. Grenander Department of Special Collections & Archives, University at Albany, SUNY