Converting Mbox mailboxes to Maildir format

Overview

Maildir is a structure for directories of incoming mail messages. It solves the reliability problems that plague mbox files. A machine may crash while it is delivering a message. For both mbox files this means that the message will be silently truncated. Even worse: if the message is truncated in the middle of a line, it will be silently joined to the next message. The mail transport agent will try again later to deliver the message, but it is unacceptable that a corrupted message should show up at all. In maildir, every message is guaranteed complete upon delivery.

A machine may have two programs simultaneously delivering mail to the same user. The mbox format require the programs to update a single central file. If the programs do not use some locking mechanism, the central file will be corrupted. There are several mbox locking mechanisms, none of which work portably and reliably. In contrast, in maildir, no locks are ever necessary. Different delivery processes never touch the same file.

A user may try to delete messages from his mailbox at the same moment that the machine delivers a new message. For mbox formats, the user's mail-reading program must know what locking mechanism the mail-delivery programs use. In contrast, in maildir, any delivered message can be safely updated or deleted by a mail-reading program.

The Maildir Structure

A directory in maildir format has three subdirectories, all on the same filesystem: tmp, new, and cur.

Each file in new is a newly delivered mail message. The modification time of the file is the delivery date of the message. The message is delivered without an extra UUCP-style From_ line, without any >From quoting, and without an extra blank line at the end. The message is normally in RFC 822 format, starting with a Return-Path line and a Delivered-To line, but it could contain arbitrary binary data. It might not even end with a newline.

Files in cur are just like files in new. The big difference is that files in cur are no longer new mail: they have been seen by the user's mail-reading program.

How a Message is delivered

The tmp directory is used to ensure reliable delivery, as discussed here. A program delivers a mail message in six steps.

It chdir()'s to the maildir directory.

It stat()'s the name tmp/time.pid.host, where time is the number of seconds since the beginning of 1970 GMT, pid is the program's process ID, and host is the host name.

If stat() returned anything other than ENOENT, the program sleeps for two seconds, updates time, and tries the stat()again, a limited number of times.

The program creates tmp/time.pid.host.

The program NFS-writes the message to the file.

The program link()s the file to new/time.pid.host. At that instant the message has been successfully delivered.

The delivery program is required to start a 24-hour timer before creating tmp/time.pid.host, and to abort the delivery if the timer expires. Upon error, timeout, or normal completion, the delivery program may attempt to unlink() tmp/time.pid.host.

NFS-writing means (1) as usual, checking the number of bytes returned from each write() call; (2) calling fsync() and checking its return value; (3) calling close() and checking its return value. (Standard NFS implementations handle fsync() incorrectly but make up for it by abusing close().)

How a Message is read

A mail reader operates as follows: It looks through the new directory for new messages. Say there is a new message, new/unique. The reader may freely display the contents of new/unique, delete new/unique, or rename new/unique as cur/unique.

The reader is also expected to look through the tmp directory and to clean up any old files found there. A file in tmp may be safely removed if it has not been accessed in 36 hours.

It is a good idea for readers to skip all filenames in new and cur starting with a dot. Other than this, readers should not attempt to parse filenames.

Environment Variables

Mail readers supporting maildir use the MAILDIR environment variable as the name of the user's primary mail directory.

Converting Mbox mailboxes to Maildir format

Mb2md.pl (mb2md-3.10) does not only convert mailbox files into a Maildir but also the /var/spool/mail/$USER mailspool file. It is smart enough to not transfer a dummy message such as the UW IMAPD puts at the start of Mbox mailboxes - and you could add your own search terms into the script to make it ignore other forms of dummy first message.

Run this as the user of the mailboxes, not as root.

mb2md -h
mb2md -m [-d destdir]
mb2md -s sourcedir [-R|-f somefolder] [-d destdir] [-r strip_extension]

-m            If this is used then the source will
               be the single mailbox at /var/spool/mail/zahn for
               user zahn and the destination mailbox will be the
               "destdir" mailbox itself.

-s source     Directory, relative to the user's home directory,
               which is where the the "somefolders" directories are
              located. Or if directory starts with a "/" it is
              taken as a absolute path, e.g. /mnt/oldmail/user

               or

               A single mbox file which will be converted to
               the destdir.

-R            If defined, do not skip directories found in a mailbox
               directory, but runs recursively into each of them,
               creating all wanted folders in Maildir.
               Incompatible with '-f'

-f somefolder Directories, relative to "sourcedir" where the Mbox files
               are. All mailboxes in the "sourcedir"
               directory will be converted and placed in the
               "destdir" directory. (Typically the Inbox directory
               which in this instance is also functioning as a
               folder for other mailboxes.)

               The "somefolder" directory
               name will be encoded into the new mailboxes' names.
               See the examples below.

               This does not save an UW IMAP dummy message file
               at the start of the Mbox file. Small changes
               in the code could adapt it for looking for
               other distinctive patterns of dummy messages too.

               Don't let the source directory you give as "somefolders"
               contain any "."s in its name, unless you want to
               create subfolders from the IMAP user's point of
               view. See the example below.

               Incompatible with '-R'

-d destdir    Directory where the Maildir format directories will
              be created. If not given, then the destination will
              be ~/Maildir .
               Typically, this is what the IMAP server sees as the
               Inbox and the folder for all user mailboxes.
               If this begins with a '/' the path is considered to be
               absolute, otherwise it is relative to the users
               home directory.

-r strip_ext If defined this extension will be stripped from
               the original mailbox file name before creating
               the corresponding maildir. The extension must be
               given without the leading dot (".").

We have a bunch of directories of Mbox mailboxes located at: /home/zahn/oldmail/

/home/zahn/oldmail/fffff
/home/zahn/oldmail/ggggg
/home/zahn/oldmail/xxx/aaaa
/home/zahn/oldmail/xxx/bbbb
/home/zahn/oldmail/xxx/cccc
/home/zahn/oldmail/xxx/dddd
/home/zahn/oldmail/yyyy/huey
/home/zahn/oldmail/yyyy/duey
/home/zahn/oldmail/yyyy/louie

With the UW IMAP server, fffff and ggggg would have appeared in the root of this mail server, along with the Inbox. aaaa, bbbb etc, would have appeared in a folder called xxx from that root, and xxx was just a folder not a mailbox for storing messages.

We also have the mailspool Inbox at: /var/spool/mail/zahn

To convert these, as user zahn, we give the first command:

mb2md -m

Converting /var/spool/mail/zahn to maildir: /home/zahn/Maildir
Source Mbox is /var/spool/mail/zahn
Target Maildir is /home/zahn/Maildir
20 messages.

The main Maildir directory will be created if it does not exist. It has the following subdirectories:

/home/zahn/Maildir/tmp/
/home/zahn/Maildir/new/
/home/zahn/Maildir/cur/

Then /var/spool/zahn file is read, split into individual files and written into /home/zahn/Maildir/new/ .

Now we give the second command:

cd /home/zahn
mb2md -s oldmail -R

convertit(): Converting fffff in /home/zahn/oldmail/ to /home/zahn/Maildir/.fffff
destination = .fffff
Source Mbox is /home/zahn/oldmail//fffff
Target Maildir is /home/zahn/Maildir/.fffff
Dummy mail system first message detected and not saved.
.....
.....

This reads recursively all Mbox mailboxes and creates:

/home/zahn/Maildir/.fffff/
/home/zahn/Maildir/.ggggg/
/home/zahn/Maildir/.xxx/
/home/zahn/Maildir/.xxx.aaaa/
/home/zahn/Maildir/.xxx.bbbb/
/home/zahn/Maildir/.xxx.cccc/
/home/zahn/Maildir/.xxx.aaaa/
/home/zahn/Maildir/.yyyy/
/home/zahn/Maildir/.yyyy.huey/
/home/zahn/Maildir/.yyyy.duey/
/home/zahn/Maildir/.yyyy.louie/

The result, from the IMAP client's point of view is:

Inbox -----------------
       |
       | fffff -----------
       | ggggg -----------
       |
       - xxx -------------
       |   | aaaa --------
       |   | bbbb --------
       |   | cccc --------
       |   | dddd --------
       |
       - yyyy ------------
            | huey -------
            | duey -------
            | louie ------

Note that although ~/Maildir/.xxx/ and ~/Maildir/.yyyy may appear as folders to the IMAP client the above commands to not generate any Maildir folders of these names. These are simply elements
of the names of other Maildir directories. (if you used '-R', they whill be able to act as normal folders, containing messages AND folders)

If you want to convert mailboxes that came for example from a Windows box than you might want to strip the extension of the mailbox name so that it won't create a subfolder in your mail clients view.

Example

You have several mailboxes named Trash.mbx, Sent.mbx, Drafts.mbx If you don't strip the extension "mbx" you will get the following hierarchy:

Inbox
     |
      - Trash
     |       | mbx
     |
      - Sent
     |       | mbx
     |
      - Drafts
             | mbx

This is more than ugly, just use:

mb2md -s oldmail -r mbx

Note: don't specify the dot! It will be stripped off automagically.

Version: mb2md-3.10 (requires TimeDate!)