dctrud's Random Road

Occasional unimportant nonsense.

2015-10-21 - Fixing Email Archives

GMail Import Dates & MIME Multipart Problems

I have a collection of email that has followed me around since the mid-2000s, moving between web hosts, GMail, Outlook.com and iCloud as they've added useful features, I've moved from Android to iOS etc. I've come across 2 problems as i've moved mail around over the years:

GMail Import Mangled Dates

When you move from another IMAP account into GMail, using the GMail online account import tool, everything looks great. Unfortunately when later copying mail somewhere else (e.g. iCloud) the received dates may show up incorrectly.

To fix:

1) Download email from Google Takeout - gives you an mbox file containing all mail. Unfortunately this loses any folder structure, but never mind.

2) Examine the mbox file. The messages imported by gmail will have additional headers, inserted at the top of each mail, e.g.

   From 1245753982836402098@xxx
	Sat Aug 25 12:06:17 +0000 2007
   Delivered-To: xxxxx@gmail.com
   Received: by with SMTP id l184csp149097iof;
	Mon, 21 Sep 2015 16:49:31 -0700 (PDT)
   X-Received: by with SMTP id t32mr30219550ioe.173.1442879371734;
	Mon, 21 Sep 2015 16:49:31 -0700 (PDT)
   Received: from 303668833448.apps.googleusercontent.com
	named unknown
	by gmailapi.google.com
	Mon, 21 Sep 2015 19:49:31 -0400
   Received: from web38814.mail.mud.yahoo.com (
	by spam2.34sp.com with SMTP; 25 Aug 2007 13:13:01 +0100

The original Received header shows the message was received /25 Aug 2007/. Unfortunately clients other than gmail will display the /21 Sep 2015/ date in the Received headers added by gmail.

To fix this remove the Gmail headers from each message in the mbox file. Can be accomplished with some creative regex e.g. in sublime text. The headers vary a little between imports I've seen. The fixed mbox file can then be imported into a mail client.

Fixing MIME errors

After using a number of different clients to move mail around over several years (Thunderbird, Outlook, OSX Mail, Windows Live Mail) I've often seen some MIME multipart emails, with HTML and attachments become broken. The message is moved or copied but will no longer display properly - I see the raw source of all the MIME parts, cannot view attachments etc.

The problem turns out that somehow the clients or servers inserted spurious /Content-Type/ headers, above the original /Content-Type/ header for the MIME mail. The added header prevents the message being read correctly.

An example:

    From: xxxxxxx <xxxxxx@gmail.com>
    Sender: <xxxx@gmail.com>
    To: "xxxxxx"
    References: <df696fdc-b801-4be8-bcb0-f3d59169dcba@SwitchService>
    Date: Wed, 11 Sep 2013 16:19:50 -0500
    Message-ID: <06460AE9-3D64-4422-88A2-C05D869A22BC@xxxxxxxxx>
    MIME-Version: 1.0
    Content-Type: text/plain; charset="iso-8859-1"
    Content-Transfer-Encoding: 7bit
    X-Mailer: Microsoft Outlook 15.0
    Content-Language: en-us
    X-Google-Sender-Auth: 1IItZObPtrZmxbo-5dealV_naxQ
    Content-type: multipart/alternative;

    > This message is in MIME format. Since your mail reader does not
    this format, some or all of this message may not be legible.

    Content-type: text/plain;
    Content-transfer-encoding: 7bit

In an email client I see all the source, from the original MIME Content-Type header...

    Content-type: multipart/alternative;

... which is being overridden by the header further up, which has crept in at some point:

    MIME-Version: 1.0
    Content-Type: text/plain; charset="iso-8859-1"
    Content-Transfer-Encoding: 7bit

Once this block is removed the message is correctly recognized as MIME multipart, and displays properly. This can also be fixed in an mbox file with some find and replace.

Index of Posts