Thursday, 1 March 2012


Contact! Populating the Outlook Address Book

A standalone tool for scanning Outlook emails and extracting names and addresses into the Address Book.


I recently formatted the hard disk of an ageing PC with an ancient Win-XP install, then re-added a fresh copy of Windows. While this put back much needed spring in the performance step of the machine, it was followed on by the tedious process of re-adding all third party applications and copying back user data.

Among these was Microsoft's venerable Office 2003 software suite, notable as the last in the series to feature the traditional tool and menu bars (rather than the Ribbon UI of Office 2007 and later). The user had never populated the Outlook address book; instead (if unknowingly) relying on the NickName list, a file automatically updated with names and addresses by Outlook from the To: and CC/BCC: fields of outgoing messages.
This file, labelled <Outlook Profile Name>.NK2, resides in the %appdata%\Microsoft\Outlook folder – on the XP machine this resolved to:

c:\documents and settings\UserName\Application Data\Microsoft\Outlook\User Name.NK2

The size of the file copied from the previous install was about 400KB, and the user of the machine made regular use of the AutoCompletion feature that the NickName list facilitates. Unfortunately, it didn't survive the transfer. It's still not clear what went wrong, but when the NickName file was copied to the new Windows install, Outlook ignored it (or wouldn't work with it). Stories abound about NK2 file corruption (http://bit.ly/yMdivj), so perhaps this is to blame, although it went unnoticed during the previous incarnation of Windows/Outlook on the same PC.

So the user was faced with the onerous prospect of starting again with an empty NickName list, and slowly re-growing it over time.


Delphi to the rescue

I've been peripherally aware of the Office controls under the “Servers” tab in Delphi for a while, but I've never had cause to investigate further. This seemed like a good opportunity to see what was possible.

My intention was to bypass the NickName file completely. It isn't something you can manually update, there are no user tools for manipulating it, and it doesn't appear to have a very good reputation for reliability. Instead I wanted the user to become familiar with the Outlook Address Book.

This is a slightly confusing area because there are some related terms that are often used interchangeably, but mean different things. For example, some people talk about the “Address Book”, while others call the same the “Contact List”. Microsoft provides this definition (http://bit.ly/x0nRa4):

Think of the Address Book as a container of individual address books or address lists, such as the following:
  • Contacts - The address book where you keep your personal addresses. In the Address Book dialog box, Contacts appear under Outlook Address Book. Contacts are stored in a .pst data file and support unicode.
  • Personal Address Book - The predecessor to Contacts in earlier versions of Outlook that uses older technology and doesn't support unicode. The Personal Address Book is stored in a .pab data file.
  • LDAP Internet directories - Think of LDAP as a White Pages for the Internet. lobal Address List (GAL) - If your organization uses Microsoft Exchange Server e-mail accounts, the GAL displays names of people in your organization.
  • Third-party address books.

So the “Address Book” is an over-arching container for other lists that contain details of people you know about – and one of these is the Contact list represented as a folder under the “Folder List” pane in Outlook.

Although importing the old NickName list failed, copying all Person Folder (.pst) files had worked without problems. So we had a store of all sent and received emails – the raw data for entries in our new Address Book if you like – but no simple way of getting this information into the Address Book short of copying and pasting it all in, one entry at a time.

My intention then, using Delphi and COM, was to create a tool which would iterate over every email in the Personal Folders store extracting names and email addresses as it went, then import each of these values into the Contacts list.


The tools

Because the need was pressing, I had very little time to decide how to go about creating the tool. There are some useful tutorials and code snippets scattered about the web, but nothing that dealt with my specific requirements. In the event I went for a quick 'n' dirty solution using Delphi MAPI (MAPI definition: http://bit.ly/gBAAaY) header translations from dimastr.com, home of Outlook Redemption (also written in Delphi) http://bit.ly/y8wbkw (see the “Extended MAPI headers for Borland Delphi” item).

As a result the tool eschews the Delphi-provided MS-Office Type Library files (Outlook2000.pas etc.), and instead uses OLEVariants and late-binding. At the time this was for no other reason than that in creating a few simple proof-of-concept apps, I found the late-binding technique quicker to get to grips with (perversely). But thinking further ahead, it would hopefully make the tool more portable to other Outlook versions.

There was another consideration that swayed the balance somewhat, which was that with the dimastr headers and late-binding, running the tool doesn't trigger Outlook's Security Alert (via the Object Model Guard) dialog (“Another process is trying to access e-mail addresses you have stored in Outlook”). Compiling against the Delphi Outlook Type Library files always did. I haven't looked at which classes or combinations of classes might be responsible for tripping the alert, so it may be that this is just as easy to mitigate in the latter instance.

Early-binding vs late-binding

http://bit.ly/zB0Z8P
http://bit.ly/A7G0O2


Exploring the UI

It's all very plain and utilitarian, but it does the job. The main form is divided into three vertically stacked components: a Listview at the top (labelled “1” in the image), which is used to display all names and email addresses found (plus an index, indicating the order in which they were discovered), a RichEdit in the middle (labelled “2” in the image) for giving feedback about any errors encountered during the scan, and at the bottom a panel with buttons (labelled “3”, “4”, and “5”) for controlling the application.

There's also a Statusbar at the foot of the main form, which gives a running count of the total number of names and addresses found, and the name of the folder currently being explored (when the application is in its “running” state).

Tool main form at the end of a scan (with email addresses blanked out)

The Listview can be sorted by Index, Sender name, or Sender email address (sorting is by Sender name in the screen shot).

Where a duplicate contact is found it is ignored and reported in the RichEdit (a “duplicate” is any contact with the same Sender name/Sender email address as a previously seen contact. Note that this means a contact with a different Sender name, but an email address found before – or visa versa - is not considered a duplicate).

Where an invalid email address is found (using a web procured algorithm that isn't quite RFC 5322 compliant http://tools.ietf.org/html/rfc5322 – but works in most instances – for example, it will reject email addresses with a '+' symbol in the “local” part of any address, the segment before the '@' sign, which is incorrect behaviour, but this can be a tricky problem: http://bit.ly/16XkKb), it is also reported to the RichEdit and not inserted into the Address Book.

There are two buttons in the panel to the lower-left of the main form labelled “3” and “4” in the first screen shot.

Button “3” has the caption “Excluded Contacts”. Clicking this button displays a modal child form with a Memo component and two buttons of its own - “OK” and “Cancel”.
By entering names or email addresses into the Memo, one name or address per line, it is possible to build up a “block list” of contacts that will not be added to the Address Book if found. Wild card symbols are supported (“?” as a place-holder for an individual character, or “*” as a place-holder for multiple characters), so a single entry can be used to cover multiple contact variations, e.g., “John D*” would block any contacts with names including “John Doe”, “John Dean”, “John Dillinger” etc.

Excluded contacts form with two entries

Here we see two added “block list” entries, “*@hotmail.com” and “Amazon*”, which would prevent any contacts with a hotmail.com email address, or Sender name beginning “Amazon” from being added to the Address Book.

Button “4” has the caption “Excluded Folders”. Clicking this button will also show a modal child form, and like the “Excluded Contacts” form, this contains a Memo and “OK” and “Cancel” buttons. This form implements another sort of “block list”, this time for folders in the Outlook “Folder List” pane. By adding entries to the Memo – again, one entry per line – folders enumerated during the scan with a name matching a “block list” entry will not be searched for emails.
As above, wild card symbols can be used, so adding a folder name such as:
Personal Folders.I?box
- would block email searching in folders named “Inbox” and “IMBox”.
Excluded folders form with 6 blocked items

Here we see six block list entries, preventing email searching in any of the folders “Deleted Items”, “Calendar”, “Contacts”, “Journal”, “Notes”, and “Tasks”. Folders must be “fully qualified”, that is, be written to include all ancestor folders in a pre-amble to the folder name, e.g., “Personal Folders.Delete Items”.

The block lists can be used in combination, to exclude entire folders and/or names and email addresses.

Button 5, with caption Populate Outlook Contacts, begins the scanning process. Below, two buttons with captions “Pause” and “Stop” are disabled when no scan is in progress, but become enabled once a scan is underway. The Pause button temporarily suspends a running scan (until it is pressed again); the Stop button halts a running scan.
Contacts are added to the Address Book as they are discovered, so Stop-ping a running scan will prevent additional contacts from being added, but any found prior to that event will already have been inserted and saved.

Code Listing
Event handler for the "Populate Outlook Contacts" button click event

procedure TfrmMain.btnAddAddressesClick(Sender: TObject);
var
  OutlookApp, OLFolderList, OLFolder: OleVariant;
  OLContactList: Variant;
  slDupList: TStringList;
  TotalAddresses: Integer;
begin
  redtStatusMsgs.Clear;
  TotalAddresses := 0;
  SetControlsEnabled(Self, True, []);
  SetControlsEnabled(Self, False, ['btnPause', 'btnStop']);
  slDupList := nil;
  OutlookApp := Unassigned;
  MAPIInitialize(nil);
  try
    try
      slDupList := TStringList.Create;
      slDupList.Sorted := True;
      slDupList.Duplicates := dupIgnore;

      OutlookApp := CreateOleObject('Outlook.Application');
      if VarIsEmpty(OutlookApp) then begin
        StatusMessage('Creating Outlook object: aborting.', tstError);
        Exit;
      end;

      // Get the Items collection from the Contacts folder
      OLContactList := OutlookApp.GetNameSpace('MAPI').Folders('Personal Folders').Folders('Contacts').Items;
      if VarIsEmpty(OLContactList) then begin
        StatusMessage('Getting Contacts object: aborting.', tstError);
        Exit;
      end;

      // Iterate folders
      OLFolderList := OutlookApp.GetNameSpace('MAPI').Folders;
      if VarIsEmpty(OLFolderList) then begin
        StatusMessage('Getting Folders object: aborting.', tstError);
        Exit;
      end;
      // Get list of folders below the "root" folder (i.e. "Personal Folders")
      OLFolder := OLFolderList.GetFirst;

      IterateFolders(OLFolder, OLContactList, slDupList, TotalAddresses);

      redtStatusMsgs.Lines.Add('');
      StatusMessage('Total contacts added: ' + IntToStr(slDupList.Count), tstStatus);
    except
      on E: SysUtils.Exception do
        StatusMessage('(' + E.ClassName + ') ' +  E.Message, tstError);
    end;
  finally
    MAPIUninitialize;
    FreeAndNil(slDupList);
    // Close Outlook
    OutlookApp := Unassigned;
    SetControlsEnabled(Self, False, []);
    SetControlsEnabled(Self, True, ['btnPause', 'btnStop']);
  end;
end;

The main application loop

function TfrmMain.IterateFolders(const Folder: OleVariant; const OLContactList: Variant; const slDupeList: TStringList; var TotalAddresses: Integer): Boolean;
var
  OLFolderList, OLFolder, OLFolder2: OleVariant;
  OLItemList, OLItem, OLContact: Variant;
  sSender, sAddress, sPath: String;
  i, j, k: Integer;
  bDupe, bBadAddress, bExcludedContact: Boolean;
  intrfTemp: IInterface;
  liEntry: TListItem;
begin
  Result := False;
  try
    // Get list of folders below Folder
    OLFolderList := Folder.Folders;

    for i := 1 to OLFolderList.Count do begin
      // Get a MAPIFolder
      OLFolder := OLFolderList.Item(i);
      if VarIsEmpty(OLFolder) then begin
        StatusMessage('Getting Folder (' + IntToStr(i) + ') object: aborting.', tstError);
        Exit;
      end;

      // Print description
      stat1.Panels[1].Text := OLFolder.Name;

      // Check whether this is an "excluded" folder
      sPath := OLFolder.Name;
      OLFolder2 := OLFolderList.Parent;
      while (Pointer(IDispatch(OLFolder2.Parent)) <> nil) do begin
        // MAPIFolder
        if Supports(OLFolder2, StringToGUID('{00063006-0000-0000-C000-000000000046}'), intrfTemp) then
          sPath := OLFolder2.Name + '.' + sPath;
        OLFolder2 := OLFolder2.Parent;
      end;
      if IsExcludedFolder(sPath) then begin
        StatusMessage('(IGNORED) excluded folder: ' + sPath, tstStatus);
        Continue;
      end;

      // Recursively iterate through sub-folders
      if OLFolder.Folders.Count > 0 then begin
        Result := IterateFolders(OLFolder, OLContactList, slDupeList, TotalAddresses);
        if not Result then
          Exit;
      end;
      
      // Iterate over folder items (i.e., emails)
      OLItemList := OLFolder.Items;

      for j := 1 to OLItemList.Count do begin
        // Running total
        stat1.Panels[3].Text := IntToStr(TotalAddresses);
        Inc(TotalAddresses);

        if j mod 50 = 0 then
          Application.ProcessMessages;

        while FPaused do begin
          Sleep(50);
          Application.ProcessMessages;
          if Application.Terminated or FStopped then
            Exit;
        end;

        if FStopped then
          Exit;

        // A _MailItem
        OLItem := OLItemList.Item(j);
        sSender := GetProperty(OLItem, PR_SENDER_NAME);
        sAddress := GetProperty(OLItem, PR_SENDER_EMAIL_ADDRESS);

        // Ignore duplicates, any messages without a valid email address, or any names or addresses in the "excluded" lists
        bBadAddress := False;
        bExcludedContact := False;
        bDupe := slDupeList.Find(LowerCase(sAddress) + '=' + LowerCase(sSender), k);
        if not bDupe then begin
          bExcludedContact := IsExcludedContact(sSender) or IsExcludedContact(sAddress);
          if not bExcludedContact then
            bBadAddress := not IsValidEmail(sAddress);
        end;

        if bDupe or bBadAddress or bExcludedContact then begin
          if bDupe then
            StatusMessage('(DUPLICATE) ' + sSender + ', ' + sAddress, tstStatus)
          else if bBadAddress then
            StatusMessage('(INVALID) ' + sSender + ', ' + sAddress, tstWarning)
          else
            StatusMessage('(EXCLUDED) ' + sSender + ', ' + sAddress, tstStatus);
          Continue;
        end;
        slDupeList.Add(LowerCase(sAddress) + '=' + LowerCase(sSender));

        // Print description
        liEntry := lv1.Items.Add;
        liEntry.Caption := IntToStr(lv1.Items.Count);
        liEntry.SubItems.Add(sSender);
        liEntry.SubItems.Add(sAddress);

        // The above presents a problem that can probably only be sorted out manually?
        // e.g., if two people with identical names are found and different email addresses
        // are they the same person with more than one address?
        // Or are they two separate individuals?
        //
        // If two different people share an email address?
        //
        // For simplicities sake, every unique name/address combo is considered a separate contact
        OLContact := OLContactList.Add;
        OLContact.FullName := sSender;
        OLContact.Email1Address := sAddress;

        // Save the new record.
        OLContact.Save;
      end;
    end;
  except
    on E: SysUtils.Exception do
      StatusMessage('(' + E.ClassName + ') ' +  E.Message, tstError);
  end;
  Result := True;
end;


Conclusion

That's about all there is to it. At the end of a scan – processing 5000 email addresses takes ~5 minutes on my 2.0GHz system, but involves a lot of disk activity, so this may be the chief bottleneck – the Contacts list of the Address Book will contain all names and email addresses found.
Be aware that Outlook will happily accept multiple identical Contact entries, and no provision has been made for this in the tool – we were starting from a “blank slate” so dealing with this wasn't a requirement; if you are updating an existing Contacts list, this may be something you need to consider. At present I leave this as an exercise for the reader.

Companion Files:
Tool source code and compiled file (419 KB):
http://www.mediafire.com/?62cmin1khrr2z9p

No comments:

Post a Comment