Friday, 30 March 2012

Digitally signing Delphi executables

 Background

From Windows XP Service Pack 2 onwards, any application downloaded with a mainstream web browser (Internet Explorer, Firefox, Opera) will display a Security Warning dialog when executed.




Zone Identifier

The warning dialog is shown when an Authenticode-compliant* download tool writes an Alternative Data Stream (so the the NTFS file system must be in use) for a binary file using the IAttachmentExecute interface
(http://msdn.microsoft.com/en-us/library/windows/desktop/bb776297%28v=vs.85%29.aspx).

The Save method of this interface is used to create a stream called “Zone.Identifier” and writes to it the values:

[ZoneTransfer]
ZoneID=3

– “3” indicating an Internet download source.

References:

http://bit.ly/GUqsok
http://bit.ly/HaCR0Y
http://bit.ly/GROWtT

A digitally signed application will be identified by its “Publisher” – usually a company name – and this will be listed towards the top of the dialog.
*Microsoft's “Introduction to Code Signing”
http://msdn.microsoft.com/en-us/library/ms537361%28VS.85%29.aspx

In the case of an unsigned application, Windows will automatically apply the disconcerting label “The publisher could not be verified. Are you sure you want to run this software?”, and “Unknown Publisher” in place of a company name.

 Executing an unsigned application


Running a signed application


From Windows Vista onwards (when User Account Control is enabled), an additional dialog is shown when Windows detects execution of an “Installer” under the following conditions:

Installer Detection only applies to:

  • 32 bit executables
  • Applications without a requestedExecutionLevel
  • Interactive processes running as a Standard User with UAC enabled

Before a 32 bit process is created, the following attributes are checked to determine whether it is an installer:

  • Filename includes keywords such as "install," "setup," and "update."
  • Keywords in the following Versioning Resource fields: Vendor, Company Name, Product Name, File Description, Original Filename, Internal Name, and Export Name.
  • Keywords in the side-by-side application manifest embedded in the executable.
  • Keywords in specific StringTable entries linked in the executable.
  • Key attributes in the resource file data linked in the executable.
  • Targeted sequences of bytes within the executable.

Note: The keywords and sequences of bytes were derived from common characteristics observed from various installer technologies

Reference: http://msdn.microsoft.com/en-us/library/aa905330.aspx

Again, a signed “Installer” will be identified by a company name as the “Verified publisher” of the application, and the dialog colored with prosaic hues. An unsigned “Installer” will list the Publisher as “Unknown” and be decorated with amber “hazard” colouring.

Running an unsigned "installer"


Running a signed "installer" 

This is all part of Microsoft's system for securing users from potentially malicious software where origin and integrity cannot be tested and confirmed (with a substantial recurring financial cost to software authors for participating in the system).

Newer versions of Visual Studio bundle all of the Microsoft tools necessary for digitally signing a binary file with that programming environment.
Delphi developers are required to source them elsewhere.


Getting the tools

For the purposes of this article, we're going to be generating our own private key and digital certificate for signing executables. Links at the end of the article expound upon these same ideas, detailing the steps involved in acquiring a certificate from a recognised Certificate Authority (CA).

There are limitations to a self-created certificate; as an individual you are not a part of the “chain of trust” that exists between the various issuing organisations, and as a result, any binary file signed with the certificate will not be validated by Windows (the “Unknown Publisher” message will still be shown).
However, it can be a useful for internal testing, or for preparing your build processes in anticipation of adding a CA-sourced certificate at some future date.

To generate our private key and signing certificate, four tools are needed:

makecert.exe
Cert2Spc.exe
pvkimprt.exe
signtool.exe

...and an additional DLL file, “capicom.dll”.

There are various ways of obtaining them (they used to be included in the .NET SDK, but this has since evolved into the “Windows SDK”), one option is to download the “Microsoft Windows SDK for Windows 7 and .NET Framework 4 (ISO)” - a 570MB package.
Another is the “Windows SDK for Windows Server 2008 and .NET Framework 3.5” - a whopping 1.3 GB download.

Windows SDK for Windows 7 and .NET Framework 4 (GRMSDK_EN_DVD.iso):

Windows SDK for Windows Server 2008 and .NET Framework 3.5 (6.0.6001.18000.367-KRMSDK_EN.iso):

In either case a tool capable of browsing ISOs – such as the indispensable freeware/open-source 7zip (http://www.7-zip.org/) - can be used to locate the files inside of them without any protracted CD-burning or other intermediate extraction.

If you opt for the former, open the ISO, then the Setup folder, then the WinSDKTools folder, and within that the cab1.cab file. Here you'll find 3 of the 4 tools above – extract these to a convenient location.

Within the cabinet the files are named:

WinSDK_makecert_exe_5D21BAF1_83A4_4E71_998E_FF39C36EA905_x86
WinSDK_Cert2Spc_Exe_910B78B5_FBE7_44CD_867D_0F90509859B0_x86
WinSDK_signtool_exe_B2E1011D_2F14_488D_A056_C5BD55106409_x86

Rename these to:

makecert.exe
Cert2Spc.Exe
signtool.exe

...respectively. Note, though, that the Windows 7/.NET 4 SDK does not include (as far as I can tell), the “capicom.dll” file, so this must be downloaded separately (via the webpage “Platform SDK Redistributable: CAPICOM”) – it's available (at slightly under 2MB) here:

If you choose the later option, open the ISO with a browsing tool (such as 7zip), open the Setup folder, locate the WinSDKTools-WinSDKTools-common.0.cab file, open this, and extract the four files:

capicom_dll.970E4F94_546F_49F3_BF1F_18BE6B938B02
Cert2Spc_exe.D7AE6AF7_EC98_4D5C_97F8_562E6B8AF64F
makecert_exe.E4279728_DED0_47AC_8B96_F1269703DEFB
signtool_exe.B68FF751_0B1A_4F33_B044_1871CB4B13CC

...then rename them to:

capicom.dll
Cert2Spc.exe
makecert.exe
signtool.exe

pvkimprt.exe is not available in either distribution so, also, must be downloaded separately. It can be found via the webpage “Office 2000 Tool: PVK Digital Certificate Files Importer” (http://www.microsoft.com/download/en/details.aspx?displaylang=EN&id=6563).
It's a self-extracting zip file, but again, a tool like 7zip can browse its content without having to execute it and step through the extraction wizard. When opened, extract the file “pvkimprt.exe” to a convenient location. This file is, in turn, a self-extracting zip. Open it with 7zip, and extract “pvkimprt.exe”.


Generating a certificate

With all of the required tools available, we can now generate our private key and code signing certificate.

Open a command prompt (type cmd in the Run or “Start Search” box, or use Start->All Programs/Programs->Accessories->Command Prompt), navigate to the directory containing the tools, and execute them in the following sequence (with all spaces and other punctuation as shown):
makecert.exe MyCertificateFile.cer -r -n "CN= MyCompanyName " -$ individual -sv MyPrivateKeyFile.pkv -pe -eku 1.3.6.1.5.5.7.3.3

This first step will produce a “self-signed” certificate. Substitute “MyCertificateFile.cer” with the name you want to give to the generated certificate (the name you choose is just a label, the value of the text is inconsequential to the outcome of later stages – in all instances it makes sense to retain the suggested extensions, though, for clarity); “MyCompanyName” with an appropriate value for your “Publisher” identity; “MyPrivateKeyFile.pkv” with a name to use for your private key store file.

For further information about the meaning of the command line switches added here, use the in-built help “makecert.exe /?”.

The string after the “eku” switch indicates that the file to create will be a “code signing” certificate.

When you run the tool, you will be prompted to enter a “Private Key Password” - you will need to provide this again in step 3, and each time you sign a file, so make a note of the password you choose to use here.
cert2spc.exe MyCertificateFile.cer MyCertificateFile.spc

Replace “MyCertificateFile.cer” with the name you chose to give your generated self-signed certificate from step 1, and choose a matching name (again, just for clarity – it doesn't have to match) to replace “MyCertificateFile.spc”. This will create a “Software Publisher Certificate” (SPC) from the “MyCertificateFile.cer” file – a certificate equivalent to one that would be issued from a Certificate Authority.
pvkimprt.exe -pfx MyCertificateFile.spc MyPrivateKeyFile.pkv

Replace “MyCertificateFile.spc”with the name you chose for your Software Publisher Certificate (.spc file) in step 2, and “MyPrivateKeyFile.pkv” with a name for the file to be created by Pvkimprt – a “Personal Information Exchange” (PFX) file – this is what will be used to for code signing.
Pkimprt is a GUI wizard. When you run it you'll first be asked to enter your “Private Key Password” - the password you selected for step 1.

None of the default wizard values need to be changed, so click through each successive form by clicking Next (x3), then on the “Password” form, enter your Private Key Password twice more, and click Next again.

On the “File to Export” form, click the Browse button, navigate to the directory containing the signing tools, and type a name for the PFX file – e.g., “MyPFXFile.pfx”, then click Next.
On the final wizard form, “Completing the Certificate Export Wizard”, click the Finish button to complete the procedure.

As a result of these steps, you should now have four new files labelled something like:

MyPrivateKeyFile.pkv
MyCertificateFile.cer
MyCertificateFile.spc
MyPFXFile.pfx


Code signing

To use our newly generated certificate, we need signtool.exe and a target file. Suppose we have a binary executable named “MyProgram.exe” - the syntax for signing this file would be:
signtool.exe sign /f MyPFXFile.pfx /p MyPrivateKeyPassword /v /t http://timestamp.verisign.com/scripts/timstamp.dll MyProgram.exe

- where “MyPrivateKeyPassword” is the password you chose in step 1 (note also that “timstamp.dll” is not a misspelling).


Automating the process

To make things easier and avoid having to re-type long commands in the case of a syntax error, it makes sense to string the tool commands together in a batch file. See the listing below for two batch files based upon the text of the steps listed above.

Generating a private key, self-signed certificate, and PFX file:

MakeCert.bat
@ECHO OFF

makecert.exe MyCertificateFile.cer -r -n "CN= MyCompanyName " -$ individual -sv MyPrivateKeyFile.pkv -pe -eku 1.3.6.1.5.5.7.3.3


cert2spc.exe MyCertificateFile.cer MyCertificateFile.spc

pvkimprt.exe -pfx myCertificateFile.spc myPrivateKeyFile.pkv


Signing a binary file:

SignBinary.bat
@ECHO OFF

signtool.exe sign /f MyPFXFile.pfx /p MyPrivateKeyPassword /v /t http://timestamp.verisign.com/scripts/timstamp.dll %1


To use the second batch file with a named binary, launch it with a parameter – the name of the target binary file, e.g., “SignBinary.bat MyProgram.exe”.


References





Code Signing for Developers – a Delphi-orientated article covering the same sort of territory, but addressing (in addition) sourcing and using a Certificate Authority issued certificate:
http://www.wiscocomputing.com/articles/code-signing.htm

A Microsoft article explaining different methods for sourcing a code signing certificate: http://technet.microsoft.com/en-us/library/cc732597%28v=ws.10%29.aspx#BKMK_Anchor3


How to create your own certificate for signing ClickOnce manifests and Strong Naming assemblies (highly recommended!):
http://www.uphillriver.com/CreateYourOwnCertificate.aspx

All About Authenticode (highly recommended!):
http://www.tech-pro.net/authenticode.html

Code Signing for Developers - An Authenticode How-To (highly recommended!):
http://www.tech-pro.net/code-signing-for-developers.html

Thursday, 8 March 2012

Snippets for the code bank pt. 2

A follow-up to last months post (“Snippets for the code bank” @ http://bit.ly/xwIV51), below is an assortment of functions and definitions from my Inno Setup/Rem Objects Pascal Script grab-bag. Some of them are inter-dependent, so it's important to be aware that because none are member methods, where Function A is used by Function B, Function A must be declared first.

Again, none of this is complicated or pioneering, but each function was designed to fill a specific need that existed at the time – one that isn't covered by the wide range of methods provided with the Pascal Script language library.

~
 
In all instances, where a call is made to a routine that is not part of the stock Pascal Script library, the listing for the routine can be found elsewhere in this or another of the "Code Bank" posts.

All references to "Common.iss" refer to the unit from which these functions are taken. Modify this in your own script to match the name of the unit in which you use it (e.g., for debugging purposes).


Stringlist routines

//------------------------------------------------------------------------------
// Delete duplicate strings from List
//
procedure RemoveDuplicates(const List: TStringList);
var
  TempList: TStringList;
  i, j: Integer;
begin
  if List = nil then
    Exit;

  TempList := TStringList.Create;
  try
    TempList.Sorted := True;
    i := 0;
    while i <= List.Count - 1 do begin
      if TempList.Find(List[i], j) then
        List.Delete(i)
      else begin
        TempList.Add(List[i]);
        Inc(i, 1);
      end;
    end;
  finally
    TempList.Free;
    TempList := nil;
  end;
end;

//------------------------------------------------------------------------------
// Explode a delimited string (S) into a list of strings using a separator (Sep)
// (n.b., Untested)
//
procedure StrToStrings(const S, Sep: String; const List: TStringList; const AllowBlank: Boolean);
var
  I, J, L, M: Integer;
  Left: String;
begin
  if List <> nil then
    List.Clear;

  if (List = nil) or (S = '') or (Sep = '') then
    Exit;

  L := Length(Sep);
  M := Length(S);
  I := 1;
  J := PosEx(Sep, S, 1);
  while (J > 0) do begin
    Left := Copy(S, I, J - I);
    if (Left <> '') or AllowBlank then
      List.Add(Left);
    I := J + L;
    J := PosEx(Sep, S, I);
  end;
  if I < M then begin
    Left := Copy(S, I, M - (I - 1));
    if (Left <> '') or AllowBlank then
      List.Add(Left);
  end;
end;


String search routines

//------------------------------------------------------------------------------
// Search for a String (Value) in an array of Strings (StrArray), return the
// array index if found, else -1
//
function StrIndex(const Value: String; const CaseSensitive: Boolean; const StrArray: array of String): Integer;
var
  i, Hi, Lo: Integer;
  Found: Boolean;
  SearchTerm: String;
begin
  Result := -1;
  Lo := 0;
  Hi := GetArrayLength(StrArray);
  Found := False;
  if not CaseSensitive then
    SearchTerm := LowerCase(Value)
  else
    SearchTerm := Value;

  for i := Lo to Hi - 1 do begin
    if CaseSensitive then
      Found := StrArray[i] = SearchTerm
    else
      Found := LowerCase(StrArray[i]) = SearchTerm;
    if Found then begin
      Result := i;
      Break;
    end;
  end;
end;

//------------------------------------------------------------------------------
// Get the index of SubStr in MainStr starting from a specified offset
//
function PosEx(const SubStr, MainStr: String; Offset: Integer): Integer;
var
  Loop, MStrLen, SStrLen: Integer;
  TmpStr: String;
begin
  Result := 0;
  MStrLen := Length(MainStr);
  SStrLen := Length(SubStr);

  if (SubStr = '') or (MainStr = '') or (Offset <= 0) or (Offset > (MStrLen - SStrLen)) then
    Exit;

  if Offset = 1 then
    Result := Pos(SubStr, MainStr)
  else begin
    Loop := Offset;
    while Loop <= MStrLen do begin
      if MainStr[Loop] = SubStr[1] then begin
        TmpStr := Copy(MainStr, Loop, SStrLen);
        if TmpStr = SubStr then begin
          Result := Loop;
          Break;
        end;
      end;
      Inc(Loop, 1);
    end;
  end;
end;

//------------------------------------------------------------------------------
// The same as Pos, but searching begins from the right of the string, rather
// than the left e.g., calling: RPos('C', 'ABCD') will return index 3, but the
// search begins at character 'D' in the string 'ABCD' and searches towards the
// beginning of the string.
//
function RPos(const Str1, Str2: String; const CaseInsensitive: Boolean): Integer;
var
  Str1Len, Str2Len: Integer;
  LStr1, LStr2: String;
begin
  Result := 0;
  Str1Len := Length(Str1);
  Str2Len := Length(Str2);
  if (Str1Len = 0) or (Str2Len = 0) or (Str1Len > Str2Len) then
    Exit;

  LStr1 := Str1;
  LStr2 := Str2;
  if CaseInsensitive then begin
    LStr1 := Lowercase(LStr1);
    LStr2 := Lowercase(LStr2);
  end;

  LStr1 := ReverseString(LStr1);
  LStr2 := ReverseString(LStr2);
  Result := Pos(LStr1, LStr2);
  if Result <> 0 then
    Result := Str2Len - (Result - 1) - (Str1Len - 1);
end;

var
  FTokenIndex: Integer;
  FTokenString: String;

//------------------------------------------------------------------------------
// Find the first character from offset 'Start' that is NOT a blank character
//
const
  TokenSeparator = ' ';

function FindNextNonBlank(const Value: String; const Start: Integer): Integer;
var
  i, ValLen: Integer;
begin
  Result := 0;
  ValLen := Length(Value);
  if (Value = '') or (Start <= 0) or (Start > ValLen) then
    Exit;

  i := Start;
  while (i <= ValLen) and (Value[i] = TokenSeparator) do
    Inc(i, 1);

  if (i = ValLen) and (Value[i] = TokenSeparator) then
    //
  else
    Result := i;
end;

//------------------------------------------------------------------------------
// Return a token from global variable 'FTokenString' - SetTokenString should be called ahead of
// this function to set the FTokenString value and reset the FTokenIndex variable.
// PeekToken does not update the FTokenIndex value
//
function PeekToken: String;
var
  i, j, k: Integer;
begin
  Result := '';
  if FTokenString = '' then
    Exit;

  // Find first non-space character
  i := FindNextNonBlank(FTokenString, 1);
  if i <= 0 then
    Exit;

  // If GetToken has been called previously, get index of last token
  if FTokenIndex > 0 then begin
    // Shouldn't need the 'against 0' test, but just in case user changed FTokenString...
    k := 0;
    while (i > 0) and (k < FTokenIndex) do begin
      Inc(k, 1);
      i := PosEx(TokenSeparator, FTokenString, i);
      if i > 0 then
        i := FindNextNonBlank(FTokenString, i + 1);
    end;

    // Something went wrong
    if k < FTokenIndex then
      Exit;
  end;

  j := i;
  if j > 0 then
    Inc(j, 1);
  j := PosEx(TokenSeparator, FTokenString, j);
  if (j <= 0) then begin
    if (i >= 1) and (i < Length(FTokenString)) then
      j := Length(FTokenString) + 1
    else
      Exit;
  end;

  Result := Copy(FTokenString, i, j - i);
end;

//------------------------------------------------------------------------------
// Return a token from FTokenString - SetTokenString should be called ahead of this function to
// set the FTokenString value and reset the FTokenIndex variable.
// GetToken can be called in a loop, e.g., until GetToken returns '' (blank). GetToken increments
// FTokenIndex by '1' after each successful call (so that it can 'carry on' from where it left
// off on a subsequent call)
//
function GetToken: String;
begin
  Result := PeekToken;
  if Result <> '' then
    Inc(FTokenIndex, 1);
end;

//------------------------------------------------------------------------------
procedure SetTokenString(const Value: String);
begin
  FTokenString := Value;
  FTokenIndex := 0;
end;


String sort routines

These first six functions are a collective; this is a Pascal Script translation of a Delphi
code sample originally provided at Experts-Exchange (http://bit.ly/yKBzd8), which
implements a pure-pascal algorithm for "natural order" string sorting (comparable
to the WinAPI function StrCmpLogicalW - available with Windows XP and later). See
function descriptions below for more details

//------------------------------------------------------------------------------
// See previous "Code Bank" posting for IsDigit
//
function NonNullChar(const AChar: Char): Boolean;
begin
  Result := not IsDigit(AChar) and (Ord(AChar) <> 0);
end;

//------------------------------------------------------------------------------
// Next three functions are for sorting Strings into 'natural' order, e.g.
// 1.txt, 2.txt, 3.txt, 123.txt, instead of
// 1.txt, 123.txt, 2.txt, 3.txt
//
function GetEndOfGroup(const Val: String; const Index: Integer): Integer;
var
  i: Integer;
begin
  Result := 0;
  i := Index;
  if (Length(Val) <= 0) or (i > Length(Val)) then
    Exit;

  if IsDigit(Val[i]) then begin
    while (i <= Length(Val)) and IsDigit(Val[i]) do
      Inc(i, 1);
  end else begin
    while (i <= Length(Val)) and (NonNullChar(Val[i])) do
      Inc(i, 1);
  end;

  Result := i;
end;

//------------------------------------------------------------------------------
function CompareGroups(var S1Index, S2Index: Integer; const E1, E2: Integer; const S1, S2: String): Integer;
var
  sTemp1, sTemp2: String;
begin
  sTemp1 := Copy(S1, S1Index, E1 - S1Index);
  sTemp2 := Copy(S2, S2Index, E2 - S2Index);
  Result := CompareText(sTemp1, sTemp2);
  S1Index := E1;
  S2Index := E2;
end;

const
  A1_BEFORE_A   = True;
  A01_BEFORE_A1 = True;
  SWAP          = 1;
  NO_SWAP       = -1;

//------------------------------------------------------------------------------
// See also, StrCmpLogicalW - WinXP+ only, but 'better' sorting, ref:
// http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_23086281.html
//
function XPSortCompareStrings(const S1, S2: String): Integer;
var
  E1: Integer; // end of group in first string
  E2: Integer; // end of group in second string
  FallbackResult: Integer;
  i, j: Integer;
begin
  Result := 0;
  if (Length(S1) <= 0) or (Length(S2) <= 0) then
    Exit;
  i := 1;
  j := 1;

  repeat
    E1 := GetEndOfGroup(S1, i);
    E2 := GetEndOfGroup(S2, j);

    if (E1 = 0) then begin
      if (E2 = 0) then
        Break
      else begin
        if (A1_BEFORE_A) then
          Result := SWAP
        else
          Result := NO_SWAP;
      end;
    end else begin
      if (E2 = 0) then begin
        if (A1_BEFORE_A) then
          Result := NO_SWAP
        else
          Result := SWAP
      end else begin
        if (IsDigit(S1[i])) then
          if (IsDigit(S2[j])) then begin
            FallbackResult := (E2 - j) - (E1 - i);

            while (S1[i] = '0') do
              Inc(i, 1);

            while (S2[j] = '0') do
              Inc(j, 1);

            Result := (E1 - i) - (E2 - j);

            if (Result = 0) then
              Result := CompareGroups(i, j, E1, E2, S1, S2);

            if (Result = 0) then
              if (A01_BEFORE_A1) then
                Result := FallbackResult
              else
                Result := -FallbackResult;
          end else
            Result := NO_SWAP
        else begin
          if IsDigit(S2[i]) then
            Result := SWAP
          else
            Result := CompareGroups(i, j, E1, E2, S1, S2);
        end;
      end;
    end;
  until (Result <> 0);
end;

//------------------------------------------------------------------------------
// Next two functions are for sorting a TStringList into 'natural' order
//
procedure ExchangeItems(const Index1, Index2: Integer; const List: TStringList);
var
  sTemp: String;
  oTemp: TObject;
begin
  if not Assigned(List) or (Index1 < 0) or (Index1 > List.Count - 1) or (Index2 < 0) or (Index2 > List.Count - 1) then
    Exit;

  sTemp                := List[Index1];
  oTemp                := List.Objects[Index1];
  List[Index1]         := List[Index2];
  List.Objects[Index1] := List.Objects[Index2];
  List[Index2]         := sTemp;
  List.Objects[Index2] := oTemp;
end;

//------------------------------------------------------------------------------
procedure QuickSort(L, R: Integer; const List: TStringList);
var
  I, J, P: Integer;
begin
  if (List = nil) or (L < 0) or (R > List.Count - 1) or (L > R) then
    Exit;

  repeat
    I := L;
    J := R;
    P := (L + R) shr 1;
    repeat
      while XPSortCompareStrings(List[I], List[P]) < 0 do
        Inc(I, 1);
      while XPSortCompareStrings(List[J], List[P]) > 0 do
        Inc(J, -1);
      if I <= J then begin
        ExchangeItems(I, J, List);
        if P = I then
          P := J
        else if P = J then
          P := I;
        Inc(I, 1);
        Inc(J, -1);
      end;
    until I > J;
    if L < J then
      QuickSort(L, J, List);
    L := I;
  until I >= R;
end;


String test routines

//------------------------------------------------------------------------------
// Determines whether StrA occurs at the very end of StrB
function StrEndsWith(const StrA, StrB: String): Boolean;
var
  StringPos: Integer;
begin
  Result := False;
  StringPos := pos(StrA, StrB);
  if StringPos > 0 then
    Result := StringPos = Length(StrB) - (Length(StrA) - 1)
end;

//------------------------------------------------------------------------------
// Returns the length of a string less any line-break characters
function StringLength(const Value: String): Integer;
var
  CRLFLen, ValLen, i: Integer;
  CRLFStr: String;
begin
  Result := 0;
  if Value = '' then
    Exit;

  CRLFStr := constCRLF;
  CRLFLen := Length(constCRLF);
  ValLen := Length(Value);
  i := 1;
  Result := 0;
  while i < ValLen do begin
    if (Value[i] = CRLFStr[1]) and (Copy(Value, i, CRLFLen) = CRLFStr) then
      Inc(i, CRLFLen)
    else begin
      Inc(Result, 1);
      Inc(i, 1);
    end;
  end;

  Result := Max(Result, 1);
end;


String transformation routines

//------------------------------------------------------------------------------
// Flip the order of characters in a string
//
function ReverseString(const S: String): String;
var
  I, L: Integer;
begin
  L := Length(S);
  SetLength(Result, L);

  for I := 0 to L - 1 do
    Result[L - I] := S[I + 1];
end;

const
  constCRLF = #13#10;

//------------------------------------------------------------------------------
// Replace all occurrences of '%s' (string placeholder) with constCRLF
//
function CRLFFormat(const Value: String; const AddSpaces: Boolean): String;
var
  TmpStr: String;
  CRLFStr: String;
begin
  TmpStr := Value;
  CRLFStr := constCRLF;
  if AddSpaces then
    CRLFStr := constSP + CRLFStr + constSP;
  StringChangeEx(TmpStr, '%s', CRLFStr, False);
  Result := TmpStr;
end;

//-----------------------------------------------------------------------------------------------
// Append a String (StrToAdd) to another String (MainString) in a non-memory-fragmentary way (in theory)
// Note: LastRealIndex marks the last 'used' character in MainString, NOT the next free character
//
const
  constStrLenIncAmount = 500;
  
procedure EfficientAppend(var MainString: String; var LastRealIndex: Integer; const StrToAdd: String);
var
  i: Integer;
begin
  if Length(StrToAdd) <= 0 then
    Exit;
  if Length(MainString) < constStrLenIncAmount then
    SetLength(MainString, constStrLenIncAmount);
  if LastRealIndex > Length(MainString) then
    LastRealIndex := Length(MainString);

  if (LastRealIndex + Length(StrToAdd)) > Length(MainString) then begin
    if (LastRealIndex + Length(StrToAdd)) > (Length(MainString) + constStrLenIncAmount) then
      SetLength(MainString, Length(MainString) + Length(StrToAdd) + constStrLenIncAmount)
    else
      SetLength(MainString, Length(MainString) + constStrLenIncAmount);
  end;

  Inc(LastRealIndex, 1);
  for i := 1 to Length(StrToAdd) do begin
    MainString[LastRealIndex] := StrToAdd[i];
    if i < Length(StrToAdd) then
      Inc(LastRealIndex, 1);
  end;
end;

//-----------------------------------------------------------------------------------------------
// Concatonate a group of strings into a single string
//
function StringsToStr(const List: TStringList; const Sep: String): String;
var
  i, j: Integer;
begin
  Result := '';
  if (List = nil) or (List.Count < 0) or (Length(Sep) <= 0) then
    Exit;
    
  j := 0;
  for i := 0 to List.Count - 1 do begin
    if i < List.Count - 1 then
      EfficientAppend(Result, j, List[i] + Sep)
    else
      EfficientAppend(Result, j, List[i]);
  end;
    
  SetLength(Result, j);
end;

//-----------------------------------------------------------------------------------------------
// Format a string value so that text fits within the width of a Vista MessageDlg box (to prevent
// the OS from automatically line-breaking a string at awkward intervals). Note that any line-
// breaks already included in the text are recognised and retained, but must be separated from
// surrounding text with a space for and aft.
const
  MaxVistaMsgBoxChars = 60;

function FormatStringForVistaMsgBox(const Value: String): String;
var
  LValue, TempToken: String;
  LValLen, PeekTokenLen: Integer;
begin
  Result := '';
  if Value = '' then
    Exit;

  SetTokenString(Value);

  LValue := GetToken;
  while True do begin
    LValLen := StringLength(LValue);
    TempToken := PeekToken;
    PeekTokenLen := StringLength(TempToken);
    LValLen := LValLen + PeekTokenLen;
    if (LValLen > MaxVistaMsgBoxChars) or (TempToken = constCRLF) then begin
      Result := Result + LValue + constCRLF;
      if TempToken = constCRLF then
        GetToken;
      LValue := GetToken;
    end else begin
      if LValue <> constCRLF then
        LValue := LValue + constSP;
      LValue := LValue + GetToken;
    end;

    if PeekToken = '' then
      Break;
  end;

  Result := Result + LValue;
end;

Thursday, 1 March 2012


Contact! Populating the Outlook Address Book

A standalone tool for scanning Outlook emails and extracting names and addresses into the Address Book.


I recently formatted the hard disk of an ageing PC with an ancient Win-XP install, then re-added a fresh copy of Windows. While this put back much needed spring in the performance step of the machine, it was followed on by the tedious process of re-adding all third party applications and copying back user data.

Among these was Microsoft's venerable Office 2003 software suite, notable as the last in the series to feature the traditional tool and menu bars (rather than the Ribbon UI of Office 2007 and later). The user had never populated the Outlook address book; instead (if unknowingly) relying on the NickName list, a file automatically updated with names and addresses by Outlook from the To: and CC/BCC: fields of outgoing messages.
This file, labelled <Outlook Profile Name>.NK2, resides in the %appdata%\Microsoft\Outlook folder – on the XP machine this resolved to:

c:\documents and settings\UserName\Application Data\Microsoft\Outlook\User Name.NK2

The size of the file copied from the previous install was about 400KB, and the user of the machine made regular use of the AutoCompletion feature that the NickName list facilitates. Unfortunately, it didn't survive the transfer. It's still not clear what went wrong, but when the NickName file was copied to the new Windows install, Outlook ignored it (or wouldn't work with it). Stories abound about NK2 file corruption (http://bit.ly/yMdivj), so perhaps this is to blame, although it went unnoticed during the previous incarnation of Windows/Outlook on the same PC.

So the user was faced with the onerous prospect of starting again with an empty NickName list, and slowly re-growing it over time.


Delphi to the rescue

I've been peripherally aware of the Office controls under the “Servers” tab in Delphi for a while, but I've never had cause to investigate further. This seemed like a good opportunity to see what was possible.

My intention was to bypass the NickName file completely. It isn't something you can manually update, there are no user tools for manipulating it, and it doesn't appear to have a very good reputation for reliability. Instead I wanted the user to become familiar with the Outlook Address Book.

This is a slightly confusing area because there are some related terms that are often used interchangeably, but mean different things. For example, some people talk about the “Address Book”, while others call the same the “Contact List”. Microsoft provides this definition (http://bit.ly/x0nRa4):

Think of the Address Book as a container of individual address books or address lists, such as the following:
  • Contacts - The address book where you keep your personal addresses. In the Address Book dialog box, Contacts appear under Outlook Address Book. Contacts are stored in a .pst data file and support unicode.
  • Personal Address Book - The predecessor to Contacts in earlier versions of Outlook that uses older technology and doesn't support unicode. The Personal Address Book is stored in a .pab data file.
  • LDAP Internet directories - Think of LDAP as a White Pages for the Internet. lobal Address List (GAL) - If your organization uses Microsoft Exchange Server e-mail accounts, the GAL displays names of people in your organization.
  • Third-party address books.

So the “Address Book” is an over-arching container for other lists that contain details of people you know about – and one of these is the Contact list represented as a folder under the “Folder List” pane in Outlook.

Although importing the old NickName list failed, copying all Person Folder (.pst) files had worked without problems. So we had a store of all sent and received emails – the raw data for entries in our new Address Book if you like – but no simple way of getting this information into the Address Book short of copying and pasting it all in, one entry at a time.

My intention then, using Delphi and COM, was to create a tool which would iterate over every email in the Personal Folders store extracting names and email addresses as it went, then import each of these values into the Contacts list.


The tools

Because the need was pressing, I had very little time to decide how to go about creating the tool. There are some useful tutorials and code snippets scattered about the web, but nothing that dealt with my specific requirements. In the event I went for a quick 'n' dirty solution using Delphi MAPI (MAPI definition: http://bit.ly/gBAAaY) header translations from dimastr.com, home of Outlook Redemption (also written in Delphi) http://bit.ly/y8wbkw (see the “Extended MAPI headers for Borland Delphi” item).

As a result the tool eschews the Delphi-provided MS-Office Type Library files (Outlook2000.pas etc.), and instead uses OLEVariants and late-binding. At the time this was for no other reason than that in creating a few simple proof-of-concept apps, I found the late-binding technique quicker to get to grips with (perversely). But thinking further ahead, it would hopefully make the tool more portable to other Outlook versions.

There was another consideration that swayed the balance somewhat, which was that with the dimastr headers and late-binding, running the tool doesn't trigger Outlook's Security Alert (via the Object Model Guard) dialog (“Another process is trying to access e-mail addresses you have stored in Outlook”). Compiling against the Delphi Outlook Type Library files always did. I haven't looked at which classes or combinations of classes might be responsible for tripping the alert, so it may be that this is just as easy to mitigate in the latter instance.

Early-binding vs late-binding

http://bit.ly/zB0Z8P
http://bit.ly/A7G0O2


Exploring the UI

It's all very plain and utilitarian, but it does the job. The main form is divided into three vertically stacked components: a Listview at the top (labelled “1” in the image), which is used to display all names and email addresses found (plus an index, indicating the order in which they were discovered), a RichEdit in the middle (labelled “2” in the image) for giving feedback about any errors encountered during the scan, and at the bottom a panel with buttons (labelled “3”, “4”, and “5”) for controlling the application.

There's also a Statusbar at the foot of the main form, which gives a running count of the total number of names and addresses found, and the name of the folder currently being explored (when the application is in its “running” state).

Tool main form at the end of a scan (with email addresses blanked out)

The Listview can be sorted by Index, Sender name, or Sender email address (sorting is by Sender name in the screen shot).

Where a duplicate contact is found it is ignored and reported in the RichEdit (a “duplicate” is any contact with the same Sender name/Sender email address as a previously seen contact. Note that this means a contact with a different Sender name, but an email address found before – or visa versa - is not considered a duplicate).

Where an invalid email address is found (using a web procured algorithm that isn't quite RFC 5322 compliant http://tools.ietf.org/html/rfc5322 – but works in most instances – for example, it will reject email addresses with a '+' symbol in the “local” part of any address, the segment before the '@' sign, which is incorrect behaviour, but this can be a tricky problem: http://bit.ly/16XkKb), it is also reported to the RichEdit and not inserted into the Address Book.

There are two buttons in the panel to the lower-left of the main form labelled “3” and “4” in the first screen shot.

Button “3” has the caption “Excluded Contacts”. Clicking this button displays a modal child form with a Memo component and two buttons of its own - “OK” and “Cancel”.
By entering names or email addresses into the Memo, one name or address per line, it is possible to build up a “block list” of contacts that will not be added to the Address Book if found. Wild card symbols are supported (“?” as a place-holder for an individual character, or “*” as a place-holder for multiple characters), so a single entry can be used to cover multiple contact variations, e.g., “John D*” would block any contacts with names including “John Doe”, “John Dean”, “John Dillinger” etc.

Excluded contacts form with two entries

Here we see two added “block list” entries, “*@hotmail.com” and “Amazon*”, which would prevent any contacts with a hotmail.com email address, or Sender name beginning “Amazon” from being added to the Address Book.

Button “4” has the caption “Excluded Folders”. Clicking this button will also show a modal child form, and like the “Excluded Contacts” form, this contains a Memo and “OK” and “Cancel” buttons. This form implements another sort of “block list”, this time for folders in the Outlook “Folder List” pane. By adding entries to the Memo – again, one entry per line – folders enumerated during the scan with a name matching a “block list” entry will not be searched for emails.
As above, wild card symbols can be used, so adding a folder name such as:
Personal Folders.I?box
- would block email searching in folders named “Inbox” and “IMBox”.
Excluded folders form with 6 blocked items

Here we see six block list entries, preventing email searching in any of the folders “Deleted Items”, “Calendar”, “Contacts”, “Journal”, “Notes”, and “Tasks”. Folders must be “fully qualified”, that is, be written to include all ancestor folders in a pre-amble to the folder name, e.g., “Personal Folders.Delete Items”.

The block lists can be used in combination, to exclude entire folders and/or names and email addresses.

Button 5, with caption Populate Outlook Contacts, begins the scanning process. Below, two buttons with captions “Pause” and “Stop” are disabled when no scan is in progress, but become enabled once a scan is underway. The Pause button temporarily suspends a running scan (until it is pressed again); the Stop button halts a running scan.
Contacts are added to the Address Book as they are discovered, so Stop-ping a running scan will prevent additional contacts from being added, but any found prior to that event will already have been inserted and saved.

Code Listing
Event handler for the "Populate Outlook Contacts" button click event

procedure TfrmMain.btnAddAddressesClick(Sender: TObject);
var
  OutlookApp, OLFolderList, OLFolder: OleVariant;
  OLContactList: Variant;
  slDupList: TStringList;
  TotalAddresses: Integer;
begin
  redtStatusMsgs.Clear;
  TotalAddresses := 0;
  SetControlsEnabled(Self, True, []);
  SetControlsEnabled(Self, False, ['btnPause', 'btnStop']);
  slDupList := nil;
  OutlookApp := Unassigned;
  MAPIInitialize(nil);
  try
    try
      slDupList := TStringList.Create;
      slDupList.Sorted := True;
      slDupList.Duplicates := dupIgnore;

      OutlookApp := CreateOleObject('Outlook.Application');
      if VarIsEmpty(OutlookApp) then begin
        StatusMessage('Creating Outlook object: aborting.', tstError);
        Exit;
      end;

      // Get the Items collection from the Contacts folder
      OLContactList := OutlookApp.GetNameSpace('MAPI').Folders('Personal Folders').Folders('Contacts').Items;
      if VarIsEmpty(OLContactList) then begin
        StatusMessage('Getting Contacts object: aborting.', tstError);
        Exit;
      end;

      // Iterate folders
      OLFolderList := OutlookApp.GetNameSpace('MAPI').Folders;
      if VarIsEmpty(OLFolderList) then begin
        StatusMessage('Getting Folders object: aborting.', tstError);
        Exit;
      end;
      // Get list of folders below the "root" folder (i.e. "Personal Folders")
      OLFolder := OLFolderList.GetFirst;

      IterateFolders(OLFolder, OLContactList, slDupList, TotalAddresses);

      redtStatusMsgs.Lines.Add('');
      StatusMessage('Total contacts added: ' + IntToStr(slDupList.Count), tstStatus);
    except
      on E: SysUtils.Exception do
        StatusMessage('(' + E.ClassName + ') ' +  E.Message, tstError);
    end;
  finally
    MAPIUninitialize;
    FreeAndNil(slDupList);
    // Close Outlook
    OutlookApp := Unassigned;
    SetControlsEnabled(Self, False, []);
    SetControlsEnabled(Self, True, ['btnPause', 'btnStop']);
  end;
end;

The main application loop

function TfrmMain.IterateFolders(const Folder: OleVariant; const OLContactList: Variant; const slDupeList: TStringList; var TotalAddresses: Integer): Boolean;
var
  OLFolderList, OLFolder, OLFolder2: OleVariant;
  OLItemList, OLItem, OLContact: Variant;
  sSender, sAddress, sPath: String;
  i, j, k: Integer;
  bDupe, bBadAddress, bExcludedContact: Boolean;
  intrfTemp: IInterface;
  liEntry: TListItem;
begin
  Result := False;
  try
    // Get list of folders below Folder
    OLFolderList := Folder.Folders;

    for i := 1 to OLFolderList.Count do begin
      // Get a MAPIFolder
      OLFolder := OLFolderList.Item(i);
      if VarIsEmpty(OLFolder) then begin
        StatusMessage('Getting Folder (' + IntToStr(i) + ') object: aborting.', tstError);
        Exit;
      end;

      // Print description
      stat1.Panels[1].Text := OLFolder.Name;

      // Check whether this is an "excluded" folder
      sPath := OLFolder.Name;
      OLFolder2 := OLFolderList.Parent;
      while (Pointer(IDispatch(OLFolder2.Parent)) <> nil) do begin
        // MAPIFolder
        if Supports(OLFolder2, StringToGUID('{00063006-0000-0000-C000-000000000046}'), intrfTemp) then
          sPath := OLFolder2.Name + '.' + sPath;
        OLFolder2 := OLFolder2.Parent;
      end;
      if IsExcludedFolder(sPath) then begin
        StatusMessage('(IGNORED) excluded folder: ' + sPath, tstStatus);
        Continue;
      end;

      // Recursively iterate through sub-folders
      if OLFolder.Folders.Count > 0 then begin
        Result := IterateFolders(OLFolder, OLContactList, slDupeList, TotalAddresses);
        if not Result then
          Exit;
      end;
      
      // Iterate over folder items (i.e., emails)
      OLItemList := OLFolder.Items;

      for j := 1 to OLItemList.Count do begin
        // Running total
        stat1.Panels[3].Text := IntToStr(TotalAddresses);
        Inc(TotalAddresses);

        if j mod 50 = 0 then
          Application.ProcessMessages;

        while FPaused do begin
          Sleep(50);
          Application.ProcessMessages;
          if Application.Terminated or FStopped then
            Exit;
        end;

        if FStopped then
          Exit;

        // A _MailItem
        OLItem := OLItemList.Item(j);
        sSender := GetProperty(OLItem, PR_SENDER_NAME);
        sAddress := GetProperty(OLItem, PR_SENDER_EMAIL_ADDRESS);

        // Ignore duplicates, any messages without a valid email address, or any names or addresses in the "excluded" lists
        bBadAddress := False;
        bExcludedContact := False;
        bDupe := slDupeList.Find(LowerCase(sAddress) + '=' + LowerCase(sSender), k);
        if not bDupe then begin
          bExcludedContact := IsExcludedContact(sSender) or IsExcludedContact(sAddress);
          if not bExcludedContact then
            bBadAddress := not IsValidEmail(sAddress);
        end;

        if bDupe or bBadAddress or bExcludedContact then begin
          if bDupe then
            StatusMessage('(DUPLICATE) ' + sSender + ', ' + sAddress, tstStatus)
          else if bBadAddress then
            StatusMessage('(INVALID) ' + sSender + ', ' + sAddress, tstWarning)
          else
            StatusMessage('(EXCLUDED) ' + sSender + ', ' + sAddress, tstStatus);
          Continue;
        end;
        slDupeList.Add(LowerCase(sAddress) + '=' + LowerCase(sSender));

        // Print description
        liEntry := lv1.Items.Add;
        liEntry.Caption := IntToStr(lv1.Items.Count);
        liEntry.SubItems.Add(sSender);
        liEntry.SubItems.Add(sAddress);

        // The above presents a problem that can probably only be sorted out manually?
        // e.g., if two people with identical names are found and different email addresses
        // are they the same person with more than one address?
        // Or are they two separate individuals?
        //
        // If two different people share an email address?
        //
        // For simplicities sake, every unique name/address combo is considered a separate contact
        OLContact := OLContactList.Add;
        OLContact.FullName := sSender;
        OLContact.Email1Address := sAddress;

        // Save the new record.
        OLContact.Save;
      end;
    end;
  except
    on E: SysUtils.Exception do
      StatusMessage('(' + E.ClassName + ') ' +  E.Message, tstError);
  end;
  Result := True;
end;


Conclusion

That's about all there is to it. At the end of a scan – processing 5000 email addresses takes ~5 minutes on my 2.0GHz system, but involves a lot of disk activity, so this may be the chief bottleneck – the Contacts list of the Address Book will contain all names and email addresses found.
Be aware that Outlook will happily accept multiple identical Contact entries, and no provision has been made for this in the tool – we were starting from a “blank slate” so dealing with this wasn't a requirement; if you are updating an existing Contacts list, this may be something you need to consider. At present I leave this as an exercise for the reader.

Companion Files:
Tool source code and compiled file (419 KB):
http://www.mediafire.com/?62cmin1khrr2z9p