Assembling assemblies

Thomas Lee Read more October 7, 2009

In last month’s column, I presented an overview of .NET, looking at its structure and contents. This month, I want to look in more detail at .NET assemblies: what they are, why they matter and what you can do with them. I’ll be illustrating these concepts using some simple C# code and the tools supplied with both the .NET Framework and the .NET Framework SDK. First, though, an update on the current release status of the .NET Framework.

Between the time I submitted my first column and this one, Microsoft has released the latest version of the .NET Framework, .NET 2, along with Visual Studio 2005 and SQL Server 2005. Last month, I gave you some pointers to the relevant downloads, some of which may have now changed, so I’ve set up a new page at www.reskit.net, which contains all the URLs I’ll be referring to in each article. These URLs are updated whenever necessary, so they should be up to date. I’ll also be including additional pointers to more information, code snippets and anything else I can think of that you might find useful.

Assembly Basics

Before diving into how assemblies work, I think it’s useful to talk a little about how Windows loads programs and how .NET fits into the scheme of things. The Windows loader is an OS component that’s responsible for loading an executable program into memory and starting it running – a vital component that we tend to take for granted. In the Win32 environment, files that are executable – that is, EXE and DLL files – have a common internal format known as Portable Execution, or PE, format.

The ‘portable’ bit refers to the fact that at one time executable files could run on multiple platforms, so a common file format was needed. The PE format initially came from the VAX world, but was adopted for Windows NT. The basic PE format was first devised in the days of MS-DOS, over 30 years ago, but has been modified over time, and for .NET and 64-bit Windows it’s undergone further modifications. (While most things have moved on, it’s interesting that the first two bytes of any PE file still contain the characters ‘MZ’ after Mark Zbikowski, one of the original DOS architects.)

PE files start off with a small section of MS-DOS executable code, which was useful in the early days of Windows: if you tried to run a Windows executable on a machine without Windows loaded, the program could display a message pointing out that Windows was required. For more information about the PE format, see msdn.microsoft.com and msdn.microsoft.com

With that bit of background in place, what exactly is an assembly and why does it play a key role in the .NET architecture? An assembly is a unit of program code that you build and deploy as a single unit – much like a traditional EXE or DLL file. The assembly is the key unit of deployment under .NET, and an assembly is also a security boundary, in that you can grant permissions at the level of the single assembly (I’ll be covering .NET security more fully in a later article or articles). Figure 1 shows the contents of a typical assembly, which includes:

Assembly metadata: information that describes the assembly in detail.

Type metadata: information about the types this assembly implements.

Code: this is MS Interpreted Language (MSIL) code that implements the types defined in this assembly. MSIL code is compiled to native code by the Just-in-Time compiler whenever the application is run. As I’ll be covering this in a future article, you can also use the ngen.exe utility to compile an assembly into native code that resides in the Global Assembly Cache (GAC).
There are broadly speaking two types of assembly: process assemblies and library assemblies. A process assembly, typically containing EXE files, represents a process that will use classes defined in library assemblies. Unlike Win32, .NET does not employ the filename extension to determine whether the file is a process or library, which means a library may have either a DLL or an EXE extension.

An assembly can be made up of one or more files, which may include multiple code files and a separate manifest. For example, an assembly might contain some executable code plus some resources such as JPEG and BMP image files. Since you can use different languages to create individual code modules under .NET, you could in theory combine several separately compiled code modules to create just one assembly, but in practice this doesn’t happen, because Visual Studio only allows you to create an assembly made up of a single code module. If you build a .NET app, such as a console application or Windows app, you must have at least one assembly: your application. Your app may have functionality contained in separate DLLs – for performance reasons, to simplify development or for localisation – and in such cases you may have multiple assemblies to manage.

While I haven’t talked about ASP.NET in much detail yet, assemblies are also used there. In the case of ASP.NET, you’ll typically have an assembly for each page of your ASP.NET website, plus additional assemblies that perform common logic to be shared among the rest of the pages.

There are two tools you can use to look inside an assembly and view its contents, and it would be good idea to get both of them and start looking inside some assemblies to get acquainted. The first tool is Microsoft’s Intermediate Language Dis-assembler, or ILDASM, contained in the .NET Software Development Kit (SDK), which you can download from msdn.microsoft.com. The second program is Reflector, a class browser for .NET components, which you can use to browse and search an assembly’s metadata, IL instructions, resources and XML documentation. Written by Lutz Roeder, Reflector is a free download. For more information about the tool and to download a copy, see www.aisto.com

Building Assemblies

Building an assembly is generally done with one of the .NET language compilers, and to illustrate the concept of assemblies and how they work I’ve created two simple C# programs. The first is called Client (client.exe), which makes use of an external DLL to perform a calculation and print the results. Here’s the source code for the client program:

using System;

public class Client

{

public static void Main()

{

Console.WriteLine(“In Client.exe”);

Maths m = new Maths();

long r1 = 2;

long r2 = 2;

long r = m.Add(r1, r2);

Console.WriteLine (“{0} + {1} = {2}”, r1, r2, r);

}

The second component, maths.dll, implements the external Maths library use by client.exe, and its source looks like this.

using System;

public class Maths

{

public long Add(long a, long b)

{

Console.WriteLine(“In Maths.dll”);

return a + b;

}

To build these two components, let’s use the built-in C# Compiler (of course, to do this you must first have the .NET Framework loaded). Once you’ve created the two source files client.cs and maths.cs, you compile them from the command line as follows:
[MSH] C:\demo>csc /t:library maths.cs

Microsoft (R) Visual C# 2005 Compiler version 8.00.50215.44

for Microsoft (R) Windows (R) 2005 Framework version 2.0.50215

[MSH] C:\demo>csc /r:maths.dll client.cs

Microsoft (R) Visual C# 2005 Compiler version 8.00.50215.44

for Microsoft (R) Windows (R) 2005 Framework version 2.0.50215

[MSH] C:\demo>ls

Directory: FileSystem::C:\demo

Mode LastWriteTime Length Name

—- ————- —— —-

-a— 12/2/2005 2:23 PM 309 client.cs

-a— 12/4/2005 5:15 PM 3584 Client.exe

-a— 12/2/2005 2:38 PM 153 maths.cs

-a— 12/4/2005 5:15 PM 3584 maths.dll

By compiling both programs, you’ve just created two assemblies: client.exe and maths.dll. If you run client.exe, you’ll see something like this:

[MSH] C:\demo>./client

In Client.exe

In Add in Maths.dll

2 + 2 = 4

Building and using .NET assemblies isn’t much different from using non-managed applications – .NET applications run like all the other applications you’re used to, although you have to have the CLR loaded, both to compile and run the program. Also, when you run the client program, the Framework needs to be able to find and load maths.dll. I’ll come back to the issue of how .NET finds components later in this article.

The Manifest

A key feature of assemblies is that they’re self-describing, through the use of a manifest. The manifest is a component of every assembly’s metadata, and every assembly has a manifest that describes precisely what that assembly contains. In the example above, there are two assemblies (client.exe and maths.dll) and therefore two manifests. For maths.dll, the manifest would show the exported maths class, while for client.exe the manifest shows the Main routine is exported. The CLR uses the information in an assembly’s manifest to resolve references, to enforce binding to specific versions of an assembly and to ensure the integrity of a loaded assembly. The manifest contains at least the following:

Assembly name: a text filename.

Version number: expressed by four digits (for example, 1.2.3.4) that represent the major version number, minor version number, revision and build numbers.

Culture: information about what language (aka culture) this assembly supports. This is used to create language-specific versions of an assembly, known as satellite assemblies.

Strong name information: the public key, if this assembly has been given a strong name.

File list: a list of the files contained in the assembly and a hash of each file (to detect changes to the files contained in the assembly).

Type information: metadata about the contents of the assembly. This information is used by the CLR at runtime to map references to a component onto the file containing that component (for example, mapping the reference to the maths class within client.exe to the file maths.dll).

Information on referenced assemblies: a list of other assemblies used by this assembly.

To look inside an assembly and view its manifest, use ILDASM, as shown in Figure 2. As you can see, the manifest lists the external assemblies the client program uses (for example, Mscorlib and maths.dll) and defines the content of the assembly, including the version of the assembly and details of the actual runtime module (client.exe). The manifest for the maths DLL looks similar, although since it’s a DLL called by the client it contains no reference to the client, but it does contain a definition of the Maths class contained in maths.dll.
Identifying Assemblies

An important feature of any runtime framework is that it can find the various files that contain needed components. In traditional Win32 programming, a program is identified by name only, so when a client program calls an external DLL, that DLL is identified by only the name of its file. If your Win32 app calls a library routine that some other app’s installer later overwrites, your app may or may not work correctly – this crude feature of Win32 is not so fondly referred to in the business as ‘DLL Hell’.

The .NET Framework addresses this problem by introducing the concept of a ‘strong name’. In Win32, and by default in .NET as shown in the example above, assemblies are weakly named and just use the filename as a way of identifying a unit of code to load and execute. A strong name consists of four components: filename, assembly version, an optional culture name and a public key and associated digital signature. To create an assembly with a strong name, you have to carry out a few simple steps. First, you generate a key-pair using the strong name program sn.exe, which gets installed with the .NET SDK. In the .NET 1.1 and 1 frameworks, the key size was fixed at 1,024 bits, but in .NET Framework 2 you can have longer keys.

To create a key-pair with sn.exe, use the -k switch, specifying an output filename and an optional key length, as follows:

[MSH] C:\demo> sn -k 2048 pcprokey.snk

Microsoft (R) .NET Framework Strong Name Utility Version 2.0.50727.42

Key pair written to pcprokey.snk

Once you’ve created the key-pair, you need to add attributes to the app’s source. Attributes are special code statements placed in an app’s source files that tell the .NET language compiler, among other things, how to create a strongly named assembly. For example, you can add the appropriate attribute for the version number into the maths.cs source file like this:

using System;

using System.Reflection;

[assembly:AssemblyVersionAttribute(“1.2.3.4”)]

public class Maths

{

public long Add(long a, long b)

{

Console.WriteLine(“In strong named maths.dll”);

return a + b;

}

Lastly, you need to recompile the application using the key you just generated, and then run it. This looks like:

[MSH] C:\demo>csc /t:library /out:maths.dll maths2.cs /keyfile:pcprokey.snk

Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.42

for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727

[MSH] C:\demo>csc /r:maths.dll client.cs

Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.42

for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727

[MSH]./Client.exe

In Client.exe

In strong named maths.dll

Result: 2 + 2 = 4

Finding Assemblies At Run Time

The CLR locates and binds to an assembly whenever one running assembly needs to use another, such as our client.exe app calling maths.dll. References are usually static, like client.exe’s reference to maths.dll, but .NET enables you to create dynamic references on-the-fly using its reflection capabilities. The CLR locates the needed assembly, using the following steps:

The CLR examines all applicable configuration files, including the app configuration file, publisher policy file and machine configuration file. I’ll cover these configuration files in next month’s column, as they’re important in deployment of apps.
The CLR then checks to see if the assembly has previously been loaded and, if so, uses the previously loaded assembly memory image.

The CLR checks the Global Assembly Cache (GAC) and, if the assembly is found in there, it’s used. I’ll be covering the GAC in the next column too.

Finally, the CLR locates the assembly, either by using the codebase directive specified in the app’s configuration file or via a process known as probing, described below.

Probing for an assembly isn’t unlike the way Windows finds an app whose name you type at a DOS Command Prompt or into the Start/Run dialog box. The CLR first looks in the folder from which the app was launched: if not found there, the CLR then looks in the folders contained in the privatePath attribute of the element, contained in the app’s configuration file.

.NET gives you considerable flexibility in deployment. To deploy ‘simple’ apps, put all the related assemblies (the app and all of its libraries) into the same folder and just Xcopy this folder over to the client system. As I’ll show you next month, for shared assemblies you can use the GAC. And you can use publisher policies to enable the CLR to redirect calls to updated versions of libraries.

In this article, we’ve seen what an assembly is, what it contains and how to create one. We’ve also examined how the .NET CLR locates and runs an assembly. Next month, I’ll be looking at how you deploy .NET apps and examining the role of the GAC and the use of publisher policies.