译:《Professional C# 4.0 and .NET 4》 第一章 .Net体系架构(1)

以下内容原文见《Professional C# 4.0 and .NET 4》 -> Part1: The C# Language -> Chapter 1: .NET Architecture
原文作者:Christian Nagel, Bill Evjen, Jay Glynn, Karli Watson, Morgan Skinner
翻译:Sam.Sha – ycoder.com

第一部分:C#语言

第一章:.Net体系架构

本节内容:

.NET中编译和运行代码
微软中间语言的优势
值类型引用类型
数据类型
理解异常处理和属性
程序集,.NET基础类,名称空间
通过本书,我们强调C#语言需要结合.NET Framework考虑,而不是孤立的看待,C#编译器是专门针对.NET的,这意味着所有C#编写的代码总是运行在.NET Framework中,这对于C#语言有两点重要影响:
1、C#的架构和方法反映了.NET底层的实现方法
2、在许多情况下,C#的语言特性实际上依赖于.NET的特性或者.NET的基础类
因为有这样的依赖性,有必要在学习C#编程之前理解.NET的体系结构和方法,这是本章的目的

C#与.NET关系

C#是一种相对较新的语言,具有两方面特点:
1、是特别针对微软.NET Framework设计的语言
2、是一种基于面向对象设计方法的语言,在设计它时,微软借鉴了所有其他相似的面向对象的语言
有一件重要的事情要说明,C#语言有他自己的规则,虽然他被设计用于.NET Framework,但他本身并不是.NET的一部分,有些.NET支持的功能,C#并不支持,你可能感到奇怪的是有些C#支持的功能.NET也不支持(如某些操作符的重写),但是C#语言的目的是用于.NET的,如果你想有效的使用C#开发程序你需要了解这个架构,因此本章将花点时间探究.NET底层的样子,让我们开始吧。

通用语言运行器

.NET Framework的核心是它的运行执行环境,也就是通用语言运行器(CLR)或者.NET运行器。在CLR控制下运行的代码通常称为托管代码。
但是,在被CLR执行之前,任何你编写的代码都需要被编译,编译有两步:
1、编译成微软中间语言(IL)
2、IL由CLR编译成跨平台的特殊代码
这两步非常重要,因为中间语言的存在是.NET诸多特性的关键所在
IL借鉴了Java字节代码的思想,是一种语法简单的低级别语言(基于数字代码而不是文本),可以快速翻译成本地机器代码,这种良好的通用语法规则的代码具有明显优势
中间语言:平台独立性、性能提高和语言互用性

平台独立

首先,平台独立意味着同样的一份代码可以放到任意平台,在运行时,在编译的最后环节轻松完成,这样代码就可以运行在不同的平台上,换句话说,通过编译成IL你可以获得平台独立性,如同Java字节代码带来的Java跨平台一样。
注意.NET的跨平台目前只是一种理论上的,因为目前.NET只完全实现于Windows,但是部分实现已经存在(比如Mono项目,致力于创建开源.NET实现,见:www.go-mono.com

性能提高

虽然我们拿Java作比较,IL实际上是一个更加野心勃勃的Java字节代码,IL是实时编译的(称为JIT编译),而Java字节码通常是解释执行的,这也是Java的缺点之一,从Java字节码转换到本地代码这一过程会导致部分性能的丧失(在有些平台下Java是JIT编译的,这种情况例外)
取代整个程序编译(这将导致程序启动缓慢)JIL编译器只编译被调用到的代码片段,这称为实时编译,一旦代码被编译,得到的本地可执行代码就会被保存知道程序结束,所以运行过的代码不需要重新编译,微软认为这样的过程比比从一开始就编译整个程序更有效率,因为通常程序运行大部分代码都不会用到,使用实时编译,这样的代码就永远不需要被编译
这就解释了为什么你调用IL代码比调用本地机器代码还要快速,这样作不只是微软为了追求效率,原因在于最后一步编译是在运行时完成,JIL编译器将知道正在运行的程序的实际处理类型,这意味着他可以利用机器代码的特性优化最后的执行代码,提供特殊的处理
传统的编译器是优化代码,但他只能对特定处理器进行优化,因为传统编译器将代码编译成本地代码后交付,这意味着编译器不知道代码将运行在什么样类型的处理器上,如x86兼容处理器还是alpha处理器。

语言互用性

IL的使用不只是保证平台独立性,也有利于语言的互用,简单的说,你可以使用一种语言编译成IL,编译后的代码可以被另一种可编译成IL的语言代码所调用
你或许想知道除了C#还有那些语言可以调用.NET,下面的章节将介绍其他常用的支持.NET的语言
Visual Basic .NET/Visual C++/COM/COM++,不同语言不多做介绍,请参看原文

值类型和引用类型的区别

与其他编程语言相同,IL提供了预定义的原始数据类型,但是IL的一个特别之处是:它将值类型和引用类型强制分开,值类型对象的属性直接存放他的数据,而引用类型的属性存放的是数据的地址,通过这个地址可以找到对应的数据。

按照C++的规范,使用引用类型与通过指针访问变量相似,而在Visual Basic中与引用类型最相似的是对象,Visual Basic 6中总是通过引用访问对象,IL还规定了数据存储的规范,引用类型的实例对象存放在一块被称为托管堆的内存区域,而值类型通常存放在内存栈中(但是如果值类型被申明为引用类型,则它将被存放在堆中),第2章节”Core C#”中描述了堆和栈的工作原理。

强制数据类型

IL的一个重要特点是,它完全基于强数据类型,这意味着所有的变量都必须申明成特定的数据类型(IL中没有room,比如Visual Basic和脚本语言中的Variant数据类型),尤其IL不允许任何操作得到不确定数据类型的结果。

举个例子,Visual Basic 6开发者可以传递参数而不用太担心参数的类型,因为Visual Basic 6会自动实现类型的转换,C++开发者可以使用常规的指针传递不同类型的数据参数,执行这样的操作可能带来性能上的提高,但是也打破了类型的安全性,因此,它只允许在某些情况下,将一些语言编译成托管代码,实际上,指针(而不是引用)只允许在标记块的C#代码中使用,Visual Basic根本不可用(虽然允许托管C++),在你的代码中使用指针,可能使CLR的内存类型安全检查失败,你需要注意某些语言对.NET的兼容,比如Visual Basic 2010依然允许宽松的书写,但那只是一种可能,因为在IL的强制执行下编译器会确保类型的安全,通过强制类型似乎会影响性能,但很多情况下是有好处的,.NET提供的服务对于类型安全的依赖远比性能重要,这些服务包括:

➤ 语言互用性

➤ 垃圾回收

➤ 安全

➤ 应用程序域

下面的章节将进一步了解为什么强制数据类型对于.NET的这些特性如此重要。

强制数据类型是语言互用的关键

If a class is to derive from or contains instances of other classes, it needs to know about all the data types used by the other classes. This is why strong data typing is so important. Indeed, it is the absence of any agreed-on system for specifying this information in the past that has always been the real barrier to inheritance and interoperability across languages. This kind of information is simply not present in a standard executable file or DLL.

Suppose that one of the methods of a Visual Basic 2010 class is defined to return an Integer — one of the standard data types available in Visual Basic 2010. C# simply does not have any data type of that name. Clearly, you will be able to derive from the class, use this method, and use the return type from C# code, only if the compiler knows how to map Visual Basic 2010s Integer type to some known type that is defined in C#. So, how is this problem circumvented in .NET?

Common Type System

This data type problem is solved in .NET using the Common Type System (CTS). The CTS defines the predefined data types that are available in IL, so that all languages that target the .NET Framework will produce compiled code that is ultimately based on these types.

For the previous example, Visual Basic 2010s Integer is actually a 32-bit signed integer, which maps exactly to the IL type known as Int32. Therefore, this will be the data type specified in the IL code. Because the C# compiler is aware of this type, there is no problem. At source code level, C# refers to Int32 with the keyword int, so the compiler will simply treat the Visual Basic 2010 method as if it returned an int.

The CTS does not specify merely primitive data types but a rich hierarchy of types, which includes well- defined points in the hierarchy at which code is permitted to define its own types. The hierarchical structure of the CTS reflects the single-inheritance object-oriented methodology of IL, and resembles Figure 1-1.

figure 1-1

We will not list all the built-in value types here, because they are covered in detail in Chapter 3, “Objects and Types.” In C#, each predefined type is recognized by the compiler maps onto one of the IL built-in types. The same is true in Visual Basic 2010.

Common Language Specification

The Common Language Specification (CLS) works with the CTS to ensure language interoperability. The CLS is a set of minimum standards that all compilers targeting .NET must support. Because IL is a very rich language, writers of most compilers will prefer to restrict the capabilities of a given compiler to support only a subset of the facilities offered by IL and the CTS. That is fine, as long as the compiler supports everything that is defined in the CLS.

For example, take case sensitivity. IL is case-sensitive. Developers who work with case-sensitive languages regularly take advantage of the flexibility that this case sensitivity gives them when selecting variable names. Visual Basic 2010, however, is not case-sensitive. The CLS works around this by indicating that CLS- compliant code should not expose any two names that differ only in their case. Therefore, Visual Basic 2010 code can work with CLS-compliant code.

This example shows that the CLS works in two ways.

1. Individual compilers do not have to be powerful enough to support the full features of .NET — this should encourage the development of compilers for other programming languages that target .NET.

2. If you restrict your classes to exposing only CLS-compliant features, then it guarantees that code written in any other compliant language can use your classes.

The beauty of this idea is that the restriction to using CLS-compliant features applies only to public and protected members of classes and public classes. Within the private implementations of your classes, you can write whatever non-CLS code you want, because code in other assemblies (units of managed code; see later in this chapter) cannot access this part of your code anyway.

We will not go into the details of the CLS specifications here. In general, the CLS will not affect your C# code very much because there are very few non-CLS-compliant features of C# anyway.

It is perfectly acceptable to write non-CLS-compliant code. However, if you do, the compiled IL code is not guaranteed to be fully language interoperable.

Garbage Collection

The garbage collector is .NET’s answer to memory management and in particular to the question of what to do about reclaiming memory that running applications ask for. Up until now, two techniques have been used on the Windows platform for de-allocating memory that processes have dynamically requested from the system:

➤ Make the application code do it all manually.

➤ Make objects maintain reference counts.

Having the application code responsible for de-allocating memory is the technique used by lower-level, high-performance languages such as C++. It is efficient, and it has the advantage that (in general) resources are never occupied for longer than necessary. The big disadvantage, however, is the frequency of bugs. Code that requests memory also should explicitly inform the system when it no longer requires that memory. However, it is easy to overlook this, resulting in memory leaks.

Although modern developer environments do provide tools to assist in detecting memory leaks, they remain difficult bugs to track down. That’s because they have no effect until so much memory has been leaked that Windows refuses to grant any more to the process. By this point, the entire computer may have appreciably slowed down due to the memory demands being made on it.

Maintaining reference counts is favored in COM. The idea is that each COM component maintains a count of how many clients are currently maintaining references to it. When this count falls to zero, the component can destroy itself and free up associated memory and resources. The problem with this is that it still relies on the good behavior of clients to notify the component that they have finished with it. It takes only one client not to do so, and the object sits in memory. In some ways, this is a potentially more serious problem than a simple C++-style memory leak because the COM object may exist in its own process, which means that it will never be removed by the system. (At least with C++ memory leaks, the system can reclaim all memory when the process terminates.)

The .NET runtime relies on the garbage collector instead. The purpose of this program is to clean up memory. The idea is that all dynamically requested memory is allocated on the heap (that is true for all languages, although in the case of .NET, the CLR maintains its own managed heap for .NET applications to use). Every so often, when .NET detects that the managed heap for a given process is becoming full and therefore needs tidying up, it calls the garbage collector. The garbage collector runs through variables currently in scope in your code, examining references to objects stored on the heap to identify whichones are accessible from your code — that is, which objects have references that refer to them. Any objects that are not referred to are deemed to be no longer accessible from your code and can therefore be removed. Java uses a system of garbage collection similar to this.

Garbage collection works in .NET because IL has been designed to facilitate the process. The principle requires that you cannot get references to existing objects other than by copying existing references and that IL be type safe. In this context, what we mean is that if any reference to an object exists, then there is sufficient information in the reference to exactly determine the type of the object.

It would not be possible to use the garbage collection mechanism with a language such as unmanaged C++, for example, because C++ allows pointers to be freely cast between types.

One important aspect of garbage collection is that it is not deterministic. In other words, you cannot guarantee when the garbage collector will be called; it will be called when the CLR decides that it is needed, though it is also possible to override this process and call up the garbage collector in your code.

Look to Chapter 13, “Memory Management and Pointers,” for more information on the garbage collection process.

security

.NET can really excel in terms of complementing the security mechanisms provided by Windows because it can offer code-based security, whereas Windows really offers only role-based security.

Role-based security is based on the identity of the account under which the process is running (that is, who owns and is running the process). Code-based security, by contrast, is based on what the code actually does and on how much the code is trusted. Thanks to the strong type safety of IL, the CLR is able to inspect code before running it to determine required security permissions. .NET also offers a mechanism by

which code can indicate in advance what security permissions it will require to run.

The importance of code-based security is that it reduces the risks associated with running code of dubious origin (such as code that you have downloaded from the Internet). For example, even if code is running under the administrator account, it is possible to use code-based security to indicate that that code should still not be permitted to perform certain types of operations that the administrator account would normally be allowed to do, such as read or write to environment variables, read or write to the registry, or access the .NET reflection features.

Security issues are covered in more depth in Chapter 21, “Security.”

application Domains

Application domains are an important innovation in .NET and are designed to ease the overhead involved when running applications that need to be isolated from each other but that also need to be able to communicate with each other. The classic example of this is a web server application, which may be simultaneously responding to a number of browser requests. It will, therefore, probably have a number of instances of the component responsible for servicing those requests running simultaneously.

In pre-.NET days, the choice would be between allowing those instances to share a process (with the resultant risk of a problem in one running instance bringing the whole web site down) or isolating those instances in separate processes (with the associated performance overhead).

Up until now, the only means of isolating code has been through processes. When you start a new application, it runs within the context of a process. Windows isolates processes from each other through address spaces. The idea is that each process has available 4GB of virtual memory in which to store its data and executable code (4GB is for 32-bit systems; 64-bit systems use more memory). Windows imposes an extra level of indirection by which this virtual memory maps into a particular area of actual physical memory or disk space. Each process gets a different mapping, with no overlap between the actual physical memories that the blocks of virtual address space map to (see Figure 1-2).

In general, any process is able to access memory only by specifying an address in virtual memory — processes do not have direct access to physical memory. Hence, it is simply impossible for one process to access the memory allocated to another process. This provides an excellent guarantee that any badly behaved code will not be able to damage anything outside of its own address space. (Note that on Windows 95/98, these safeguards are not quite as thorough as they are on Windows NT/2000/XP/2003/ Vista/7, so the theoretical possibility exists of applications crashing Windows by writing to inappropriate memory.)

Processes do not just serve as a way to isolate instances of running code from each other. On Windows NT/2000/XP/2003/Vista/7 systems, they also form the unit to which security privileges and permissions are assigned. Each process has its own security token, which indicates to Windows precisely what operations that process is permitted to do.

Although processes are great for security reasons, their big disadvantage is in the area of performance. Often, a number of processes will actually be working together, and therefore need to communicate with each other. The obvious example of this is where a process calls up a COM component, which is an executable and therefore is required to run in its own process. The same thing happens in COM when surrogates are used. Because processes cannot share any memory, a complex marshaling process must be used to copy data between the processes. This results in a very significant performance hit. If you need components to work together and do not want that performance hit, you must use DLL-based components and have everything running in the same address space — with the associated risk that a badly behaved component will bring everything else down.

Application domains are designed as a way of separating components without resulting in the performance problems associated with passing data between processes. The idea is that any one process is divided into a number of application domains. Each application domain roughly corresponds to a single application, and each thread of execution will be running in a particular application domain (see Figure 1-3).

If different executables are running in the same process space, then they are clearly able to easily share data, because, theoretically, they can directly see each other’s data. However, although this is possible in principle, the CLR makes sure that this does not happen in practice by inspecting the code for each running application to ensure that the code cannot stray outside of its own data areas. This looks, at first, like an almost impossible task to pull off — after all, how can you tell what the program is going to do without actually running it?

In fact, it is usually possible to do this because of the strong type safety of the IL. In most cases, unless code is using unsafe features such as pointers, the data types it is using will ensure that memory is not accessed inappropriately. For example, .NET array types perform bounds checking to ensure that no out-of-bounds array operations are permitted. If a running application does need to communicate or share data with other applications running in different application domains, it must do so by calling on .NET’s remoting services.

Code that has been verified to check that it cannot access data outside its application domain (other than through the explicit remoting mechanism) is said to be memory type safe. Such code can safely be run alongside other type-safe code in different application domains within the same process.

Error Handling with Exceptions

The .NET Framework is designed to facilitate handling of error conditions using the same mechanism, based on exceptions, that is employed by Java and C++. C++ developers should note that because of IL’s stronger typing system, there is no performance penalty associated with the use of exceptions with IL in the way that there is in C++. Also, the finally block, which has long been on many C++ developers’ wish lists, is supported by .NET and by C#.

Exceptions are covered in detail in Chapter 15, “Errors and Exceptions.” Briefly, the idea is that certain areas of code are designated as exception handler routines, with each one able to deal with a particular error condition (for example, a file not being found, or being denied permission to perform some operation). These conditions can be defined as narrowly or as widely as you want. The exception architecture ensures that when an error condition occurs, execution can immediately jump to the exception handler routine that is most specifically geared to handle the exception condition in question.

The architecture of exception handling also provides a convenient means to pass an object containing precise details of the exception condition to an exception-handling routine. This object might include an appropriate message for the user and details of exactly where in the code the exception was detected.

Most exception-handling architecture, including the control of program flow when an exception occurs, is handled by the high-level languages (C#, Visual Basic 2010, C++), and is not supported by any special IL commands. C#, for example, handles exceptions using try{}, catch{}, and finally{} blocks of code. (For more details, see Chapter 15.)

What .NET does do, however, is provide the infrastructure to allow compilers that target .NET to support exception handling. In particular, it provides a set of .NET classes that can represent the exceptions, and the language interoperability to allow the thrown exception objects to be interpreted by the exception-handling code, regardless of what language the exception-handling code is written in. This language independence

is absent from both the C++ and Java implementations of exception handling, although it is present to a limited extent in the COM mechanism for handling errors, which involves returning error codes from methods and passing error objects around. The fact that exceptions are handled consistently in different languages is a crucial aspect of facilitating multi-language development.

Use of Attributes - 使用表征

Attributes are familiar to developers who use C++ to write COM components (through their use in Microsoft’s COM Interface Definition Language [IDL]). The initial idea of an attribute was that it provided extra information concerning some item in the program that could be used by the compiler.

Attributes are supported in .NET — and hence now by C++, C#, and Visual Basic 2010. What is, however, particularly innovative about attributes in .NET is that you can define your own custom attributes in your source code. These user-defined attributes will be placed with the metadata for the corresponding data types or methods. This can be useful for documentation purposes, in which they can be used in conjunction with reflection technology to perform programming tasks based on attributes. In addition, in common with the .NET philosophy of language independence, attributes can be defined in source code in one language and read by code that is written in another language.

Attributes are covered in Chapter 14, “Reflection.”


三 + 8 =