Is the goal of clear componentization defeated by sharing too much type information between libraries? Maybe you need efficient strongly typed data storage, but it would be very expensive if you need to update your database schema every time the object model evolves cost, so would you rather infer its type schema at runtime? Do you need to deliver components that accept arbitrary user objects and handle them in some intelligent way? Do you want the library's compiler to be able to programmatically Tell you what their types are?
If you find yourself struggling to maintain strongly typed data structures while maximizing runtime flexibility, you'll probably want to consider reflection and how it can improve your software. In this column, I'll explore the System.Reflection namespace in the Microsoft .NET Framework and how it can benefit your development experience. I'll start with some simple examples and end with how to handle real-world serialization situations. Along the way, I'll show how reflection and CodeDom work together to efficiently handle runtime data.
Before I delve into System.Reflection, I'd like to discuss reflective programming in general. First, reflection can be defined as any functionality provided by a programming system that allows programmers to inspect and manipulate code entities without prior knowledge of their identity or formal structure. There’s a lot to cover in this section, so I’ll go into it one by one.
First, what does reflection provide? What can you do with it? I tend to divide typical reflection-centric tasks into two categories: inspection and manipulation. Inspection requires analyzing objects and types to gather structured information about their definition and behavior. Apart from a few basic provisions, this is often done without any prior knowledge of them. (For example, in the .NET Framework, everything inherits from System.Object, and a reference to an object type is often the general starting point for reflection.)
Operations dynamically invoke code using information gathered through inspection, creating new instances, or even types and objects can be easily dynamically restructured. One important point to make is that for most systems, manipulating types and objects at runtime results in performance degradation compared to doing the equivalent operations statically in source code. This is a necessary trade-off due to the dynamic nature of reflection, but there are many tips and best practices for optimizing the performance of reflection (see msdn.microsoft.com/msdnmag/issues/05 for more in-depth information on optimizing the use of reflection /07/Reflection).
So, what is the goal of reflection? What does the programmer actually inspect and manipulate? In my definition of reflection, I used the new term "code entity" to emphasize the fact that from the programmer's perspective, Reflection techniques sometimes blur the lines between traditional objects and types. For example, a typical reflection-centric task might be:
start with a handle to object O and use reflection to obtain a handle to its associated definition (type T).
Examine type T and obtain a handle to its method M.
Call method M of another object O' (also of type T).
Note that I'm shuttling from one instance to its underlying type, from that type to a method, and then using the method's handle to call it on another instance - obviously this is using traditional C# programming in the source code Technology cannot achieve it. After discussing the .NET Framework's System.Reflection below, I'll explain this situation again with a concrete example.
Some programming languages provide reflection natively through syntax, while other platforms and frameworks (such as the .NET Framework) provide it as a system library. Regardless of how reflection is provided, the possibilities for using reflection technology in a given situation are quite complex. The ability of a programming system to provide reflection depends on many factors: Does the programmer make good use of the features of the programming language to express his concepts? Does the compiler embed enough structured information (metadata) in the output to facilitate future analysis? Interpretation? Is there a runtime subsystem or host interpreter that digests this metadata? Does the platform library present the results of this interpretation in a way that is useful to programmers?
If you have in mind a complex, object-oriented type system, but it appears as a simple, C-style function in the code, and there is no formal data structure, then it is obviously impossible for your program to dynamically infer that the pointer of a certain variable v1 points to an object instance of a certain type T . Because after all, type T is a concept in your head; it never appears explicitly in your programming statements. But if you use a more flexible object-oriented language (such as C#) to express the abstract structure of the program, and directly introduce the concept of type T, then the compiler will convert your idea into something that can later be passed through the appropriate Logic to understand the form, as provided by the common language runtime (CLR) or some dynamic language interpreter.
Is reflection entirely a dynamic, runtime technology? Simply put, it's not. There are many times throughout the development and execution cycle when reflection is available and useful to developers. Some programming languages are implemented through stand-alone compilers that convert high-level code directly into instructions that the machine can understand. The output file only includes compiled input, and the runtime has no supporting logic for accepting opaque objects and dynamically analyzing their definitions. This is exactly the case with many traditional C compilers. Because there is little supporting logic in the target executable, you can't do much dynamic reflection, but compilers do provide static reflection from time to time—for example, the ubiquitous typeof operator allows programmers to check type identifiers at compile time.
A completely different situation is that interpreted programming languages always get execution through the main process (scripting languages usually fall into this category). Since the complete definition of the program is available (as the input source code), combined with the complete language implementation (as the interpreter itself), all the techniques needed to support self-analysis are in place. This dynamic language frequently provides comprehensive reflection capabilities, as well as a rich set of tools for dynamic analysis and manipulation of programs.
The .NET Framework CLR and its host languages such as C# are in the middle. The compiler is used to convert source code into IL and metadata. The latter is lower level or less "logical" than the source code, but still retains a lot of abstract structure and type information. Once the CLR starts and hosts this program, the System.Reflection library of the base class library (BCL) can use this information and return information about the object type, type members, member signatures, and so on. In addition, it can also support calls, including late-binding calls.
Reflection in .NET
To take advantage of reflection when programming with the .NET Framework, you can use the System.Reflection namespace. This namespace provides classes that encapsulate many runtime concepts, such as assemblies, modules, types, methods, constructors, fields, and properties. The table in Figure 1 shows how the classes in System.Reflection map to their conceptual runtime counterparts.
Although important, System.Reflection.Assembly and System.Reflection.Module are primarily used to locate and load new code into the runtime. In this column, I will not discuss these parts and assume that all relevant code has already been loaded.
To inspect and manipulate loaded code, the typical pattern is mainly System.Type. Typically, you start by obtaining a System.Type instance of the runtime class of interest (via Object.GetType). You can then use various methods of System.Type to explore the type's definition in System.Reflection and obtain instances of other classes. For example, if you are interested in a specific method and want to obtain a System.Reflection.MethodInfo instance of this method (perhaps through Type.GetMethod). Likewise, if you are interested in a field and want to obtain a System.Reflection.FieldInfo instance of this field (perhaps through Type.GetField).
Once you have all necessary reflection instance objects, you can continue by following the steps for inspection or manipulation as needed. When checking, you use various descriptive properties in the reflective class to get the information you need (Is this a generic type? Is this an instance method?). When operating, you can dynamically call and execute methods, create new objects by calling constructors, and so on.
Checking Types and Members
Let's jump into some code and explore how to check using basic reflection. I'll focus on type analysis. Starting with an object, I'll retrieve its type and then examine a few interesting members (see Figure 2).
The first thing to note is that in the class definition, it seems at first glance that there is a lot more space to describe the methods than I expected. Where do these extra methods come from? Anyone versed in the .NET Framework object hierarchy will recognize these methods inherited from the common base class Object itself. (In fact, I first used Object.GetType to retrieve its type.) Additionally, you can see the getter function for the property. Now, what if you only need the explicitly defined functions of MyClass itself? In other words, how do you hide the inherited functions? Or maybe you only need the explicitly defined instance functions?
Just take a look online at MSDN and you'll know I found that everyone is willing to use the second overloaded method of GetMethods, which accepts the BindingFlags parameter. By combining different values from the BindingFlags enumeration, you can have a function return only the desired subset of methods. Replace the GetMethods call with:
GetMethods(BindingFlags.Instance | BindingFlags.DeclaredOnly |BindingFlags.Public)
As a result, you get the following output (note that there are no static helper functions and functions inherited from System.Object).
Reflection Demo Example 1
Type Name: MyClass
Method Name: MyMethod1
Method Name: MyMethod2
Method Name: get_MyProperty
Property Name: MyProperty
What if you know the type name (fully qualified) and members beforehand? How do you accomplish retrieving from an enum type to Type conversion? With the code in the first two examples, you already have the basic components to implement a primitive class browser. You can find a runtime entity by name and then enumerate its various related properties.
Dynamically calling code
So far I have obtained handles to runtime objects (such as types and methods) for descriptive purposes only, such as printing their names. But how do you do more? How do you actually call a method?
A few key points in this example are: first, a System.Type instance is retrieved from an instance of MyClass, mc1, and then, a MethodInfo instance is retrieved from that type. Finally, when MethodInfo is called, it is bound to another MyClass (mc2) instance by passing it as the first parameter of the call.
As mentioned earlier, this example blurs the distinction between types and object usage that you would expect to see in source code. Logically, you retrieve a handle to a method and then call the method as if it belonged to a different object. For programmers who are familiar with functional programming languages, this may be a breeze; but for programmers who are only familiar with C#, it may not be so intuitive to separate object implementation and object instantiation.
Putting it all together
So far I've discussed the basic principles of checking and calling, and now I'll put them together with concrete examples. Imagine you want to deliver a library with static helper functions that must handle objects. But at design time, you don't have any idea of the types of these objects! It depends on the function caller's instructions on how he wants to extract meaningful information from these objects. The function will accept a collection of objects, and a string descriptor of the method. It will then iterate through the collection, calling each object's methods and aggregating the return values with some function.
For this example, I'm going to declare some constraints. First, the method described by the string parameter (which must be implemented by each object's underlying type) will not accept any parameters and will return an integer. The code will iterate through the collection of objects, calling the specified method, and gradually calculate the average of all values. Finally, since this is not production code, I don't have to worry about parameter validation or integer overflow when summing.
When browsing the sample code, you can see that the agreement between the main function and the static helper ComputeAverage does not rely on any type information other than the common base class of the object itself. In other words, you can completely change the type and structure of the object being transferred, but as long as you can always use a string to describe a method that returns an integer, ComputeAverage will work fine!
A key issue to note is that it is hidden in The last example is related to MethodInfo (general reflection). Note that in the foreach loop of ComputeAverage, the code only grabs a MethodInfo from the first object in the collection and then binds it to the call for all subsequent objects. As the coding shows, it works fine - this is a simple example of MethodInfo caching. But there is a fundamental limitation here. A MethodInfo instance can only be called by an instance of the same hierarchical type as the object it retrieves. This is possible because instances of IntReturner and SonOfIntReturner (inherited from IntReturner) are passed in.
In the sample code, a class named EnemyOfIntReturner has been included, which implements the same basic protocol as the other two classes, but does not share any common shared types. In other words, the interfaces are logically equivalent, but there is no overlap at the type level. To explore the use of MethodInfo in this situation, try adding another object to the collection, get an instance via "new EnemyOfIntReturner(10)", and run the example again. You will encounter an exception indicating that MethodInfo cannot be used to call the specified object because it has absolutely nothing to do with the original type from which the MethodInfo was obtained (even though the method name and underlying protocol are equivalent). To make your code production-ready, you need to be prepared to encounter this situation.
A possible solution could be to analyze the types of all incoming objects yourself, retaining the interpretation of their shared type hierarchy (if any). If the next object's type is different from any known type hierarchy, a new MethodInfo needs to be obtained and stored. Another solution is to catch the TargetException and re-obtain a MethodInfo instance. Both solutions mentioned here have their pros and cons. Joel Pobar wrote an excellent article for the May 2007 issue of this magazine about MethodInfo buffering and reflection performance, which I highly recommend.
Hopefully, this example demonstrates adding reflection to an application or framework to add more flexibility for future customization or extensibility. Admittedly, using reflection can be a bit cumbersome compared to equivalent logic in native programming languages. If you feel that adding reflection-based late binding to your code is too cumbersome for you or your clients (after all, they need their types and code to be accounted for in your framework somehow), then it may only be necessary in moderation flexibility to achieve some balance.
Efficient Type Handling for Serialization
Now that we've covered the basic principles of .NET reflection through several examples, let's take a look at a real-world situation. If your software interacts with other systems through Web services or other out-of-process remoting technologies, you've likely encountered serialization problems. Serialization essentially converts active, memory-occupying objects into a data format suitable for online transmission or disk storage.
The System.Xml.Serialization namespace in the .NET Framework provides a powerful serialization engine with XmlSerializer, which can take any managed object and convert it to XML (XML data can also be converted back to a typed object instance in the future, This process is called deserialization). The XmlSerializer class is a powerful, enterprise-ready piece of software that will be your first choice if you face serialization issues in your project. But for educational purposes, let's explore how to implement serialization (or other similar runtime type handling instances).
Consider this: You are delivering a framework that takes object instances of arbitrary user types and converts them into some smart data format. For example, assume you have a memory-resident object of type Address as shown below:
(pseudocode)
classAddress
{
AddressID id;
String Street, City;
StateType State;
ZipCodeType ZipCode;
}
How to generate an appropriate data representation for later use? Perhaps a simple text rendering will solve this problem:
Address: 123
Street: 1 Microsoft Way
City: Redmond
State: WA
Zip: 98052
If the formal data that needs to be converted is fully understood in advance type (e.g. when writing the code yourself), things become very simple:
foreach(Address a in AddressList)
{
Console.WriteLine(“Address:{0}”, a.ID);
Console.WriteLine(“tStreet:{0}”, a.Street);
... // and so on
}
However, things can get really interesting if you don't know in advance what data types you'll encounter at runtime. How do you write general framework code like this?
MyFramework.TranslateObject(object input, MyOutputWriter output)
First, you need to decide which type members are useful for serialization. Possibilities include capturing only members of a specific type, such as primitive system types, or providing a mechanism for type authors to indicate which members need to be serialized, such as using custom properties as markers on type members). You can only capture members of a specific type, such as primitive system types, or the type author can state which members need to be serialized (possibly by using custom properties as markers on the type members).
Once you have documented the data structure members that need to be converted, what you then need to do is write the logic to enumerate and retrieve them from the incoming objects. Reflection does the heavy lifting here, allowing you to query both data structures and data values.
For the sake of simplicity, let's design a lightweight conversion engine that takes an object, gets all its public property values, converts them to strings by calling ToString directly, and then serializes the values. For a given object named "input", the algorithm is roughly as follows:
call input.GetType to retrieve a System.Type instance, which describes the underlying structure of the input.
Use Type.GetProperties and the appropriate BindingFlags parameter to retrieve public properties as PropertyInfo instances.
Properties are retrieved as key-value pairs using PropertyInfo.Name and PropertyInfo.GetValue.
Call Object.ToString on each value to convert it (in basic fashion) to string format.
Pack the name of the object type and the collection of property names and string values into the correct serialization format.
This algorithm significantly simplifies things while also capturing the point of taking a runtime data structure and turning it into self-describing data. But there's a problem: performance. As mentioned before, reflection is very costly for both type processing and value retrieval. In this example, I perform a complete type analysis on each instance of the provided type.
What if it was possible to somehow capture or preserve your understanding of a type's structure so that you could effortlessly retrieve it later and efficiently handle new instances of that type; in other words, skip ahead to step #3 in the example algorithm? The good news is that it's possible to do this using features in the .NET Framework. Once you understand a type's data structure, you can use CodeDom to dynamically generate code that binds to that data structure. You can generate a helper assembly that contains a helper class and methods that reference the incoming type and access its properties directly (like any other property in managed code), so type checking only impacts performance once.
Now I will fix this algorithm. New type:
Get the System.Type instance corresponding to this type.
Use the various System.Type accessors to retrieve the schema (or at least the subset of the schema useful for serialization), such as property names, field names, etc.
Use the schema information to generate a helper assembly (via CodeDom) that links with the new type and handles instances efficiently.
Use code in a helper assembly to extract instance data.
Serialize data as needed.
For all incoming data of a given type, you can skip ahead to step #4 and get a huge performance improvement over explicitly checking each instance.
I developed a basic serialization library called SimpleSerialization that implements this algorithm using reflection and CodeDom (downloadable in this column). The main component is a class named SimpleSerializer, which is constructed by the user with an instance of System.Type. In the constructor, the new SimpleSerializer instance analyzes the given type and generates a temporary assembly using helper classes. The helper class is tightly bound to the given data type and handles the instance as if you were writing the code with complete prior knowledge of the type.
The SimpleSerializer class has the following layout:
class SimpleSerializer
{
public class SimpleSerializer(Type dataType);
public void Serialize(object input, SimpleDataWriter writer);
}
Simply amazing! The constructor does the heavy lifting: it uses reflection to analyze the type structure and then uses CodeDom to generate the helper assembly. The SimpleDataWriter class is just a data sink used to illustrate common serialization patterns.
To serialize a simple Address class instance, use the following pseudocode to complete the task:
SimpleSerializer mySerializer = new SimpleSerializer(typeof(Address));
SimpleDataWriter writer = new SimpleDataWriter();
mySerializer.Serialize(addressInstance, writer);
End
It is highly recommended that you try out the sample code yourself, especially the SimpleSerialization library. I've added comments to some interesting parts of SimpleSerializer, hope that helps. Of course, if you need strict serialization in production code, you really have to rely on the technologies provided in the .NET Framework (such as XmlSerializer). But if you find that you need to work with arbitrary types at runtime and handle them efficiently, I hope you'll adopt my SimpleSerialization library as your solution.