Code generation in .NET with Roslyn (part 2)

5 July 2012
This post was originally published on the Softwire blog

This post is the second in a series on code generation on the .NET platform. In this post, we will take a closer look at Microsoft’s Community Technology Preview of ‘Roslyn’. In brief, this is a C# compiler implemented in managed code that exposes an API to let you hook into the compilation process.

Recap

In the previous post, I introduced Roslyn as an interesting tool for code generation, in particular when generating source code from other source code. Roslyn is especially appealing for this purpose because it provides a strongly-typed model for working with source code and (unlike most code generation approaches) allows you to use the same model for both input and output.

One of the common drawbacks of most other options for code generation (as discussed in the previous post) is the need to translate from one model to another and how clumsy this can be, particularly when you want to carry across some elements of the input source code without changing them. A detailed example of when you might need to do this is discussed in the appendix at the end of this post.

This kind of code generation should be easier to implement using Roslyn, since using the same model for input and output doesn’t force you to create your output code from scratch, but instead allows you to generate a modified version of your input code. The overall outline of the process is as follows:

Read in the source code of the original contract and get Roslyn to parse it into a strongly-typed syntax tree
Manipulate the bits of the syntax tree that we’re interested in, leaving the rest alone
Get Roslyn to write out code for the modified syntax tree

The rest of this post covers what I learned as I went about implementing a Roslyn-based code generator in this way.

Working with Roslyn

The current Roslyn release has some limitations (there’s a complete list of known issues on the MSDN forums, which has been updated for the second CTP). However the API is supposed to be fairly stable, so anything learnt about it or written against it now should remain relevant for the final release. I found this to be the case when updating from the first CTP to the second. I did have to make some changes, but in most cases it was very easy to see how to translate my code to use the new API. None of the changes forced me to re-order things or otherwise alter the structure of my program; it was mostly just a matter of using slightly different methods and updating some property names.

Creating Syntax Trees & Nodes

You can parse a string into a syntax tree using the SyntaxTree.ParseCompilationUnit method. The API of Roslyn’s syntax tree model is reasonably discoverable. All the various syntax objects are created by static factory methods, which you can reach by typing ‘Syntax.‘ and following Intellisense. The Factory methods have lots of parameters, but make heavy use of named parameters with default values, so you can get away without specifying all of them. However, you need to be careful about this…

One thing you’ll notice quite quickly is that the syntax tree model is designed to be able to represent broken code. This makes sense, since one of the intended uses of Roslyn is writing code fix-up tools like ReSharper. This does mean that if you want to transform from valid code to valid code, there’s quite a bit of additional complexity in the API that you might prefer not to have to think about (e.g. having to specify that you’d like separators between your parameters, or semi-colons at the end of you interface method signatures, when anything else wouldn’t compile anyway). When you create a new syntax node without specifying some property of it, the API seemed to be slightly inconsistent in whether the default behaviour will be the most sensible one (i.e. whatever will compile).

Manipulating the model

Syntax trees and the nodes within them are immutable. Most syntax nodes have an Update method which doesn’t affect the original node, but returns a new one with your changes applied. For example, the original Roslyn documentation suggested something like this for renaming a namespace:

NamespaceDeclarationSyntax newNamespace =
    oldNamespace.Update(oldNamespace.NamespaceKeyword,
                        newNamespaceName,
                        oldNamespace.OpenBraceToken,
                        oldNamespace.Externs,
                        oldNamespace.Usings,
                        oldNamespace.Members,
                        oldNamespace.CloseBraceToken,
                        oldNamespace.SemicolonTokenOpt);

For some reason, these Update methods don’t make the same use of named parameters and default values that the factory methods do. This means that if you just want to change one property they’re rather pointlessly verbose (as in the above example). I previously wrote my own extension methods for this purpose, so the above would become:

// 'Update' in this case is an extension method
NamespaceDeclarationSyntax newNamespace =
    oldNamespace.Update(name: newNamespaceName);

However, with the second Roslyn CTP in June, Microsoft have addressed this in a slightly different way by adding lots of fluent-style methods for updating one property at a time, so the above would become:

NamespaceDeclarationSyntax newNamespace = oldNamespace.WithName(newNamespaceName);

You can create a new updated tree by calling Roslyn’s ReplaceNodes() extension method on the root node (which again doesn’t modify the original tree but returns a new tree with the specified nodes replaced), passing it the nodes you want to replace and a function for transforming each node. The signature for ReplaceNodes() looks like this:

public static TRoot ReplaceNodes<TRoot, TNode>(
        this TRoot root,
        IEnumerable<TNode> oldNodes,
        Func<TNode, TNode, SyntaxNode> computeReplacementNode)
    where TRoot : SyntaxNode where TNode : SyntaxNode;

It’s not immediately obvious why the computeReplacementNode function type has two input parameters of type TNode. In all the cases where I was using this method, Roslyn always passed in the exact same (i.e. reference equal) object for both. Fortunately, the second CTP has an extremely helpful FAQ with extensive code samples in the form of unit tests, which includes an example covering this method. It turns out that the first parameter represents the node before any replacements have been made, and the second parameter represents the original node with its descendants already replaced. The two arguments may often be the same if you’re only replacing one or two specific nodes, rather than performing a more general transformation on most or all of the nodes in the tree.

If you want to add or remove nodes of a tree (rather than just updating them), this typically becomes a matter of updating a list property on the relevant parent node, again either using Update() or one of the new fluent-style “With-“ methods.

Outputting source code

Finally, you can get back from a source tree to a string by calling GetText() on your new root node. However, you’ll definitely want to call Format() on the root node before you do so: This method is responsible for making sure the outputted code has sensible and consistent whitespace. Beware that this is more than just a matter of making your code look pretty: If you don’t call Format, Roslyn might miss out vital things like spaces between parameter types and parameter names (think about it for a second…)! Note that you only need to call Format() on the outermost node: Calling it on inner nodes first makes no difference (and presumably takes some amount of processing).

With the second CTP, the formatting methods have been moved out into the Roslyn.Services assembly, and the API has changed slightly so that Format() returns an IFormattingResult object rather than a new root node. This interface includes a new GetTextChanges method, which may be useful in some scenarios. However, if you just want the complete text for the newly formatted node though, you can use the method chain .Format().GetFormattedRoot().GetText();

Gotchas

I did encounter a few cases where I had to use the debugger to explore the original syntax tree and work out what was going on (although, a couple of samples included with the CTP are syntax visualization tools, which may also be useful for this purpose). For example: Many SyntaxNode types have an Attributes property of type SyntaxList, which makes sense. What didn’t make sense to me initially was that each AttributeDeclarationSyntax in this list itself has a property called Attributes of type SeparatedSyntaxList. I eventually realised this was due to an esoteric feature of the C# language that allows you to add attributes to a class using a comma-seperated [Attribute1, Attribute2, Attribute3] syntax, rather than the more usual practice of putting each attribute in it’s own pair of square brackets on a new line. I ended up just using attributeList.SelectMany(a => a.Attributes).

This is a good example of how Roslyn’s source code model differs significantly from the Reflection model (which is only concerned with compiled types, where the two cases above would appear identical) or even CodeDOM (which is a model of code rather than of compiled types, but is too general to ever differentiate between such minor syntax variations). This clearly illustrates how Roslyn gives you more control than something like CodeDOM, but at the cost of some additional complexity you have to bear in mind.

Wrapping up

Having successfully got Roslyn doing what I wanted it to, I thought it would be worth packaging up my Roslyn-based code generator into a neat Visual Studio extension because, after all, how hard could it be? We’ll find out in the next post…

Appendix: A real-world problem

The particular proof-of-concept I chose to implement for this exercise was a generator for creating an asynchronous WCF service interface from a synchronous version. This is a common requirement in order to make use of some clever WCF plumbing that allows you to implement a service contract consisting of simple, synchronous methods, and to consume that service via a contract containing corresponding asynchronous methods (which may be more appropriate for some clients or communication scenarios). An example of what these two contracts might look like follows:

[ServiceContract]
[XmlSerializerFormat]
public interface IService1
{
  [OperationContract]
  string GetData(int value);

  [OperationContract]
  OutputType GetComplexData(InputType value);
}

[ServiceContract]
[XmlSerializerFormat]
public interface IService1Async
{
  [OperationContract(AsyncPattern = true)]
  IAsyncResult BeginGetData(int value, AsyncCallback callback, object state);
  string EndGetData(IAsyncResult result);

  [OperationContract(AsyncPattern = true)]
  IAsyncResult BeginGetComplexData(InputType value,
                                   AsyncCallback callback, object state);
  OutputType EndGetComplexData(IAsyncResult result);
}

Even if you’re not familiar with WCF, you can see that the second interface contract is a lot more complex than the first and may be correspondingly more difficult to implement. You can also see that translating the first interface contract into the second is just a matter of following some simple rules and could be quite a mechanical process.

Aside:

In fact, this translation can be performed by svcutil.exe, a program that ships with the .NET framework, the main purpose of which is to generate code for WCF service clients. However, if the service and client are part of the same project and the same codebase, you don’t need to generate contracts on the client side (it’s also quite difficult to integrate svcutil into your build process if you want the client to automatically pick up changes to the service, since it requires the service code to be not only compiled but deployed and running). For these reasons a lot of people put their service contracts in a shared assembly and reference it directly from the client. However, this means you also lose the benefit of svcutil creating asynchronous versions of your service contracts.

What would be nice is to get an asynchronous version of your WCF contract that automatically stayed up-to-date at design time (without having to build and run your service). I’ve come across a few attempts to do this which have forgotten to carry across something from the original definition to the generated one (e.g. the XmlSerializerFormatAttribute on the contracts above), or have included a lot of complex code to handle all the fiddly edge-cases you have to worry about (e.g. writing out attribute parameters correctly) when attempting to faithfully reproduce all of these aspects of the original code. What you really want is a code generator that

Transforms each service method according to the rules for turning synchronous methods into asynchronous ones
Doesn’t alter any other part of the code, but automatically reproduces it in the generated contract

Writing a code generator using Roslyn allowed me to focus on the first point, while achieving the second point without even having to think about it.