Yossi Dahan [BizTalk]


Thursday, August 23, 2007

Thoughts about static members and local caching in BizTalk Server

(sorry for the long post)

I've recently read Richard Seroter's very useful post around static objects in BizTalk server and singleton classes, which also lead me to Saravana Kumar's post about using static classes to implement caching.

Like most BizTalk developers - I assume - I use static methods quite often; mostly they come handy to developers in environments like BizTalk Server because in our world we already have a good "container" for state - the orchestration class, and so quite often we simply need a piece of code to help us complete a very specific, atomic, task that does not require a separate state to be maintained.

Of course having static, stateless, methods also helps avoiding certain serialisation issues we would have had to deal with had we used a class instance.

However, that is not exactly what Richard and Saravana are talking about - both posts (although they have tackled this from slightly different perspectives I believe) - their classes maintained state but provided a static interface - both examples had static members to hold information;

While under some circumstances having static members is a necessity, or at least very useful, it introduces a fair amount of complexity into the design of the process - with the level of complexity varying depending on the actual requirements from the static data.

As I believe Saravana described a more specific use of static members - local caching - I thought I'd tackle this one first; I share the view that while caching is often introduced to improve performance, it also often has exactly the opposite effect (or no effect at all); Saravana writes at the start of his post -

"There is no necessity to explain the importance of caching in any server based developments like BizTalk, ASP .NET etc. You need to plan early in your development cycle to cache resources in order to make most out of your limited server resources."

I think this is inaccurate, at least as I understand it, but I guess it really depends what "resources" means in the context -

With a server application such as BizTalk server "limited server resources" frequently refers to the server's memory, in which case, local caching of data will actually put more strain on this resource; it might improve performance as it can save roundtrips to the database server, especially if the database server is particularly busy or the network is not as fast as it should be, or it might slow the server down as less memory is available for it to do it's job. anyway - if you include memory in your "resources" - local caching will hardly help you make the most of it.

In many cases going to SQL server to do that lookup every single time ends up being faster (or at least not slower) than caching the data locally. The reason being that storing all this information in the server's memory consumes resources as well; if you have several hosts that use the data you will need to cache it in each host (or AppDomain) consuming even more memory; now - considering having enough available memory is key requirement for a server to run fast - this can easily get quite painful.

To make matters worse - if you have multiple servers in your farm (and you should have) - you are now repeating this on each server; but more importantly - now you have to worry about making sure both caches are ALWAYS in synch, otherwise you'll get unexpected results from your processes.

You also need to worry about having some mechanism that ensures data is up-to-date (maybe you have to go and retrieve newer data periodically, after the data's expiry time or anytime expired record is being requested), maybe you need some mechanism to invalidate the data in the cache when someone/something updates the source of the data?

The list goes on and on but bottom line is - your code becomes quickly more complex - which means it is slower, it is harder to maintain and harder to fix - which means it may not be as optimal as you want it to be and also you've used a lot more memory.

Now, if we're talking about data coming from SQL one has to remember that SQL, as far as I know, is quite smart about it's own caching strategy, so if you're using the data frequently enough it will be cached anyway (and if not you shouldn't be caching it anyway).

Again - I'm not saying that caching is always bad - but it is definitely something that needs to be justified before it should be implemented and so should not necesarily be "planned early" but rather implemented when it becomes obvious data access is indeed a problem.

If, for example, you have to work with a very busy SQL server, or the network is not as fast as it should be, or the process of retrieving the data is quite long and complex and thus slow (probably worth revisiting in that case, but anyway) - than it might make perfect sense to cache the data - but if it's a simple lookup of some sort or something similar - more often than not caching will not bring a significant improvement in my experience.

Richard's post, as I understand it, speaks much more generically about how to implement a singleton pattern to provide access to static data in a thread-safe manner; my thoughts around this are quite similar - if you have to have static members, this is probably the way to go, but it should be avoided whenever possible -
The singleton pattern is a very useful pattern, in particular when you need to share data between processes, for example between different instances of an orchestration, or between instances of a running pipeline (in a host).

What I think was not clear from Richard's post (but it could well be just me) is when this is useful and when it is not - Richard clearly illustrated that, in BizTalk, the very limiting fact that the messaging engine and the orchestration engine are separate systems, means there's no simple way to share data between them and while technically, within any one sub-system (in an AppDomain, in a host), a singleton pattern can provide a good way to share data; the hosting model in BizTalk means that you cannot assume, for example, that information that exists in a receive pipeline will exist in the send pipline; or that information that is available for one process will be available for another (as they all may be hosted by different instances) although

What is worth stressing (and again, it might just be me being difficult) that you don't have to introduce a singleton pattern every time you need to use static methods - if you don't have to ensure that in 100% of the time there will be only one instance of the class you can safely, I think, simply use static members.

And more importantly - if you don't have static members - you don't need worry about this - and so while this pattern is very useful it mostly demonstrates the complexities around keeping static/shared information in products like BizTalk Server; data that is not owned by the specific instance of the process; and so cannot be "managed" by it.

I'm not even going into the complexities around the persistence of the shared data (I don't believe static members will be persisted when an instance dehydrates, so what happens when the server/host is being rebooted? What happens when BizTalk decides to free resources and remove assemblies from memory, etc.

I guess the main point I'm trying to make, in what has become probably a too long post, is that processes manage state and a lot of effort is going into that, and anything outside that has to be maintained separately which adds complexity and has to be justified.

Labels: , , ,


  • Hi Yossi,

    Here is an article you might be interested:



    By Anonymous Anonymous, at 24/03/2009, 22:23  

Post a comment

<< Home