Yossi Dahan [BizTalk]


Thursday, April 19, 2007

Notes on streaming pipeline components

Working with BizTalk for several years now, I've had the chance to develop quite a few pipeline components, and, as you'd expect, more often than not I'm developing them in a streaming fashion.

A lot has been written on developing streaming pipeline components, so there's little point repeating it all here - the main point is to minimise the memory footprint of your pipeline component, and by doing that in all the components in the pipeline – reducing the memory footprint of the pipeline as a whole.

Reducing memory consumption in a server code is basic for any enterprise development, as many instances of the code may run in parallel, components that consume a lot of memory can quite easily “bring the server to it’s knees”.

So – I’m wrapping the message stream in my own (“eventing”) stream and implementing all the logic of my component in the events raised as the stream is being read by someone else (ideally the messaging agent, as it reads the message to write it to the message box); and by doing so, I was confused to think in one tired moment, that I would get bits of logic from different component in the same pipeline running in parallel as event fire when the stream is being read.

I'll try to describe a simple scenario I used to test this -

I’ve created a test pipeline component that replaces the original stream with a stream that fires an event when the read method is called for the first time (“firstRead”).

The component returns the message with the wrapper stream to the pipeline immediately, but when the event fires, it sleeps for 5 seconds.

I’ve put 4 trace lines in the code - when the component is called and returns, and before and after the sleep.
I then created a receive pipeline that uses this component twice.

What I expected to see is a trace like this:

1. Enter Component (as the first component in the pipeline is being called)
2. Leave Component (first component returns message to pipeline)
3. Enter Component (second component being called)
4. Leave Component (second component returns message to pipeline)
5. Start Sleep (component 1 first read event fires)
6. Start Sleep (Component 2 first read event fires)
7. End sleep (component 1 - sleep is over for first component)
8. End Sleep (Component 2 -sleep is over for second component)

Notice that what I'm expecting to happen here is that the sleep of components 1 and 2, used to simulate hard-working code happens in parallel.

However, the trace I really had was this -

1. Enter Component
2. Leave Component
3. Enter Component
4. Leave Component
5. Start Sleep (component 1 first read event fires)
6. End sleep (sleep is over for first component)
7. Start Sleep (Component 2 first read event fires)
8. End Sleep (sleep is over for second component)

Spot the subtle(?) difference in lines 6 and 7

The event raised in the 2nd instance of the component does not get executed until the 1st component finishes execution; this is because all the code runs on the same thread.

What this demonstrates is that while streaming pipeline components can significantly reduce the memory footprint of your pipeline they cannot speed up processing time by executing things in parallel. Not without exlicit effort at least.

As a side note I’ll mention that similarly the subscription evaluation in the message box does not happen until all the components have finished execution.
I believe this is because of the transaction used in receive-pipelines, so that if the pipeline fails at any stage BizTalk will not process the message.

Theoretically - a way around this, if you really want to achieve parallelism in the pipeline execution, as with any code, is to execute your code in a separate thread.

If I change the component to start a new thread when the event fires and sleep in the new thread, everything works much faster, including anything triggered by the arrival of the message to the msgbox (as the pipeline or the messaging agent no longer wait for the sleep to be over before executing their next tasts), but I understand this is not recommended.

To start with - you loose transactivity with BizTalk – if anything in the pipeline, or the messaging agent's work with message box may fail, you will not know about it; your code will still run, and to make matters worse – the pipeline may then be executed again for the same message (be it because of adapter logic or administrators decision) and so your code will execute again; for the same message. in many cases this is unacceptable.

In addition to this I am lead to believe there is some optimisation happening in the messaging agent around threading, and starting your own threads may interfere with it.

So bottom line here - if you have to use threads in your pieplines to get parellelism, accept the risks you're taking and test really carefully all possible eventualities.

There’s a little more to play around with, especially checking all of this in a send pipeline as well, as I expect some differences, but I’m not sure when I’ll get the chance to look at it, hopefully soon.



  • Indeed it looks like there something different going on in a send pipeline with regard to threading. I made a pipeline component which can be used in a recieve pipeline as well as in a send pipeline. This component creates a seperate thread. I have no problem with the receiving messages, but when it comes to sending, the send port service just hangs.

    By Anonymous Anonymous, at 09/04/2009, 12:49  

  • nicely done

    By Anonymous rajni, at 08/01/2011, 06:22  

  • if reading the message is required (in component) you save the reading time (read the message once instead of three)

    By Blogger shmuel.f, at 31/10/2011, 09:23  

Post a comment

<< Home