Yossi Dahan [BizTalk]

Google
 

Monday, March 23, 2009

On Atomic Scope and Message Publishing

A few weeks back I worked on a process that looked something like this -

It was triggered by the scheduled task adapter and then used a SQL send port to call SP to return list of ‘things’.
It needed to split the things in the list to individual records, and to start a new, different, process, through pub/sub (to avoid the binary dependency with the called process), for each ‘thing’.

Fairly simple.

A lot of have been said on the different ways to split messages, I won’t repeat this discussion here; I would just say that initially I used a different approach – I used the SQL adapter in the initial, triggering, receive port and then used a receive pipeline, with an XmlDisassembler component, to split the incoming  message so that each record was published individually thus avoiding the need to have a ‘master process’; that back fired though, in my case – I quickly realised I’ll be choking the server with the amount of messages published and needed a way to throttle the execution; I’ve played a bit with host throttling but then came to the conclusion the best approach for me would be to throttle in a process, which is what I’ve done.

And so - to make things interesting, and because I already had it all ready - I decided to use a call to a pipeline from my process to split the message.

The first thing I realised, trying to take that approach, was that I had to change type of the response message received from the SQL port to be XmlDocument (which is an approach I generally dislike – I’m a sucker for strongly-typed-everything) – but my schema was configured as an envelope so that when I call the pipeline from my process it knows how to split it correctly, but, when used in the SQL port BizTalk split the message too early for me – I needed to whole message in the process first, which was no good to me; if , however, I removed the envelope definition from the schema when I would call the pipeline directly from my process it won’t know how to split the message, which is no good either; nor could i have two schemas (BizTalk, as we all know, dones’t like that bit at all, not without even more configuration); XmlDocument it is.

It then came back to me (in the form of a compile time error :-)) that the pipeline variable has to exist in an atomic scope, and so I added one to contain my pipeline variable; I then added the necessary loop with the condition set to the GetNext() method of the pipeline and in each iteration constructed a message using the GetCurrent() method; all standard stuff.

I would then set some context properties to route my message correctly and allow me to correlate the responses (I used a scatter-gather pattern in my master process) and published it to the message box

What I noticed when testing my shiny new process was that all those sub-processes that were meant to start as a result the published messages in my loop were delayed by quite a few minutes (6-8), which seemed completely unreasonable, so I embarked on a troubleshooting exercise which resulting in that big “I should have thought of that!” moment.

While the send shape in my loop successfully completed its act of publishing the message in each iteration, moving my loop to the next message and so on, being in an atomic scope BizTalk would not commit the newly published messages to the message box database, allowing subscriptions to kick in, before the atomic scope would finish; that is to allow it to rollback should something in the atomic scope would fail.
What it meant for me though, was that all the messages were still effectively published at once, which brought me back to square one (or, minus one, actually, considering that the great delay caused my this approach means I’m even worse off from my first debatch-in-pipeline approach).

And so I went back to the old and familiar approach of splitting the messages using xpath in the process, which allowed me to carefully control the publishing rate of messages for my process and throttle them as needed.

Labels: ,

4 Comments:

  • Never try to control flow or pace in adapters/pipelines, they are not meant for that. Instead, if you don't want to do splitting in BizTalk (which you shouldn't; it's very slow), you should consider SQL Server Integration Services (SSIS). This will allow you to use a database to control the flow and pace. For instance you can use SSIS to read a large list (from file or db), inserting rows at a steady pace into a table that is picked up by the orch through an SQL adapter. Also, if you like to control the maximum number of items being processed at one time, you can have SSIS "checking out" a specified number of items from the table, send it to BizTalk (i.e. through a WS), and wait for the orch to "check in" its item, before allowing new items to be sent. SSIS is generally known to be the most effective way of sending off large batches of work into BizTalk.

    A. Berg

    By Anonymous Anonymous, at 23/03/2009, 13:23  

  • Good lesson. I wonder why host throttling did not work for you, can you elaborate on that? Cause it's all game of trade-offs. What if your batch was very large (i.e. large records or huge number of small ones)? Getting it into orchestration and splitting with xpath wouldn't be so efficient either.

    Paul

    By Anonymous Paul, at 06/04/2009, 17:31  

  • Thank Berg for the comment, I do like the SSIS approach, but in my case it was a bit of an overkill for my requirements (from an administration point of view), and I was quite happy to use the orchestration as the size of the messages is known and not too big to be impossible, and performance was not a key factor; I would generally agree with you that messaging components should not be used to control pace (don't agree with you regarding flow), but generally dislike the word 'never' :-).

    Paul - as I hinted before - I was quite happy to fall back to an orchestration solution when I couldn't configure the host to stay alive and well with throttling settings, but it may well be that I didn't spend enough time on it, I just wasn't a big enough issue for me and in any case - A.Berg does have a good point! :-)

    By Blogger Yossi Dahan, at 13/04/2009, 08:01  

  • Good read! Did you ever find a solution to this problem? I'm facing the same exact situation and I was planning to do my debatching in a pipeline inside an orchestration... until I read your post. I need to do this because the xpath method (as your exposed solution here) uses too much memory.

    By Anonymous CG, at 08/04/2015, 01:51  

Post a Comment

<< Home