Yossi Dahan [BizTalk]


Sunday, July 26, 2009

Parallel shape behaviour in BizTalk 2004 and >2006

‘Shiri’ had posted this question in the newsgroup -…

…After open the orchestration
debugger both at BT 2004 and BT 2006 we've recognized a different behaviour:
at 2004 executes through all the branches first shape and then back to the
first branch at parallel and executes the second ahape, the second branch
and etc. at 2006 executes the first branch - first and second shape, then
goes to the second branch and executes the first and second and etc.
It seems that 2004 works more like multithreaded then 2006….

I vaguely remember reading/discussing this difference in the past, but – unfortunately - I can't remember the exact details; I've floated this question around again, and I believe (but treat this with care – this may well be at least somewhat inaccurate) that there has indeed been a change in the behaviour of the parallel shape in BizTalk 2006, here’s some background -

The parallel shape was never intended to provide ‘true’ parallelism, and Microsoft has been fairly clear from the start that BizTalk will not process each branch on it’s own thread (which would have been required).

Darren Jefford explains very well the intention, and the expected behaviour, of the parallel shape in his book Professional BizTalk 2006, which I highly recommend; if you want a quick peek you can read the relevant piece here, the key point he makes is that when thinking about the parallel shape, you need to wear your business analyst hat, and not the developer hat – the parallel shape effectively “says” – hey - I’ll run this code (=branch), but if I reach a point where I’m sitting idle waiting for something to happen (receive, delay or listen shapes) I will go and run that other code (=branch) in the meantime; when I’m through with that (or reached a waiting point again) I will check if I’m ready to process the rest of the first bit of code…and so on…

This is not quite your techie run-things-in-parallel-on-multiple-threads approach, but – from my experience – it is more than enough (if you wanted to run things completely in parallel you could use BizTalk’s pub/sub, which would allow you to potentially get a lot more than just one thread – you might end up on a different machine altogether, for a price :-))

So – indeed – the BizTalk 2006’s (and subsequent versions) parallel shape behaves exactly as I understand it should do (and as is described in Darren’s book) and is consistent with Shiri’s observation - if you have three branches, and neither have a blocking shape – the left most branch will be executed completely, then the next one to its right and then the right most branch; however –and that’s a very important point to remember - Microsoft does not, to the best of my knowledge, guarantee any order of processing between the branches, and this might change in future versions (as indeed it has been seen from 2004 to 2006), so all you can assume is that all the branches can theoretically run in any arbitrary order or indeed in parallel.

Back to the 2006 behaviour - if, however, you had a receive shape as the second shape in the left most branch, when BizTalk would hit this shape it would move on to execute the second branch while it’s waiting for the message to be received; it would come back to the first branch at the earliest point once two things had happened – 1. the message it was waiting for was delivered and 2. it had reached a point in the currently executing branch in which it can stop and re-enter the first branch; this would be a receive shape, delay shape, listen shape or the end of the branch.

So – if that’s the 2006 behaviour, was the behaviour in 2004 different? yes, I believe it was – in 2004 the engine was, some would say, trying to be too clever - BizTalk 2004 would try to run branches on multiple thread if it can; where “if it can” depends on several factors, not the least the state of the thread pool at the time of evaluation; if it managed to do so, you would get code running truly in parallel, as Shiri observed, but there are no guarantees that this would be the case; in that sense BizTalk 2004 is less predictable than later versions of BizTalk, which is exactly the problem with this approach, and - considering that this was never the intention to begin with – I can fully understand the decision to simplify the model in BizTalk 2006.

Labels: ,

Wednesday, July 22, 2009

Integrating DotNetNuke and the Geneva Framework

Was a somewhat painful exercise. sure – it was not helped by the fact that I’m not really familiar with DNN, nor am I really a web developer by any stretch of the imagination, but never mind that – DNN has certain ‘features’ that made making it play nice with the WIF  somewhat ‘challenging’. below are some points we’ve encountered that are worth remembering/considering (in no particular order) -

DNN will redirect to ErrorDisplay.aspx on most errors.
This means that once you’ve configured the web site with FAM and automatic passive redirects, most errors will send you in an endless redirect loop.
In the first few attempts I did to integrate the two, the aliases for my portals in the DNN database were only configured for ‘localhost’; as my STS is on a remote machine (very important for testing federated identity scenarios, obviously) I was now accessing my portal using the machine’s name and/or IP address, for which aliases had to be defined; this error was caught by DNN before the FAM had a chance to execute and so the call to ErrorDisplay.aspx was made when the user was still unauthenticated, but now it no longer carried a security token, which caused the redirect back to the STS and the infinite loop; to avoid that problem, I’ve added a location configuration in the web.config for ErrorDisplay.aspx and set it’s authorisation settings to allow all users – this allowed the error to be displayed despite of the fact the user is not authenticated (something that needs to be considered carefully, of course, but we’re not showing any dangerous details on our error page.)

The membership module performs a lookup for the username in DNN’s membership database.
As we’ve moved the authentication work from DNN to the STS, it is now possible for an authenticated user to not exist in the DNN database; we already have synchronisation process that keeps the two databases (our membership database, used by the STS, and the DNN membership database) aligned as we needed that anyway, but there’s always the chance of things getting out of sync; out of the box, the DNN membership module redirects the user back to the home page, in our case - because we’re using the FAM with auto passive redirects, this will enter an infinite redirect again (as the redirect to the homepage loses the security token), so we’re looking to change the on-error redirect to an allowed page (not an easy task in DNN, it appears)


DNN can host several portals.
DNN supports hosting many portals on the one application (/virtual directory) - driven by database configuration; that means that the FAM configuration, driven via web.config, does not fit very well as we’d have a single realm/replyTo address.
There were two optinos to choose from – we could decide that the entire DNN instance is a single RP, which would mean the existing configuration solution could suffice, but is a security risk – a user’s permissions cannot be checked at the STS at a portal level, only at ‘DNN level’; the other option was to treat each portal independatly, for this to work we had to set the realm and replyto values in the request to the STS dynamic as the configuration story was not enough; and so - we’ve extended the FAM by overriding OnRedirectingToIdentityProvider and setting the realm and reply properties of the SignInRequestMessage  dynamically (based on the HttpRequest, in our case)


Setting the realm and replyTo dynamically, as described above, raised another challenge - the Geneva Frameworks would like you to specify the audienceUris you’d be expecting in the tokens received from the STS; generally – this setting exists in the web.config, but as our realm can be one of many things we were faced with two options – either list all possible audienceURIs in the ‘allowed’ list, which has some security implications, or provide a mechanism to dynamically evaluate the request arriving and see if its allowed.

The problem with the first approach, out-of-the-box, is that it means keep updating the web.config of DNN with newly added portals’ Uris; this actually has two implications – one: it means that whoever sets up a new portal (which is a DNN user, generally), must have access to the config file – not quite what we had planned (or can live with), two: whenever the web.config gets edited, if we did allow that, the appdomain is reloaded, which kicks users our of their sessions (or so I’m told) as well as makes the next call really slow as the application is recompiled; the outcome was clear – we must stay away from the web.config

It appears that there are several ways around this:

  • You can have a single AudienceURI in the RP side, and have code your STS to always return the same one (for all portals, despite the realm provided); you will need a way to find the audienceUri to use (as its no longer the realm from the request, which is generally used), but that’s possible through configuration;  you are also introducing a risk as DNN – the RP – will now accept tokens across portals, but that risk can be mitigated by DNN’s own authorisation.
  • You can load the audienceURIs section of the configuration in the RP from somewhere else but the web.config (a database table, for example); to do this you would need to add a handler to the FederatedAuthentication.ServiceConfigurationCreated event in the FAM (best way is through the constructor of your custom FAM, InitializeModule is called after the configuration has been loaded) on the RP side and set the audience uri for each portal alias in the DNN database; in a sense this is the RP version of the previous option as it will allow any token to any allowed portal access, even if the token is issued to one portal and the redirect goes to another; it does solve the need to edit the web.config, but it does not solve the need to restart the app domain when changes have been made as the call to the database will only happen once – when the module is initialised.

Both the options above provide some answer to the first approach mentioned at the beginning of this section – allow access to all possible realms, without having to edit web.configs.

The two options below talk about how you could implement a more dynamic check – moving further away from the existing method of checking against a static list of audienceUris -

  • You can implement your own SamlSecurityTokenHandler (you might want to implement two – one that inherits from Saml2SecurityTokenHandler and another that inherits from Saml11SecurityTokenHandler) in which you would override the ValidateConditions method; in ValidateConditions you would call the base ValidateConditions method, passing in false for the ‘enforceAudienceRestictions’ parameter – this would ensure the configured audienceUris are not checked by the base method; you would then implement your own audienceUri validation, presumably against the DNN database (the conditions parameter passed  to you will contain the audienceUri provided by the STS); you could use either code or configuration to setup your RP to use these tokens instead of the built in ones.
  • A slightly re-factored version of the above is to wrap the validation code required in a custom SamlSecurityTokenRequirement class in which you override the ValidateAudienceRestriction method; both Saml2SecurityTokenHandler and Saml11SecurityTokenHandler classes allow you to provide them with a custom SamlSecurityTokenRequirement class in order to override the built-in logic; this allows you to write the validation logic just once; you will need to replace the default class in the token handlers with yours, which is best done in the same ServiceConfigurationCreated event mentioned above.

DNN’s UrlRewriteModule will redirect any request to the DNN vdirs copying any paths “lower” than the DNN vdir to the query string -

This means that even without dynamically setting the realm and reply address as described above, out-of-the-box you would simply get an infinite redirect situation due to lost cookies -
If you have a portal with the alias www.MyDomain.com/DNN/MyPortal and you set the realm and reply to to this address, you get redirected to the STS and then back to the portal correctly; cookies will be set at the URL above.
However, the UrlRewriteModulre will redirect the request to to www.MyDomain.com/DNN/default.aspx?alias=MyPortal; as the authentication cookies are stored in the original location (one or more virtual directories “lower” than the DNN root directory) they cannot be found and so the user is redirected back to the STS and so on…
The obvious solution is to set the cookie handler path in the configuration to be the DNN’s virtual directory – it would mean that moving between portals would require obtaining a new token and a roundtrip to the STS, but it will solve the circular redirect, which is better (and in any case this is not a likely scenario as far as real users are concerned, as they will normally work on a single portal)


Ok – possibly the easiest point encountered – we needed to replace the logout functionality (which would logout the user locally from DNN) to the ws-federation single sign out supported by our STS; to do this we simply replace the code in Login.ascx.vb (in our case it existed in [path to root DNN website]\admin\skins) from the out-of-the-box redirect to a redirect to to the STS with ?ws=wsingout1.0 in the query string.


Going over most items in this list with Jon Simpson – the chief architect in one of the companies I work with – he raised an interesting point – why does every time the Geneva Framework gets upset the user ends up in an endless redirect loop? (ok, he didn’t put it quite in those words, and also generally his view is, naturally, limited to the bits he cares, or – have to worry – about, but the man has a point)

I did not buy into this initially, but I suspect there’s a lot of sense in expecting the framework to recognise there’s an endless loop in place and display some error instead of keeping it going; we will certainly need to do something on our end, as an endless redirect, even if cause by a configuration error on our part, is not acceptable, but this should be part of the framework, or so Jon says :-).

Labels: ,

Tuesday, July 21, 2009

New ‘Oslo’ CTP released, and so there’s a new version of BTSDF

Somewhat quietly Microsoft released another CTP of ‘Oslo’, which you can find on the Oslo dev centre; it is great to see the team release early and frequent drops of ‘Oslo’, especially given the impact ‘Oslo’ is likely to have on how we build software.

Having the chance to play with the bits so early, and provide feedback, which they seem to be very keen on receiving, is pretty awesome!

I’ve only had a brief play with it and I guess that, to me, the greatest news are that – all the potential benefits aside – my existing code (BTSDF) still worked as is; of course this is only temporary, as the team have made significant changes (read: improvements) to the API, but they have kept backwards compatibility TEMPORARILY as they work to align all their existing code to the new model as the rest of us worry about ours.

So, next I needed to spend some time looking at the new release in detail and align my code with it, so I reap some of the benefits from the improvements made, but that’s a much better position to be in than – “it’s all broken now and I need to figure out how to fix it”m kudos guys!

Paul Arundel was kind enough to give me a gentle nudge to start taking a look at these changes, and it certainly took me longer than I would have wanted to get around to these things thanks to other commitments (a repeating theme here, recently), but – slowly but surely – I went through my code and am happy to say I’ve now published the necessary changes to codeplex.

I was going to write a post on the changes required when moving to the new code, but Paul had done so already, and so there’s little point in me saying pretty much the same things (Paul is only focused on the aspects that are relevant to his project, but as we both focus on pretty much the same area my words would have been pretty much identical).

There only one point I think is worth mentioning from my end – it’s probably only me – but being able to walk to graph in an easier and more convenient way, and being able to easily access nodes by label (or by checking their brand) motivated me to work more on the grammar itself, or – to be more specific – on the projection of the grammar –  where previously i would just jump through whatever hoops were needed to parse whatever projection I got.
This is a very good thing I think (although it is the grammar itself that really matters, as this is what users will see); the other half of this, though, is that it is still not quite possible to get the projection just right – I can’t get a projection that would work well with any M model for my domain, for instance, but I suspect the smart guys in Redmond are hard at work sorting this out for us…

Labels: ,

Thursday, July 09, 2009

Geneva Framework Query String woes

I bumped into a situation where a query string passed from one RP to another RP was lost as the request went through the STS.

Looking through this carefully I think I figured out why this would happen, and it turns out that the scope of this issue is slightly bigger (and therefore the title of my post is slightly misleading), see the details below - but first – here’s how it goes when everything goes well -

A user tries to access an RP url with some query string parameters; the RP configuration redirects the user to the STS providing realm uri and reply address; as these two values are generally retrieved from the RP configuration, they are unlikely to include the query string parameter.

Good news are, though, that the FAM keeps the original URL requested by the user (excluding the scheme and domain) in a context field passed to the STS with the Sign-in request(the ‘ru’ field).

Once the STS authenticates the user it redirects the request to the reply address provided by the RP; this is NOT the original address the user requested (now in the context) but the reply value provided by the RP through the request, and it will NOT contain the query string parameter.

The FAM, at the RP side, now extracts the URL the user requested from the context and does a second redirect to that URL – the one with the query string.
This time the local cookie is found, the user is authenticated, and the request can be processed by the RP code, where the query string is available to be read.


So – Where is it broken? it appears that if, after all of the above had happened, a request to the RP, with some query string, is redirected again to the STS for authentication for some reason, as the request comes back from the STS to the RP, the SignInResponseMessage is being ignored (as there’s already an authentication cookie) and that second redirect, to the URL in the context does not happen.
In this case, the RP code that will get executed in the page defined in the reply URL in the configuration, which is not likely to have the query string the user requested; in fact – and here’s the bigger scope – it may well be a completely different page to the one the user requested in the browser!


So – the question is – why would I be redirected to the STS when there’s already an authentication cookie for this realm? (after all – if there is cookie I would not be redirected to the STS, right?!) - it turns out there are some ‘edge' cases where this is possible, here’s one -

Step 1

1. Some link somewhere sends the user to the RP using domain name in the URL, with some query string parameters – http://MyDomain.com/SomeApp/Default.aspx?Something=SomethingElse

2.The RP redirects the request to the STS for authentication, providing the realm and reply from the configuration; let’s assume the reply address was set to http://MyDomain.com/SomeApp/SSO.aspx

3. The STS authenticates the user, and redirects back to the reply URL above

4. At the RP, the FAM processes the SignInResponseMessage , stores the authentication cookies for MyDomain.com/SomeApp and then redirects to the context’s http://MyDomain.com/SomeApp/Default.aspx?Something=SomethingElse (note that SSO.aspx never got executed, and that the query string is now available for Default.aspx)


Step 2

5. Now assume that another link somewhere sends the user to the same application, but for whatever reason it uses the server’s IP address instead of the domain – /SomeApp?SomeQueryString">http://<some IP address>/SomeApp/Default.aspx?Something=SomethingElse

6. The RP cannot find any authentication cookies for this URL (as it has the IP address and not the domain name) and so it too, redirects the request to the STS for authentication, however – as its the same configuration, the STS URL, the realm and the reply address are the same as in Step 1, so the STS redirects straight back to http://MyDomain.com/SomeApp/Default.aspx?Something=SomethingElse 

7. At the RP, as the URL is now the same as it was in Step 1, the FAM find the previous cookie set and so it ignores the SignInResponseMessge; this means that the redirect that would have happened to the original URL the user requested does not happen, and so it is the page at the reply address, without any query string, that gets processed


There is actually a much simpler way to reproduce the issue, albeit slightly less realistic (post testing) – if you access the RP via it’s IP address, but the reply address in the configuration is set using the domain name;

The cookie will be stored for the domain name, and so any subsequent attempts to access the RP, with query string parameter, will result in them being removed as the redirect at the FAM does not happen.

Labels: ,