Yossi Dahan [BizTalk]

Google
 

Monday, May 05, 2008

Do we need schemas?

This is somewhat of a recurring theme with me recently, but I want to discuss the contents of the management database; more specifically I want to discuss the fact that schemas get deployed to it and that most other things deployed will have a strong dependency on schemas.

As schemas are always at the bottom of the dependency chain, this means is that on top of the expected difficulties one can experience when needing to change schemas and the impact on other system, the actual act of deploying a new schema.

At best this is simply an annoyance to a developer who needs to re-deploy his entire solution as the schema evolves through the development cycle (versioning is not applicable in this scenario);

At worst this is an operational nightmare if a solution has to be updated/patched/evolved where a good versioning story does not exist (as is all too often the case, not that versioning would have solved this all).

As we are forced to remove the entire solution and then re-deploy with the new schema, we can expect, from my experience, the process to take quite a while for large solutions, which may take the business offline for a couple of hours.

Taking the risk of making a point about something I don't know enough about - the internal behaviour of BizTalk server with regards to deployed schemas (but one could say this is often the case...) - I would argue that as far as I can tell, schemas are not actually used all that often by the runtime.

(and because I accept I could be completely wrong here, please do share any thoughts/ideas/comments/insights/whatever on the subject - put a comment on this post or email me if you prefer. I'd love to hear some feedback on this.)

Anyway - as I was saying -

When you define a message type you select the schema at design time, and the designer may refer to that schema to do various things - draw the map designer, check validity of assignments in expression shapes, build intellisense, it would even check serialisation an de-serialisation attributes on classes vs. your schema when you try to assign a .net class to a message in an expression shape, but as far as I'm aware, the schemas are rarely used by the runtime.

At runtime, when message is received into an orchestration (and set to a pre-defined message type), it's contents are not checked against the schema; neither does it get validated at the end of a transform or message assignment shapes.

When you run a map you select a schema, but again - that map could well return something completely different; BizTalk couldn't care less.

When do I know schemas get used? in the pipelines. sometimes.

If you're using the XmlDisassembler for example it would try to resolve the message type based on the message's root node and namespace, and then try to get the schema from the database.

the disassembler may then use this schema to promote some properties, if configured it may debatch the message according to the schema and possibly use it to validate the message; all are very valid usages for the schema but - they are not always used, and they require specific configuration, either in the schema at design time or in the pipeline component (or both).

Also, at least with regards to property promotion, all that get's used is a bunch of xpaths provided in an annotation in the schema, not the actual schema information.

There are, of course, other cases where schemas are required - FlatFileDisassembler, XmlValidation, Xml and FlatFile Assemblers all need schemas for their work (to some extent at least) and definitely the design time environment uses them extensively, but what I'm arguing is - can we do without having to deploy schemas if they are not used?

BizTalk works in a late-binding fashion anyway, where assemblies and their contents are loaded from the GAC/database as needed (and may be unloaded after a period of them not being used), couldn't we get away with only deploying the schema when it is needed at runtime, and simply 'register' message types when it is not?

In fact - even if a schema is needed at runtime - why does it need to exist in the database? how is it different from maps, pipelines, orchestrations? all of which are 'known' to the database but physically exist only in the GAC? (well, that's not accurate - the orchestration's structure is stored, as XML in the database, but that's to be displayed in HAT, and possibly a bad design decision on it's own)

I can't help thinking I'm missing something, I'm sure the guys behind BizTalk's decision had given it a lot of thought and found good justification for it, wouldn't they? anyone can comment on what those might be?

One argument could be that BizTalk wants to know which messages are 'supported' by the solution - just as a message arriving with no subscription is considered an error, a message arriving which is not of a known 'type' should be considered an error. but in a sense - the two are the same, and in any case BizTalk is quite happy to support 'blob' messages through the use of passthrough pipelines and XmlDocument as a message type in the orchestrations.

Labels: ,

14 Comments:

  • Hi yossi,

    I had a situation the other day where i was creating a new message in a pipeline component which then went via the message box to an orchestration.

    The orchestration port knows what type of message it is expecting.

    When i ran my scenario i was getting an error from the receive shape (if i remember correctly) in the orchestration which was indicating that it received a message which did not match the schema strong name context property

    Im assuming this means the orchestration is deserializing this message and checking it against a schema internally

    Once i was setting the correct value in the context property it was working fine

    What does this mean, i guess in some scenarios the schema is important to orchestration execution internally

    HTH
    Mike

    By Anonymous Mike Stephenson, at 05/05/2008, 23:35  

  • Thanks Mike

    But I wonder - from your comment it is not clear if the engine actually checked the message content/structure AGAINST the schema, or simply checked that the message type is known (based on the message type identified in the pipeline, or event the root-node/namespace combination.

    I suspect all that happened is that the engine identified that the message received is not of the requested TYPE, but did not actually care about the contents of the message (which was examined in the pipline), am I wrong?

    I'm happy the check for message type vs. subscription, but does it really need the entire schema in the database?

    By Blogger Yossi Dahan, at 05/05/2008, 23:47  

  • BTS does not need the schema most of the time. One purpose in my solutions for schemas is that I use them for schema-level validation of incoming documents. I use a custom pipeline component to get a schema and validate documents against it...having the scheam availalbe in this case is very useful.

    Schemas are of course important for mapping, but there are interesting ways around that too, and you can actually get away without having a schema around at at all (for mapping).

    Things change if you have mandatory elements (min=1 max=1) and try to de/serialize something that does not conform - but that is a very basic level of validation.

    For true (as in functioning) schema-level validation, you have to write your own pipelne.

    By Anonymous Erik, at 08/05/2008, 04:02  

  • Yes, I agree with all the things you said. However, we still need schemas.

    Note that BizTalk create .NET data types (classes) based on these schemas, and if your input is XML, BizTalk try to match the XML based on the .NET class type generated from the schema. Sure you can always declare all message types as XmlDocument but how are you going to instantiate an Orchestration based on the Message type?

    Also, Schemas provide a way to AUTOMATICALLY promote a node as promoted property...which is used in Correlation and all that good stuff.

    So, yeah...it may not seem much...i suppose it's a necessary evil.

    By Blogger Dexter Legaspi, at 09/05/2008, 17:06  

  • Thank you all for the comments, I'm very interested in this subject and am really happy to get (and share) as many opinions as possible.

    I don't think at all this is clear cut (not that my view matters).

    But - to be clear - I'm not arguing at all that schemas are not important or useful, nor do I argue that sometimes they are really needed and BizTalk make a very good use of them.

    I'm definitely not arguing to make BizTalk any less strong-typed. I think that well-known message types is a fundemental concept in BizTalk that should be maintained.

    I am, however, trying to argue is, that often, the actual schema content is not used by BizTalk and so might not be needed, which would save quite a bit of hassle around deployment.

    I think often it would be enought for BizTalk to map a root node-namespace combination to a message type (as it does), without needing the entire schema in the database.

    Isn't that what's hapenning in practice anyway?


    In other words - I think that, like pipeline component for example - schemas should be "known of" but only exist fully in the databse if they are needed by the runtime (for property promotion for example, or debatching, or validaion, when one chooses to have one). maybe they can even simply be loaded from the assembly when needed at runtime, just as pipeline components and their configuration is, for example.(and cahced. of course!)

    Does that not make sense at all?

    By Blogger Yossi Dahan, at 12/05/2008, 22:02  

  • That does make sense, an interesting discussion.
    I think BizTalk's deep deployment dependancy on schemas is due to early design paradigms\decisions rather than what's necessary and sufficient to make the thing tick. The .net assembly containing the text of the XSD could certainly be read from the GAC at runtime, but maybe the (message data) promoted properties must be deployed in as a hard dependancy as the subscriptions rely on them - its better to make you undeploy your entire codebase to change the subscriptions than allow them to change outside of BizTalk's control.

    Of course the the intended use case is not that you should be undeploying the entire stack, you should be side by side deploying the next version. And whilst you may not have up-front planned a great versioning story, you were compelled to at least assign a strong name to your schema assembly which allows you to take things forward for side by side deployment.

    By Blogger Ben Cops, at 18/06/2008, 19:36  

  • I do not know much , but it may be because ,the schema's can be used in other applications also to apply filters.
    may be because of that they can not be ONLY loaded at runtime when required. Schema's are required also at configuration/design time also.

    By Blogger Naushad Alam, at 29/12/2008, 11:51  

  • Hi Yossi,

    Thanks for your sharing all your insight on BizTalk. I'm trying to read everything, but there is a lot ;)
    We recently ran into a situation where we need to use a pipeline component to transform an xml to another xml type. Seems easy enough. Plus, we can just use a standard xsl file in whatever path we choose and essentially never need to redeploy. And then here comes the ugly monster that I think you are referring to...

    We also need to be able to convert flat files to an internal xml type. Instead of just using an xsl file, we need an actual schema. Which as far as I understand, MUST be deployed with the Biztalk solution. Now, whenever we have another customer with a new flat file schema, we have to redeploy our entire solution! And I agree, this is a bit absurd.

    Why can't we access a schema outside the solution? Or if it is possible, please share the technique.

    Pete

    By Anonymous Anonymous, at 13/02/2009, 21:57  

  • Hi Pete, and thanks for your comment.
    You say you need a pipeline component to transform the xml - are you using a pipeline component or a map (with custom xsl)?
    Either way, life is a bit easier in these cases, because you could have your schema in an assembly, deployed to BizTalk, without any dependencies (other than the map obviously)
    In many cases, by separating artifacts to separate assemblies, you can achieve a good level of separation between parts of your solution that would allow you to partially undeploy it; it is much harder when orchestrations get involved.
    In your case - when you have a new custom with a new flat file format - could you not simply deploy an additional assembly with that schema and the relevant map (to some canonical format) and the pipeline configured with the disassembler?

    By Blogger Yossi Dahan, at 15/02/2009, 11:34  

  • Hi Yossi,

    I have a question about passing of different files to biztalk without having a schema? Is it possible? Im thinking of having a customized component that would be placed in the decode box for it to accept all kinds of file and move it to the specified location.

    Any help?
    Thanks

    By Blogger reese, at 06/01/2010, 08:43  

  • He Reese

    Generally it is best to post questions on the BizTalk newsgroup, as more people can answer them there.

    (Not that I mind anybody posting questions here...)

    To your question - there isn't any requirements to use a schema.

    The exact implementation details, however, will depend on your scenario.

    If you just want to move a file from one place to another, for example, two ports, configured with passthough pipelines will do the job nicely.

    If you need to run your message through an orchestration, make sure to set the message type to XmlDocument (even if the message is not xml), and it will handle it nicely.

    By Blogger Yossi Dahan, at 06/01/2010, 09:24  

  • Hi yossi,
    thanks for the response. Can i have the link to the biztalk news group? thanks.
    Actually the scenario would be for biztalk to receive any kinds of file. CSV, TXT, XML.. then move it to a location and call a web service to notify the upload. So im thinking if i use message routing, where will i call the webservice.
    thanks so much.

    By Blogger reese, at 06/01/2010, 09:35  

  • Sure - it's on http://social.msdn.microsoft.com/Forums/en-US/biztalkgeneral/threads

    One option would be to have the send port that subscribes to the incoming file and drops it into the location.

    As for the notification - you have several options -

    1. you can have a second send port to call the web service. you will need to look into how to call a web service without an orchestration if you're using the SOAP adapter, but look at the web - it's more than possible.
    You will also need something in the port to create the actual notification, probably possible in a map, possibly easier in a pipeline component.

    2. An easier, but less efficient, option would be to have an orchestration subscribing to the file as well, pretty much ignoring the file but calling the web service

    Both these options are relatively easy, but are somewhat inefficient as they both receive the same file, when they don't actually need the file contents, so option number 3-

    In your receive pipeline have a custom disassembler that would return the first message untouched, but would also return a second message, which is just the notification.
    then the send port or orchestration (options 1 and 2) can receive only that notification message.

    By Blogger Yossi Dahan, at 06/01/2010, 10:41  

  • thanks yossi for the information.
    I've tried using the system.xml.document and its now working. :) for the webservice i'll try doing what you've said. But I'm also going to get the filename and pass that as a parameter in the webservice. So maybe it will work in the expression shape, right? Not that sure on getting the filename.

    Thanks so much for the help. I really appreciate it.

    :)

    By Blogger reese, at 06/01/2010, 10:50  

Post a Comment

<< Home