BizTalk 2006 : Pipeline Component Best Practices and Examples - Creating New Documents, Using BizTalk Streams

10/9/2012 8:59:36 PM

Creating New Documents

When you look at how a Disassembler component is structured, essentially you are building new documents that get submitted to the Messagebox. In our previous examples, we demonstrated the use of the SchemaWithNone and SchemaWithList objects as properties to allow users to choose what type of document should be accepted through the IProbeMessage interface. If you take this one step further, you could build a generic Disassembler component that allows users to select what type of document they want to accept, and provide them an interface to choose what type of document will be produced. The custom logic will still need to be created to extract the values for the new document, but at least the schema type will be available. But how can you actually create a new message? You will know the schema type of the message, but how do you create a new XMLDocument with all the available nodes already inserted but empty?

There are two ways to accomplish this task: the right way and not-so-right way. The notso-right way is the simplest. What most people do is hard-code the XML for the new empty document in a string and assign it to a new XMLDocument object. This approach can be cumbersome for a number of reasons, the most important being that if the structure of the message ever changes, the class will need to be recompiled. Another "wrong," but more correct, way would be to load the XML from a configuration file at runtime or include it as a resource file that is imported when the assembly is loaded. This is still a pain, since you will have to manually keep this in sync with the actual BizTalk schema.

A different way to do this is to use an undocumented API, which allows you to create a new blank XMLDocument based solely on the class file that is generated when you create a new schema. Unfortunately, this class is unsupported and is not made public by the BizTalk product team. It does work well, however, but you need to think about the support implications of using this class in your solution. For most, this isn't an issue, as the other alternative is to create a schema walker class as documented here—http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxmlnet/html/xmlgen.asp . Our only issue is that a significant amount of code is required to implement the schema walker. Also, depending on how you create your schema, certain attributes and imports may not be picked up in the new empty document. We have also found a few compatibility issues between the documents that it generates and BizTalk's validation engine. In the end, it is a good solution if you are wary about using an undocumented class, but using the class that exists within the BizTalk Framework guarantees that your documents will match the schema within the engine and will validate properly.

The first thought that comes to many people's minds when they think about this example is "Okay, I have an external resource file that I need to keep in sync with the actual schema, but won't my code that uses the schema need to change anyway if I have a schema change?" The answer to this is maybe. In many cases, the type of code that creates new XML instances only uses certain fields. Often schema changes involve adding new elements to the schema, not removing them or changing element names. In this case, should the BizTalk schema be modified to include new elements, then no code needs modification, and new XML instances will be created with empty elements as you would expect. In the case where fields have been renamed or removed, you will need to determine whether your pipeline component has explicitly added values to those nodes via an XPath expression. If the component has, then you will need a code change.

In order to generate the new empty document, you need to create an instance of the following class: Microsoft.Biztalk.Component.Interop.DocumentSpec. This class is found in the Microsoft.BizTalk.Pipeline assembly.

An example method follows that can be used to create new documents based on the passed schema name.

Imports Microsoft.BizTalk.Component.Interop Public Function CreateNewBTSDoument(ByVal schemaFullName As String) As XmlDocument Dim newdocument As XmlDocument = Nothing Dim catExplorer As New BtsCatalogExplorer Dim Schemas As SchemaCollection Dim myDocSpec As DocumentSpec = Nothing

Dim catExplorer As New BtsCatalogExplore
    Dim mySchema As Schema
    Dim sbuilder As New StringBuilder()

    catExplorer.ConnectionString = "Integrated Security=SSPI; Persist Security_
Info=false; Server=(local); Database=BizTalkMgmtDb;"
    Schemas = catExplorer.Schemas
    mySchema = Schemas(schemaFullName)

    If Not (mySchema Is Nothing) Then
         myDocSpec = New DocumentSpec(schemaFullName,_
 mySchema.BtsAssembly.DisplayName)
        If Not (myDocSpec Is Nothing) Then
            Dim writer As New StringWriter(sbuilder)
            Try
                newDocument = New XmlDocument()
                'create and load the new instance into the return value
                newDocument.Load(myDocSpec.CreateXmlInstance(writer))
            Finally
                writer.Dispose()
            End Try
        End If
    End If
End Function

Using BizTalk Streams

BizTalk Server 2004 and 2006 have been built to use streams as a key part of the products' architecture. A stream as a programming construct is a sequence of bytes with no fixed length. When you begin to read a stream, you have no idea how long it is or when it will end. The only control you have is over the size of the data you will read at any one time. So what does this have to do with good programming? It means that when you are dealing with extremely large amounts of data, if you use a stream, you don't need to load all of this data at once. It is almost like reading a book. You can't just read the entire book at once; you must read the pages one at a time. When reading a book, the amount of data you consume at one time is a page; the letters on the page represent bytes. You also don't know how big the book is until you finish the last page and see "The End" (unless you skip to the back of the book).

In this way, streams make dealing with large amounts of data more manageable. If you have worked with BizTalk 2002 or prior, you know that BizTalk would often produce "out of memory" exceptions when processing large XMLDocuments. This was because in BizTalk 2000 and 2002, the XMLDom was used to parse and load XML documents. The DOM is not a streaming-based model. The DOM requires you to load the entire document into memory to use it.

In supporting this paradigm, the BizTalk product team has included three classes that optimize how you can use streams in your pipeline components and allow you to do streambased XPath queries. Each of these classes is explained in the following sections.

VirtualStream

Included in the BizTalk SDK under the \Program Files\Microsoft BizTalk Server 2006\SDK\Samples\Pipelines\ArbitraryXPathPropertyHandler directory is a class file called VirtualStream.cs. This class is an implementation that holds the data in memory up to a certain threshold (by default 4MB). The remaining data it keeps on disk in temporary files. The ArbitraryXPathPropertyHandler example in the SDK shows you an example of how to use this class.

SeekableReadOnlyStream

SeekAbleReadOnlyStream is an implementation of a stream class that provides fast, read-only, seekable access to a stream. It is a wrapper class around a regular stream object and can be used in cases where the base stream object is not seekable, and does not need write access. An example of this class can be found in the \Program Files\Microsoft BizTalk Server 2006\SDK\Samples\Pipelines\Schema Resolver Component directory.

XPathReader

The XPath reader class lives in the Microsoft.BizTalk.XPathReader.dll assembly. This is a class that provides XPath query access to a stream of XML. This is very advantageous as it allows for very fast, read-only access to a stream of data via an XPath expression. Normally, XPath queries require the entire document be loaded into memory such as in an XMLDocument.

Using the XPath reader, you can load your document via the SeekAbleReadOnlyStream class mentioned previously, and then have this stream wrapped by an XMLTextReader. The net effect is that you have a stream-based XPath query that does not require the entire XML document to be loaded into memory. The following example shows how this can be implemented in a pipeline component. Note the use of the SeekAbleReadOnlyStream variable in the Execute method. This is the means by which you can have your stream of data be seekable and readonly, which improves the performance and usability of the pipeline component.

Imports SystemImports Microsoft.BizTalk.Component.Interop
Imports Microsoft.BizTalk.Message.Interop
Imports System.Collections
Imports Microsoft.BizTalk.XPath
Imports System.Xml
Imports System.IO
Imports Microsoft.Samples.BizTalk.Pipelines.CustomComponent

Namespace ABC.BizTalk.Pipelines.Components

 <ComponentCategory(CategoryTypes.CATID_PipelineComponent)> _
 <ComponentCategory(CategoryTypes.CATID_Any)> _
 Public Class PropPromoteComponent
 Implements IComponent
 Implements IComponentUI
 Implements IBaseComponent

Implements IPersistPropertyBag
   Private _PropertyName As String
   Private _Namespace As String
   Private _XPath As String

 Public Property PropertyName() As String
   Get
     Return _PropertyName
   End Get
   Set
     _PropertyName = value
   End Set
  End Property

  Public Property Namespace() As String
    Get
      Return _Namespace
    End Get
    Set
      _Namespace = value
    End Set
  End Property

  Public Property XPath() As String
    Get
      Return _XPath
    End Get
    Set
      _XPath = value
    End Set
  End Property

  Public Function Execute(ByVal ctx As IPipelineContext, ByVal msg As_
  IBaseMessage)_
   Dim xpathValue As Object = Nothing
   Dim outMessage As IBaseMessage = ctx.GetMessageFactory.CreateMessage
   Dim newBodyPart As IBaseMessagePart = ctx.GetMessageFactory.CreateMessagePart
   newBodyPart.PartProperties = msg.BodyPart.PartProperties
   Dim stream As SeekableReadOnlyStream = New_
SeekableReadOnlyStream(msg.BodyPart.GetOriginalDataStream)
   Dim val As Object = msg.Context.Read(PropertyName, Namespace)
   If val Is Nothing Then
     Throw New ArgumentNullException(PropertyName)
   End If
   msg.Context.Promote(PropertyName, Namespace, val)
   Dim xpc As XPathCollection = New XPathCollection

Dim xpr As XPathReader = New XPathReader(New XmlTextReader(stream), xpc)
   xpc.Add(Me.XPath)
   While xpr.ReadUntilMatch = True
     Dim index As Integer = 0
     While index < xpc.Count
       If xpr.Match(index) = True Then
         xpathValue = xpr.ReadString
         ' break
       End If
       System.Math.Min(System.Threading.Interlocked.Increment(index),index-1)
     End While
   End While
   If xpathValue Is Nothing Then
     Throw New ArgumentNullException("xpathValue")
   End If
   msg.Context.Write("SomeValue", "http://ABC.BizTalk.Pipelines", xpathValue)
   stream.Position = 0
   newBodyPart.Data = stream
   outMessage.Context = msg.Context
   CopyMessageParts(msg, outMessage, newBodyPart)
   Return outMessage
 End Function

 Public ReadOnly Property Icon() As IntPtr
   Get
     Return IntPtr.Zero
   End Get
 End Property

 Public Function Validate(ByVal projectSystem As Object) As IEnumerator
   Return Nothing
 End Function

 Public ReadOnly Property Description() As String
   Get
     Return "Description"
   End Get
 End Property

 Public ReadOnly Property Name() As String
   Get
     Return "Property Promote"
   End Get
 End Property

Public ReadOnly Property Version() As String
   Get
     Return "1"
   End Get
 End Property

 Public Sub GetClassID(ByRef classID As Guid)
   Dim g As Guid = New Guid("FE537918-327B-4a0c-9ED7-E1B993B7897E")
   classID = g
 End Sub

 Public Sub InitNew()
   Throw New Exception("The method or operation is not implemented.")
 End Sub

 Public Sub Load(ByVal propertyBag As IPropertyBag, ByVal errorLog As Integer)
   Dim prop As Object = Nothing
   Dim nm As Object = Nothing
   Dim xp As Object = Nothing
 Try
   propertyBag.Read("Namespace", nm, 0)
   propertyBag.Read("PropertyName", prop, 0)
   propertyBag.Read("XPATH", xp, 0)
 Catch
 Finally
   If Not (prop Is Nothing) Then
     PropertyName = prop.ToString
   End If
   If Not (nm Is Nothing) Then
     Namespace = nm.ToString
   End If
   If Not (xp Is Nothing) Then
     XPath = xp.ToString
   End If
  End Try
 End Sub

 Public Sub Save(ByVal propertyBag As IPropertyBag, ByVal clearDirty As Boolean_
, ByVal saveAllProperties As Boolean)
     Dim prop As Object = PropertyName
     Dim nm As Object = Namespace
     Dim xp As Object = XPath
     propertyBag.Write("PropertyName", prop)
     propertyBag.Write("Namespace", nm)
     propertyBag.Write("XPATH", xp)
   End Sub

Private Sub CopyMessageParts(ByVal sourceMessage As IBaseMessage, ByVal _
destinationMessage As IBaseMessage, ByVal newBodyPart As IBaseMessagePart)
     Dim bodyPartName As String = sourceMessage.BodyPartName
     Dim c As Integer = 0
     While c < sourceMessage.PartCount
       Dim partName As String = Nothing
       Dim messagePart As IBaseMessagePart = _
       sourceMessage.GetPartByIndex(c,partName)
       If Not (partName = bodyPartName) Then
         destinationMessage.AddPart(partName, messagePart, False)
       Else
         destinationMessage.AddPart(bodyPartName, newBodyPart, True)
       End If
       System.Threading.Interlocked.Increment(c)
     End While
   End Sub
 End Class
End Namespace

Other

Trendnet Megapixel Wireless N Day / Night Internet Camera (TV-IP572WI)

Nginx HTTP Server : Downloading Nginx

Nginx HTTP Server : Setting up the prerequisites

The HP Virtual Server Environment : Secure Resource Partitions (Partitioning Inside a Single Copy of HP-UX)

The HP Virtual Server Environment : HP Integrity Virtual Machines (Fully Virtualized Partitioning)

Memory Management : Prevent Memory from Being Moved, Allocate Unmanaged Memory

Memory Management : Use Pointers, Speed Up Array Access

Memory Management : Force a Garbage Collection, Create a Cache That Still Allows Garbage Collection

D-Link Cloud Router 5700 With 1750Mbps Total Band

The HP Virtual Server Environment : Virtual Partitions (Peak Performance Virtualization)