ENTERPRISE

BizTalk 2006 : Dealing with Extremely Large Messages (part 1) - Large Message Decoding Component

10/9/2012 9:01:40 PM
A major problem that many have discovered is that accommodating extremely large (200MB+) files can be a major performance bottleneck. The shame is that in many cases the documents that are being retrieved are simply going to be routed to another outbound source. This is typical of the Enterprise Service Bus (ESB) type of architecture scenario.In short, an ESB is software that is used to link internal and partner systems to each other—which basically is what BizTalk is designed to do out of the box. For these types of architectures, large files are generally routed through the ESB from an external party to an internal party or from internal to internal systems. Most times, the only logic that needs to be performed is routing logic. In many cases, this logic can be expressed in a simple filter criteria based on the default message context data, or by examining data elements within the message, promoting them, and then implementing content-based routing. Also in many cases, the actual message body's content is irrelevant beyond extracting properties to promote. The performance bottleneck comes into play when the entire file is received, parsed by the XMLReceive pipeline, and then stored into the Messagebox. If you have ever had to do this on a 200MB file, even though it works, there is a nasty impact to the CPU utilization on your BizTalk and SQL Server machines, where often the machines' CPU usage goes to 100% and the system throughput essentially goes down the drain.

Now imagine having to process 10 or 20 of these per minute. The next problem is going to be sending the file. The system will essentially take this entire performance hit all over again when the large file needs to be read from SQL Server out of BizTalk and sent to the EPM. You can quickly see how this type of scenario, as common as it is, most often requires either significant hardware to implement or a queuing mechanism whereby only a small number of files can be processed at a time.

You'll find a simple solution in BizTalk Server's capability to natively understand and use streams. The following examples show a decoding component that will receive the incoming message, store the file to disk in a uniquely named file, and store the path to the file in the IBaseMessagePart.Data property. The end result will be a message that only contains the path to the text file in its data, but will have a fully well-formed message context so that it can be routed. The component will also promote a property that stores the fact that this is a "large encoded message." This property will allow you to route all messages encoded using this pipeline component to a particular send port/pipeline that has the corresponding encoding component. The encoding component will read the data element for the path to the file, open up a file stream object that is streaming the file stored to disk, set the stream to the 0 byte position, and set the IBaseMessagePart.Data property to the FileStream. The end result will be that the file is streamed by the BizTalk runtime from the file stored on the disk and is not required to pass through the Messagebox. Also, performance is greatly improved, and the CPU overhead on both the BizTalk Server host instance that is sending the file and the SQL Server hosting the BizTalk Messagebox is essentially nil.

The partner to this is the sending component. In many scenarios, BizTalk is implemented as a routing engine or an Enterprise Service Bus. This is a fancy way of saying that BizTalk is responsible for moving data from one location within an organization to another. In many cases, what does need to be moved is large amounts of data, either in binary format or in text files. This is often the case with payment or EDI-based systems in which BizTalk is responsible for moving the files to the legacy system where it can process them. In this scenario, the same performance problem (or lack of performance) will occur on the send side as on the receive side. To account for this, the examples also include a send-side pipeline component that is used to actually send the large file to the outbound destination adapter.

Caveats and Gotchas

The solution outlined previously works very well so long as the issues described in the following sections are taken into account. Do not simply copy and paste the code into your project and leave it at that. The solution provided in this section fundamentally alters some of the design principles of the BizTalk Server product. The most important one of these is that the data for the message is no longer stored in the Messagebox. A quick list of the pros and cons is provided here:

  • Pros:

    • Provides extremely fast access for moving large messages

    • Simple to extend

    • Reusable across multiple receive locations

    • Message containing context can be routed to orchestration, and data can be accessed from the disk

  • Cons:

    • No ability to apply BizTalk Map

    • No failover via Messagebox

    • Custom solution requiring support by developer

    • Need a scheduled task to clean up old data

Redundancy, Failover, and High Availability

As was stated earlier, the data for the large message will no longer be stored in SQL Server. This is fundamentally different from how Microsoft designed the product. If the data within the message is important and the system is a mission-critical one that must properly deal with failovers and errors, you need to make sure that the storage location for the external file is also as robust as your SQL Server environment. Most architects in this situation will simply create a share on the clustered SQL Server shared disk array. This share is available to all BizTalk machines in the BizTalk Server Group, and since it is stored on the shared array or the storage area network (SAN), it should be as reliable as the data files for SQL Server.

Dealing with Message Content and Metadata

A good rule of thumb for this type of solution is to avoid looking at the message data at all costs once the file has been received. Consider the following: assume that you have received your large file into BizTalk and you need to process it through an orchestration for some additional logic. What happens? You will need to write .NET components to read the file and manually parse it to get the data you need. The worst-case scenario is that you need to load the data into an XMLDom or something similar. This will have performance implications and can negate the entire reason for the special large-file handling you are implementing.

If you know you are going to need data either within an orchestration or for CBR, make sure you write the code to gather this data within either the receiving or sending pipeline components. Only open the large data file at the time when it is being processed within the pipeline if you can. The best approach is to promote properties or create custom distinguished fields using code from within the component itself, which you can access from within BizTalk with little performance overhead.

Cleaning Up Old Data

If you read through the code in the section "Large Message Encoding Component (Send Side)," you will notice that there is no code that actually deletes the message from the server. There is a good reason for this. Normally you would think that once the message has flowed through the send pipeline it would be okay to delete it, but this is not true. What about a sendside adapter error? Imagine if you were sending the file to an FTP server and it was down; BizTalk will attempt to resend the message after the retry period has been reached. Because of this, you can't simply delete the file at random. You must employ a managed approach.

The only real solution to this would be to have a scheduled task that executes every few minutes that is responsible for cleaning up the data directory. You will notice that the name of the file is actually the InterchangeID GUID for the message flow. The InterchangeID provides you with a common key that you can use to query each of the messages that have been created throughout the execution path. The script that executes needs to read the name of the file and use WMI to query the Messagebox and determine whether there are any suspended or active messages for that Interchange. If there are, it doesn't delete the file; otherwise, it will delete the data file.

Looping Through the Message

As stated previously, if you do know you will need the data within the message at runtime, and this data is of an aggregate nature (sums, averages, counts, etc.), only loop through the file once. This seems like a commonsense thing, but it is often overlooked. If you need to loop through the file, try to get all the data you need in one pass rather than several. This can have dramatic effects on how your component will perform.

Large Message Decoding Component (Receive Side)

This component is to be used on the receive side when the large message is first processed by BizTalk. You will need to create a custom receive pipeline and add this pipeline component to the Decode stage. From there, use the SchemaWithNone property to select the desired inbound schema type if needed. If the file is a flat file or a binary file, then this step is not necessary, as the message will not contain any namespace or type information. This component relies on a property schema being deployed that will be used to store the location tothe file within the message context. This schema can also be used to define any custom information such as counts, sums, and averages that is needed to route the document or may be required later on at runtime.

Imports System
Imports System.IO
Imports System.Text
Imports System.Drawing
Imports System.Resources
Imports System.Reflection
Imports System.Diagnostics
Imports System.Collections
Imports System.ComponentModel
Imports Microsoft.BizTalk.Message.Interop
Imports Microsoft.BizTalk.Component.Interop
Imports Microsoft.BizTalk.Component
Imports Microsoft.BizTalk.Messaging

Namespace ABC.BizTalk.PipelineComponents

<ComponentCategory(CategoryTypes.CATID_PipelineComponent), _
     System.Runtime.InteropServices.Guid("89dedce4-0525-472f-899c-64dc66f60727"), _
     ComponentCategory(CategoryTypes.CATID_Decoder)> _
Public Class LargeFileDecodingomponent
Implements IBaseComponent, IPersistPropertyBag,_
IComponentUI,Microsoft.BizTalk.Component.Interop.IComponent,IProbeMessage

    Private _InboundFileDocumentSpecification As SchemaWithNone = New_
Microsoft.BizTalk.Component.Utilities.SchemaWithNone("")

    Private resourceManager As System.Resources.ResourceManager = New_
System.Resources.ResourceManager("_
ABC.BizTalk.PipelineComponents.LargeFileDecodingComponent", _
[Assembly].GetExecutingAssembly)
        Private Const ABC_PROPERTY_SCHEMA_NAMESPACE=_
"http://ABC.BizTalk.Schemas.ABCPropertySchema"

        '<summary>
        'this variable will contain any message generated by the Disassemble method
        '</summary>
        <Description("The inbound request document specification. Only messages of_
this type will be accepted by the component.")> _
        Public Property InboundFileDocumentSpecification() As_
Microsoft.BizTalk.Component.Utilities.SchemaWithNone
            Get

                Return _InboundFileDocumentSpecification
            End Get
            Set(ByVal Value As Microsoft.BizTalk.Component.Utilities.SchemaWithNone)
                _InboundFileDocumentSpecification = Value
            End Set

					  

End Property
        '<summary>
        'Name of the component
        '</summary>
        <Browsable(False)> _
        Public ReadOnly Property Name() As String Implements_
Microsoft.BizTalk.Component.Interop.IBaseComponent.Name
            Get
               Return resourceManager.GetString("COMPONENTNAME",_
System.Globalization.CultureInfo.InvariantCulture)
            End Get
        End Property

        '<summary>
        'Version of the component
        '</summary>
        <Browsable(False)> _
        Public ReadOnly Property Version() As String Implements_
Microsoft.BizTalk.Component.Interop.IBaseComponent.Version
            Get
                Return resourceManager.GetString("COMPONENTVERSION",_
System.Globalization.CultureInfo.InvariantCulture)
            End Get
        End Property

        '<summary>
        'Description of the component
        '</summary>
        <Browsable(False)> _
        Public ReadOnly Property Description() As String Implements_
Microsoft.BizTalk.Component.Interop.IBaseComponent.Description
            Get
                Return resourceManager.GetString("COMPONENTDESCRIPTION",_
System.Globalization.CultureInfo.InvariantCulture)
            End Get
        End Property

        '<summary>
        'Component icon to use in BizTalk Editor
        '</summary>
        <Browsable(False)> _
        Public ReadOnly Property Icon() As IntPtr Implements_
Microsoft.BizTalk.Component.Interop.IComponentUI.Icon
            Get
                Return CType(Me.resourceManager.GetObject("COMPONENTICON",_
System.Globalization.CultureInfo.InvariantCulture), System.Drawing.Bitmap).GetHicon
            End Get
        End Property

					  

'<summary>
       'Gets class ID of component for usage from unmanaged code.
       '</summary>
       '<param name="classid">
       'Class ID of the component
       '</param>
       Public Sub GetClassID(ByRef classid As System.Guid) Implements_
Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.GetClassID
            classid = New System.Guid("89dedce4-0525-472f-899c-64dc66f60727")
       End Sub

       '<summary>
       'not implemented
       '</summary>
       Public Sub InitNew() Implements_
Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.InitNew
       End Sub

       '<summary>
       'Loads configuration properties for the component
       '</summary>
       '<param name="pb">Configuration property bag</param>
       '<param name="errlog">Error status</param>
       Microsoft.BizTalk.Component.Interop.IPropertyBag,_
       ByVal errlog As Integer) Implements_
       Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.Load
       End Sub

       '<summary>
       'Saves the current component configuration into the property bag
       '</summary>
       '<param name="pb">Configuration property bag</param>
       '<param name="fClearDirty">not used</param>
       '<param name="fSaveAllProperties">not used</param>
       Public Overridable Sub Save(ByVal pb As_
       Microsoft.BizTalk.Component.Interop.IPropertyBag, ByVal fClearDirty As_
       Boolean, ByVal fSaveAllProperties As Boolean) Implements_
       Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.Save
       End Sub

       '<summary>
       'Reads property value from property bag
       '</summary>
       '<param name="pb">Property bag</param>
       '<param name="propName">Name of property</param>

					  

'<returns>Value of the property</returns>
       Private Function ReadPropertyBag(ByVal pb As_
       Microsoft.BizTalk.Component.Interop.IPropertyBag, ByVal propName_
       As String) As Object
           Dim val As Object = Nothing
           Try
              pb.Read(propName, val, 0)
           Catch e As System.ArgumentException
               Return val
           Catch e As System.Exception
               Throw New System.ApplicationException(e.Message)
           End Try
           Return val
        End Function

       '<summary>
       'Writes property values into a property bag.
       '</summary>
       '<param name="pb">Property bag.</param>
       '<param name="propName">Name of property.</param>
       '<param name="val">Value of property.</param>
       Private Sub WritePropertyBag(ByVal pb As_
       Microsoft.BizTalk.Component.Interop.IPropertyBag,_
       ByVal propName As String, ByVal val As Object)
           Try
               pb.Write(propName, val)
           Catch e As System.Exception
               Throw New System.ApplicationException(e.Message)
           End Try
       End Sub
       '<summary>
       'The Validate method is called by the BizTalk Editor during the build
       'of a BizTalk project.
       '</summary>
       '<param name="obj">An Object containing the configuration properties.
       '</param>
       '<returns>The IEnumerator enables the caller to enumerate through a
       'collection of strings containing error messages. These error messages
       'appear as compiler error messages. To report successful property
       'validation, the method should return an empty enumerator.</returns>
       Public Function Validate(ByVal obj As Object) As_
System.Collections.IEnumerator Implements_
Microsoft.BizTalk.Component.Interop.IComponentUI.Validate
            'example implementation:
            'ArrayList errorList = new ArrayList();

					  

'errorList.Add("This is a compiler error");
            'return errorList.GetEnumerator();
            Return Nothing
        End Function
        '<summary>
        'called by the messaging engine when a new message arrives
        'checks if the incoming message is in a recognizable format
        'if the message is in a recognizable format, only this component
        'within this stage will be execute (FirstMatch equals true)
        '</summary>
        '<param name="pc">the pipeline context</param>
        '<param name="inmsg">the actual message</param>
        Public Function Probe(ByVal pc As_
Microsoft.BizTalk.Component.Interop.IPipelineContext, ByVal inmsg As_
Microsoft.BizTalk.Message.Interop.IBaseMessage) As Boolean Implements_
Microsoft.BizTalk.Component.Interop.IProbeMessage.Probe

            Dim xmlreader As New Xml.XmlTextReader(inmsg.BodyPart.Data)
            xmlreader.MoveToContent()

            If (InboundDocumentSpecification.DocSpecName = _
xmlreader.NamespaceURI.Replace("http://", "")) Then
                Return True
            Else
                Return False
            End If
        End Function
        '<summary>
        'Implements IComponent.Execute method.
        '</summary>
        '<param name="pc">Pipeline context</param>
        '<param name="inmsg">Input message</param>
        '<returns>Original input message</returns>
        '<remarks>
        'IComponent.Execute method is used to initiate
        'the processing of the message in this pipeline component.

        '</remarks>
Public Function Execute(ByVal pContext As_
Microsoft.BizTalk.Component.Interop.IPipelineContext, ByVal inmsg As_
Microsoft.BizTalk.Message.Interop.IBaseMessage) As_
Microsoft.BizTalk.Message.Interop.IBaseMessage Implements_
Microsoft.BizTalk.Component.Interop.IComponent.Execute
            'Build the message that is to be sent out
            StoreMessageData(pContext, inmsg)
            Return inmsg
        End Function

					  

'<summary>
 'Method used to write the message data to a file and promote the
 'location to the MessageContext.
        '</summary>
        '<param name="pc">Pipeline context</param>
        '<param name="inmsg">Input message to be assigned</param>
        '<returns>Original input message by reference</returns>
        '<remarks>
        'Receives the input message ByRef then assigns the file stream to the
        'messageBody.Data property
        '</remarks>
        Private Sub StoreMessageData(ByVal pContext As IPipelineContext, _
        ByRef inMsg As_IBaseMessage)
            Dim FullFileName As String = FILE_LOCATION + _
            inMsg.InterchangeID + ".msg"
            Dim dataFile As New FileStream(FullFileName, FileMode.Open, _
            FileAccess.Read_,FileShare.Read, 4 * 1024 * 1024)
            Dim binaryWriter As BinaryWriter = New BinaryWriter(dataFile)
            Dim reader As New StringReader(FullFileName)
            Dim byteRead As Byte
            Dim j As Integer

            If inMsg.BodyPart.Data.CanSeek Then
                inMsg.BodyPart.Data.Position = 0
            Else
                Throw new exception("The stream is not seekable")
            End If

            For j = 0 To inMsg.BodyPart.Data.Length() - 1
                byteRead = inMsg.BodyPart.Data.ReadByte
                bw.Write(byteRead)
            Next
            bw.Close()
            inMsg.BodyPart.Data = reader
            inMsg.Context.Promote("LargeFileLocation", _
            ABC_PROPERTY_SCHEMA_NAMESPACE,FullFileName)
FullFileName)
        End Sub
End Class
End Namespace		  
Other  
 
Video
Top 10
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
The latest Audi TT : New angles for TT
Era of million-dollar luxury cars
Game Review : Hearthstone - Blackrock Mountain
Game Review : Battlefield Hardline
Google Chromecast
Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
Michael Kors Designs Stylish Tech Products for Women
REVIEW
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
Popular Tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone