A major problem that many
have discovered is that accommodating extremely large (200MB+) files can
be a major performance bottleneck. The shame is that in many cases the
documents that are being retrieved are simply going to be routed to
another outbound source. This is typical of the Enterprise Service Bus
(ESB) type of architecture scenario.In short, an ESB is software that is used to link internal and partner
systems to each other—which basically is what BizTalk is designed to do
out of the box. For these types of architectures, large files are
generally routed through the ESB from an external party to an internal
party or from internal to internal systems. Most times, the only logic
that needs to be performed is routing logic. In many cases, this logic
can be expressed in a simple filter criteria based on the default
message context data, or by examining data elements within the message,
promoting them, and then implementing content-based routing. Also in
many cases, the actual message body's content is irrelevant beyond
extracting properties to promote. The performance bottleneck comes into
play when the entire file is received, parsed by the XMLReceive
pipeline, and then stored into the Messagebox. If you have ever had to
do this on a 200MB file, even though it works, there is a nasty impact
to the CPU utilization on your BizTalk and SQL Server machines, where
often the machines' CPU usage goes to 100% and the system throughput
essentially goes down the drain.Now imagine having to
process 10 or 20 of these per minute. The next problem is going to be
sending the file. The system will essentially take this entire
performance hit all over again when the large file needs to be read from
SQL Server out of BizTalk and sent to the EPM. You can quickly see how
this type of scenario, as common as it is, most often requires either
significant hardware to implement or a queuing mechanism whereby only a
small number of files can be processed at a time.
You'll find a simple solution in BizTalk Server's capability to natively understand and use streams.
The following examples show a decoding component that will receive the
incoming message, store the file to disk in a uniquely named file, and
store the path to the file in the IBaseMessagePart.Data
property. The end result will be a message that only contains the path
to the text file in its data, but will have a fully well-formed message
context so that it can be routed. The component will also promote a
property that stores the fact that this is a "large encoded message."
This property will allow you to route all messages encoded using this
pipeline component to a particular send port/pipeline that has the
corresponding encoding component. The encoding component will read the
data element for the path to the file, open up a file stream object that
is streaming the file stored to disk, set the stream to the 0 byte
position, and set the IBaseMessagePart.Data property to the FileStream.
The end result will be that the file is streamed by the BizTalk runtime
from the file stored on the disk and is not required to pass through
the Messagebox. Also, performance is greatly improved, and the CPU
overhead on both the BizTalk Server host instance that is sending the
file and the SQL Server hosting the BizTalk Messagebox is essentially
nil.
The partner to this is
the sending component. In many scenarios, BizTalk is implemented as a
routing engine or an Enterprise Service Bus. This is a fancy way of
saying that BizTalk is responsible for moving data from one location
within an organization to another. In many cases, what does need to be
moved is large amounts of data, either in binary format or in text
files. This is often the case with payment or EDI-based systems in which
BizTalk is responsible for moving the files to the legacy system where
it can process them. In this scenario, the same performance problem (or
lack of performance) will occur on the send side as on the receive side.
To account for this, the examples also include a send-side pipeline
component that is used to actually send the large file to the outbound
destination adapter.
Caveats and Gotchas
The solution
outlined previously works very well so long as the issues described in
the following sections are taken into account. Do not simply copy and
paste the code into your project and leave it at that. The solution
provided in this section fundamentally alters some of the design
principles of the BizTalk Server product. The most important one of these is that the data for the message is no longer stored in the Messagebox. A quick list of the pros and cons is provided here:
Pros:
Provides extremely fast access for moving large messages
Simple to extend
Reusable across multiple receive locations
Message containing context can be routed to orchestration, and data can be accessed from the disk
Cons:
No ability to apply BizTalk Map
No failover via Messagebox
Custom solution requiring support by developer
Need a scheduled task to clean up old data
Redundancy, Failover, and High Availability
As was stated earlier, the
data for the large message will no longer be stored in SQL Server. This
is fundamentally different from how Microsoft designed the product. If
the data within the message is important and the system is a
mission-critical one that must properly deal with failovers and errors,
you need to make sure that the storage location for the external file is
also as robust as your SQL Server environment. Most architects in this
situation will simply create a share on the clustered SQL Server shared
disk array. This share is available to all BizTalk machines in the
BizTalk Server Group, and since it is stored on the shared array or the
storage area network (SAN), it should be as reliable as the data files
for SQL Server.
Dealing with Message Content and Metadata
A good rule of thumb for
this type of solution is to avoid looking at the message data at all
costs once the file has been received. Consider the following: assume
that you have received your large file into BizTalk and you need to
process it through an orchestration for some additional logic. What
happens? You will need to write .NET components to read the file and
manually parse it to get the data you need. The worst-case scenario is
that you need to load the data into an XMLDom or something similar. This
will have performance implications and can negate the entire reason for
the special large-file handling you are implementing.
If you know you are going to
need data either within an orchestration or for CBR, make sure you
write the code to gather this data within either the receiving or
sending pipeline components. Only open the large data file at the time
when it is being processed within the pipeline if you can. The best
approach is to promote properties or create custom distinguished fields
using code from within the component itself, which you can access from
within BizTalk with little performance overhead.
Cleaning Up Old Data
If you read through the
code in the section "Large Message Encoding Component (Send Side)," you
will notice that there is no code that actually deletes the message
from the server. There is a good reason for this. Normally you would
think that once the message has flowed through the send pipeline it
would be okay to delete it, but this is not true. What about a sendside
adapter error? Imagine if you were sending the file to an FTP server and
it was down; BizTalk will attempt to resend the message after the retry
period has been reached. Because of this, you can't simply delete the
file at random. You must employ a managed approach.
The only real solution
to this would be to have a scheduled task that executes every few
minutes that is responsible for cleaning up the data directory. You will
notice that the name of the file is actually the InterchangeID GUID for the message flow. The InterchangeID
provides you with a common key that you can use to query each of the
messages that have been created throughout the execution path. The
script that executes needs to read the name of the file and use WMI to
query the Messagebox and determine whether there are any suspended or
active messages for that Interchange. If there are, it doesn't delete
the file; otherwise, it will delete the data file.
Looping Through the Message
As stated previously, if
you do know you will need the data within the message at runtime, and
this data is of an aggregate nature (sums, averages, counts, etc.), only
loop through the file once. This seems like a commonsense thing, but it
is often overlooked. If you need to loop through the file, try to get
all the data you need in one pass rather than several. This can have
dramatic effects on how your component will perform.
Large Message Decoding Component (Receive Side)
This component is to be used
on the receive side when the large message is first processed by
BizTalk. You will need to create a custom receive pipeline and add this
pipeline component to the Decode stage. From there, use the SchemaWithNone
property to select the desired inbound schema type if needed. If the
file is a flat file or a binary file, then this step is not necessary,
as the message will not contain any namespace or type information. This
component relies on a property schema being deployed that will be used
to store the location tothe file within the message context. This schema
can also be used to define any custom information such as counts, sums,
and averages that is needed to route the document or may be required
later on at runtime.
Imports System
Imports System.IO
Imports System.Text
Imports System.Drawing
Imports System.Resources
Imports System.Reflection
Imports System.Diagnostics
Imports System.Collections
Imports System.ComponentModel
Imports Microsoft.BizTalk.Message.Interop
Imports Microsoft.BizTalk.Component.Interop
Imports Microsoft.BizTalk.Component
Imports Microsoft.BizTalk.Messaging
Namespace ABC.BizTalk.PipelineComponents
<ComponentCategory(CategoryTypes.CATID_PipelineComponent), _
System.Runtime.InteropServices.Guid("89dedce4-0525-472f-899c-64dc66f60727"), _
ComponentCategory(CategoryTypes.CATID_Decoder)> _
Public Class LargeFileDecodingomponent
Implements IBaseComponent, IPersistPropertyBag,_
IComponentUI,Microsoft.BizTalk.Component.Interop.IComponent,IProbeMessage
Private _InboundFileDocumentSpecification As SchemaWithNone = New_
Microsoft.BizTalk.Component.Utilities.SchemaWithNone("")
Private resourceManager As System.Resources.ResourceManager = New_
System.Resources.ResourceManager("_
ABC.BizTalk.PipelineComponents.LargeFileDecodingComponent", _
[Assembly].GetExecutingAssembly)
Private Const ABC_PROPERTY_SCHEMA_NAMESPACE=_
"http://ABC.BizTalk.Schemas.ABCPropertySchema"
'<summary>
'this variable will contain any message generated by the Disassemble method
'</summary>
<Description("The inbound request document specification. Only messages of_
this type will be accepted by the component.")> _
Public Property InboundFileDocumentSpecification() As_
Microsoft.BizTalk.Component.Utilities.SchemaWithNone
Get
Return _InboundFileDocumentSpecification
End Get
Set(ByVal Value As Microsoft.BizTalk.Component.Utilities.SchemaWithNone)
_InboundFileDocumentSpecification = Value
End Set
End Property
'<summary>
'Name of the component
'</summary>
<Browsable(False)> _
Public ReadOnly Property Name() As String Implements_
Microsoft.BizTalk.Component.Interop.IBaseComponent.Name
Get
Return resourceManager.GetString("COMPONENTNAME",_
System.Globalization.CultureInfo.InvariantCulture)
End Get
End Property
'<summary>
'Version of the component
'</summary>
<Browsable(False)> _
Public ReadOnly Property Version() As String Implements_
Microsoft.BizTalk.Component.Interop.IBaseComponent.Version
Get
Return resourceManager.GetString("COMPONENTVERSION",_
System.Globalization.CultureInfo.InvariantCulture)
End Get
End Property
'<summary>
'Description of the component
'</summary>
<Browsable(False)> _
Public ReadOnly Property Description() As String Implements_
Microsoft.BizTalk.Component.Interop.IBaseComponent.Description
Get
Return resourceManager.GetString("COMPONENTDESCRIPTION",_
System.Globalization.CultureInfo.InvariantCulture)
End Get
End Property
'<summary>
'Component icon to use in BizTalk Editor
'</summary>
<Browsable(False)> _
Public ReadOnly Property Icon() As IntPtr Implements_
Microsoft.BizTalk.Component.Interop.IComponentUI.Icon
Get
Return CType(Me.resourceManager.GetObject("COMPONENTICON",_
System.Globalization.CultureInfo.InvariantCulture), System.Drawing.Bitmap).GetHicon
End Get
End Property
'<summary>
'Gets class ID of component for usage from unmanaged code.
'</summary>
'<param name="classid">
'Class ID of the component
'</param>
Public Sub GetClassID(ByRef classid As System.Guid) Implements_
Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.GetClassID
classid = New System.Guid("89dedce4-0525-472f-899c-64dc66f60727")
End Sub
'<summary>
'not implemented
'</summary>
Public Sub InitNew() Implements_
Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.InitNew
End Sub
'<summary>
'Loads configuration properties for the component
'</summary>
'<param name="pb">Configuration property bag</param>
'<param name="errlog">Error status</param>
Microsoft.BizTalk.Component.Interop.IPropertyBag,_
ByVal errlog As Integer) Implements_
Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.Load
End Sub
'<summary>
'Saves the current component configuration into the property bag
'</summary>
'<param name="pb">Configuration property bag</param>
'<param name="fClearDirty">not used</param>
'<param name="fSaveAllProperties">not used</param>
Public Overridable Sub Save(ByVal pb As_
Microsoft.BizTalk.Component.Interop.IPropertyBag, ByVal fClearDirty As_
Boolean, ByVal fSaveAllProperties As Boolean) Implements_
Microsoft.BizTalk.Component.Interop.IPersistPropertyBag.Save
End Sub
'<summary>
'Reads property value from property bag
'</summary>
'<param name="pb">Property bag</param>
'<param name="propName">Name of property</param>
'<returns>Value of the property</returns>
Private Function ReadPropertyBag(ByVal pb As_
Microsoft.BizTalk.Component.Interop.IPropertyBag, ByVal propName_
As String) As Object
Dim val As Object = Nothing
Try
pb.Read(propName, val, 0)
Catch e As System.ArgumentException
Return val
Catch e As System.Exception
Throw New System.ApplicationException(e.Message)
End Try
Return val
End Function
'<summary>
'Writes property values into a property bag.
'</summary>
'<param name="pb">Property bag.</param>
'<param name="propName">Name of property.</param>
'<param name="val">Value of property.</param>
Private Sub WritePropertyBag(ByVal pb As_
Microsoft.BizTalk.Component.Interop.IPropertyBag,_
ByVal propName As String, ByVal val As Object)
Try
pb.Write(propName, val)
Catch e As System.Exception
Throw New System.ApplicationException(e.Message)
End Try
End Sub
'<summary>
'The Validate method is called by the BizTalk Editor during the build
'of a BizTalk project.
'</summary>
'<param name="obj">An Object containing the configuration properties.
'</param>
'<returns>The IEnumerator enables the caller to enumerate through a
'collection of strings containing error messages. These error messages
'appear as compiler error messages. To report successful property
'validation, the method should return an empty enumerator.</returns>
Public Function Validate(ByVal obj As Object) As_
System.Collections.IEnumerator Implements_
Microsoft.BizTalk.Component.Interop.IComponentUI.Validate
'example implementation:
'ArrayList errorList = new ArrayList();
'errorList.Add("This is a compiler error");
'return errorList.GetEnumerator();
Return Nothing
End Function
'<summary>
'called by the messaging engine when a new message arrives
'checks if the incoming message is in a recognizable format
'if the message is in a recognizable format, only this component
'within this stage will be execute (FirstMatch equals true)
'</summary>
'<param name="pc">the pipeline context</param>
'<param name="inmsg">the actual message</param>
Public Function Probe(ByVal pc As_
Microsoft.BizTalk.Component.Interop.IPipelineContext, ByVal inmsg As_
Microsoft.BizTalk.Message.Interop.IBaseMessage) As Boolean Implements_
Microsoft.BizTalk.Component.Interop.IProbeMessage.Probe
Dim xmlreader As New Xml.XmlTextReader(inmsg.BodyPart.Data)
xmlreader.MoveToContent()
If (InboundDocumentSpecification.DocSpecName = _
xmlreader.NamespaceURI.Replace("http://", "")) Then
Return True
Else
Return False
End If
End Function
'<summary>
'Implements IComponent.Execute method.
'</summary>
'<param name="pc">Pipeline context</param>
'<param name="inmsg">Input message</param>
'<returns>Original input message</returns>
'<remarks>
'IComponent.Execute method is used to initiate
'the processing of the message in this pipeline component.
'</remarks>
Public Function Execute(ByVal pContext As_
Microsoft.BizTalk.Component.Interop.IPipelineContext, ByVal inmsg As_
Microsoft.BizTalk.Message.Interop.IBaseMessage) As_
Microsoft.BizTalk.Message.Interop.IBaseMessage Implements_
Microsoft.BizTalk.Component.Interop.IComponent.Execute
'Build the message that is to be sent out
StoreMessageData(pContext, inmsg)
Return inmsg
End Function
'<summary>
'Method used to write the message data to a file and promote the
'location to the MessageContext.
'</summary>
'<param name="pc">Pipeline context</param>
'<param name="inmsg">Input message to be assigned</param>
'<returns>Original input message by reference</returns>
'<remarks>
'Receives the input message ByRef then assigns the file stream to the
'messageBody.Data property
'</remarks>
Private Sub StoreMessageData(ByVal pContext As IPipelineContext, _
ByRef inMsg As_IBaseMessage)
Dim FullFileName As String = FILE_LOCATION + _
inMsg.InterchangeID + ".msg"
Dim dataFile As New FileStream(FullFileName, FileMode.Open, _
FileAccess.Read_,FileShare.Read, 4 * 1024 * 1024)
Dim binaryWriter As BinaryWriter = New BinaryWriter(dataFile)
Dim reader As New StringReader(FullFileName)
Dim byteRead As Byte
Dim j As Integer
If inMsg.BodyPart.Data.CanSeek Then
inMsg.BodyPart.Data.Position = 0
Else
Throw new exception("The stream is not seekable")
End If
For j = 0 To inMsg.BodyPart.Data.Length() - 1
byteRead = inMsg.BodyPart.Data.ReadByte
bw.Write(byteRead)
Next
bw.Close()
inMsg.BodyPart.Data = reader
inMsg.Context.Promote("LargeFileLocation", _
ABC_PROPERTY_SCHEMA_NAMESPACE,FullFileName)
FullFileName)
End Sub
End Class
End Namespace