Script Components
The script component
is one of the easiest ways to extend the functionality of the data flow.
It allows access to the data flow using script written in Visual Basic
.NET. When should you use a script component instead of a custom
component? Here are some advantages to using the script component:
Less component housekeeping required, allowing you to focus more on scripting the transformation logic.
Gives
guidance by providing script templates and classes that are generated
based on how the component is set up in the data flow. This, in turn,
improves the effectiveness of IntelliSense in the script editor.
The disadvantages include:
It’s harder to port scripts from one package to another than to install a custom component that can be used in any package.
Scripts must be updated to use the latest component column metadata whenever the metadata has changed.
Some
data flow functionality, such as multiple inputs, is not available to
the script component, due to the limitations of the component or its UI
editor.
To compare a
script component to a custom component further, let’s walk through
creating a script component that does the same transformation as the
component we created earlier in the section about writing custom
components. That is, the script component will take an input column of
data type integer and map that integer to the name of a month. For example, 1 will be mapped to “Jan”.
First create a data flow
in the Integration Services designer, and then add a source adapter that
contains at least one output column with a data type of DT_I4
to be used as the input column to our script component. If you have the
data flow from the section on constructing packages programmatically,
you can load that package in the designer as your starting point.
Once the data flow
with the source adapter is set up, place a script component onto the
design surface. A dialog box appears asking if the script component is
to be a source adapter, a transformation, or a destination adapter.
Select the Transformation option, which initializes the component to be a
synchronous transform with one input and one output. Internally in the
component, this means the SynchronousInputID
property of the output is set to the ID of the input. Next, connect the
output of the source adapter to the input of the component. This makes
the output columns from the adapter available as upstream columns for
the component, which means those columns can be selected as input
columns for the component.
Now we are ready to
select which input column we want to use. Double-click on the script
component to bring up its UI editor. Notice that this is the editor for
the script component, which lets you modify the component’s metadata,
and is not the editor for modifying the script itself. Select the Input
Columns page. Here the upstream columns originating from the source
adapter should be displayed as available input columns. Select the
column of data type DT_I4
to be used as the input column for the script component. If you are
using the data flow from the section on programmatic package
construction, select the EmployeeID column from the
HumanResources.Employee table, as shown in Figure 3. This adds an input column to the script component. Internally in the component, the LineageID of the input column is set to the LineageID of the upstream output column. Also notice that the usage type of the input column is set to ReadOnly
by default, which is correct for our case because the script component
will only be reading the value of the column and not writing into the
column. Although EmployeeID is not the most likely set of data to be
mapped to a month name, we will use it in this sample. Another
option for users who do not want to use EmployeeID is to change the data
source so that provides real month data—for example, “select month(startdate)as MonthNumber from DimPromotion” from the AdventureWorksDW database.
After selecting the
input column, we must add an output column to store the name of the
month. In the script component UI editor, select the page for Inputs and
Outputs, which displays the input and output of the script component in
a tree view. Rename the input to MapMonthInput and the output to MapMonthOutput. Expand the output, and select Output Columns. Click Add Column, and then rename the new output column to Month. Select the new output column, and change its data type to DT_WSTR, with a length of 7, as shown in Figure 4.
With the input and output
columns in place, we are ready to write the script for the component. In
the editor, select the Script page and click Design Script to bring up
Microsoft Visual Studio for Applications set up as the script editor for
the component. Here you can open up the Project Explorer window to see
the references that are set and the classes, such as BufferWrapper and ComponentWrapper, that are needed to make the script execution in the data flow possible. The class we need to edit is the ScriptMain class, which contains a template for an overridden method called MapMonthInput_ProcessInputRow.
This method will be called once for each row in the data buffer that
flows through the component, with the row passed into the method as the Row parameter. Add the code shown in Listing 3 to the class.
Listing 3. Script component example
Public Class ScriptMain
Inherits UserComponent
Private Month() As String = New String() {"Invalid", "Jan", "Feb", "Mar", "Apr",
"May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"}
Public Overrides Sub MapMonthInput_ProcessInputRow(ByVal Row As MapMonthInputBuffer)
If Row.EmployeeID < 1 Or Row.EmployeeID > 12 Then
Row.Month = Month(0)
Else
Row.Month = Month(Row.EmployeeID)
End If
End Sub
End Class
|
The Row
object has been generated with properties that correspond to the input
and output columns we added to the component, making access to the data
in the buffer very easy. You can see the implementations of these
properties in the BufferWrapper class. If you worked through the section on writing custom components, you will notice that the custom component’s ProcessInput method takes an entire data buffer and iterates through the rows within the method. In contrast, our script component’s MapMonthInput_ProcessInputRow method is passed a row at a time. What happened to the standard ProcessInput method for the script component? It is still there, but in the ComponentWrapper class. If you bring up the code for that class, you can see the ProcessInput method and how it iterates through the rows in the buffer, calling the MapMonthInput_ProcessInputRow method once per row. Figure 5 shows the script for the component being edited.
Once our script has
been added, close the script editor and then the script component
editor. This completes the configuration of our script component; its
output can be attached to the input of another transform or destination
adapter that needs the Month output column generated by the component.
This example shows
how to use the script component in the most common scenario: as a
synchronous transform. However, if you need to implement more advanced
functionality in the component, knowledge of data flow concepts becomes
invaluable. For example, if you need the script component to be a
transform with an asynchronous output instead of a synchronous output
that the UI sets up initially, you need to know that you must adjust the
configuration made by the UI by manually setting the SynchronousInputID property on the component’s output to 0 using the script component’s UI editor.