SSIS DATA PROFILING

By | April 3, 2016

SQL Server Integration Services offers a useful tool to analyze data before you bring it into your Data Warehouse.  The Profile Task will store the analysis in an XML file, which you can view using the Data Profile Viewer.  Before we review how to use the Profile Task, let’s take a look at the eight types of profiles that can be generated by this control.

  • Candidate Key Profile Request
    • Use this profile to identify the columns that make up a key in your data
  • Column Length Distribution Profile Request
    • This profile reports the distinct lengths of string values in selected columns and the percentage of rows in the table that each length represents. Use this profile to identify invalid data, for example a United States state code column with more than two characters.
  • Column Null Ratio Profile Request
    • As the name implies, this profile will report the percentage of null values in selected columns. Use this profile to identify unexpectedly high ratios of null values.
  • Column Pattern Profile Request
    • Reports a set of regular expressions that cover the specified percentage of values in a string column. Use this profile to identify invalid strings in your data, such as Zip Code/Postal Code that do not fit a specific format.
  • Column Statistics Profile Request
    • Reports statistics such as minimum, maximum, average and standard deviation for numeric columns, and minimum and maximum for datetime columns. Use this profile to look for out of range values, like a column of historical dates with a maximum date in the future.
  • Column Value Distribution Profile Request
    • This profile reports all the distinct values in selected columns and the percentage of rows in the table that each value represents. It can also report values that represent more than a specified percentage in the table.  This profile can help you identify problems in your data such as an incorrect number of distinct values in a column.  For example, it can tell you if you have more than 50 distinct values in a column that contains United States state codes.
  • Functional Dependency Profile Request
    • The Functional Dependency Profile reports the extent to which the values in one column (the dependent column) depend on the values in another column or set of columns (the determinant column). This profile can also help you identify problems in your data, such as values that are not valid. For example, you profile the dependency between a column of United States Zip Codes and a column of states in the United States. The same Zip Code should always have the same state, but the profile discovers violations of this dependency.
  • Value Inclusion Profile Request
    • The Value Inclusion Profile computes the overlap in the values between two columns or sets of columns. This profile can also determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables. This profile can also help you identify problems in your data such as values that are not valid. For example, you profile the ProductID column of a Sales table and discover that the column contains values that are not found in the ProductID column of the Products table.

Leave a Reply

Your email address will not be published. Required fields are marked *