Browsed by
Category: Integration Services

Various SQL date formats to use with SSIS, SSAS and/or SSRS

Various SQL date formats to use with SSIS, SSAS and/or SSRS

While designing data warehouses, SSIS packages, etc., like myself you’ve probably ran into the issue of having to format different date and time values that ends up in a long string of SQL code or expressions in SSIS. The purpose of this post is to collect some links to various pages where these kind of date and time formatting can be found.

Getting only date part of a datetime field

First of all, the function that I used the most is when I’m trying to get only the date part of a datetime field type. For example, when I’m looking at a sales order fact table, an order could have been placed at August 15th 2011 at 9:15am. This date would be stored as 2011-08-15 09:15:00:000. The issue here, is that it’s difficult to design your SSAS cube and use this field as a link to your time dimension table, because the time dimension table usually holds the dates only. Therefore, the following function could be used:


(thanks to Ben Nadel)

When you’re using SQL Server 2008 and above, you can also use the following function that is a lot easier to use. Since the date and time fields were introduced in SQL Server 2008, this won’t work in 2005 and earlier versions:


Date formats

Some simple date formats can be found here. These are just some standard SQL date formats but these could help you tremendously to avoid having to concatenate several strings together.

Calculations: First and last day of the month

You might have ran into the issue of calculating dates in relation to a certain input date. This page shows you how to get the first or last day of the month in relation to a certain date.

Calculations: First and last day of the year

Just as calculating the first and last day of the month, you can see here how you can calculate the first and last day of the year.

Standard functions

Other than the ones above, if you are looking for a calculation in terms of number of days, weeks, months, years, etc., the standard DATEADD and DATEDIFF functions in SQL would do the trick.

Monitor your SQL database size and disk space

Monitor your SQL database size and disk space

I faced the challenge of retrieving information about the size of different databases on different servers and the disk space left on the server. Basically I wanted to find a way how I can retrieve this data periodically so I could monitor the growth of the databases and the disk space left.

I decided to create an SSIS package from this that loops through the servers and databases to get the information out of it from one single package. In this post I’ll describe how I set this up. In case you’re facing a similar issue, bear in mind that you can basically get every piece of information about the databases as long as you can query it. I’ll stick to the basics to show how it works, I won’t go too much into detail which different data you can get out of it.
I’ll split the whole thing in parts so it’s well readable… the first couple of tasks look like this:
Before I start with the tasks in the package, I have created an OLE DB connection with a random server name and database. The location of the server and database name are really not important here because they will be changed later on.
The first task is a very straightforward task that retrieves all the SQL Servers where you want to retrieve the database information from. For now I just hardcoded the servers, you could have a table somewhere where these servers are listed.
Some settings that are important here is that on the general page, the ResultSet is set to Full result set. On the page Result Set, add one result set with the name 0 (zero) and in the variable name, select a variable that you have create before (the variable needs to be of type Object!).
Right after this package I have added a foreach loop container. This container has an ADO Enumerator with the result set variable from the previous package selected. On the section Variable mappings, add a variable of type String, that holds the name of the server in each iteration of this loop. The variable that I have used here is called “Server”.
The next package is a script task, and this task is responsible for setting the connection to the right server in each iteration. It’s a VB .NET script package, and you have to make sure that the variable “Server” is on the ReadOnly list.
Edit the script, and I have added the following code:
Dim ConnMgr As ConnectionManager = Dts.Connections(“ds.ServerName.master”)
Dim ConnStr As String = ConnMgr.ConnectionString
ConnStr = “Provider=SQLNCLI10.1;Data Source=” + Dts.Variables(“Server”).Value.ToString + “;Integrated Security=SSPI;Initial Catalog=master”
ConnMgr.ConnectionString = ConnStr
Dts.TaskResult = ScriptResults.Success
This code basically sets the connection manager that you have created before with a connection string for the different servers in each iteration.
Back to the control flow… the next package is an SQL task that retrieves all the databases from the server. This package has similar settings to the server, also a full result set in an object variable but then for the different databases on the server.
Time for the second part of the control flow:
In the tempdb of the server, I have created a table that holds the physical disk space of the server. The first task in the screenshot above creates the table, the second task inserts the data with the following statement:
INSERT INTO [tempdb].[dbo].PhysicalDisk
EXEC xp_fixeddrives
In my original setup I had an OLE source connection in the data flow task that retrieved the disk space with the EXEC xp_fixeddrives statement, but apparently SSIS didn’t like that very much… therefore I changed it to this setup, mainly because I couldn’t find a way to use a source connection component that executes a stored procedure.
The next step is a loop through the databases, and for each database, the data flow task is executed. The data flow task looks like this:
On the left side, several pieces of information about the database is retrieved, like the logical name, physical name, size, etc. Since this query is different for every loop, I have put the query in an expression variable.
On the right side, I retrieve the physical disk space that I have added in the previous steps in the control flow.
I’ve joined this information together, and just for testing purposes I have put this in a text file. Eventually I’ll put a database connection as the destination in order to keep track of the database size.
Hierarchical data with SCD Type 2

Hierarchical data with SCD Type 2

The question that is always asked with storing data with Slowly Changing Dimensions, is to what extent you want to keep the historical data. I would like to explain a situation of hierarchical data that is stored with the SCD method and how this affects the data.

As an example where I first got in touch with this subject, I’ll take the hierarchical relationship between employees and managers.
The employees’ data is stored as SCD Type 2, and this table also includes a column “Manager”. This column has a foreign key relationship to the surrogate key of the same table. So what does this do to your data?
To keep the example simple, I’ll take two employees (John and Jane), and Bill is their manager. This results in the following structure in the table:
Let’s say Bill moves to another location as of October 1st 2010. Since I have the column “Location” also set to a type 2 change, you might expect the following changes and additions to the table:
The changes for Bill seems correct, an extra line is added with a new surrogate key (line 4) and his original line has gotten an end date. Now I still know that Bill has been working in New York and then in Boston.
Nothing special than a simple SCD type 2 change until so far. But the above situation, is that really what should happen, isn’t there more that should be done? I probably wouldn’t have posted this if it was just that 🙂
Look at John and Jane. They have Bill as their manager because they both have the surrogate key of Bill in the column “ManagerKey”. The active record for Bill has now been changed to key 4, so Jane and Joe have to “change manager” as well. I have put this as a quote, because they don’t really change managers, but the link to their manager’s record in the employees table has changed.
Therefore, the end result of Bill’s change adds the following lines to the table for his subordinates:
In this example it doesn’t seem like a big change, but this entails that every single change in a manager’s data, the subordinates all the way down will get a new record in the table as well. In my case, there is a new CEO in the organization that led to these changes for every one of the 2500 employees in the company!
We’re not talking about a lot of data here, but in other hierarchical structures than employee data, this could lead to a massive data load that is not even necessary. Therefore you should ask yourself if you really need that information so much in detail. If you don’t need this, there are two options that look like the most straight-forward options:
  1. Change the column “ManagerKey” to an SCD Type 1 column. If this is used in the above example, the manager’s key in John’s and Jane’s profile will be overwritten from 3 to 4. A drawback of this option is that you lose the history of who has been whose manager.
  2. If you need to keep the history, you can use a foreign key relationship not on the surrogate key, but on the business key. In the above example, John and Jane would have the value 145 in the column “ManagerKey”, and nothing changes with their information and no new record for them are added.
It is advised to make this choice early on in your design phase, since going back from one to another could be a very time consuming job!