Before going into details in how to setup data partitioning on SQL server, it is important to understand data partitioning concept. So first, I will give a simple definition of what it is.
Understanding the concept
Data partitioning is a way to spread data on the same logical structure into different physical storage with a specific organization. The goal of data partitioning is to accelerate mainly IOs reads and writes.
Indeed, when you write your data, you can access several different physical storage units. Usually, when you insert data, you access the same partition but when you update, it can be on several partition. An example is when you update orders with a specific status (orders open, to be archived for example) and your partition scheme (this corresponds to the way you spread your data into different physical storage units) follows range of order seeded ids, you access several partitions at once, making your update faster. On the contrary, when you insert orders with the same partition scheme, you will access one partition, that we can usually the current partition, as orders will be inserted with incremental order id.
Following the same idea, when you read the orders by status, you will access several physical storage units at once, increasing your IO read parallelism, while you will read one partition when you like to access the most recently inserted orders.
Finally, maintenance on partitions is easier than on one big data storage unit. For example, old orders would not be accessed often, reducing the risk of real downtime when rebuilding offline indexes. Additionally data compression is completely understandable of old partitions as the access would be limited. Switching out to a different storage unit or purging an entire partition (for example, we like to keep only one year worth or orders) is much easier with a partition scheme set up.
One can define partition scheme as the mapping between the technical segmentation (range of order ids in our example) and the physical storage unit (on Sql server, filegroup and eventually physical file). Obviously Microsoft added additional layer such as the partition function, filegroup before files but it ends up been the segmentation and the physical storage unit mapping.
Implementing data partition
Before going into scripting, one should understand that there are many ways to implement data partitions. Some MSSql Dbas will say that data partitioning is only available on enterprise version. Those DBAs just simplify that data partitioning is, just an implementation, instead of a concept. Data partitioning can be implemented in any database engine on any edition as soon as one knows the limitation of the edition. We will review methods of data partitioning on the three main editions of Sql server, three methods that are different due to limitation of the Sql server edition. Obviously the limitations of the engine are only due to the cost license of Microsoft SQL server. Microsoft thinks that Data partitioning is only relevant for large corporation and therefore should not be a matter on smaller editions. Well, this is a point of view, rather a sales pitch, I would say.
In my experience, I set up data partitioning on the three different editions: Express, Standard and Enterprise. However small or big your company is, data is irrelevant to the budget you have. I do recall a very successful company using Sql server express edition. They were just using commodity – type servers with Express license to dedicate database shards. This was in telecommunication. In the mean time, I do remember another set up on standard version, with a table schema representing the partition scheme. Finally, for large table on enterprise, the use of Microsoft partition scheme and function was used.
I will present the three different set ups and first the one that most of people know, the enterprise edition data partitioning.
On Sql Server Enterprise
After SQL server 2005, Microsoft decided to add several features for VLD (very large database) only for the very few lucky companies that could afford Enterprise licensing.
Setting the partitioning first knows how your data will be stored, used and maintained. The classic example is your orders data.
Here is an example: your data will be extracted on data warehouses and only 1 year worth is relevant to the production. Your orders are inserted every day and your orders can be updated (mostly status of the order: open, fulfilled, paid, etc…) within 3 months. You have a billion orders per year.
You decide to ask your DBA to improve access to orders and making sure that this very large database is easy to maintain and no downtime will have to be endured when reindexing indexes occur.
The first think is to know how many partitions you will need. In this example, we will set up. Let’s say we like to set monthly partitions. As the minimum to keep is 1 year, the minimum to keep is 12 filled partitions. There are two options here.
The first one is to have your DBA creating every month a new partition and empty remove the old partition. This means that your DBA will need to be maintaining every month the partitions, which does not seem irrelevant. It is actually fairly simple if this becomes a routine deployment to do.
The other option is to set up a rolling partition scheme, where a month will overlap the other based on the “month number”. Therefore using 24 months to simplify the schema symmetry (you can use less month, but your maintenance routine would be more complex).
I did both. In the first option, you can use the order id to be your partition key as you will set up id range within your partitions.
In the second option, you do not have this choice because your id ranges would not be consistent, so you would use a persisted column to indicate the partition number to allocate your data. The persisting column would be a function based on the order creation date and therefore the month number. You can for example decide that all odd year would have month from 13 to 24 and all even year would be from 1 to 12.
Do not forget to align your partition scheme to the index cluster for performance boost. Always put your partition key at the end of each index.
Here is how you would set your persisted column based on the OrderCreationDate column and your partition schemes:
CREATE DATABASE DataPartitions
CREATE SCHEMA Enterprise Authorization dbo
CREATE PARTITION FUNCTION pfMonthNumber (tinyint)
AS RANGE RIGHT FOR
CREATE PARTITION SCHEME psMonthNumber
AS PARTITION pfMonthNumber
TO ([PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY] );
CREATE TABLE Enterprise.TblOrders
(OrderId Int identity(1,1) not null, OrderCreationDate Datetime Not NULL Default (GetDate()),
OrderPartitionNumber AS CONVERT(TINYINT,(DATEPART(M, OrderCreationDate) + CASE WHEN DATEPART(YEAR, OrderCreationDate) % 2 = 0 THEN 0 ELSE 12 END)) PERSISTED NOT NULL
, Primary Key Clustered (OrderId, OrderPartitionNumber))
INSERT INTO Enterprise.TblOrders
OUTPUT INSERTED.OrderId, INSERTED.OrderCreationDate, INSERTED.OrderPartitionNumber
Obviously, for the sake of your performance, your partition function would be spread on several filegroups that will redirect your data in different storage units. In our case, I did not work on it but this would be easy to set this up.
One can see above that an insert of different years would associate a different partition number, leading to keep 2 years worth of data in your table. By separating the data into 24 months, you can then easily keep 12 months by purging the data on the partitions of the other year.
The purge can be done with “switching out” partitions but will probably use a DELETE statement to keep the partition and a rebuild. It is longer and resource-wise taking more horse power from your server but it keeps the database to be “DBA” free of complex manipulation (switch out, merge, creation of partitions).
But obviously you can do the operation on partitions regularly instead.