Overblog Suivre ce blog
Administration Créer mon blog
30 décembre 2012 7 30 /12 /décembre /2012 14:34

 

There are many ways to set up your cube but there are definite best practices to implement if you want to scale your cube. The first point will just remind us that a cube is an analytical tool and should not be used as a source of detailed report. Drillthrough utilization is according to me to avoid as this mislead the users to the purpose of a cube.

Your cube is an analytical tool

Main issues that we encounter if the misleading use of OLAP cube. Because end users see potentials and look at the data onto Excel (that is very friendly for them to set up reports), they sometimes believe that details can be reported by doing MDX queries. Let’s take a simple classical example. We have a bunch of orders from an Ecommerce. One can analyse the ordered sales and quantity per day, month or year but we cannot set up a report per order or customer! Well technically you can do all but the cube is not meant to do this and defeat the pupose of aggregation designs for that! A cube should be an analytical tool! You will solve just by these principles probably 50% of performance issues!

Avoid complex measures if you can

By complex measures, I mostly refer to measures that cannot be well aggregated such as Distinct count or last child measures. Sums, Counts are the measures that should be use the most as they are easily aggregatable.  I am not saying not to use the other measure aggregates, I am just advising not to do them if not necessary. Your processing will then be optimized because your aggregation designs would be much easier for the engine to set. That is another 20-30% easy increase in performance there.

Partition your cube

Microsoft Best practice advises to partition your measure groups when it reaches 200 million rows. Microsoft also advise to avoid small partitions. Well, both are right and I strongly advise to partition your measure groups for the following reasons. The first one is to ease your processing. By separating your facts, you avoid to reprocess old historical facts that would never change. However those facts still remain in your cube as they are part of your analysis and mining strategies. I have many clients with million rows facts. There are many ways to optimize performance and one of them is to aggregate into fact tables granularities that are not useful. For example, we can aggregate order sales by day and product just because we would never analyze by customer. However, this is risky because your sales and marketing users might require additional dimensions such as the Age of the customer at the time of the order and your aggregate would then need to be recalculated. In other words, sure you can optimize your cube by having less fact rows but in general I advise more towards partitioning your fact tables and generally by date or incremental ids. By doing so, your daily, hourly processing would then be focus on your current partitions. Even with proactive caching it is nice to have a way to set up incremental processing on a specific partition.

Set up your cube partitioning based on life cycle of your facts

This is especially true when it comes to transactions, which is usually the main measures source of any cube. A transaction has always a life cycle, whether is it a one-day cycle, one-month cycle or one-year cycle. Rarely, your transaction can change at any time or if it does, this means your system has flaws. Your transaction should follow a clear deterministic life cycle. Otherwise you might have “Zombie” transactions in your system. Let’s take again the example of an order in an Ecommerce system. I will take a regular case. Your order is passed the same day you pay, then is fulfilled within the month and then can be reimbursed/return within the last 90 days. After 90 days, the order is closed and cannot be touched. Exceptions should be made only by manually requesting to the IT people to reopen the order status. Then we do statistics and we notice on this Ecommerce that the orders are 100% created, 90% paid, 70% fulfilled and 10% reimbursed within the 90 days and then 0.1% changed after 90 days. Of course we can say that incremental processing is not feasible due to the fact that orders changed at any time but this would prevent the Sales team to have a fair amount of analysis in almost real time. What I would do in the case of this Ecommerce is run incremental processing one every hour on the current partition, run a current partition processing daily (and set the partition to be worth a month of data) and then do a full partition processing daily of the previous three months. Finally I would merge the monthly partitions if they are small, every year, for the Year minus 2 (or merge in separate partitions if a year worth of data is more than 200 million rows). Finally, I would reprocess the whole thing once a month if not too big and if it is, I would process fully partitions up to 1 year old. Updating dimensions is based on how frequent the dimensions change but as we re process the current partition daily, I would just update dimensions usually once a day just before the partition processing.
This way, it would make the creation of orders appears within the hour on your cube (without proactive caching). The payment usually would appear within the hour but if not, it would appear the next day. The fulfillment and returns/reimbursement and order closing status would appear the next day as well.
Again, there is no need to be absolutely accurate for the real time because this is an analysis tool. If people likes to be accurate about the a specific order status, an OLAP cube is not the tool to use. A simple OLTP query on the production database for a specific order detail is what you need. As for how many orders are still in open status, one can do an asynchroneous query (which would be almost similar to our cube) or a synchroneous query on the production system (if this system is well indexed/ status partitioned/sharded). 

 Set up your cube processing schedule

You can then automate your processing by setting a small table that would gather all necessary information for your Measure group partitions (Cube name, measure group name, Partition Id, Minimum and maximum values for your indremental processing, query binding source). Once you have done that. You can use SSIS to set up all your processing needs.


Repost 0
Published by clementhuge.over-blog.com - dans Administration
commenter cet article
9 octobre 2012 2 09 /10 /octobre /2012 09:13

On Sql Server Standard

On Sql server Standard edition, this is actually easier in a way. You do not manage partition scheme or function or at least not in the way the enterprise version does. You actually set up tables to be your partitions. If you want to access several storage units, you can then affect different filegroup to the different tables.  In our case, I did not work on it but this would be easy to set this up. Identically to the previous example, here we will keep only three months worth of log data, keeping 4 rolling monthly partition scheme.

First, we create the 4 partitions (4 tables):

USE[DataPartitions]

GO

CREATE SCHEMA Standard Authorization dbo

GO

CREATE TABLE Standard.[LogTrace_Partition1](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition1]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

 

GO

CREATE TABLE Standard.[LogTrace_Partition2](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition2]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

CREATE TABLE Standard.[LogTrace_Partition3](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition3]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

CREATE TABLE Standard.[LogTrace_Partition4](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition4]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

 

Second, we create the partition scheme (1 table):

 

CREATE TABLE Standard.LogTracePartition

(

PartitionId tinyint,

TableName varchar(256),

IsCurrentPartition Bit,

NextPartitionId tinyint,

PurgePartitionId tinyint null,

CONSTRAINTPK_LogTracePartition PRIMARY KEY CLUSTERED (PartitionId)

)

Go

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (1,'Standard.LogTrace_Partition1',1,2,2)

GO

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (2,'Standard.LogTrace_Partition2',0,3,3)

GO

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (3,'Standard.LogTrace_Partition3',0,4,4)

GO

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (4,'Standard.LogTrace_Partition4',0,1,1)

GO

 

Third, we create the stored procedure to switch partition to be the current one (using Synonym and truncate the data we do not need to have on production):

 

/*

Author: Chuge

Creation Date: 2012-10-08

Reason: Partition scheme on standard edition to switch current partition

Sample: EXEC Standard.P_SwitchLogTracePartition @SynonymName = 'Standard.LogTraceS', @Debug = 1

select object_id('Standard.LogTraceS')

select * from Standard.LogTraceS

*/

CREATE PROCEDURE Standard.P_SwitchLogTracePartition

@SynonymName VARCHAR(256) = 'Standard.LogTraceS',

@Debug bit = 0

AS

 

SET NOCOUNT ON

DECLARE@TableNameNewPartition varchar(256), @TableNamePurgePartition varchar(256), @Stmt VARCHAR(1024), @NextPartitionId tinyint, @PurgePartitionId tinyint

 

BEGIN TRY

 

SELECT TOP 1 @NextPartitionId = NextPartitionId, @PurgePartitionId = PurgePartitionId

FROM Standard.LogTracePartition

WHEREIsCurrentPartition = 1

 

SELECT@TableNameNewPartition = TableName

FROM Standard.LogTracePartition

WHEREPartitionId = @NextPartitionId

 

SELECT@TableNamePurgePartition = TableName

FROM Standard.LogTracePartition

WHEREPartitionId = @PurgePartitionId

 

IF OBJECT_ID(@TableNameNewPartition) IS NULL

BEGIN

      SELECT 1

      RETURN

END

IF OBJECT_ID(@TableNamePurgePartition) IS NULL

BEGIN

      SELECT 2

      RETURN

END

 

IF OBJECT_ID(@SynonymName) IS NOT NULL

BEGIN

      SET @Stmt = 'DROP SYNONYM ' + @SynonymName

      IF @Debug = 1

            PRINT @Stmt

      ELSE

            EXEC ( @Stmt )

END

 

SET @Stmt = 'TRUNCATE TABLE ' + @TableNamePurgePartition

IF @Debug = 1

      PRINT @Stmt

ELSE

      EXEC ( @Stmt )

SET @Stmt = 'CREATE SYNONYM ' + @SynonymName + ' FOR ' + @TableNameNewPartition

IF @Debug = 1

      PRINT @Stmt

ELSE

      EXEC ( @Stmt )

 

IF @Debug = 1

      RETURN

 

UPDATE P

SETIsCurrentPartition = CASE WHEN PartitionId = @NextPartitionId THEN 1 ELSE 0 END

FROM Standard.LogTracePartition P

 

END TRY

BEGIN CATCH

      DECLARE @Error VARCHAR(512)

      SET @Error = ERROR_MESSAGE()

      RAISERROR(@Error,16,1)

      RETURN @@Error

END CATCH

SET NOCOUNT OFF

GO

 

 

Fourth, we create the scheduled job to switch partition at midnight the 1rst of each month;

 

 

USE [msdb]

GO

/****** Object:  Job [WebApper - Data partition Switch out]    Script Date: 02/01/2012 12:08:05 ******/

BEGIN TRANSACTION

DECLARE@ReturnCode INT

SELECT@ReturnCode = 0

/****** Object:  JobCategory [[Uncategorized (Local)]]]    Script Date: 02/01/2012 12:08:05 ******/

IF NOT EXISTS (SELECT name FROM msdb.Standard.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)

BEGIN

EXEC@ReturnCode = msdb.Standard.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

 

END

 

DECLARE @jobId BINARY(16)

EXEC@ReturnCode =  msdb.Standard.sp_add_job @job_name=N'WebApper - Data partition Switch out',

            @enabled=1,

            @notify_level_eventlog=0,

            @notify_level_email=0,

            @notify_level_netsend=0,

            @notify_level_page=0,

            @delete_level=0,

            @description=N'No description available.',

            @category_name=N'[Uncategorized (Local)]',

            @owner_login_name=N'sa', @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object:  Step [EXEC Standard.P_SwitchLogTracePartition @SynonymName = 'Standard.LogTrace', @Debug = 0]    Script Date: 02/01/2012 12:08:05 ******/

EXEC@ReturnCode = msdb.Standard.sp_add_jobstep @job_id=@jobId, @step_name=N'EXEC Standard.P_SwitchLogTracePartition @SynonymName = ''Standard.LogTrace'', @Debug = 0',

            @step_id=1,

            @cmdexec_success_code=0,

            @on_success_action=1,

            @on_success_step_id=0,

            @on_fail_action=2,

            @on_fail_step_id=0,

            @retry_attempts=0,

            @retry_interval=0,

            @os_run_priority=0, @subsystem=N'TSQL',

            @command=N'EXEC Standard.P_SwitchLogTracePartition @SynonymName = ''Standard.LogTrace'', @Debug = 0',

            @database_name=N'DataPartitions',

            @flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC@ReturnCode = msdb.Standard.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC@ReturnCode = msdb.Standard.sp_add_jobschedule @job_id=@jobId, @name=N'Webapper - Monthly',

            @enabled=1,

            @freq_type=16,

            @freq_interval=1,

            @freq_subday_type=1,

            @freq_subday_interval=0,

            @freq_relative_interval=0,

            @freq_recurrence_factor=1,

            @active_start_date=20110303,

            @active_end_date=99991231,

            @active_start_time=0,

            @active_end_time=235959

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC@ReturnCode = msdb.Standard.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

 

GO

 

On Sql server Express

There are even more limitations for the Express version such as no agent (need to have different scheduler) and database size is limited to 1 GB.
Your partitioning would then be managed by an application (scheduled services and web services probably).
The partitions would then be identical to the standard version one except that you would need to specify the context of the shard.
In the order example, one can decide to set up 24 databases, one for each monthly partition, assuming that a month worth of data/index would not be bigger than 1 GB. SQL server express data partitioning will work best with a service oriented architecture where all applications will need to go through intelligent web methods that will select the proper database to select or update the data.
Obviously you can use the system used in standard version, above, within the databases as well. You just need to be aware of the 1 GB limitation.

 

Repost 0
Published by clementhuge.over-blog.com - dans Administration
commenter cet article
9 octobre 2012 2 09 /10 /octobre /2012 09:11

Before going into details in how to setup data partitioning on SQL server, it is important to understand data partitioning concept. So first, I will give a simple definition of what it is.

 

Understanding the concept

Data partitioning is a way to spread data on the same logical structure into different physical storage with a specific organization.  The goal of data partitioning is to accelerate mainly IOs reads and writes.

Indeed, when you write your data, you can access several different physical storage units. Usually, when you insert data, you access the same partition but when you update, it can be on several partition. An example is when you update orders with a specific status (orders open, to be archived for example) and your partition scheme (this corresponds to the way you spread your data into different physical storage units) follows range of order seeded ids, you access several partitions at once, making your update faster. On the contrary, when you insert orders with the same partition scheme, you will access one partition, that we can usually the current partition, as orders will be inserted with incremental order id.

Following the same idea, when you read the orders by status, you will access several physical storage units at once, increasing your IO read parallelism, while you will read one partition when you like to access the most recently inserted orders.

Finally, maintenance on partitions is easier than on one big data storage unit. For example, old orders would not be accessed often, reducing the risk of real downtime when rebuilding offline indexes. Additionally data compression is completely understandable of old partitions as the access would be limited. Switching out to a different storage unit or purging an entire partition (for example, we like to keep only one year worth or orders) is much easier with a partition scheme set up.

One can define partition scheme as the mapping between the technical segmentation (range of order ids in our example) and the physical storage unit (on Sql server, filegroup and eventually physical file). Obviously Microsoft added additional layer such as the partition function, filegroup before files but it ends up been the segmentation and the physical storage unit mapping.

 

Implementing data partition

Before going into scripting, one should understand that there are many ways to implement data partitions.  Some MSSql Dbas will say that data partitioning is only available on enterprise version. Those DBAs just simplify that data partitioning is, just an implementation, instead of a concept. Data partitioning can be implemented in any database engine on any edition as soon as one knows the limitation of the edition. We will review methods of data partitioning on the three main editions of Sql server, three methods that are different due to limitation of the Sql server edition. Obviously the limitations of the engine are only due to the cost license of Microsoft SQL server. Microsoft thinks that Data partitioning is only relevant for large corporation and therefore should not be a matter on smaller editions.  Well, this is a point of view, rather a sales pitch, I would say.
In my experience, I set up data partitioning on the three different editions: Express, Standard and Enterprise. However small or big your company is, data is irrelevant to the budget you have. I do recall a very successful company using Sql server express edition. They were just using commodity – type servers with Express license to dedicate database shards. This was in telecommunication. In the mean time, I do remember another set up on standard version, with a table schema representing the partition scheme. Finally, for large table on enterprise, the use of Microsoft partition scheme and function was used.
I will present the three different set ups and first the one that most of people know, the enterprise edition data partitioning.

On Sql Server Enterprise

After SQL server 2005, Microsoft decided to add several features for VLD (very large database) only for the very few lucky companies that could afford Enterprise licensing.
Setting the partitioning first knows how your data will be stored, used and maintained. The classic example is your orders data.
Here is an example: your data will be extracted on data warehouses and only 1 year worth is relevant to the production. Your orders are inserted every day and your orders can be updated (mostly status of the order: open, fulfilled, paid, etc…) within 3 months. You have a billion orders per year.
You decide to ask your DBA to improve access to orders and making sure that this very large database is easy to maintain and no downtime will have to be endured when reindexing indexes occur.

The first think is to know how many partitions you will need. In this example, we will set up. Let’s say we like to set monthly partitions. As the minimum to keep is 1 year, the minimum to keep is 12 filled partitions. There are two options here.
The first one is to have your DBA creating every month a new partition and empty remove the old partition. This means that your DBA will need to be maintaining every month the partitions, which does not seem irrelevant. It is actually fairly simple if this becomes a routine deployment to do.
The other option is to set up a rolling partition scheme, where a month will overlap the other based on the “month number”. Therefore using 24 months to simplify the schema symmetry (you can use less month, but your maintenance routine would be more complex).

I did both. In the first option, you can use the order id to be your partition key as you will set up id range within your partitions.

In the second option, you do not have this choice because your id ranges would not be consistent, so you would use a persisted column to indicate the partition number to allocate your data. The persisting column would be a function based on the order creation date and therefore the month number. You can for example decide that all odd year would have month from 13 to 24 and all even year would be from 1 to 12.
Do not forget to align your partition scheme to the index cluster for performance boost. Always put your partition key at the end of each index.

Here is how you would set your persisted column based on the OrderCreationDate column and your partition schemes:

CREATE DATABASE DataPartitions

GO

UseDataPartitions

GO

CREATE SCHEMA Enterprise Authorization dbo

GO

CREATE PARTITION FUNCTION pfMonthNumber (tinyint)

AS RANGE RIGHT FOR

VALUES (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23);

GO

CREATE PARTITION SCHEME psMonthNumber

AS PARTITION pfMonthNumber

TO ([PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY] );

GO

CREATE TABLE Enterprise.TblOrders

(OrderId Int identity(1,1) not null, OrderCreationDate Datetime Not NULL Default (GetDate()),

OrderPartitionNumber AS CONVERT(TINYINT,(DATEPART(M, OrderCreationDate) + CASE WHEN DATEPART(YEAR, OrderCreationDate) % 2 = 0 THEN 0 ELSE 12 END)) PERSISTED NOT NULL

, Primary Key Clustered (OrderId, OrderPartitionNumber))

ONpsMonthNumber (OrderPartitionNumber)

GO

INSERT INTO Enterprise.TblOrders

(OrderCreationDate)

OUTPUT INSERTED.OrderId, INSERTED.OrderCreationDate, INSERTED.OrderPartitionNumber

VALUES ('2011-06-25'),('2011-07-18'),('2012-06-25'),('2012-07-18')

GO

 

Obviously, for the sake of your performance, your partition function would be spread on several filegroups that will redirect your data in different storage units. In our case, I did not work on it but this would be easy to set this up.

One can see above that an insert of different years would associate a different partition number, leading to keep 2 years worth of data in your table. By separating the data into 24 months, you can then easily keep 12 months by purging the data on the partitions of the other year.

The purge can be done with “switching out” partitions but will probably use a DELETE statement to keep the partition and a rebuild. It is longer and resource-wise taking more horse power from your server but it keeps the database to be “DBA” free of complex manipulation (switch out, merge, creation of partitions).

 

But obviously you can do the operation on partitions regularly instead.

Repost 0
Published by clementhuge.over-blog.com - dans Administration
commenter cet article
14 avril 2010 3 14 /04 /avril /2010 10:15

Hello Guys,

 

I cannot tell you how many clients have extremely bad security layer at the data-tier, thinking they will be off the hook because their security at the application tier is well defined.

This is nice to have ssl/https protocol layers and some firewall layers and all that jazz but if you let your logins do whatever they feel like once they are on the database server, you are in risk of losing your data, being stolen your data or even worse compromise the business overall.

How many clients develop their website with setting their application login as db_owner of their most trusted database and keep it this way once they go live on production due to the lack of expertise of this topic.

suddenly a few months later, they realize that sneaky injection attacks or hungry employees make their way onto the data tier and set up mayhem in the data. as soon as you are aware of it, you prevent the attacks making your application tier more secure and you fire the employees that provoked the horror. But unfortunately you realize it too late and your entire website is compromise. You then put your website down due to maintenance to fix all the mess as unfortunately you either do not have secured backups or if you have, you have no idea how to deal with point-in-time restore or you realize that your backups were not actually set up correctly.

Worse! you have to reinstall the entire database server because the attacks were very damaging even at the server layer.

This nightmare can happen and a good 10% to 20% of my clients call me to fix this kind of mess. One was because they just fired a developer that had all-in access to the data layer (with credit card information !) and another one due to sql injection. during a previous audit of their systems, I warned them a big deal about security but it was always at the bottom of their list, advocating they had to develop and deploy new functionalities first and if security was too much of a pain, they will look at that afterwards!

This is frustrating :-( but this is the life of a SQL server consultant. Warn the client, then secure what can be secure without dealing with the client application code (backup regime, having a development lifecycle with DBA validation).

What even worse is when the client needs to be PCI-DSS compliant. Then there are a huge task to do.

No matter what the goal of securing the data, here are the things to make sure to do:

1. reduce the amount of sysadmin people to users and sql service accounts: make sure the users are RESPONSIBLE for the sake of securing the production environment

2. Secure the backup folder and where the backups are robocopied, taped, etc.. Securization can be done at the file server layer and at the backup layer itself by setting up transparent database encryption (if this is something that you like me to develop I can)

3. Secure your sensible data: some data have to be secured like credit card data, password, etc... You can either use an application layer encryption, a dedicated box for encrypting the data or data encryption at the data layer with build-in encryption and symetric/asymetric keys. This is something I can also talk about if needed.

4. Your application logins need to have limited permissions to the databases it connects. There are many ways to minimize the application logins accesses. Here are the main ways:

4a. Minimize the access to stored procedure execution

4b. avoid cross-database query and avoid cross-database chaining: if not possible, keep the stored procedure execution level, add the execution by owner, grant the owner authentication to server, make the database trustworthy and set up synonyms to the other database tables. Make sure the owner of the source database does not have sysadmin permission and add the database owner to the databaase role dedicated to this security on the other database. 

4c. Forbid dynamic sql as much as you can especially if your stored procedure require execution as owner to access other database tables

4d. Avoid linked servers, not only for security, but for having a complete independant architecture and avoid distributed transactions.

4e. Use Active Directory / LDAP security if possible to set up common encryption, that is compliant to Microsoft.

4f. Make sure a DBA will audit regularly all security policy. Best would be that your DBA is the only person allowed to deploy anything on the server and on the production database.

4g. Use of schema and database role: to set up "dynamically" the permissions onto a database. Granting execute on the same schema stored procedures will allow your developers to have secured permissions to any stored procedures within the same schema and therefore permissions will be already set and maintainibility will be easy.

4h. Secure data and log file size so that any attacks will not make the other databases that share the ressources unavailable if something goes wrong with this databases. Obviously this require supervision setup and a DBA that can adjust sizes of database files if necessary.

4i. Separate ressources: usually the back office software development is not as tight as front office development for several reasons like the implementation of a third-party software or different developpers running a lot of weird services, ETL "injections", so on and so forth. Your front office databases can be at risk. It is better to separate instances and at best having different server even. Publication of back office data and extraction of front office data can be perform with thourough atchitecture setup using replication, ETL tools and custo;ized publication SQL development (Along my experience, publication of data is extremely frequent!). This allows to secure and avoid the heterogenenous sources on the front office and then security os easier to setup.

 

 

 

 

 

 

Repost 0
Published by clementhuge.over-blog.com - dans Administration
commenter cet article