Mardi 15 janvier 2 15 /01 /Jan 15:01

Hello Guys again,

Although my blog main concern is about SQL server, I like to share my experience in the telecommuting field, misconceptions and reality.

1. Telecommuters works more efficiently than Onsite people

I believe hiring telecommuters is eventually more efficient than OnSite people. Obviously, a telecommuter would come once a day or once a month or once a week or once a year to show his face and gather information but with the current technology, web conferences, training over the internet have become much better and easier so that working remotely is not even a new thing.

Most outsourcing in India is done via Skype, teamviewer, Webex, etc... Internet is also the main advantage for those Indian people, besides cost.

Also, flexibility, project-based budget and cost eficiency can be added to that. The telecommuter is happy about being at home most of the time and therefore can take a pay cut OR is ok to be independant and have less paid leave OR cost less in unnecessary additional cost like food, travelling reimbusement, professional phone, car, desk, computer, etc...

I have been successful to work with many consulting companies and direct clients this way. None of them thought I was not up to the task and get their money back easily.

 

2. Key to social progress

Another big aspect for telecommuter is that TIME IS MONEY! The first thing you gain is your commute. You gain like 1 to 2 hours on average PER DAY! of time, note considering transport cost as well.

But not only that, it is more difficult to accept part-time jobs when working on-site as your commute is proportionnally more expensive in terms of time and transport cost.

A telecommuter could easily ask to work part-time or 3/4 of the time instead of full-time. The company has interest in this as well as he can cut his cost.

Finally I think it is a key to social progress. Why do we need full-time people. Why a full-time DBA? Isn't it much better to get 3 part-time DBA, so that when one go to vaccation, the other is there. If one does nto know the answer, the other one does... Why not having talent for less cost and having several talents at once? Everybody is happy and this is why I call it social progress! 

Telecommuters usually do not care too much about career or money. They care about their well-being. They accepted to eearn less to spend more time with their family and/or their hobby. 

I believe it is the future for social progress! I hope people agree with me and hope that in the future companies would thinki about this way!

Telecommuting does not mean outsourcing to Indian people in India. It just mean getting more talents that chose well-being instead of career!

I am open to any comments on that.

Best!

 

 

 

Par clementhuge.over-blog.com - Publié dans : General
Write a comment - See the 0 comments
Mardi 15 janvier 2 15 /01 /Jan 14:11

hello guys,

Today's topic relate to a series of trainings I spent doing at one of the top 10 ECommerce in Europe as a database/etl architect for 2 years time. The idea of those series were to train junior to senior level developpers on how to optimize their applicative code with database-related operations.

One of the series that I like particularly was about sending arrays of value to the database in order to do a series of DML operations.

I still have the video series and they have no relation to anything done in this Ecommerce company. The copyright is mind, so I can definitely share it with people. However, they are way too big to share on a blog.

I have also seen similar training sessions on the Net but I would happy to discuss with your projects on how to implement it in Coldfusion, .Net or Php... or any other kind of Web language.

I will summarize my findings and my advices on this article regarding this need that is quite often overlooked in the web industry.

1. Ad-hoc versus stored procedure

As a DBA, there is absolutely no controversy here for me. Any applicative code should be calling stored procedures when reaching the database for any DML operations or even Selects. 

Some DBAs can say it is because of performance. Well, this reason is shaky. It is true that stored procedure helps the SQL server engine to reuse execution plans. However, performance can be sometimes an issue when dealing with searches. 

Many searches like "Give me the orders for specific dates and/or with specific customer names and/or specific cities, etc..." have what we call dynaic portions of (hopefully) SARGable conditions and stored procedure with CASE WHEN conditions will not optimize the queries. 

There have been a lot of debate here but the truth and the solution is somewhat simple. You will eventually use parameterized query.

Parameterized query will help you gain the performance needed and increase security and avoid sql injection with proper parameter check management, at the application layer or at the stored procedure laywer. 

And in the last sentence I gave my preferrence: Stored procedure. 

Yes, indeed! Even though some queries might be very dynamic, for security reason (and not really for performance reason (just a bit), stored procedure is unequivocal for me.

Having that said, if you want to update a table, for instance, on several rows (for example, you like to update the shopping cart on one stored procedure call), you end up passing several values for the same parameters (for exemple, CartDetailIds) to the database. 

2. Fetch stored procedure calls vs one stored procedure call

Keeping the same example, you can decide in one click of a button to update the shopping cart (addition of quantities, removal of products, etc...). As I sais, there are three ways to update your cart:

1 is to build dynamically your DML operation and do one execution dynamically: as I stated, this is not very good for security reason.

Another one is to call n times the same stored procedure that would update each cart detail, one at a time, like fetching stored procedure calls. This solution is not good as well for two reasons. The first one is that you will need to deal with a transacxtion block at the application layer and I would rather keep the transaction as short as possible. The second reason is that the performance is significantly degraded. Do the test. I did the test and I could see significant differences. (I can share the tests upon request).

Finally, you can pass all the details to one stored procedure call and let the stored procedure deal with the transaction and the update operation at once. So in this example, the trick would be to pass some kind of array of data with CartDetailId and Quantity values, quantity 0 would be to remove the line. We will discuss later how we pass such arrays but fist, conceptually, I like to pin point the fact that you could definitely improve performance by using Merge statement (to deal with inserts, updates and deletes of Cart details at the same time) and then the Ouptut clause to render to the application the results of the changes (notably render the inserts as new CartIds would be created. If you did not have any other table to update (which is probably not true as you probably would update the Cart table (for the total amount column or a last update date column), you would then do an implicit transaction and not even need to deal with transaciton block.

3. Solution to pass arrays of data to the stored procedures

There are for this 4 solutions. I tested all three solutions and I will comment worse to best performance solutions.

The worse one is the one that calls several time a stored procedures but this one was already discarded for reasons stated above. On a scale of performance, this solution is noted 1/10.

The second one is passing arrays with lists (texte with a separator). For example in our case we could pass @CartDetailIds = '541001,541002,541004' with @CartDetailQuantities = '1,0,5'. The stored procedure would then parse the texts and allocate to each cartdetailid the corresponding quantity. To do so, you need to create a parsing function. This is pretty classic and this kind of function can be done in any sql server version or any database engines (oracle or Mysql, etc...). This makes this solution universal and therefore useful if you are not committed to a specific technology. Then you would use this function to gather the pieces and insert all data onto one single variable table. The preparation would be done and the DML operation would then be possible. I would rate this solution in terms of performance a good 7/10 performance. This job does the job nicely and answer all the issues: security, performance, scalability.

The third one is passing an xml. You can pass an array with an XML fragment like <CartDetail id="541001" Quantity="1"/><CartDetail id="541002" Quantity="0"/><CartDetail id="541004" Quantity="5"/>. Once passed onto the stored procedure to an xml parameter @CartDetails, you can rebuild a table with xquery and do then the same than above. The performance is good and I would rate it a bit higher than the first one. Actually even in terms of perfornance, it bits the first solution but not by much. I would give a 8/10. Note that Xml uses a lot of verbose. The advantage is the universality of the language for sure. However, the text is longer to pass and if you have to pass thousands of rows, it might be a concern.

The third solution is passing a table-valued parameter (TVP). This solution is by far the most efficient one. Again during my test, performance is very good and much better than the previous solution (if you push the stress to hundreds of thousand of rows at once). Instanciating a table can be done on framework 3.5 and above on .Net. Another advantage is the referential integrity of the data passed into the stored procedure as you can decide the exact datatypes of each column. In our example, quantity should be a smallint, allowing values from 0 to 255 maximum and reducing the storage and eventually improving the performance. Your main concern when you use TVP is how to change a stored procedure if you use the same TVP in any stored procedure. You cannot change a table datatype if it is used already. You might consider then TVP versions to get the release not on one big shot and increase regression testing time. But this can be managed nicely if you know how to deal with versions.

4. Things that make the small difference

Now that I talk about performance (and be more a DBA), I like to talk about architecture. I like TVP for another reason. Sometimes I am a bit too neat on the architecture and TVP helps me being neater again. As I like to enforce a filegroup different of primary on all my tables, as I like to define datatypes like telephone or email and use those user-defined datatypes on my tables, I like TVP because I can enfore integrity on all my parameters, even the array ones. 

When integrity struggles with performance, like FK might, there can be a debate but we are talking about defining a datatype or a check constraint, I am in favor of spending the time on it. Once we spent the time on it, maybe use an additional day or two to cover all grounds on each table, then we achieve top-of-the art storage. Data partitioning is in the same kind of things when you need to spend time to render the best copy for your database structure.

So TVP is the same kind of thing. Do not hesitate to spend extra-time and ask extra question to the porject manager/business analyst to get the best storage and best performance form your database.

Then everything will fall into pieces nicely: integrity, security, scalability.

Do not hesitate to ask me question and ask me for the videos related to this training.

Have a good day!

Par clementhuge.over-blog.com - Publié dans : Architecture
Write a comment - See the 1 comments
Dimanche 30 décembre 7 30 /12 /Déc 14:34

 

There are many ways to set up your cube but there are definite best practices to implement if you want to scale your cube. The first point will just remind us that a cube is an analytical tool and should not be used as a source of detailed report. Drillthrough utilization is according to me to avoid as this mislead the users to the purpose of a cube.

Your cube is an analytical tool

Main issues that we encounter if the misleading use of OLAP cube. Because end users see potentials and look at the data onto Excel (that is very friendly for them to set up reports), they sometimes believe that details can be reported by doing MDX queries. Let’s take a simple classical example. We have a bunch of orders from an Ecommerce. One can analyse the ordered sales and quantity per day, month or year but we cannot set up a report per order or customer! Well technically you can do all but the cube is not meant to do this and defeat the pupose of aggregation designs for that! A cube should be an analytical tool! You will solve just by these principles probably 50% of performance issues!

Avoid complex measures if you can

By complex measures, I mostly refer to measures that cannot be well aggregated such as Distinct count or last child measures. Sums, Counts are the measures that should be use the most as they are easily aggregatable.  I am not saying not to use the other measure aggregates, I am just advising not to do them if not necessary. Your processing will then be optimized because your aggregation designs would be much easier for the engine to set. That is another 20-30% easy increase in performance there.

Partition your cube

Microsoft Best practice advises to partition your measure groups when it reaches 200 million rows. Microsoft also advise to avoid small partitions. Well, both are right and I strongly advise to partition your measure groups for the following reasons. The first one is to ease your processing. By separating your facts, you avoid to reprocess old historical facts that would never change. However those facts still remain in your cube as they are part of your analysis and mining strategies. I have many clients with million rows facts. There are many ways to optimize performance and one of them is to aggregate into fact tables granularities that are not useful. For example, we can aggregate order sales by day and product just because we would never analyze by customer. However, this is risky because your sales and marketing users might require additional dimensions such as the Age of the customer at the time of the order and your aggregate would then need to be recalculated. In other words, sure you can optimize your cube by having less fact rows but in general I advise more towards partitioning your fact tables and generally by date or incremental ids. By doing so, your daily, hourly processing would then be focus on your current partitions. Even with proactive caching it is nice to have a way to set up incremental processing on a specific partition.

Set up your cube partitioning based on life cycle of your facts

This is especially true when it comes to transactions, which is usually the main measures source of any cube. A transaction has always a life cycle, whether is it a one-day cycle, one-month cycle or one-year cycle. Rarely, your transaction can change at any time or if it does, this means your system has flaws. Your transaction should follow a clear deterministic life cycle. Otherwise you might have “Zombie” transactions in your system. Let’s take again the example of an order in an Ecommerce system. I will take a regular case. Your order is passed the same day you pay, then is fulfilled within the month and then can be reimbursed/return within the last 90 days. After 90 days, the order is closed and cannot be touched. Exceptions should be made only by manually requesting to the IT people to reopen the order status. Then we do statistics and we notice on this Ecommerce that the orders are 100% created, 90% paid, 70% fulfilled and 10% reimbursed within the 90 days and then 0.1% changed after 90 days. Of course we can say that incremental processing is not feasible due to the fact that orders changed at any time but this would prevent the Sales team to have a fair amount of analysis in almost real time. What I would do in the case of this Ecommerce is run incremental processing one every hour on the current partition, run a current partition processing daily (and set the partition to be worth a month of data) and then do a full partition processing daily of the previous three months. Finally I would merge the monthly partitions if they are small, every year, for the Year minus 2 (or merge in separate partitions if a year worth of data is more than 200 million rows). Finally, I would reprocess the whole thing once a month if not too big and if it is, I would process fully partitions up to 1 year old. Updating dimensions is based on how frequent the dimensions change but as we re process the current partition daily, I would just update dimensions usually once a day just before the partition processing.
This way, it would make the creation of orders appears within the hour on your cube (without proactive caching). The payment usually would appear within the hour but if not, it would appear the next day. The fulfillment and returns/reimbursement and order closing status would appear the next day as well.
Again, there is no need to be absolutely accurate for the real time because this is an analysis tool. If people likes to be accurate about the a specific order status, an OLAP cube is not the tool to use. A simple OLTP query on the production database for a specific order detail is what you need. As for how many orders are still in open status, one can do an asynchroneous query (which would be almost similar to our cube) or a synchroneous query on the production system (if this system is well indexed/ status partitioned/sharded). 

 Set up your cube processing schedule

You can then automate your processing by setting a small table that would gather all necessary information for your Measure group partitions (Cube name, measure group name, Partition Id, Minimum and maximum values for your indremental processing, query binding source). Once you have done that. You can use SSIS to set up all your processing needs.


Par clementhuge.over-blog.com - Publié dans : Administration
Write a comment - See the 0 comments
Mercredi 21 novembre 3 21 /11 /Nov 20:03

1. Understanding the concept

One of the main challenges to a distributed system is how to handle transactions across several servers and technologies. 

What if your application connect for the same transactions to several web services or database servers and if any failure occurs, need a rollback of all that is been implemented? 

How do you handle this also during an ETL workflow with several servers on several data centers?

I can give you many examples where transactions are not easy to handle. Noting that distributed transactions could also be dangerous and block the entire system if a rollback cannot occur

 

2. Do not mix technologies

One of the nightmares I encounter during my consulting life is the following example. Let’s assume that a stored procedure ran from a SQL server job does push from a linked server a large amount of data and then update several tables from the source database and finally bcp a bunch of data to flat files and send an email with an attached file with error (errors bcp’ed to flat files too).

During all of the process, a transaction block is designed with try catch block to roll it all back.

In some case the bcp portion (run with Xp_cmdshell) gets blocked by an OS that do not respond for various reasons (that could be a memory leak). The production supervisor decides to stop the job, triggering a rollback. Unfortunately he waits hours and the rollback does not occur and the main table where data were pushed on the destination server gets frozen by a lot of locks? The supervisor calls the DBA on-call that eventually fix the issue by ending the OS process that runs the bcp and was hanging on the SQL server box.

In this case, the main issue is the distribution of the transactions on several software and here the rollback is initiated by SQL server engine (by stopping the job) but has no permission to rollback the DOS cmd (the bcp) initiated by a process on the OS level.

What does this mean? Do not mix technologies! Because if everything works well, the transaction will be committed and nothing will fail (because the Dos cmd is actually already committed as the transaction scope within SQL server does not have any authority).

In this very case, it is much better to consider the tasks to be independent and play each step only if the previous step works. In this case, it is better to implement it within an ETL/Workflow system. Each step, you can constrain the success and failure and based on a log system, mark a rollback or not on the specific technology of the step that failed.

3. Keep it simple

Obviously, a distributed environment require a certain level of complexity when dealing with transactions but if you can make it simple, make it simple!

Try to identify what is really transactional and what is not for 2 reasons: one is for performance but one is also to make your transaction simple.

A good architect will always look for new technology, look at various ways to set up transactions on distributed environment but will use manly his common sense to make the system viable especially if one of the components fails.

4. Keep it scalable

The “keep it simple” paragraph is probably the hardest one to describe as by means, a transaction on a distributed environment is complex. 

“Keep it scalable” is easier to explain and is plain common sense. Most of the developers would make it work! Most of the architect would make it work with the right environment setup and the right software layers.

For example, if the task is to purge a multi-million row tables on a SQL server platform. As the production environment has man y concurrencies, he will make sure to keep the transactions short by implementing/using indexes and by keeping the DML operations as small as possible (by packets usually).

5. Keep it manageable

Another thing to focus on is to make sure that complex system has a very clear documentation and technology that is easily to manage. Another classic mistake of many companies is to setup technology that they cannot manage. 

Another one is to setup a system that is hard to maintain when there is a new feature to implement or hard to restart when there is a failure that blocked the system for a while. Both things need to be considered when implementing such system.

6. ETL system

What I like about ETL! There is one thing that I really like. It is the way you have to follow a strict workflow! Each task is independent of the transaction scope and therefore if a failure occurred, you need to setup the contingency plan within your ETL workflow. In SSIS, Event handlers make sure you can set up generic tasks for many different tasks, which is good.

I usually put together SQL tasks that I consider a transaction block (within the same stored procedure call). This allows me to rollbacks several DML operations at once.

However, when changing technology (like BCP, FTP, Email, Zip, etc...), there are handled outside the same transaction scope with their independent tasks. 

My ETL flow then needed to set up contingency plans if needed for any error on any task.

Most of the type, the contingency plan is just to log the error and the system would make sure to alert the Production team and guide him to restart the system or fix the error to get the incriminated transaction victim to be back in the ETL process.

For specific tasks, you can eventually set up a rollback plan.

 

7. Development cycle

The main classic mistake that IT service do is believing that the development phase is the longest part of the project. Here are the main phases for me of a project:

1. Architectural studies: see if the project is possible, goes well within the current Information system

2. Architectural implementation: define how to setup servers and code layers to accommodate the needs

3. Infrastructure installation: install the necessary network and servers

4. Development: develop the different code layers (this task can be done during step 3) based on architectural implementation

5. Q&A phase: tests and validation of the code layers and load/stress tests

6. Production deployment: deploy on production

7. Production supervision: supervise for a period the deployment.

Based on my experience, one step that is too often ignored is the step 2, the architectural implementation. 

It reminds a project. I was in charge of the architecture and the development (oh well, it happens). Phase 1 was fairly quick as it involved current system in place. Phase 2 could have been very fast as well but it took me about 2 weeks to really grasp all the functional concepts and render them feasible technically. The project manager answered most of the functional needs but technical limitations rendered the specs largely under-documented. Technical documentation was written by me but required a lot of changes or at least a lot of assumptions from the part of the functional specs, which were not précised enough, especially when errors were occurring (classic mistake as well).

The project manager was giving me 2 months to deploy on production the specific system (it was a master data management system) and was worry about the fact that I spent 2 weeks on step 2. 

Well thanks to it, the development phase took 1 week (he thought it would take 3 weeks).  The Q&A phase took 1 week (the project manager thought it would take 2 weeks) and the production phase took 1 day to deploy and 1 week to supervise. 

Even though the project manager estimated the step 2 to last only 1 week instead of 2 weeks, he was agreeably surprised that the project overall took 5 weeks instead of the previously planned 8 weeks.

Since the deployment, there were no bugs fixing to implement. Only new features.

I wanted to spend time explaining this story because I think this is why I wanted to be an Architect. I believe that development starts way too quickly and main issues are coming from a bad architecture and it is especially true in a distributed environment with distributed transactions!

 

 

Par clementhuge.over-blog.com - Publié dans : Architecture
Write a comment - See the 0 comments
Mardi 9 octobre 2 09 /10 /Oct 09:13

On Sql Server Standard

On Sql server Standard edition, this is actually easier in a way. You do not manage partition scheme or function or at least not in the way the enterprise version does. You actually set up tables to be your partitions. If you want to access several storage units, you can then affect different filegroup to the different tables.  In our case, I did not work on it but this would be easy to set this up. Identically to the previous example, here we will keep only three months worth of log data, keeping 4 rolling monthly partition scheme.

First, we create the 4 partitions (4 tables):

USE[DataPartitions]

GO

CREATE SCHEMA Standard Authorization dbo

GO

CREATE TABLE Standard.[LogTrace_Partition1](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition1]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

 

GO

CREATE TABLE Standard.[LogTrace_Partition2](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition2]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

CREATE TABLE Standard.[LogTrace_Partition3](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition3]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

CREATE TABLE Standard.[LogTrace_Partition4](

      [id] [int] IDENTITY(1,1) NOT NULL,

      [insertDate] [datetime] NULL CONSTRAINT [DF_LogTrace_InsertDate_Partition4]  DEFAULT (getdate()),

      [sessionID] [varchar](36) NULL,

      [userAgent] [varchar](500) NULL,

      [eventID] [int] NULL,

      [requestID] [int] NULL,

      [frontEndSessionID] [varchar](36) NULL,

      [trace] [varchar](5000) NULL,

      [serverName] [varchar](50) NULL,

      [userIP] [varchar](15) NULL,

      [note] [varchar](50) NULL,

PRIMARY KEY CLUSTERED

(

      [id] ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

 

Second, we create the partition scheme (1 table):

 

CREATE TABLE Standard.LogTracePartition

(

PartitionId tinyint,

TableName varchar(256),

IsCurrentPartition Bit,

NextPartitionId tinyint,

PurgePartitionId tinyint null,

CONSTRAINTPK_LogTracePartition PRIMARY KEY CLUSTERED (PartitionId)

)

Go

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (1,'Standard.LogTrace_Partition1',1,2,2)

GO

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (2,'Standard.LogTrace_Partition2',0,3,3)

GO

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (3,'Standard.LogTrace_Partition3',0,4,4)

GO

insert into Standard.LogTracePartition

(PartitionId, TableName, IsCurrentPartition, NextPartitionId, PurgePartitionId)

Values (4,'Standard.LogTrace_Partition4',0,1,1)

GO

 

Third, we create the stored procedure to switch partition to be the current one (using Synonym and truncate the data we do not need to have on production):

 

/*

Author: Chuge

Creation Date: 2012-10-08

Reason: Partition scheme on standard edition to switch current partition

Sample: EXEC Standard.P_SwitchLogTracePartition @SynonymName = 'Standard.LogTraceS', @Debug = 1

select object_id('Standard.LogTraceS')

select * from Standard.LogTraceS

*/

CREATE PROCEDURE Standard.P_SwitchLogTracePartition

@SynonymName VARCHAR(256) = 'Standard.LogTraceS',

@Debug bit = 0

AS

 

SET NOCOUNT ON

DECLARE@TableNameNewPartition varchar(256), @TableNamePurgePartition varchar(256), @Stmt VARCHAR(1024), @NextPartitionId tinyint, @PurgePartitionId tinyint

 

BEGIN TRY

 

SELECT TOP 1 @NextPartitionId = NextPartitionId, @PurgePartitionId = PurgePartitionId

FROM Standard.LogTracePartition

WHEREIsCurrentPartition = 1

 

SELECT@TableNameNewPartition = TableName

FROM Standard.LogTracePartition

WHEREPartitionId = @NextPartitionId

 

SELECT@TableNamePurgePartition = TableName

FROM Standard.LogTracePartition

WHEREPartitionId = @PurgePartitionId

 

IF OBJECT_ID(@TableNameNewPartition) IS NULL

BEGIN

      SELECT 1

      RETURN

END

IF OBJECT_ID(@TableNamePurgePartition) IS NULL

BEGIN

      SELECT 2

      RETURN

END

 

IF OBJECT_ID(@SynonymName) IS NOT NULL

BEGIN

      SET @Stmt = 'DROP SYNONYM ' + @SynonymName

      IF @Debug = 1

            PRINT @Stmt

      ELSE

            EXEC ( @Stmt )

END

 

SET @Stmt = 'TRUNCATE TABLE ' + @TableNamePurgePartition

IF @Debug = 1

      PRINT @Stmt

ELSE

      EXEC ( @Stmt )

SET @Stmt = 'CREATE SYNONYM ' + @SynonymName + ' FOR ' + @TableNameNewPartition

IF @Debug = 1

      PRINT @Stmt

ELSE

      EXEC ( @Stmt )

 

IF @Debug = 1

      RETURN

 

UPDATE P

SETIsCurrentPartition = CASE WHEN PartitionId = @NextPartitionId THEN 1 ELSE 0 END

FROM Standard.LogTracePartition P

 

END TRY

BEGIN CATCH

      DECLARE @Error VARCHAR(512)

      SET @Error = ERROR_MESSAGE()

      RAISERROR(@Error,16,1)

      RETURN @@Error

END CATCH

SET NOCOUNT OFF

GO

 

 

Fourth, we create the scheduled job to switch partition at midnight the 1rst of each month;

 

 

USE [msdb]

GO

/****** Object:  Job [WebApper - Data partition Switch out]    Script Date: 02/01/2012 12:08:05 ******/

BEGIN TRANSACTION

DECLARE@ReturnCode INT

SELECT@ReturnCode = 0

/****** Object:  JobCategory [[Uncategorized (Local)]]]    Script Date: 02/01/2012 12:08:05 ******/

IF NOT EXISTS (SELECT name FROM msdb.Standard.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)

BEGIN

EXEC@ReturnCode = msdb.Standard.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

 

END

 

DECLARE @jobId BINARY(16)

EXEC@ReturnCode =  msdb.Standard.sp_add_job @job_name=N'WebApper - Data partition Switch out',

            @enabled=1,

            @notify_level_eventlog=0,

            @notify_level_email=0,

            @notify_level_netsend=0,

            @notify_level_page=0,

            @delete_level=0,

            @description=N'No description available.',

            @category_name=N'[Uncategorized (Local)]',

            @owner_login_name=N'sa', @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object:  Step [EXEC Standard.P_SwitchLogTracePartition @SynonymName = 'Standard.LogTrace', @Debug = 0]    Script Date: 02/01/2012 12:08:05 ******/

EXEC@ReturnCode = msdb.Standard.sp_add_jobstep @job_id=@jobId, @step_name=N'EXEC Standard.P_SwitchLogTracePartition @SynonymName = ''Standard.LogTrace'', @Debug = 0',

            @step_id=1,

            @cmdexec_success_code=0,

            @on_success_action=1,

            @on_success_step_id=0,

            @on_fail_action=2,

            @on_fail_step_id=0,

            @retry_attempts=0,

            @retry_interval=0,

            @os_run_priority=0, @subsystem=N'TSQL',

            @command=N'EXEC Standard.P_SwitchLogTracePartition @SynonymName = ''Standard.LogTrace'', @Debug = 0',

            @database_name=N'DataPartitions',

            @flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC@ReturnCode = msdb.Standard.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC@ReturnCode = msdb.Standard.sp_add_jobschedule @job_id=@jobId, @name=N'Webapper - Monthly',

            @enabled=1,

            @freq_type=16,

            @freq_interval=1,

            @freq_subday_type=1,

            @freq_subday_interval=0,

            @freq_relative_interval=0,

            @freq_recurrence_factor=1,

            @active_start_date=20110303,

            @active_end_date=99991231,

            @active_start_time=0,

            @active_end_time=235959

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC@ReturnCode = msdb.Standard.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

 

GO

 

On Sql server Express

There are even more limitations for the Express version such as no agent (need to have different scheduler) and database size is limited to 1 GB.
Your partitioning would then be managed by an application (scheduled services and web services probably).
The partitions would then be identical to the standard version one except that you would need to specify the context of the shard.
In the order example, one can decide to set up 24 databases, one for each monthly partition, assuming that a month worth of data/index would not be bigger than 1 GB. SQL server express data partitioning will work best with a service oriented architecture where all applications will need to go through intelligent web methods that will select the proper database to select or update the data.
Obviously you can use the system used in standard version, above, within the databases as well. You just need to be aware of the 1 GB limitation.

 

Par clementhuge.over-blog.com - Publié dans : Administration
Write a comment - See the 0 comments

Présentation

Créer un Blog

Recherche

Calendrier

Juillet 2014
L M M J V S D
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
<< < > >>
Créer un blog gratuit sur over-blog.com - Contact - C.G.U. - Rémunération en droits d'auteur - Signaler un abus