Mercredi 25 août 2010 3 25 /08 /Août /2010 10:30

Hello guys,

 

Today we will discuss the different high availability (HA) strategy you have on SQL server.

 

Basically, it all depends on your needs, your SLA towards your clients and the level of satisfactory you would like to have to your client.

Also, it depends on your wallet. Some HA strategies are very expensive, some are very cheap.

I am going to rate all solution from 1 to 10, 10 being the best and 1 is the worse for the criteria.

 

Here are the question you should consider:

 

1. How much money would I lose if my system would not work for 24 hours?

 

It is pretty reasonnable to think that you would need one full day to rebuild your database server, giving a robust backup regime and a good database administrator. Contrarily to a web server, database server can take more time to rebuild. If your primary server crashed and you do not have an HA strategy in place, you would have to reinstall Sql server on another machine, restore all databases, rebuild all server objects that are not within msdb or master and/or remap all your datasource or swith DNS lookup towards the new server.

If 1 day downtime does not make your company suffer, you do not need an HA strategy! It would be too expensive to maintain, dardware and software wise and it is not necessary for your company. Obviously as a DBA, it is hard for me to say it but it makes sense.

The How much money you can lose should also indicate how much budget you should dedicate on your Disaster Revovery Plan and High Availability solution, if any.

 

2. What is your SLA or satisfactory level?

 

If 1 day is too much, then it all depends on your degree of HA desired. This can be real-time and low maintenance, real-time and high maintenance, not real-time and low maintenance, etc...

For example, log shipping is less ressource intensive than mirroring but does not supply a real-time solution. For many companies this would be fine to have the risk to lose up to 15 mn. of data and recover immediately a crash. In the mean time, log shipping as well as mirroring (until the next release of Microsoft SQL server 2010, according to my internal contacts at Microsoft) are not providing HA for your server objects. This means you would need to care about SQL server jobs, DTSX packages, credentials, DBMail, Logins, Service brokers, separately. LS and Mirroring would only cover the databases.

Another solution is clustering and give you real-time coverage of your databases but also your server objects. As your storeage is shared on this solution, you need to make sure your storeage area does not become a single point of failure. For performance, you would also need to make all possible for your SAN to be setup for optimized database files' storeage.

 

Here is the rate for all the solution for this particular criteria: minimizing downtime and data loss

Sql server cluster: 9

(not much chance to lose significant downtime and data)

Mirroring: 8

(not much chance to lose significant downtime (maybe logins not deployed correctly) and data but maybe some server objects)

Log Shipping: 6

(chance to lose some downtime as this is a manual failover, also logins not deployed correctly on secondary server and some other server objects)

Replication: 6

(chance to lose some downtime as this is a manual failover, also the publication need to be broken to make your data available and proper installation needs to be done (to avoid identities reseed, not for replication foreign key and triggers activation, etc...)

Sharding: 5

(with proper architecture, sharding can also be good and not losing significant downtime but it is hard to find the right architecture. Sharding is almost th eopposite of one centralized database)

 

3. How much maintenance is required?

 

Some solution require very reccurent care. Some others not. Upgrading/patching can also be far more difficult for one solution compare to another one.

 

Here is the rate for all the solution for this particular criteria: high maintenance

Sql server cluster: 7

(once installed, maintenance is minimal but still require senior level management)

Mirroring: 7

(once installed, maintenance is low but still require specific maintenance)

Log Shipping: 8

(once installed, maintenance is low but still require proper supervision)

Replication: 7

(once installed, maintenance is low, especially if properly installed (separate distributor from publisher) but still require proper supervision)

Sharding: 5

(versioning and deployment is much more challenging, also merging shards or spliting shards while activity is on, more backups to do, more work on datawarehousing...: this is probably the most important drawback)

 

4. How much human ressources (DBA, system engineers) you like to dedicate

 

At the install, SQL server cluster is intensive in human ressource, it requires network engineer to set up all the proper ips, private Vlan. It requires system engineers to set up the shared disk ressources and set up the cluster groups. It requires DBA to install SQL server services in cluster (but SSIS, installed as active/active). It is much easier though to manage SQL server cluster on a daily maintenance as all server objects are maintained in cluster, even the server ressource allocation such as memory, CPU allocation or degree of parallelism. With a virtual IP, application just connect onto one virtual IP or Server Name. When in need of failover, the management is fairly easy as you just move the cluster group that brings in less than 30 seconds all the ressources onto the secondary server, becoming the active one. Your SPOF at the storeage level results in controller cards that fail or disk that fail. Usually this is where your cost will go onto: securing your storeage tier.

All of this require senior level engineers even though on daily basis, there is nothing to do.

 

At the install, mirroring is not too difficult to install. Network engineer would ensure DNS suffix and proper private network and firewall settings and DBA would set up all the security correctly as well as synchronising the databases. Once all done, daily maintenance is not too difficult but activity needs to be supervize as mirroring can suffer in performance during large operations such as massive reindexing for example (suspend mirroring during reindexing). Also server objects have to be created on both side with server specificitt=y (keep login SID, make sure to use localhost as server name for the server on DTSX, etc...). Automatic failover is ensure with a witness server. Manual failover is also fairly easy to deal with. Application datasource needs to handle a failover parner to switch to the partner in cas of a failover.

Log shipping might be the easiest solution and might be the best to implement for most of the companies. It just requires proper networking between the two servers. DBA will set up very easily a log shippment between the two servers. Proper monitoring.

 

Here is the rate for all the solution for this particular criteria: human ressources

Sql server cluster: 9

(once installed, DBA only, mid-level)

Mirroring: 9

(once installed, DBA only, mid-level)

Log Shipping: 9

(once installed, DBA only, mid-level)

Replication: 6

(once installed, DBA only, senior level)

Sharding: 6

(once installed, DBA only, senior level)

 

5. How much performance can you accept to lose?

 

This is an obvious equation. Adding redundancy usually adds up in ressource usage. For example, high safety mirroring does not give as much performance as asynchroneous mirroring. The first one gives real-time mirroring of all your database-specific operations but require a two-commit transaction, whereas asynchroneous mirroring does not provide real-time but does not influence much performance. Asynchroneous mirroring is still a good deal because it is quasi-real-time.

SQL server clustering does provide high performance as it does not require your sql code to be played at two database servers. Redundancy is offered at the storeage level. A software, cluster administration, enables you to automatically failover.

That is probably the best advantage of sharding and grid solution. The ability to lose performance is almost none. You will probably gain performance by sharding or setting up a grid. Writing on grid might be a bit impacted but with asynchroneous transaction on the other grid node, you will not have too much impact. Concerning the sharding, the write should be done only on one shard and limit in terms of ressources the load to the dedicated shard. Scalability is your key in these two solutions.

Log shipping is also a good solution as backup and transfer should not hinder much your performance on the principal server. You can also sue your secondary servers as read-only databases for your near-real-time  datasource revenues. Mirroring can give a small advantage on entreprise version with the snapshot of your mirrored databases but is limited to one passive node exposure.

SQL server clustering does not give you more performance but is not impacted either. So this is a pretty good solution overall.

Replication can also help exposing data onto a different server to avoid ressource usage of your reporting tools and cubes onto the primary sources. Replication can severely hinder, as well as mirroring performance when logs are not getting through due to a network failure, security disruption (replication and mirroring), or human errors (especially on replication, it can quickly happen). Replication can for example make your transaction log grow as the transactions are not marked bein processed by the distributor. Mirroring can also be abottleneck if your log does not get through, your mirroring is suspended for maintenance (reindexing for example).

 

Here is the rate for all the solution for this particular criteria: High performance

Sql server cluster: 7

(once installed, your ressources are not very impacted by the cluster. However it does not help.)

Mirroring: 5

(once installed, your ressources are somewhat impacted but you might use your mirrored databases with snapshots on entreprise to expose your data to reporting)

Log Shipping: 8

(once installed, DBA only, your ressources are not impacted by much and is often using ressources your disaster recovery plan would use anyway (I am thinking of your log backups and transfer). Your secondary server can expose your database with a standby version)

Replication: 6

(once installed, replication is ressource intensive unless you dedicate a distribution server, your data can be exposed for reporting)

Sharding: 9

(once installed, the solution can load-balance your reads and also your writes as you can access several shards)

 

Conclusion:

Depending on where your company is located on the criteria above, the best solution for your company can be one of the solution above or even another (like third party replication even at the subsystem level. I have seen success in all high availability strategy.

I think a lot of time log shipping is not considered quickly but one thing I like a lot with log shipping is the fact that your disastry recovery plan usually handles as well log backups. Log shipping does not involve a lot more work and is very inexpensive.

Also, HA can also be done geographically. Based on network speed between the two datacenters, all the solution can be done but that comes with a definite price. Log shipping stays the cheapest solution. Mirroring is not far behind. Sharding is not that expensive either. Geoclustering is expensive definitely as you need to implement serious sussystem redundancy (like recoverypoint from EMC).

 

I hope this article can help. Do not hesite to comment.

 

 

Par clementhuge.over-blog.com - Publié dans : Architecture
Write a comment - See the 0 comments
Mercredi 14 avril 2010 3 14 /04 /Avr /2010 12:19

Hello Guys,

How are you doing? 

The topic of today is how to setup mirroring by minimizing downtime. Well it is not that difficult but it requires some tips to achieve this.

First, we need to make sure we have all the prerequisites set up.

0. secondary server identical if possible

Set the secondary server identical (same edition, same version, same patches, same drive letters, etc... This is the true high availability you need. Otherwise the high availability setup will eventually fail due to the lack of ressources on the secondary server. But this is the client's call! and the client's wallet.

1. Get all the logins (with their SID) copied on the secondary server. This is something that is usually forgot by clients who want to do the thing alone. Once they failover, they realize that their logins do not connect. Ouch!

This link will help you transfer the logins easily: http://support.microsoft.com/kb/246133/

2. Get all the other server objects you feel usefull for the immediate failover:

like setting SQL server jobs and some DTSX packages that need to be turned on quickly after the failover either by an automatic job that would check whether the database is online or by activating by hand after the failover.

3. The database you wish to mirror needs to be in full recovery mode.

4. Set up a test database and do the mirroring for it, whether you will use active directory or certificates/master key.

This is obviously an important step. You need to make sure that the network setup is working and you need to perform manual failover on this test database to make sure all is in good shape. Ideally, try to set up the witness on it and let it go for a day or so to check networkk disruption and tune the timeout accordingly. Please comment if you need more info. If you have an application on .Net, Coldfusion, java etc... set up the connection string with a failoverpartner, do a stupid query on it for a page and look at the page while performing manual failover to see if the failover partner is working at the application tier.

Once all the prerequisites are set up and basic mirroring tests are done. 

 

Second,

perform a full backup of your primary database (no need of course to stop services) and restore it on the secondary server. Obviously a shared path need to be available.

Then perform a log backup by scripting it. Perform a copy by scripting it. Restore the log backup. Ideally do it on the same batch.

To do so there are many ways:

1. use Sqlcmd to access remotely the server.

2. use xp_cmdshell (need to activate it with sp_configure) to be able to transfer file

If you do not succeed with SqlCmd, you can always set up a linked server with the credentials you like to perform the log restore.

Concerning the copy, if you have troubling getting the copy working because of security, you can set up a proxy account and run the cmd via a sql server job in your script (sp_startjob). If you need more details, let me know.

Once you have perform the log backup, the copy and the restore once via the same batch, you actually minimize the time you will need to stop the activity on the principal server. Yes, indeed, you need absolute synchronization.

Within your batch, add your script concerning the mirroring set up.

prepare script: the "alter database set partner = 'TCP://principal.domain.com:Port' " on the secondary server via linked server or sqlcmd. You can test it on your test database.

Then script the same for the principal server:

"alter database set partner = 'TCP://secondary.domain.com:Port' "

"alter database set witness= 'TCP://witness.domain.com:Port' "

 

Third, the time to synchronize

Choose the right time, like not much activity to do your mirroring.

You can actually try to run it directly without even stopping the activity. Technically you can still read. There will be no effect in reading the principal server by your application logins. It can works. I had databases for client that I mirroried without absolutely any downtime.

If activity still persists, you will need to kill them! and apply the "Alter database set single_user with rollback immediate" script. Obviously, you cannot forget to activate multi_user back after your mirroring is set up.

 

If you follow specific testing protocols and run thorough testings. There is absolutely no way you can miss or fail your mirroring.

I would be happy to assist you on your mirroring needs if needed. I have performed hundreds of them already on Sql server 2005 and 2008.

Any comments are welcome!

Par clementhuge.over-blog.com - Publié dans : Architecture
Write a comment - See the 0 comments
Mercredi 14 avril 2010 3 14 /04 /Avr /2010 10:15

Hello Guys,

 

I cannot tell you how many clients have extremely bad security layer at the data-tier, thinking they will be off the hook because their security at the application tier is well defined.

This is nice to have ssl/https protocol layers and some firewall layers and all that jazz but if you let your logins do whatever they feel like once they are on the database server, you are in risk of losing your data, being stolen your data or even worse compromise the business overall.

How many clients develop their website with setting their application login as db_owner of their most trusted database and keep it this way once they go live on production due to the lack of expertise of this topic.

suddenly a few months later, they realize that sneaky injection attacks or hungry employees make their way onto the data tier and set up mayhem in the data. as soon as you are aware of it, you prevent the attacks making your application tier more secure and you fire the employees that provoked the horror. But unfortunately you realize it too late and your entire website is compromise. You then put your website down due to maintenance to fix all the mess as unfortunately you either do not have secured backups or if you have, you have no idea how to deal with point-in-time restore or you realize that your backups were not actually set up correctly.

Worse! you have to reinstall the entire database server because the attacks were very damaging even at the server layer.

This nightmare can happen and a good 10% to 20% of my clients call me to fix this kind of mess. One was because they just fired a developer that had all-in access to the data layer (with credit card information !) and another one due to sql injection. during a previous audit of their systems, I warned them a big deal about security but it was always at the bottom of their list, advocating they had to develop and deploy new functionalities first and if security was too much of a pain, they will look at that afterwards!

This is frustrating :-( but this is the life of a SQL server consultant. Warn the client, then secure what can be secure without dealing with the client application code (backup regime, having a development lifecycle with DBA validation).

What even worse is when the client needs to be PCI-DSS compliant. Then there are a huge task to do.

No matter what the goal of securing the data, here are the things to make sure to do:

1. reduce the amount of sysadmin people to users and sql service accounts: make sure the users are RESPONSIBLE for the sake of securing the production environment

2. Secure the backup folder and where the backups are robocopied, taped, etc.. Securization can be done at the file server layer and at the backup layer itself by setting up transparent database encryption (if this is something that you like me to develop I can)

3. Secure your sensible data: some data have to be secured like credit card data, password, etc... You can either use an application layer encryption, a dedicated box for encrypting the data or data encryption at the data layer with build-in encryption and symetric/asymetric keys. This is something I can also talk about if needed.

4. Your application logins need to have limited permissions to the databases it connects. There are many ways to minimize the application logins accesses. Here are the main ways:

4a. Minimize the access to stored procedure execution

4b. avoid cross-database query and avoid cross-database chaining: if not possible, keep the stored procedure execution level, add the execution by owner, grant the owner authentication to server, make the database trustworthy and set up synonyms to the other database tables. Make sure the owner of the source database does not have sysadmin permission and add the database owner to the databaase role dedicated to this security on the other database. 

4c. Forbid dynamic sql as much as you can especially if your stored procedure require execution as owner to access other database tables

4d. Avoid linked servers, not only for security, but for having a complete independant architecture and avoid distributed transactions.

4e. Use Active Directory / LDAP security if possible to set up common encryption, that is compliant to Microsoft.

4f. Make sure a DBA will audit regularly all security policy. Best would be that your DBA is the only person allowed to deploy anything on the server and on the production database.

4g. Use of schema and database role: to set up "dynamically" the permissions onto a database. Granting execute on the same schema stored procedures will allow your developers to have secured permissions to any stored procedures within the same schema and therefore permissions will be already set and maintainibility will be easy.

4h. Secure data and log file size so that any attacks will not make the other databases that share the ressources unavailable if something goes wrong with this databases. Obviously this require supervision setup and a DBA that can adjust sizes of database files if necessary.

4i. Separate ressources: usually the back office software development is not as tight as front office development for several reasons like the implementation of a third-party software or different developpers running a lot of weird services, ETL "injections", so on and so forth. Your front office databases can be at risk. It is better to separate instances and at best having different server even. Publication of back office data and extraction of front office data can be perform with thourough atchitecture setup using replication, ETL tools and custo;ized publication SQL development (Along my experience, publication of data is extremely frequent!). This allows to secure and avoid the heterogenenous sources on the front office and then security os easier to setup.

 

 

 

 

 

 

Par clementhuge.over-blog.com - Publié dans : Administration
Write a comment - See the 0 comments
Samedi 10 avril 2010 6 10 /04 /Avr /2010 15:50

Hello Guys,

 

You might wonder what is new and whether this is that fantastic?

Well as usual, I cannot say that Microsoft changed the world on this one either but as good followers, they release some of the new good obvious features they should address to avoid being too late.

 

1. In-memory OLAP

One of the main features SQL server 2008 R2 address, with Gemini / Excel 2010, is letting the user take relational sources on excel and develop a real OLAP cube with PowerPivot. It accelerates the prototyping of new cubes and enable users like marketers, managers and analysts to go faster and not wait for an entire life cycle to get the most of the data in short period.

Now, this is not fantastic either: you still need the ressources to be able to get the cube going like RAM and CPU to process. Excel can hold about 100 millions rows which is nice.

 

2. Optimistic unicode storage management

That might be the most underestimate feature of SQL server 2008 R2. This allows you to accelerate everything concerning reading and writing strings into columns where you need to enter your strings into unicode storage because of a few lines. Let's say your company primarily deals with non-unicode countries such as France for example and then they have a small amount like 1% of their business in Greece or Hungary and they need to use the data model everywhere, therefore need to change all their varchar storage to nvarchar storage.... great!

This is annoying... very annoying. First you will have to migrate your data from varchar to nvarchar, also the store procedures parameters, all variables, all table-valued parameters, so on and so forth.

Also, before Sql server 2008 R2, your storage required twoce the size as before... bummer. What if your storage was limited? what if your indexes suddenly jump in space  and reindexing takes twice as long.... This is very annoying... and all of that for just 1% of the business...

He he! Microsoft with this new release did improve that. Even though R2 takes a tiny bit more space for the non-unicode data in a unicode column, it reduces probably by 40% the size needed for the unicde datatype.

In other words, think about having a database, with a major table with a very large varchar(512) column taking like 5GB of storage. Ouch. Suddenly you learn you need to migrate it to nvarchar(512) as your company is opening business to Russia. You need to find 5 more GB of space! Well with Sql Server 2008 R2, the non unicode data will take probably about 100 MB more only. So the migration would only be needed 100 MB only :-), these 100 MB is to store the fact that the data are non-unicode, not counting the potential index on the column or full text search catalog too...

Now you get it! finding data will also be much faster as you read way less i/o than before.

 

3. Master data management

It is interesting and Microsoft tries its best but this feature is not perfect! It is very not user-friendly and does not replace a good customized master data management system.

 

4. Logical CPU core

R2 now allow 256 cores. Before it was 64 cores. But to be honnest, unless you really need that many processors, your reaching this need means you either have not a scalable system and your data-tier is not optimized, or you have so many concurrent connections that you need to think about getting some kind of data-tier farm.

I guess 256 cores is nice when you think about datamining processing. It might be good for BI then.

 

5. Reporting services geospatial features

Reporting services adds new geospatial features so that you can litterally had a map of the United States and then you can click on each state to get the information, sales, costs, etc... However, this is pretty long to set up and not very user-friendly. It is worth looking at though.

 

There are other small features like multi-server tools.

 

Overall SQL server 2008 R2 is very interesting especially for BI new features but definitely not worth migrating from SQL server 2008 to 2008 R2, especially because everybody should wait for the first service pack ;-)

 

 

 

 

Par clementhuge.over-blog.com - Publié dans : Architecture
Write a comment - See the 0 comments
Samedi 10 avril 2010 6 10 /04 /Avr /2010 09:33

Hello Guys,

 

10 years ago, people did not care too much about getting the database fast or available. Databases were small and ECommerce or online payment or any kind of social community did not exists. Performance and availability were somewhat important for several companies but a large portion of them did not care much. They could either tell their employees that the database will be back up and running in a few hours. Performance_wise, obviously we could not wait forever but once again large databases were not that common.

I do remember working for a company that waited every single night an entire night batch to get their accounting consolidated. Not that this was that major of a work. I believe today it would take about 10-20 minutes to do the same work.

 

Now, we can say that machine 2 machine (telecommunication for example) and people 2 machine (Ecommerce for example) require extreme availability and performance. Additional technology enriches the panel of connection types such as widgets, services farms. Heterogeneous behaviour makes database availibility and performance a science.

 

Historically the data-tier has alwways been the centralized storage area where all data needed to be persisted, backed up, secured resides. There is therefore a direct connection with reading and writing on disk. For DBAs, that is the main bottleneck for performance and availability: the storage area.

 

Other problems might happen but there are all related to typical application single point of failures like the operating system failing, the memory failing, the network failing, etc...

You can have serious application requiring a very tiny portion of disk to run. Lately enriched website would need serious storage for images, large-code files, multimedia files but technically, these are all related to storage.

 

The data-tier could work the same way: bringing all data onto memory and therefore removing the impact of disk and all database editors work on that whether or not they wanted too.

NoSql and grid technology like xkoto or Oracle RAC try to make the persistent data brought to memory and make it like a buffer so that the "operational" database is on the cache layer.

Sql Server cachestore and SAN cache do that as well making reads much faster and ensurring somewhat of better performance.

 

So there are two areas to take care off:

- the getting the data or writing the data part: either on the cache of disk

- the transactional part to keep consistency of the data

 

getting data and writing data can be done fairly easily: you have a standalone server and you are done. You need to keep the data available: you get a replication, a mirroring on a different server that would be your partner. You can silmplify all by clustering the all thing although even this solution will give you a spof on the storage layer.

Writing the data, same.

 

Now that you cover your availability, you need to ensure that your data is secured. Your availability does not secure your data: it just makes sure your data will be available but if this is crap data, this will be still crap data. Recently a client of mine was subject to sql injection. Yeah, you got it: no form validation on the client side; no validation on the server side and a bunch of dynamic ad-hoc queries. Classic! Another client of mine recently did want to update one line out of millions of row but actually committed an update on all rows. Oops!

Well in this case, your availability does not help you. Great! you can still connect your data but it is all crap! You need to make sure you know your tolerance when you have a disastry like that and your backup regime will become handy. For both clients of mine we were able to recover all data correctly thanks to a good supervised backup regime.

 

Ok, let's way we cover all basis on that: good availability, good disastry recovery strategy, what is next? Well you get your performance to take care off and this is where it gets complicated.

Why?

 

Because sometimes performance does not help availability or disastry recovery plan and vice-versa.

 

Here are some clear example:*

1. you can denormalize data to make them eithier to query: more storage, more need of disk, more backup place, more network for the robocopy, more transaction log, etc...

2. you increse the need of indexing: same results

3. you increase the need of reindexing: same result

 

So how in the world will you be able to improve performance without getting your database on their knees.

Solutions 1 and 2 work no problem if you accept the consequences. If you add a couple of ressources in the momery, cpu, SAN area, you are probably fine and this might be some of the components to deal with.

 

But there are other areas

 

1. Federation of servers: get availability automatically be spreading the load onto different servers. Something I implemented very easily with telecommunication companies. We talk more about gateways in those companies. Fir high volumetric data, we just get several servers to handle the data onto different servers. This gives the availability because if one gateway does not work, then we go on another one. We secure the data by implementing serious backup regime and a datawarehouse that lets the database tier upfront very tiny: improving availability because less ressources needed and performance because less data to query no matter how your execution plan is.

 

2. Keep only operational data on your operational server; separate front and back office data: some clients unfortunately do not separate the data that resides at the back office and the front office. You should separate it not only at the database level but ideally at the server level. It is like the best practice for database architecture. You do not want to have CLR or application stuff on your server because your server is not made to handle different type of technologies that eat eachother on ressources like CPU or memory. generally your back office data will be rich and focus on the business to be run (datamining, billing, client information b2b, etc...). Front office will try to be light if possible.

 

3. Application tier: you can have as many layers as you want. In some way, it is good architecture to have multi-tier environnement to make it scalable. This will enable your to switch servers easily and maintain your layers independantly. However, it does not change the fact that, in an ECommerce, for example, your web navigation is critical. If you manage to make your client uses less dependancy for the most viewed pages you will gain in performance. Let's say there is one data you need to show to the client but you know it is changing all the time and it will consume ressources, you will try to make it available on a less-viewed page like you will make your client click on it, not show it automatically. I am not saying it is what you have to do. Functional aspect of your application might require the availability of the data very quickly. I said, you need to consider seriously the performance aspect of your application navigation from the web-tier to the application-tier to the data-tier.

 

4. Virtualising the data-tier: as virtualisation becomes very popular and cloud computing, this is definitely another performance and availability improvement. NoSql projects are looking on this sense by trying to get into the data-tier only if needed pretty much. Grids goes to the database layer but try to spread the load on identical data servers. In other words, they read data from the most ressourcefull server and write the data on the most available server and then peer-to-peer replications (mySql for example) or command-type replication (xkoto for example) can take care of it. Virtualisation is a way to do federation of servers in some ways, so I can dig that. However, it is to count on an additional layer of abstraction that might be a black box for you.

 

5. Data partitioning: you can obviously store data even within the same table in different raid array drives getting the loads more evenly whether your write or you read.

 

6. Make your database model towards performance. Debate around row-store vs. column-store / relational vs. non-relational database is important especially for performance. Lately referential integrity is less important than performance. Think about it, you delete a member of your website but because you want it to be fast you only delete the core information (the row on the primary table). Well tehcnically your member is gone whether you have parasite data. Then you can have a cleanup process on the background that would clean those useless data anymore. For a non-experienced DBA it is difficult to say that as this is one of the important point: referential integrity. But believe me, you have to adapt your database to the situation! After all, some people would say that database is just a way to store the data in a usefull order.

 

On all the solution, I prefer the federation of servers and I think you probably can apply to most of the industries. If ECommerce might require getting very large "lookup" data like looking at the memeber data, you can still federate data like orders or payments or emails sent, etc... I usually advise client to segment their databases/servers based on their websites, on the countries for international websites, on the alphabetical order of the logins, etc...

It is always possible to segment your data this way.

 

Any comments would be apreciated :-)

 

Clement

 

 

 

 

 

 

Par clementhuge.over-blog.com - Publié dans : Architecture
Write a comment - See the 0 comments

Présentation

Créer un Blog

Recherche

Calendrier

Mai 2012
L M M J V S D
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
<< < > >>
Créer un blog gratuit sur over-blog.com - Contact - C.G.U. - Rémunération en droits d'auteur - Signaler un abus