During the past ten years, websites have witnessed different stages on their needs of database tools. In the beginning, websites stored nothing at all on database because they were primarily giving simple services based on application algorithms or they were institutional websites. Then came e-Commerce and extranet were companies needed to store some information to secure and give access to specific information. Finally, internet became a tool to track every single click (like Xiti or other major advertisers). Information has never been so important. Google, based on their search engine, is able to get so many people using their software technology that it does compete directly with Microsoft. And, sure the search engine algorithm are important but they still need to crunch billions of data.
But as we elevate the starting point of sale or acquisition of revenues, we elevate the complexity of the services provided on internet. Users demand performance and sophisticated services. Therefore, the application servers needed to insert intelligence, usually driven by a humongous amount of data, in data mining technologies for example. E-commerce is becoming the first main channel of purchase on specific industries such as computer electronics. Payment transactions have to be extremely secure and performing very well. These systems have to respect strict rules, after the catastrophic Enron event that require company to log every single event that could be a point of failure within the transaction process.
All of that makes databases much more complex to manage and I noticed that, now, you have to manage not only your databases efficiently but also your data. Some might think that data management is just ensuring referential integrity but it is not. It is not at all actually. In a highly transactional environment, we might even think that referential integrity should be delegated, in profit of performance and maintenance.
And here is why my title is “…data versus database management”. A database server enables you to have the ability to store the data in an ACID transaction and the ability to help you query (select, update, delete) data in a more comprehensive way and with a server dedicated for that.
All of that is great and nice but this is theoretical. Fortunately, they are simple practices to follow. No matter how great the server engine is, the DBA is still the man! There is a shift now with what the DBA is.
Now, the DBA does not have to ensure referential integrity at all cost. He has to balance performance and integrity.
Now the DBA does not have to normalize the data at all cost. As the storage is cheaper, he has to make sure performance is ensured and sometimes accept some denormalization.
Now the DBA does not have to do a big transaction. It can separate the process into small transactions and perform the actions asynchronously. Once again, performance makes the difference. Also, in transactional environment, as auditors want to make sure that there is no breach, DBA have to do asynchronous tasks to follow specific workflow rules.
This is a challenging time and this is all good for DBAs. More and more, data is becoming the very core of a lot of businesses online. DBAs start being gurus and the persons that carry the essence of the business.
Now DBAs have to elevate their skills. Not only they have to ensure that the database servers are up and running and that they work on full speed but they have to be able to manage DATA more efficiently and give priorities to performance and not integrity.
Integrity is the enemy of performance. Although it is essential, so that you do not store garbage data, it is not anymore the essence of businesses. More and more businesses trust the application layer to take care of the integrity and in some extent, there is some truth about it.
Now I still think we should make a difference so that databases enforce rules of integrity. And this difference is now based on this question: Is the server storing more configuration data or transactional data?
To give you an example, let’s say your company is an online payment facilitator. It is very true that you cannot forget to enforce integrity when you set up a new account, a new market, a new type of payment. However, you should not be invasive when the application server inserts payment transactions. If you decide to enforce integrity for example on a billion-row table with several foreign-key, unique and conditional constraints and with applications that write onto this table every 10ms, You might get bottlenecks and heavy reads that enforce the integrity. Now it might be good to set up this integrity at first but if your online payment platform wants to get some money, avoid it afterwards. If you application layer knows that there is only 3 types of payment and insert only within those three types of payment in the database, that is sufficient integrity. At some point, as every manager in a team as to delegate, the database has to delegate integrity to the applications. In other way, database servers should TRUST application servers.
Now I am not going to have a lot of friends by saying that, but this is true. We live in a pragmatic world where a payment should not take more that 1 second to process. Beware of the performance!
And if you manage your data instead of managing your database, you will:
1/ make your life easier.
2/build a system that is not dependable on database engine that might just die on you.
Ideally, you want your database engine to be the best as possible but what you want the most is managing your data the best way. A DBA has a new challenge. This challenge is that his job has to be closer to the business.
Here are some good practices to achieve better performance in a transactional environment:
1/ Federate your transactions: having a lot of servers to write not only gives you more power and more writes and reads parallelism, it also gives a natural way of implementing high availability strategies. If one node is down, you write on the other one.
2/ Get your database transactions as small as possible and do uncommitted reads as much as possible: be pragmatic, how many times do you really need committed reads? Beware of committed reads, they can lock your server processes. The smaller your transaction is, the better it is as you free memory, connection and tables.
3/ Implement a thorough partition analyses: before starting business, try to have an idea of what transactions are needed on real time for reporting and for updates. Set your partitioning scheme adequately so that you can free “old news” transactions and do maintenance on them (extracting, purging, reindexing, defragmenting) without disturbing your current activity.
4/ Think Data instead of Database. When you think about it, the more your data is small to manage, the easiest your database will be to manage. If you have opportunity to think about a better partitioning (3), a better storage solution to keep your production data small, you will make your life easier.
5/ Identify your group of data: configuration data and transactional data are different and so should be your servers. When you just want to know client’s configuration, your data repository should emphasize reads and integrity and when you insert transactions, your data repository should emphasize writes and performance.
6/ delegate and trust the application layer: for a lot of reason, it is good to cache data within your server in certain situation. It gives the application the ability to run without the database and also the ability to have a better a performance for highly recurring request for data that are not change that often. It frees up database process CPU and Ram. Database engine can always cache execution plans but they are still going to query or at least render the result over the network. DBAs and application engineers have to find the right balance and now where to delegate the query. More importantly, when you insert or update within the database, constraints can slow down performance. The database should trust the application. One way of giving trust is the use of application web services. Those web services will enforce security by limiting Ip ranges and setting up certificates. And web services will enable to set up the trust the database need. Obviously if the database is accessed by 1000 different applications directly, then the database will have a hard time to trust all of them but if the database is accessed via a web service, then it will be more comfortable to delegate some of the integrity rules.
Now some DBAs, with the last point (6) might think that database will end up being just a storage space. Do not worry about it. Database engine will still perform better querying as it is a set-based engine. Database will still perform better storage compression. Database will still perform better recovery strategies for your data. And let’s be honest, isn’t it better for our career to think that you manage data and not database? Because, think about it, data is moving all the time, not database. So in essence, it is harder to manage volatile systems than conventional and easy-to-follow systems. So the big bucks will be for people that manage unconventional systems, not the opposite!