Complexity is often a natural condition of most successful businesses. We build databases to handle complex data, to
maintain a layer of structure for important business information.
However, when building a database, or cluster of databases, typically the needs or requirements change over time. New
divisions or projects spring up. This is generally not a bad thing for a business or organisation. In most cases growth is good. However order to do this, without incurring huge amount of expense, often you a add these modules into existing databases rather than create new ones for different purposes.
There are many advantages to this approach; it makes accessing data easier if needed. However, in some (read: many) cases, you need to create new databases to handle different functions. One part of a business, such as vendor contract information, may have literally nothing to do with another, such as customer service records. So new databases are created. Maybe even the needs in one function, such as sales records increase beyond the capacity of the original
database they receive higher workloads than others.
A problem appears on one of your instances, so you fix that. Meanwhile a different problem occurs on different ones, and you fix that separately, so instances are created. The complexity grows even more. This problem only increases with Hybrid of on premise vs Cloud systems, and each comes with different needs.
Maybe there are entirely different databases all holding versions of the same data; these are hacked together through XML or JSON feeds, and it works okay. But the more often this happens the more complex your system gets. This, of course, is not something that you can completely avoid (such as dealing with the differences between operation and analytical structures) but ideally you can minimise this.
The next thing you know you've created a monster. It's inevitable. We've all experienced it.
While it's not possible to remove all complexity, sometimes it makes sense to take a step back and take a look at your structures. Below is list (admittedly simplified) of techniques that you can use to try to reduce complexity creep.
Efficient Database Design
While it may seem obvious, design needs to be addressed before anything else. If you start off with a bad or inefficient
database structure, complexity can spread like a virus.
- Build scalability into your systems from the ground up.
As a general rule, never assume that your small little database project will always remain so small. Quick and dirty design may work great in the short-term, but you never know when you could be creating a headache for yourself down the road
- Use constraints
Don't depend on software to handle this. If you enforce foreign key relationships at the data-level, you don't need to worry about different coding styles of different developers. This will ensure the integrity of your data, and less vulnerable to bad code.
- Avoid common lookup tables
I've seen this more often than not. Just because two different types of tables have similar structure does not
mean that they should necessary belong together. While this may seem simple and cleaner at first, it can cause
problems down the road. Though they both are worn on your feet and have similar attributes, you don't put shoes and socks in the same dresser drawer. The same is true in your tables. Even if you can visualise everything going together in this way, you will run into constraint problems and a ton of confusion (not to mention that your developers may want to cause you physical harm).
- Good normalisation vs database speed
On the other hand, are you repeating data in multiple places? Have you configured for scalability? Sure
sometimes it seems crazy to create a new table for each new piece of data and each look up can slow you
down, but remember that complexity often gets created by oversimplified as circumstances change.
The more you standardise, the less clutter and confusion. A large number of different configurations, setups, and tasks
and scripts is inefficient, time-consuming, particularly for complex operations. Creating standardised configurations
makes clustering easier, which will in turn increase the uptime of your instances.
Operational vs. Analytical Configurations
While in many cases it's not possible to configure all databases in the same way, you can typically break them in to some generalised categories. The reality is that different types of databases often operate in opposition to each other; what works well for business workflow may not work well for running analytics. So maybe you spin off a view of one into a view.
However, you can place almost all instance/views in one or the other models. As a policy make sure that all databases (within a specific purpose-type) are built with the same or similar configurations, whether it is needed or not. It may seem like it is unnecessary at first but it will save you a lot of time and frustration down the line.
For operational databases, use one standardised configuration and a different for databases used for analytical purposes.
Once you have everything configured in a similar fashion, you can manage everything from a central place. However, sometimes you don't have this option, particularly if you've inherited a veritable forest of platforms and configurations.
The problem with working with multiple platforms and configurations is that it can become difficult to manage them
without jumping from one to the other. Analytical and management tools (if they exist) tend to work for one platform
or another. They may also have limitations for the number of instances they can handle within one version of the application.
However, if you can have one tool that will work with all of these platforms, you will save a considerable amount of time. Ideally this tool should do more than monitor activity and statistics but it should also be able to assist in distributing the same scripts, tasks and reports to all instances.
Facilitate Group Operations
Once you have a good management tool, for each database, create a standard set of group operations that you can handle as one batch process.
Try to keep these as close to identical as possible, at least for groups of instances that have the same or similar functions. If they are configured in the same way, you will likely need only need one script for operations such as installations, updates, and reporting.
These are only a few examples of ways to reduce complexity creep and to gain a stronger hold over your database operations, however if you use these methods, you will certainly be in a better place.