The best database is one you don’t have to think about.
Wait, that can’t be right. The best database is one kept running by an entire team of brilliant specialists who have trained for years with the database vendor. The absolute best database is the one that challenges this team to use all their dark magic to keep everything running as the data grows. Also, ideally, the data would have something to do with money: more data should be a good thing, or the database being accessible should mean more money. Maybe don’t spend time planning for the scale of the database of the security exploits in your system, for example.
But this post isn’t about managing data; it is about creating a database so bad people will remember it for the rest of their careers. Like a literary classic, it will exist long after its typical lifetime, staying alive in discussions in bars and meetups. If done right, you can be like a Chekov short story: timeless, to do with strange peasants, written in another language, needing complex historical and cultural knowledge to understand, and easy to read but having a dozen possible interpretations.
First, your database should be treated as a database accidentally. It should start as a small file you write data to while building a prototype. Later, this file will drive the core of your entire business. Corruption of this database will be as catastrophic as the Mississippi River changing course, within the context of your business. Over time, you face many challenges to it growing into a monster, and each time, you decide to add onto the existing custom binary format that you came up with one Tuesday afternoon and start to believe there was an RFC process.
In this way, your database should not be from a database management vendor but something you created. It should be inspired by a true database. If you are forced to move it to a DBMS, do so in name only and keep your custom flavors within the database structure.
Second, your database should be written to by so many software components it is hard to inventory them all. Some should only be compiled binaries, the source code lost to time. If you are trying to be funny, keep the source code for some of the client systems within the database, and then have corruption take them from this world.
Third, it should be highly opinionated and tailored to your specific needs, or cognitive dissonance should make it seem like you need the database to work just like this because of your Complex and Unique Business Constraints. Ignore that your database is probably for reading and writing data, and sometimes quickly.
What do I mean by highly opinionated? Here are some opinions to put into your database and some great choices if you want programmers to whisper about your database in later job interviews. Combine them as you see fit:
The more stored procedures you have, the better you are as a person
The fewer stored procedures you use, the better you are as a person
One way to protect your data is to make it hard to understand even to the people who are paid to do so: Security through Exterme Obscurity and Partial Memory Loss
Foreign keys are for the weak, and they slow everything down
The more logical versions of the database schema you have in play, the better - if clients can still connect to your database from a version one human generation ago, then you have won some invisible battle with science
The name of everything should be a sort of gift-wrapped puzzle that you get to figure out later: a personal Secret Santa
Whatever level of normalization is widely practiced, go way under or way over this value. One Table to Rule Them All, or Everybody Has A Copy and Updates Hopefully Cascade are equally valid philosophies
Triggers should be used sparingly, i.e. 1 per table with a bunch of
IF
statements that control different scenarios and do ungodly things like send emails or create queue records for another trigger to pick upTable and column names should look like randomly generated text, but not be randomly generated text. When asked why the tables are named that way, mutter something obscene about SQL Injection
Documentation of the database schema is not practical, because you are Rapidly Adjusting to The Changing Business Landscape (or you just don’t want to do it)
Ideally, over the course of its life, the database has two primary paths of ownership to becoming legendary:
One person maintains it, and the database is a slow reflection of this person’s descent into madness
Multiple teams partially own it over a series of decades. Ideally, the teams speak different languages and are in different hemispheres or work on it in different decades
Sensitive data should be hidden in plain sight
Use one field for multiple purposes, occasionally, as a treat
Text handling of the data should go through all the classic stages of grief:
Denial: use
varchar
for everything with a fixed length and truncationAnger: find out that there are people who speak other languages and have longer names than you and your brother and that this needs to be reflected in the database.
Bargaining: create a series of stored procedures that try to convert things from one format into another and keep more complex scenarios in a different location, with a boolean flag to indicate if this is the case.
Depression:
nvarchar(max)
all fields for an entire dark winter.Acceptance: look up the right way to do it, and do this in a few new tables. Leave the older ones as is.
All
datetime
fields should be calleddate
ortransaction_date
ortime_it_happened
orthe_when,
field names that do not indicate if they are local/UTC/have a time component.
There should be at least one table in your database that only has 1 row. If this 1 row of data is removed, the entire system stops working, and the fire alarm in the building next door goes off (somehow).
Here is one example of a stored procedure in a legendary database:
SELECT
trx + ' ' + id as "ticket id",
case DateWhenItWentDown
when now() then "just now"
when 1 then "unknown"
when NULL then "bthom" -- beats the hell outta me
else "-"
end as "User created when?",
case SignedIn
when 0 then "we got in"
else null end as [token valid],
Convert(varchar(100), LastSignOn, 101) as "LastTokenTiming",
DateDiff('d', LastSignOn, getDate()) as [shouldRefreshBool],
AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' +
City + ', ' + State + ' ' + Zip as "Address",
'XXX-XX-' + Substring(CONvert(varchar(9), SSN_PLAIN_TEXT_POSSIBLE_PIE), 6, 4) as "not a s s n"
FROM tblSTR_GLOBAL_NORTHEAST_usr_grp_utf_2___final_view t
join (
select top 1 from client_token_map__v2_replicated_dave__ order by 1 desc
) xyz on xyz.123 = t.246
where trx like '%\_ %'
and (processed = 'not really%' or processed = NULL) -- handle retries
order by cast(trx as int(11)) desc
This stored procedure’s name is:
sproc_getUserTokenDetail_demo02_14_1995_DO_NOT_TOUCH_
best one i've even seen:
- created and maintained by 2 ex-soviet scientists who were expat in canada.
- all table names an fields shortened to 2 letters
- which were abbreviated in the russian language, but helpfully transliterated into latin letters.
(documention? yeah right)