Database and Domain Design
In which I don my gear and go exploring in the barren wastes of confusion and spite with only a one-page wiki spec as a map, seeking the one true design for my domain that will allow me to build an app atop it that meets the requirements in full.
Comments on this article can be found here at use.perl.org
(for the rest of the articles in this series, please see the table of contents)
Initial analysis
Before you start, have a read of the specifications page, since I'll be referring to it and I have no intention of duplicating its content.
Read it? Really? You, the guy on Bell South DSL, the access logs don't lie, go to the back of the class and you'll be writing vt100 mode lines on the blackboard after school.
Everybody read it now? Good. Are we sitting comfortably? Then I'll begin.
First thing, obvious enough[0]: Our two main "types of things" are going to be Todos and People. But that leaves plenty of possibilities for decisions within that. My first thought about Todos is to have a simple structure - pseudocode follows:
class Todo {
auto Int id;
rw Str project;
rw Str name;
rw ProjectManager submitter;
optional rw Str description;
optional rw URI test;
rw Str value;
rw Str complexity;
rw TodoState state default 'submitted';
};
Of course this begs the question of what a ProjectManager is[1] and how does it relate to a People, or indeed a PurplePeopleEater. Or perhaps even a Person? Donning my Object Orienteated Developer hat and taking a deep slug of the koolaid, clearly it was a subclass of person. i.e. ProjectManager->isa(Person)[2]. So then maybe actually we should have ProjectMember and a Project class and then a ProjectMembership relation ... so we don't actually need ProjectMember but maybe we need different types of membership and then a way for somebody to manage them. And then state transitions are only allowed to be made by specific types of person so perhaps instead of a state we should have a collection of transitions and then they can be typed so a TodoPaidTransition has a ViennaCommitteeMember creator attribute and we figure out the state by looking at all the transitions and seeing what's the latest, or which one moved us into the state that's latest on in the workflow graph, and ...
And suddenly we're an architecture astronaut, except launched by a bottle rocket and with no spacesuit.[3]
No.
No no no no no.
Looking at the design again, the only role that's actually special is vienna committee member. So let's call that Admin and everybody else User for the sake of argument[4]. The reason we don't need to distinguish ProjectManager is that "the person who submitted the todo" is quite sufficient, at least for the moment - and if requirements change we can change that later. Plus suddenly we don't have a problem with somebody being involved in the admin for some todos but the submitter of others as well, which may or may not be required but is nice to leave possible. So now we have:
class Todo {
auto Int id;
rw Str project;
rw Str name;
rw Person submitter;
optional rw Str description;
optional rw URI test;
rw Str value;
rw Str complexity;
rw TodoState state default 'submitted';
};
Now, possibly we should make the project part its own class but that'll be easy to do later so I'm punting the decision for now[5]. I've marked both the test URI and description as optional because while the primary purpose here is to handle TODO tests, I can't really see how you can write a test for documentation work, there's no strict need for a description if you do have a test and somebody might want to submit without either to get the todo out there and then add the rest later.
I've marked value and complexity as strings rather than anything else because I think people will really like being able to use them freeform - sure, value could be a Currency object or a CurrencyRange but how do you say "200, 300 if they also do X" that way? Complexity seems like the same principle but more so. TodoState is, for the moment, basically going to be enum qw(submitted open deleted taken being_implemented in_review completed paid) and I suspect we can use Class::Workflow to handle the transitions and their requirements.
Which brings us back to the Person object. Minimal requirements seem to be:
class Person {
rw Email email; # primary identifier
rw Str name;
rw Password password;
rw enum(qw(User Admin)) role;
};
There's probably lots more things we could add to offer a richer profile here, but other than display and edit of the extra attributes they don't really make any difference I can see to the app, so the question of their existence is punted and we'll go with this for now.
Making History
The trouble is, now we've (hopefully sensibly) chosen a simple design for Todos, there's a bunch of information we don't have like who published a todo, or when a proposal began implementation - and the latter is important since we're only allowing a limited amount of time before a todo returns back to being a free for all. Admittedly I'm probably going to store a deadline for this rather than the number of days permitted, but we'll still likely want to display how long the implementor has taken compared to how long they have left. Now, we could easily add all sorts of special purpose knobs and twiddleable things[6] to store such information, but we'd rapidly find ourselves with a profusion of tables - or worse still, a suffusion of yellow. Instead, we can easily capture all this information by logging changes to Todos, and be able to present a history as an added bonus.
There are a number of ways to model an audit log; a common one is to make all data fields nullable and to say "NOT NULL means the value was changed, NULL means it didn't"[7]. We can't use this one though since two of our fields (description and URI) can be unknown and hence NULL[8], so we'll need to do this another way. I'd been thinking for a couple of days that the trick would be to add a set of boolean columns named is_${field}_changed to make this easy to query, and then realised I was an idiot and there's a much cleaner approach.
The design for our todo_version table[9] is going to look like this (I'm pseucoding SQL this time):
table todo_version {
primary_key {
int todo_id !null;
int revision_id !null;
};
int changed_by_id !null, references(person->id);
datetime changed_at !null;
text name !null;
int submitter_id !null, references(person->id);
text description;
text test;
text value;
text complexity;
todo_state state;
};
The revision IDs here must be monotonically incrementing; while this is a little bit trickier to implement than using a global sequence for it, the display resulting from this will be much more user-friendly and it gives us a chance for a rather neat trick. When the todo is initially inserted we create revision 0 here, copying the full data, then on each update we create the next revision entry with full data. Now, given two values a and b we can determine whether a change has happened between them by the following SQL (translated fairly literally from the way DBIx::Class determines if it needs to mark a column as dirty for later UPDATE statements)[10]:
(a IS NULL != b IS NULL) OR COALESCE(a != b, false)
And a quick check shows the results are what we expect:[11]
SELECT ((a IS NULL != b IS NULL) OR COALESCE(a != b, false)) FROM ( SELECT NULL, NULL UNION ALL SELECT NULL, '' UNION ALL SELECT '', NULL UNION ALL SELECT '', 'foo' UNION ALL SELECT 'foo', 'foo' ) AS x (a, b); false true true true false
So then we should be able to make a reflexive join to the immediately previous revision, so to find out if for example the name of a todo has changed (note that since name isn't nullable the condition can simply be inequality) we can do:
SELECT this.revision_id, this.changed_by_id, this.changed_at
FROM
todo_version this
JOIN todo_version previous
ON (
this.todo_id = previous.todo_id
AND this.revision_id = previous.revision_id + 1
)
WHERE this.name != previous.name
ORDER BY this.revision_id DESC
LIMIT 1
If you're wondering why I'm going to all the trouble of figuring out what's arguably an implementation detail at design time, it's because I like to know if the odder corners of my design are possible before I commit to implementing them - it saves a lot of heartache, wailing and gnashing of teeth later, and given the cost of beer these days is good for my bank balance. In any case, given this seems to be workable, I'm going with this design for the moment.
State Approved Revisionism
Having established a plan for history maintenance our data model can handle submission, review, publishing, and editing. Looking at the take/approve/implement/verify/pay stuff, the first thing that jumps out at me is that approving somebody to start implementing requires both vienna -and- the submitter to say yes. So we're going to need to model an approval that can hang about:
class Approval {
ro Todo todo;
ro Person implementor;
ro Person approved_by;
auto DateTime approved_at;
};
Of course, we still don't have a way for an implementor to indicate their interest; I guess that should be something like:
class Implementation {
ro Todo todo;
ro Person implementor;
ro Str notes;
auto DateTime proposed_at;
optional rw DateTime deadline;
};
The thing is, an implementation proposal is (presumably) attached to a specific revision of the todo - if the todo changes, the proposal probably needs to change too. One way to model this would be to version the implementation as well as todos, but that's going to get complex fast so it's worth looking for another way. Thinking about it, it seems to be possible to cheat - hang the implementation off the TodoVersion rather than the Todo, and if somebody's updating an outdated implementation proposal we can simply default the notes to what they were for their last proposal; that's really a user convenience the domain model doesn't need to know about since now to it the implementation is already locked to the version. That also means we can shift the Approval to hang off the Implementation object, which simplifies that as well. What we end up with then, is:
class Implementation {
ro TodoVersion todo_version;
ro Person implementor;
ro Str notes;
auto DateTime proposed_at;
optional rw DateTime deadline;
};
class Approval {
ro Implementation implementation;
ro Person approved_by;
auto DateTime approved_at;
};
I think that's good to handle the proposal system, so it's time to go look at the spec again and see what's next. The key thing we haven't handled yet is the review process, which seems to be simply "implementor asks for review, submitter either (a) approves, (b) cancels implementation" - the review stage in the outline stati section only allows for the todo to go to accepted or rejected. I'm going to go out on a limb here[12] and guess it should actually be possible to (c) say "well, you're close, but no cigar[13]. Here's what you need to do to fix this" and leave the todo in implementation. So the Review will need to allow for a description; if a task moves to completion as a result, that is sufficient to determine success - but it should be possible to trace that to a specific Review, so we need to indicate which review marked the Implementation complete. Looking at the cancelled case, I wonder if the same field can be used to indicate failure - simply make it 'implementation ended by $this_review' and if the todo isn't complete, that's a fail. Except what about when the second or third attempt does succeed, then how do we tell which one was which? Versions to the rescue[14] - since an Implementation is against a specific version we can tell by whether we transitioned to the completed state as of the next version if this was a success or a failure. So we now have:
class Implementation {
ro TodoVersion todo_version;
ro Person implementor;
ro Str notes;
auto DateTime proposed_at;
optional rw DateTime deadline;
optional rw Review closed_by;
};
class Review {
ro Implementation implementation;
ro DateTime requested_at;
rw DateTime performed_at;
rw Str write_up;
};
Looking back over the spec, this design seems to cover everything written. Or at least all the data changes modelled; the notification stuff is orthogonal and I'm not planning to worry about that until the UI is up and working. But what about all the things users will want that are as yet unwritten?[15] Well, the key thing that springs to mind is that people are going to want to add explanations and notes to almost everything they do - and to things other people do; while implementation offers and reviews have direct fields for this, there's always a requirement for out of band information in any sort of project/task tracking app, and there's also a traditional and effective way of solving it - comments. Since I think people will usually be viewing the event stream at the Todo level, we can probably get away with just having comments there (and if that doesn't work in practice the code should be simple enough to re-use to add them elsewhere). So we need to add:
class TodoComment {
ro Todo todo;
ro Person author;
ro Str title;
ro Str body;
};
The question of how we format the body (and, indeed, other notes and similar fields) does spring to mind, but that's primarily a UI design decision so I'm going to punt it for now and declare comments modeled.
Measure twice, commit once.
At this point I think we're done for a first cut - I'm not trying to be exhaustive at this stage, just get close enough to right that we hopefully won't have to throw away most of our work later on realising part of the plan is utterly wrong[16]. So if we take the whole of what we have so far, adding appropriate arrayref attributes as we go along, the end result is:
class Person {
rw Email email; # primary identifier
rw Str name;
rw Password password;
rw enum(qw(User Admin)) role;
};
class Todo {
auto Int id;
rw Str project;
rw Str name;
rw Person submitter;
optional rw Str description;
optional rw URI test;
rw Str value;
rw Str complexity;
rw TodoState state default 'submitted';
ro ArrayRef[TodoVersion] versions;
ro ArrayRef[TodoComment] comments;
};
class TodoVersion {
ro Todo todo;
ro Int revision;
ro Person changed_by;
auto DateTime changed_at;
rw Str project;
rw Str name;
rw Person submitter;
optional rw Str description;
optional rw URI test;
rw Str value;
rw Str complexity;
rw TodoState state default 'submitted';
ro ArrayRef[Implemenation] implementations;
};
class TodoComment {
ro Todo todo;
ro Person author;
ro Str title;
ro Str body;
};
class Implementation {
ro TodoVersion todo_version;
ro Person implementor;
ro Str notes;
auto DateTime proposed_at;
optional rw DateTime deadline;
optional rw Review closed_by;
ro ArrayRef[Approval] approvals;
ro ArrayRef[Review] reviews;
};
class Approval {
ro Implementation implementation;
ro Person approved_by;
auto DateTime approved_at;
};
class Review {
ro Implementation implementation;
ro DateTime requested_at;
rw DateTime performed_at;
rw Str write_up;
};
A quick check back over the spec doesn't reveal anything I don't think this set of classes can model, so I'm going to declare the domain model design provisionally complete, which marks the end of this article.
Comments on this article can be found here at use.perl.org
The previous (and first) article in this series was the Introduction
[0]Or at least obvious to me that this is how I want to do it. Even the fact that parallel lines never meet turns out to be merely a postulate, so I'm simply going to declare that any reality where this isn't obvious is non-mstian and move on.
[1]I was originally going to write "of what a ProjectManager is when it's at home" but then realised that I wouldn't want to interrupt the good ones when they are to find out and that reminding people that the bad ones have homes too might result in said homes being set fire to in revenge for all the overtime keeping their minions from seeing their own homes. Which would be a bad thing. After all, the minions might get arrested for it.
[2]Stop with the sniggering at the back. Don't think I can't hear you.
[3]Or perhaps more like the early Russian monkey cosmonauts who were sent with just enough oxygen to go "woo I'm in space" or whatever the equivalent in monkey is before discovering that there was this slight problem in the carrying on breathing department.
[4]Which is not only clearer, but is going to save me a lot of typing and a lot of swearing when I keep inexplicably leaving the last e off commite.
[5]Always punt a decision where possible. Not only does it save you to concentrate on the decisions you can't, but it only takes one man to punt whereas you need three men in a boat.
[6]Technical terms, not to be confused with widgets, frobbers or gangles. All are important parts of the ActiveJobSecurity pattern popularised by the PDTalist cabal.
[7]I've also seen the "let's just serialise a hashref into a blob field" approach which is really flexible right up until the point you need to run a SELECT, much like every other attempt to use a database as a dumb datastore. I'd try to express how I feel about such designs but I just cleaned the screen on my laptop and apopleptic spittle tends to stick.
[8]This is perhaps pushing a point, given at the class level we're considering them to be optional rather than unknown so using a separate table for each one would be more pedantically normalised, but I'm calling overkill on that idea.
[9]I always use name_like_this for table names and in perl NameLikeThis for class names; the correspondence is usually fairly direct, except when it isn't. In that case I'll be sure to warn you.
[10]DBIx::Class tests defined-ness of a before the inequality check to avoid 'use of undefined value in eq' warnings but there's nothing causing an equivalent requirement in SQL. However, NULL != NULL returns NULL so we have to COALESCE to false to return an overall false rather than a NULL.
[11]Anybody saying 'use VALUES instead of the UNION ALL' is going to be clubbed to death with a printout of the postgres changelog since that appears in 8.2 and I only have 8.1 since I'm using Debian 4.0 (yes, I know 8.2's in backports. Four people told me 5 minutes after I got the UNION ALL working).
[12]My left leg, to be precise. The kneecap's already knackered so it's the least loss if anybody breaks it.
[13]INSERT INTO here (bill_clinton_joke);
[14]Da-da da DAH da DAAAAH
[15]I sometimes think that these are the -real- user requirements. What you start off with is often just a set of business rules, and the things the real people burst into tears over when they try to actually use the result are the things that are actually important, as opposed to important on paper (in theory, theory is the same as practice ...)
[16]Several of the ideas we considered and threw out would definitely have caused this sort of clusterfuck, which is why at each stage I've gone through as many scenarios as I can think of in my head and made sure the design we had so far could handle it. This sort of mental exercise becomes quite an effective tool with practice and while trying to design everything up front is always doomed, getting to the point where only the unforeseen problems (plus those questions you decide are safe to punt) are left tends makes the implementation process much smoother.