Wednesday, November 19, 2008

State Pattern using module extension

I try to favour delegation over inheritance. But sometimes Replace Inheritance with Delegation can be difficult. It turns out that when you have an inheritance heirarchy, and state is used in both the superclass and subclasses, it can be difficult to remove the inheritance heirarchy and replace it with delegation. Let's look at an example. Here we're modeling bikes:

class Bicycle
def wheel_circumference
Math::PI * (@wheel_diameter + @tire_diameter)
end
end

class FrontSuspensionMountainBike < Bicycle
def off_road_ability
@tire_diameter * TIRE_WIDTH_FACTOR + @front_fork_travel * FRONT_SUSPENSION_FACTOR
end
end

class RigidMountainBike < Bicycle
def off_road_ability
@tire_diameter * TIRE_WIDTH_FACTOR
end
end

class RoadBike < Bicycle
def off_road_ability
raise "You can't take a road bike off-road"
end
end

In this heirarchy the @tire_diameter instance variable is used in both the superclass and subclass. You can imagine that if we were to try to have the Bicycle class delegate to a FrontSuspensionMountainBike object, we'd have to duplicate the @tire_diameter state. This becomes a bit awkward, particularly if @tire_diameter can change - you'd have to ensure that the the @tire_diameter in Bicycle is kept in synch with the one in FrontSuspensionMountainBike. I'd probably decide that it wasn't worth the effort, and keep the inheritance heirarchy.

But what if we wanted to change the type of bike at run-time? Perhaps we want to upgrade a RigidMountainBike (a bike with no suspension) to a FrontSuspensionMountainBike. Using the state pattern would be ideal, but the traditional state pattern uses delegation. With ruby modules we have a different option. Rather than represent the FrontSuspensionMountainBike and RigidMountainBike behaviour as subclasses of Bicycle, we could make them modules and extend Bicycle with the appropriate module for the behviour that we want:

mountain_bike = RigidMountainBike.new
front_suspension_bike = FrontSuspensionMountainBike.new

becomes

mountain_bike = Bicycle.new.extend(RigidMountainBike)
front_suspension_bike = Bicycle.new.extend(FrontSuspensionMountainBike)

class Bicycle
def wheel_circumference
Math::PI * (@wheel_diameter + @tire_width)
end
end

module FrontSuspensionMountainBike
def off_road_ability
@tire_width * TIRE_WIDTH_FACTOR + @front_fork.travel * FRONT_SUSPENSION_FACTOR
end
end

module RigidMountainBike
def off_road_ability
@tire_width * TIRE_WIDTH_FACTOR
end
end


So we could conceivably upgrade our mountain bike at run-time to add a fork with front suspension:

bike = Bicycle.new.extend(RigidMountainBike)
...
bike.add_front_suspension(fork)

module RigidMountainBike...
def add_front_suspension(fork)
@front_fork = fork
extend(FrontSuspensionMountainBike)
end
end

So we now have the ability to change behaviour at run-time, and we haven't introduced any duplication - the state and behaviour is still shared between the Bicycle class and the modules. But there's a problem:

bike.kind_of?(FrontSuspensionMountainBike) => true    
bike.kind_of?(RigidMountainBike) => true

We were able to mix in the FrontSuspensionMountainBike behaviour, but the RigidMountainBike behaviour still exists on the bike object. It turns out that ruby doesn't provide the ability to unmix a module. But all is not lost - there's an open-source library called mixology that does exactly what we want.
require 'mixology'

module RigidMountainBike

def add_front_suspension(fork)
@front_fork = fork
unmix(RigidMountainBike)
mixin(FrontSuspensionMountainBike)
end

end

It provides two methods: unmix (for removing a module from an object) and mixin, for adding a module to an object. Our bug is now fixed:

bike = Bicycle.new.extend(RigidMountainBike)
...
bike.add_front_suspension(fork)

bike.kind_of?(RigidMountainBike) => false

Tuesday, November 18, 2008

Bored with the rhetoric

During election time, I often find myself contemplating argumentative techniques as I watch the politicians do battle for votes. But it's not electioneering that I'm sick of - when you don't have a TV, live in the mountains, and avoid USA Today, you can moderate your intake pretty easily. It's programmer rhetoric that's gotten me down. We're an opinionated bunch, we developers. I'd like to think there's some relationship between an enjoyment of discrete mathematics and a healthy passion for the subtleties of logical argument, but rationality doesn't always reign supreme. Here's a few examples that I've seen over the past year:

Using the phrases "I'm just being pragmatic", or "my solution is the simplest thing that works" to convince someone the merits of an idea.
If you respect the person you are arguing with, and you are both aiming for a pragmatic and simple solution (as we all should be), then these statements are not helping. You're trying to discredit the other person's argument by implying that they are not pragmatic, or their solution is too complex. If the two of you agree that one solution is simpler than the other, then focus on why the simpler solution is adequate.

"But that's not Agile"
I see a repeating pattern in software. Over time, we learn from our mistakes. Out of this learning, we come up with some high level goals (such as the four specified in the Agile manifesto). We then develop some set of practices to help guide people in achieving the goals. Then we follow the practices, refining them a bit to make sure the goals are met. But then, after time, we get caught up in the mechanics of the practices, and forget what the goals are all about. Practices are often much easier to understand. Often when I hear "but that's not Agile", it's to do with a practice. It might be "If a story is more than 3 lines long, we're not being Agile". But the manifesto goal is "Customer collaboration over contract negotiation". So long as we're not sacrificing customer collaboration with our 4th line on our story, then it shouldn't be a problem. And if you think we're hurting our customer collaboration, focus on that in the argument.

The Logical Fallacy
"see, that's why we need continuous integration"
This quote comes from me. I said it. I use programmer rhetoric too :( But I'm trying to get better... Anyway, I was arguing with my project manager about our need for continuous integration. We were developing on macs, and deploying to linux. We had a problem that was only exposed on linux. I had been advocating for a linux machine to run Cruise Control. So I jumped on the chance to further my argument. "See, that's why we need continuous integration". But my project manager called me out on it. He's a smart guy. And he said "So are you telling me that you would have had a test that exposed this problem?". I had to admit that no, I wouldn't have. I'd used a logical fallacy. Yes, continuous integration helps expose bugs. Yes, we had a bug. Both of those statements are logically correct. But it does not follow that continuous integration would have exposed this bug. Shame on me...

As Steven Levitt says in Freakonmics, professionals tend to take advantage of their specialised knowledge to better themselves. He's talking specifically about realtors in one chapter, but programmers aren't above this behaviour. The skills and knowledge that we have gained in order to do our jobs is pretty rare, and we're privileged to be in the position we're in. But all too often I see developers taking advantage of their specialised knowledge during an argument and misleading their opponent by talking about something that the opponent doesn't understand. If you want to win the argument, explain your reasoning in a language that your adversary understands.

And yes, I do understand the irony of using rhetoric to describe my frustration with rhetoric. But I swear it's for good, not evil.

Wednesday, September 24, 2008

Discount for Voices That Matter Ruby Conference

I'll be speaking at the Voices That Matter Professional Ruby Conference. You can get a discount of $200 if you book using this code: PRDPKRL.

soylent green is people. so is software

I did a presentation on pair programming at a client a while back and it caused me to reflect a bit on the virtues of pairing. During my research, as is often the case for me, I found someone that articulated the benefits (and concerns) far better than I could. He may not have had pair programming in mind, but it is amazingly apt:

"Only if the various principles - names, definitions, intimations and perceptions - are laboriously tested and rubbed one against the other in a reconciliatory tone, without ill will during the discussion, only then will insight and reason radiate forth in each case, and achieve what is for man the highest possible force..." - Plato

Despite the somewhat sexist use of the word 'man' as a synonym the human race, Plato has done a good job here of describing the process of argument and some of the things to watch out for. The section that resonated most with me was the notion of perceptions being "laboriously tested and rubbed one against the other". This is where the true advantage of pair programming comes to the fore. By rubbing one idea against another, new ideas are born... is there a euphemism there? Anyway, the point is that you may start out with a couple of different ideas, but the process of discussion will often modify these ideas, combine them, or spawn completely new ideas. I've even had misunderstandings inspire orthogonal solutions. One participant will explain an idea, and the second participant will misunderstand, but try to build on their (mis) understanding, sometimes even coming up with a superior design.

But you don't need to take my word for it. A study undertaken by Alistair Cockburn and Laurie Williams highlighted the benefits of pair programming. During their research, they found that pair programming improves design quality, reduces defects, reduces staffing risk, enhances technical skills through knowledge transfer and improves team communication. It costs about 15% more in development time (not double, as one might intuit). The cost in extra development time is more then compensated for (by at least an order of magnitude) with savings in reduced defects.

Pair programming was also considered a more enjoyable method of development by the participants in this study. I certainly find it more enjoyable, though I can understand that some software developers don't. If you're a personality who doesn't enjoy constant communication and collaboration then pairing can be tiresome. But, as my friend z once wrote "Soylent Green is people. So is software". Many difficulties associated with the software development process stem from poor communication, and pair programming can go a long way to addressing these problems.

Refactoring: Ruby Edition is now available on Safari

Refactoring: Ruby Edition is now available on Safari as a Rough Cut. It's also available for pre-order on Amazon

Sunday, January 27, 2008

The evolution of a Domain in rails: Part 2 Separate Query From Modifier

When working on Refactoring, Ruby Edition, I realised that Separate Query From Modifier was one of my favourite refactorings. I'd been doing it for a while without realising that it had a formal name. I probably hadn't realised the full benefits of it either. From Refactoring, Ruby Edition: "When you have a function that gives you a value and has no observable side effects, you have a very valuable thing. You can call this function as often as you like. You can move the call to other places in the method. In short, you have a lot less to worry about." If you do not separate the querying code from the modifying code, code becomes difficult to understand and re-use. When I'm trying to track down a bug, I'm looking for two things: 1. The code that triggers the offending code (the query), and 2. the offending code itself (the modifier). If I have to search through a method that makes a query, then does some modification, then does another query, and some more modification, then my head starts to hurt. Which query returns the result that triggers the bug? And which modification is the bad modification? If we can separate the querying code from the modifying code, we can often achieve a better abstraction of our business rules and promote re-use. As a project evolves, we often have to introduce new trigger points for state changes. And in an agile environment, I often see the story cards evolve like this:

Story 1: Under condition 'X', 'A' should change such that...
Story 2: Under condition 'X', warn the user before making the change to 'A'

As Agile developers, we like this kind of story breakdown. After completing story 1, we can demonstrate to the user that we understand condition 'X' and the changes that should be made to 'A'. If we get it wrong, then we can fix it. Story 2 is the icing on the cake. It provides some nicer usability around the feature. It might also be a lower priority, and we might be able to release the code without Story 2 and gain some real business value before we polish it later. So the separation of the stories could be important. But if we mix query and modifier, it can become very difficult to introduce the warning, or introduce new trigger points for the desired state change.

But I've realised that Separate Query From Modifier is not always as easy as extracting conditional logic to one method, and having the modifying logic in another method. In my last post I said that in Rails, it can sometimes be difficult to re-wire multiple ActiveRecord objects of different type together according to some set of business rules without using the database as a storage mechanism, and without the validations getting in your way. We've often ended up with complex service methods that perform a query, do some modification (saving to the database), perform another query, do some more modification, and so on. You might end up with 5 or 6 queries that have to be performed in sequence. Later queries might depend on modifications that have been performed as a result of earlier queries. And the code is ugly, and difficult to re-use.

One way that we've solved this problem is to create a results object that represents the new relationships to be created. It's just a plain old Ruby object with some attributes. Let's say we're trying to build a new 'A' object, with relationships to B and C. Let's say A has_many B, and A has_one C. I'd create a results object called NewA, with an array attribute for the Bs and an attribute for the C. As I go through my algorithm, I can add my Bs and my C to the results object - without actually changing any underlying associations. (I now have a query without the modifier). Toward the end of the algorithm, I can present my results object to the user for confirmation, and if they confirm the change, then I can grab my results object and make the actual associations in the database (the modifier). I might even find that I can move some behaviour to this results object, and it will cease being a dumb data object. But even if I don't get to move any behaviour, the separation of query and modifier is worth the effort.

Friday, January 25, 2008

The evolution of a Domain in rails: Part 1

I've been at my current client for about 15 months now, and during that time we've been working in fairly complex domains. Whilst our application started out very CRUD-like, it gradually moved away from CRUD in its simplest form (resources with simple attributes) to quite complex domain models. We now have quite a few functions that cause the interaction of 10-plus domain objects and often require the re-wiring of these objects according to a change in state triggered by a user. I was recently asked by a colleague to explain the limitations of ActiveRecord and some ways that we've overcome those issues. His concern was that Rails' tight coupling to the database would lead to anaemic domain objects that are effectively just DTO's, with all of the business logic creeping up into the view.

We've never ended up with business logic in the view, but we have made mistakes such that this behaviour has ended up in non-ideal parts of the application. Our first mistake was to place too much business logic in the controllers. This came about when we were trying to orchestrate the interactions between (say) three domain objects of different type. We asked ourselves "This behaviour doesn't belong on any one of the three domain objects, so where should it go?". Rails makes a clear distinction between Model, View, and Controller, and most of the examples only show "model" objects as being those that inherit from ActiveRecord::Base (and are therefore stored in a database). So given that we didn't want this business logic in the view, and it didn't belong on any of the "model" objects that we already had, the only place left was the controller. So our controllers got fat. And they were hard to test. And hard to change. And the business logic was very hard to re-use. And our model objects were thin (you might even say anaemic). Boo.

Fresh from reading Domain Driven Design, our teammate Pat Sarnacke came to the rescue, saying "We need services". And it's interesting to note that we were not alone in this discovery. So we extracted the business logic from our complex controller actions into Service objects. (From Domain Driven Deisgn, service objects are objects "with no state of their own nor any meaning in the domain beyond the operation they host"). We got quite a bit of re-use of this logic by doing so. It was definitely a step in the right direction. But then our Services started to grow. They became large and complex, and although they displayed the positive trait of having low coupling (with perhaps only one public method), they weren't very cohesive. We extracted private methods that were cohesive within themselves, but the collection of private methods made no sense together, other than the fact that they were part of the "procedure" of the service. And so the service became hard to understand in isolation, and not particularly amenable to re-use. If you wanted to re-use the service as a whole, then you were in good shape (much better than when the logic was in the controller). But if you wanted to re-use one of the private methods, it wasn't so easy (and not simply because they were private).

It turns out that, in some cases, it can be difficult to re-wire multiple ActiveRecord objects of different type together according to some set of business rules without using the database as a temporary storage mechanism. We would often end up with service classes that would first associate model A to model B, save model A, and then query model B for some further modifications to be performed. We'd have to reload model B in order to make this query (because it depended on the relationship to A). And so our services ended up being a sprinkling of save!s and reloads. And this was not because it was impossible to perform this logic in any other way, but because it was much easier to make the association, save it, and then go on our merry way through the rest of the algorithm.

And this worked for a little while. But any sub-section of the service was almost impossible to re-use. The order of association became very fragile because of our validations - you'd have to disassociate C from A before associating B to A because A couldn't have both B and C. And perhaps C needed a replacement for A to be valid, so you'd have to find C a new 'A' before you could save C. And so not only was the service procedural, but it was strictly procedural - you had to perform each step in a defined order. This made re-use very difficult. And what we really wanted to do was to move some of the logic onto domain objects, but with all the saving and reloading that was going on, it seemed almost impossible to extract logic that made any sense outside of the context of our complex algorithm.

And the killer came when we were asked to warn the user before we made this complex change, and give them information about the change that was about to be made so that they could decide whether or not to proceed. In order to know whether the change was going to be made, and what the change would look like, we'd have to traverse the entire algorithm, by which time the change would have been made. In short, we couldn't separate the querying code from the modifying code (see Separate Query from Modifier in Refactoring). This was the reason we couldn't fulfill the feature, and the reason that we couldn't move cohesive logic to the domain objects.

It took us two steps to solve these two problems (fragility and Separate Query From Modifier). I'll describe the solution to the fragility here, and the solution to Separate Query from Modifier in the next post.

Fragility


Our validations started to dictate the order in which our objects had to be saved (and therefore the order of the steps in our algorithm), which made re-use of code very difficult. So we decided that we needed a way to save without triggering the validations, but ensure that only valid objects were left in the database after the algorithm had finished. ActiveRecord provides a method called save_without_validation that takes care of our first requirement. And if we called save_without_validation within a transaction, and then raised an Exception if any of the objects were invalid at the end of the algorithm, then the transaction would roll back, and we wouldn't have invalid records in the database.

So we added a method called save_and_record_without_validation to ActiveRecord::Base which calls save_without_validation on the object, and stores the object in the Thread.current hash so that we can go back at the end of the algorithm and ensure that it is valid. We have a method called validate_recorded_records! which takes a block that contains the calls to save_and_record_without_validation, and then validates after the block has been executed. The calling code might look something like this:

    1 old_parent = Parent.find_by_name("oldie")
2 new_parent = Parent.find_by_name("newbie")
3 new_child = Child.find(5)
4 old_child = old_parent.child
5 old_child.parent = new_parent
6 old_parent.child = new_child
7
8 Parent.transaction do
9 validate_recorded_records! do
10 old_child.save_and_record_without_validation
11 old_parent.save_and_record_without_validation
12 end
13 end
14
15


And if line 10 causes an invalid object until line 11 is executed, then it doesn't matter, because the validation only gets performed after the validate_recorded_records! block has been executed. The source code for our RecordingHelper can be found here and the tests here

So this solved our fragility problem - the order of execution of the statements no longer mattered as much, which enabled us to extract methods that could be re-used. But it still doesn't separate query from modifer - we'll tackle that in the next post.