Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Albert and Bernard just become friends with Cheryl, and they want to know when her birthday is. Cheryl gives them a list of 10 possible dates.
May 15, May 16, May 19
June 17, June 18
July 14, July 16
August 14, August 15, August 17
Cheryl then tells Albert and Bernard separately the month and the day of her birthday respectively.
Albert: I don’t know when Cheryl’s birthday is, but I know that Bernard does not know too.
Bernard: At first I don’t know when Cheryl’s birthday is, but I know now.
Albert: Then I also know when Cheryl’s birthday is.
So when is Cheryl’s birthday?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
Yay.
]]>For example, consider this scalding job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Testing this job end-to-end would be fragile because there is so much going on and it would be tedious and noisy to build fake data to isolate and highlight edge cases. The pivot operations on lines 20-22 only deal with browser
and country
yet test data with all 10 fields is required including valid timestamps and user agents just to get to the pivot logic.
There are a few ways to tackle this and an approach I like is to use extension methods to breakdown the logic into smaller chunks of testable code. The result might look something like this.
1 2 3 4 5 6 7 8 9 10 11 |
|
Each block of code depends on only a few fields so it doesn’t require mocking the entire input set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
In this example only browser
and country
are required so setting up test data is reasonably painless and the intent of the test case isn’t lost in a sea of tuples. Granted, this approach requires creating a helper job to set up the input and capture the output for test assertions, but I think it’s a worthwhile trade off to reveal such a clear test case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
Option
. The problem is I’d like to add a property to my data model and I don’t want it to be optional because it’s not.
In Java land, I probably would have added the property and thought about how to deal with migrating old data later, but not in scala. This battle has to be fought right now. That’s what type safe means and it’s why the NullPointerException
has all but died in pure scala apps, not unlike the carpal tunnel epidemic.
In my case I’ve got some data in Couchbase and when I add a new property to my data model, I won’t be able to read the old data. What I need to do is transform the json before hydrating the object. Fortunately, this is a snap by using the rich transformation features described in JSON Coast-to-Coast.
My approach was to create a subclass of Reads[A]
called ReadsWithDefaults[A]
that reads json and uses a transformation to merge default values. It looks like this:
1 2 3 4 5 6 7 8 9 10 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Integrating with Google spreadsheets to manage application data unlocks powerful possibilities and will take you further than anything you could scaffold up in a comperable amount of time. To be clear, I’m not suggesting you take a runtime dependency on Google docs. I’m talking about using a lightweight business process that takes advantage of the rich collaborations features of Google docs and simply import the data into your database using the API.
Google docs keeps a revision history of what was changed, when and by whom. That’s a pretty cool feature and one you’re not likely to build when there are more pressing customer facing features to work on.
Sometimes release notes aren’t detailed enough when trying to correlate a surprising change in a KPI, so it’s reassuring to have the nitty gritty revision history there when you need to dig a little deeper. It’s just like reviewing source control history when tracking down a bug that was introduced in the latest code release. Your data deserves the same respect.
My older brother carries a pocket knife at all times, and when he believes in something, he can’t help but sell everyone else on the idea. He used to tell me I needed to carry a pocket knife so I could cut off my seatbelt when trapped in a burning (or sinking) car. In his defense, GM recalled 240,000 SUVs in 2010 citing “…[the seat belt] may seem to be jammed.”
Naturally, I started carrying a pocket knife too. I figured that along with untrapping myself from a burning car or defending myself at an ATM, it would be handy when I needed to open a box. To my surprise, new opportunities presented themselves and I was using my knife several times a day. I could cut off a hanging thread, scrape bee pollen off my windsheild, improve the vent in my coffee lid, remove a scratchy tag from my son’s shirt and yes, open a box.
It turns out I had a similar experience with concurrent editing. Working on the same document at the same time with multiple people is pretty cool, and you’d be surprised how this powerful ability can change the way you work. There’s no way you would justify an extravagant feature like this in your own back office tools and you get it for free with Google docs.
Since it’s a spreadsheet, you can use formulas and this is cooler than you might think. Consider a list of products. Chances are the product prices have some kind of relationship that you can capture with a formula and save yourself from tedious and error prone manual entry.
In reality though, you probably don’t care about the actual product prices as much as you care about your margins or some other derived number. With a spreadsheet, you can keep your formula together with your data. When you change a price, you can see how it affects your bottom line in a calculated field a few cells over. Or work backwards and calculate the price based on a combination of other factors. Or just use a forumla as a sanity check that all your numbers add up to the right amount.
Again, this is the kind of feature that invites new ways of working. You might use a formula to find a bug before the data ever hits your app or add a new forumla so you don’t get burned twice by the same mistake. A spreadsheet gives you a place to capture knowledge about your data. You could capture this knowledge in a wiki, but the fact that it’s in a Google spreadsheet that’s connected to your app, makes it real. It’s the difference between a code comment and a unit test.
A data update is a deployment just like a code deployment and things can go horribly wrong when there’s a mistake in your data. Big companies get this and respond with change advisory boards and more process to protect themselves, but you can do better.
Your data deserves a repeatable one-click deployment process and you shouldn’t settle for anything less. An import is idempotent which means you can run it multiple times and the end result is always the same. In other words, you can practice in a test environment and iterate until you get it right. Oh, and if you spin off a copy prior to making changes, you get one-click rollback too.
One-click deployment and rollback with revision history for your data without developer interaction. That’s a really big deal because the end result is confidence. Confidence means that as an organization, you can make decisions, execute and not look back.
Empower your team (not just developers) with a familiar tool like a spreadsheet and give them the opportunity to impress you. Watch as calculations and charts emerge to add deeper meaning to your data. Watch as collaboration occurs and team members are brought together rather than divided by functional roles. Watch as confidence increases in your ability to deploy changes.
Seriously. Don’t waste your time half-assing back office tools when you can invest a similar amount of effort to import data from Google spreadsheets. It’s a scrappy tool that delivers on the features that accelerate the way you work.
]]>If you can do a half-assed job of anything, you’re a one-eyed man in a kingdom of the blind. – Kurt Vonnegut
1 2 3 4 5 6 |
|
And this Scala case class
to hold the documents retrieved from the view.
1
|
|
It’s a common scenario to retrieve multiple documents at once and the Java driver has a pretty straight forward api for that. The desired keys are simply specified as a json array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The Java driver deals with strings so it’s up to the client application to handle the json parsing. That was an excellent design decision and makes using the Java driver from Scala pretty painless. I’m using the Play Framework json libraries and an extension method _.as[TermOccurrence]
defined on ViewRow
to simplify the mapping of the response documents to Scala objects.
1 2 3 4 5 6 7 8 9 10 11 |
|
In order for this extension method to work, it requires an implicit Format[TermOccurrence]
which is defined on of the TermOccurrence
compainion object.
1 2 3 |
|
In my case, the left side of the join contained about 100K records while the right side was closer to 1B. Emitting all join keys from the mapper means that all 1B records from the right side of the join are shuffled, sorted and sent to a reducer. The reducer then ends up discarding most of join keys that don’t match the left side.
Any best practices guide will tell you to push more work into the mapper. In the case of a join, that means dropping records in the mapper that will end up getting dropped by the reducer anyway. In order to do that, the mapper needs to know if a particular join key exists on the left hand side.
An easy way to accomplish this is to put the smaller dataset into the DistributedCache
and then load all the join keys into a HashSet
that the mapper can do a lookup against.
1 2 3 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
This totally works, but consumes enough memory that I was occassionally getting java.lang.OutOfMemoryError: Java heap space
from the mappers. Enter the Bloom filter.
A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not. Elements can be added to the set, but not removed. The more elements that are added to the set, the larger the probability of false positives. -Wikipedia
I hadn’t heard of a Bloom filter before taking Algorithms: Design and Analysis, Part 1. If not for the course, I’m pretty sure I would have skimmed over the innocuous reference while pilfering around the hadoop documentation. Fortunately, recent exposure made the term jump out at me and I quickly recognized it was exactly what I was looking for.
When I took the course, I thought the Bloom filter was an interesting idea that I wasn’t likely to use anytime soon because I haven’t needed one yet and I’ve been programming professionally for more than a few years. But you don’t know what you don’t know, right? It’s like thinking about buying a car you didn’t notice before and now seeing it everywhere.
The documentation is thin, with little more than variable names to glean meaning from.
1 2 3 |
|
vectorSize
- The vector size of this filter.nbHash
- The number of hash function to consider.hashType
- type of the hashing function (see Hash).I know what you’re thinking. What could be more helpful than The vector size of this filter as a description for vectorSize
? Well, the basic idea is there’s a trade-off between space, speed and probability of a false positive. Here’s how I think about it:
vectorSize
- The amount of memory used to store hash keys. Larger values are less likey to yield false positives. If the value is too large, you might as well use a HashSet
.nbHash
- The number of times to hash the key. Larger numbers are less likely to yeild false positives at the expense of additional computation effort. Expect deminishing returns on larger values.hashType
- type of the hashing function (see Hash). The Hash documentation was reasonable so I’m not going to add anything.I used trial and error to figure out numbers that were good for my constraints.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
When you have your numbers worked out, simply swap out the HashMap
with the BloomFilter
and then blog about it.
http://www.google.com/?q=tld+uk
, extract the root domain. In this case, it would be google.com
. Sounds easy, right? Well it is, but http://www.google.com.br/
is also legit. Okay, so recursion to the rescue!
Not so fast…
.ac.uk .co.uk .gov.uk .parliament.uk .police.uk ... .metro.tokyo.jp .city.(cityname).(prefecturename).jp ...
http://en.wikipedia.org/wiki/.ukhttp://en.wikipedia.org/wiki/.jp
Doh! Surely George Clinton is not happy about this and neither am I because it means I’m stuck doing a lookup against a list of arbitrary domains.
Using java.net.URI
takes care of extracting the host, so the interesting part is parsing the host into a list of chunks that decrease in number of segments.
1 2 3 4 5 6 |
|
An implementation of splitIntoChunks
isn’t terribly exciting, but probably a good interview question. How about an implementation that doesn’t mutate state? Sounds fun, but why make this more difficult? It’s not because I want to run this bit of code on mutliple cores or distributed across machines, but because it challenges me to change the way I think about simple problems so that solving more complicated problems using functional idioms feels more natural. After all, when something is painful, you should do it more often. You know, like push ups and deployments.
1 2 3 4 5 6 |
|
Scala ftw.
]]>I banged out a naive implementation using nested loops that I could use for comparison to make sure I was implementing the real algorithm correctly.
1 2 3 4 5 6 7 8 |
|
The algorithm described in lecture is a divide and conquer algorithm based on a variation of merge sort, and as you might expect from its lineage, runs in O(nlog(n))
time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
* The guts of the implementation are not shown per the Coursera honor code.
The assignment was to run this algorithm on a list of 100,000 integers and report the number of inversions. I ran both implementations as a sanity check.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
It’s one thing to crunch numbers on a calulator or plot graphs to gain an intuition for algorithm complexity, but it’s quite a bit more meaningful to wait for 15 seconds while a 2.7GHz quad-core Intel Core i7 grinds through ~5 Billion comparisions O(n^2)
versus a mere 140 ms to zip through 1.6 Million comparisons using an O(nlog(n))
implementation.
]]>Yeah, science!
1 2 3 4 5 6 |
|
This bit of innocuous code encourages the reader to squander the power of dependency inversion and reduce it to a clunky tool that makes unit testing a little bit easier. That sounds harsh, so let’s start by discussing what Guice is and the problem it solves.
Guice and the like are referred to as IoC containers. That’s Inversion of Control. It’s a pretty general principle and when applied to object oriented programming, it manifests itself in the form of a technique called Dependency Inversion. In terms of the BillingService
example, it means the code depends on a CreditCardProcessor
abstraction rather than new‘ing something specific like a PayPalCreditCardProcessor
. Perhaps depends is an overloaded term here. With or without the new keyword, there is a dependency. In one case, a higher level module is responsible for deciding what implementation to use, and in the other case, the class itself decides that it’s using a PayPalCreditCardProcessor
, period.
Writing all your classes to declare their dependencies leaves you with the tedious task of building up complex object graphs before you can actually use your object. This is where Guice comes in. It’s a tool to simplify the havoc wreaked by inverting your dependencies and it’s inevitable when guided by a few principles like DRY (Don’t Repeat Yourself). If you don’t believe me, go ahead a see for yourself. Write some truly SOLID code and you’ll end up writing an IoC container in the process.
So now that we’ve covered what Guice is and the problem it solves, we are ready to talk about what’s wrong with @PayPal
. Specifying the concrete class you expect with an annotation is pretty much the same as just declaring the dependency explicitly. Sure, you get a few points for using an interface and injecting the object, but it’s really just going through the motions while entirely missing the point. It would be like the Karate Kid going into auto detailing after learning wax-on, wax-off.
Abstractions create seams in your code. It’s how you add new behavior as the application evolves and it’s the key to managing complexity. Since we’re looking at a billing example, let’s throw out a few requirements that could pop up. How about some protection against running the same transaction twice in a short time period. How about checking a blacklist of credit cards or customers. Or maybe you need a card number that always fails in a particular way so QA can test the sad path. Or maybe your company works with a few payment gateways and wants to choose the least cost option based on the charge amount or card type. In this little snippet of code, we’ve got 2 seams we can use to work in this behavior. We’ve got the BillingService
and CreditCardProcesor
.
Oh, wait a minute we’re declaring that we need the PayPalCreditCardProcessor
with that annotation so now our code is rigid and we can’t inject additional behavior by wrapping it in a DoubleChargeCreditCardProcessor
, open-closed style. That’s the ‘O’ in SOLID. So you’re probably thinking, why can’t you just change the annotation from @PayPal
to @DoubleCharge
? Let’s dive a little deeper into this example to find out:
1 2 3 4 5 6 |
|
I’m not going to rant about how extends is evil and that you’re better off with a decorator because I’ve already done that, and this article is about how to wire up a decorator with Guice. So the challenge here is how to configure the container to supply the correct credit card processor as the first dependency of our double charge processor which itself implements CreditCardProcessor
. Looking at the Guice documentation, you would likely think the answer is to do this:
1 2 3 4 5 6 |
|
1 2 3 4 5 6 |
|
That’s wrong though. The CreditCardProcessor
isn’t a thing, it’s a seam and it’s where you put additional behavior like preventing duplicate charges in a short time period. If you look at the decorator, you’ll notice that it has nothing to do with PayPal. That’s because it’s a business rule and shouldn’t be mixed with integration code. Our business rule code and the PayPal integration code will likely live in different packages and the CreditCardProcessor
abstraction could get assembled differently for any number of reasons. Maybe your application supports multi-tenancy and each tenant can use a different payment gateway. We can’t reuse our double charge business rule if it’s hard-coded to wrap a PayPal processor, and that’s a problem.
While I don’t particularly like using annotations for this sort of thing, it’s not the root cause. As a mechanic, it works just fine and can help us accomplish our task. The problem is that the documentation is subtly wrong and encourages mis-use of this feature. The better way to use binding annotations and not undermine the point of injecting your dependencies is like so:
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The difference is subtle, but the devil is in the details. In this last example, the DoubleChargeCreditCardProcessor
doesn’t know or care what implementation it’s decorating. It simply declares a name for it’s dependency so it can be referenced unambiguously in a configuration module. This moves the configuration logic to… well, configuration code. Now you can see that the code is once again flexible and you can easily imagine more sophisticated configuration logic that could consider tenant settings or environment variables in selecting the proper combination of credit card processors to assemble.
tenant_id
to most of the tables in our database we’ll need the application to take care of a few things. First, we need to apply a where clause to all queries to ensure that each tenant sees only their data. This is pretty painless with NHibernate, we just have to define a parameterized filter:
1 2 3 4 5 6 7 |
|
And then apply it to each entity:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The last step is to set the value of the filter at runtime. This is done on the ISession
like this:
1 2 3 4 5 6 7 8 9 10 |
|
The current tenant comes from c.Resolve<Tenant>()
. In order for that to work, you have to tell Unity how to find the current tenant. In ASP.NET MVC, we can look at the host header on the request and find our tenant that way. We could just as easily use another strategy though. Maybe if this were a WCF service, we could use an authentication header to establish the current tenant context. You could build out some interfaces and strategies around establishing the current tenant context, however for this article I’ll just bang it out.
1 2 3 4 5 6 7 8 9 10 11 |
|
Second, we have to set the tenant_id
when new entities are saved. This is a bit more complicated with NHibernate and requires a bit of a concession in that we have to add a field to the entity in order for NHibernate to know how to persist the value. I’m using a private nullable int for this.
1 2 3 4 5 6 7 8 9 10 |
|
It’s private because I don’t want the business logic to deal with it and it’s nullable because my tenant table is in a separate database which means I can’t lean on the data model to enforce referential integrity. That’s a problem because the default value for an integer is zero which could be happily saved by the database. By making it nullable I can be sure the database will blow up if the tenant_id
is not set.
So, back to the issue at hand. The tenant_id
needs to be set when the entity is saved. For this, I’m using an interceptor and setting the value in the OnSave
method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
This IInterceptor
mechanism is a little wonky. If you change any data, you have to do it in both the entity instance and the state array that NHibernate uses to hydrate entities. It’s not a big deal, it’s just one of those things you have to accept like the fact that Apple and Google are tracking your every move via your smart phone. Oh, and the interceptor gets wired up like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
We’re almost done. There is one more case that needs to be handled. When NHibernate loads an entity by its primary key, it doesn’t run through the query engine which means the tenant filter isn’t applied. Fortunately, we can take care of this in the interceptor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
That’s it. Have fun and happy commingling.
]]>Build a web application as though it’s for a single customer (tenant) and add multi-tenancy as a bolt-on feature by writing only new code. There are flavors of multi-tenancy, in this case I want each tenant to have its own database but I want all tenants to share the same web application and figure out who’s who by looking at the host header on the http request.
To pull this off, we’re going to have to rely on our SOLID design principles, especially Single Responsibility and Dependency Inversion. We’ll get some help from these frameworks:
Let’s take a look at a controller that uses NHibernate to talk to the database. I’m not going to get into whether you should talk directly to NHibernate from the controller or go through a service layer or repository because it doesn’t affect how we’re going to add multi-tenancy. The important thing here is that the ISession
is injected into the controller, and we aren’t using the service locator pattern to request the ISession
from a singleton.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Alright, now it’s time to write some new code and make our web application connect to the correct database based on the host header in the http request. First, we’ll need a database to store a list of tenants along with the connection string for that tenant’s database. Here’s my entity:
1 2 3 4 5 6 7 8 |
|
I’ll use the repository pattern here so there is a crisp consumer of the ISession
that connects to the lookup database rather than one of the tenant shards. This will be important later when we go to configure Unity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
So now we need a dedicated ISessionFactory
for the lookup database and make sure that our NHibernateTenantRepository
gets the right ISession
. It’s not too bad, we just need to name them in the container so we can refer to them explicitly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Hopefully that’s about what you were expecting since it’s not really the interesting part. The more interesting part is configuring the ISession
that gets injected into the UserController
to connect to a different database based on the host header in the http request. The Unity feature we’re going to leverage for this is the LifetimeManager
. This is an often overlooked feature of IoC containers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Here we’re using a custom PerHostLifetimeManager
. This tells Unity to maintain a session factory per host. When Unity runs across a host it doesn’t have a session factory for, it will run the InjectionFactory
block to create one using the connection string associated with that tenant.
Since multiple simultaneous requests will be trying to get and set values with the same key, we need to make sure our PerHostLifetimeManager
is thread safe. That’s pretty easy since Unity comes with a SynchronizedLifetimeManager
base class that takes care of the fact that Dictionary
isn’t thread safe.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
So what did we accomplish? Well we didn’t touch any of our existing application code. We just wrote new code and through configuration we added multi-tenancy! That’s pretty cool, but was it worth it? Well, the goal in itself isn’t super important, but this exercise can certainly highlight areas of your codebase where you might be violating the single responsibility principle or leaking too many infrastructure concepts into your application logic.
]]>Avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request. Chain the receiving objects and pass the request along the chain until an object handles it. – Gang of Four
Variations of this pattern are the basis for Servlet Filters, IIS Modules and Handlers and several open source projects I’ve had the opportunity to work with including Sync4J, JAMES, Log4Net, Unity and yes, even Joomla. It’s an essential tool in the OO toolbox and key in transforming rigid procedural code into a composable Domain Specific Language.
I’ve blogged about this pattern before so what’s new this time?
How does it work? It’s pretty simple, there is just one interface to implement and it looks like this:
1 2 3 4 |
|
Basically, you get an input to operate on and a value to return. The executeNext
parameter is a delegate for the next filter in the chain. The filters are composed together in a chain which is referred to as a Pipeline in the Tamarack framework. This structure is the essence of the Chain of Responsibility pattern and it facilitates some pretty cool things:
Consider a block of code to process a blog comment coming from a web-based rich text editor. There are probably several things you’ll want to do before letting the text into your database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
What about dependency injection for complex filters? Take a look at this user login pipeline. Notice the generic syntax for adding filters by type. Those filters are built-up using the supplied implementation of System.IServiceProvider
. My favorite is UnityServiceProvider
.
1 2 3 4 5 6 7 8 9 10 11 |
|
Here’s another place you might see the chain of responsibility pattern. Calculating the spam score of a block of text:
1 2 3 4 5 6 7 8 9 10 |
|
Prefer convention over configuration? Try this instead:
1 2 3 4 5 6 7 8 |
|
Let’s look at the IFilter
interface in action. In the spam score calculator example, each filter looks for markers in the text and adds to the overall spam score by modifying the result of the next filter before returning.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
In the login example, we look for the user in our local user store and if it exists we’ll short-circuit the chain and authenticate the request. Otherwise we’ll let the request continue to the next filter which looks for the user in an LDAP repository.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Why should I use it?
It’s pretty much my favorite animal. It’s like a lion and a tiger mixed… bred for its skills in magic. – Napoleon Dynamite
It’s simple and mildly opinionated in effort to guide you and your code into The Pit of Success. It’s easy to write single responsibility classes and use inversion of control and composition and convention over configuration and lots of other goodness. Try it out. Tell a friend.
]]>Stopwatch
on the BeginRequest
event and then log the elapsed time on the EndRequest
event. I started by modifying the Global.asax
to wire this up, but quickly got turned off because I was violating the Open-Closed Principle. I really just want to bolt-in this behavior while I’m profiling the application and then turn it off when the kinks are worked out. IIS has a pretty slick extension point for this sort of thing that let’s you hook into the request lifecycle events.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
The only weird part here is getting a handle to the logger. I’m using an IoC container in my application, however I can’t tell IIS how to build up my RequestDurationLoggerModule
, so I’m stuck using the Service Locator pattern. The container could be a singleton, but I don’t like singletons, so I implemented IServiceProvider
in Global.asax
instead. All that’s left now is wiring in the module. Since Cassini behaves like IIS6, you have to use the legacy style configuration, which looks like this:
1 2 3 4 5 |
|
For IIS7 though, you add it like this:
1 2 3 4 5 |
|
Finally, it’s time to run the application and see the total request duration logged.
]]>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
There are a few things to pick at. I’m looking at the member variable named tax
that is null until you call CalculateTax
for the first time. There isn’t anything to prevent the rest of the class from using the tax variable directly and possibly repeating the null check code in multiple places. I thought it would be fun to rewrite it using a closure. I don’t mean the kind of accidental closures we write with LINQ, but an honest to goodness proper closure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
A closure is created around the tax variable so only the CalculateTax
function has access to it. That’s pretty awesome. I wouldn’t have thought of using this technique before learning JavaScript all over again. It’s fun code to write, but it’s not going to get checked-in to source control. It’s basically a landmine for the next guy that has to make changes. The mental energy it takes to wrap your head around it is like listening to a joke with a really long setup and a lousy punch line. The solution is more complicated than the problem.
I still thought it was worth the mental exercise. Each language has its wheelhouse which makes a certain class of problems easy to solve. Admittedly this was a bit forced here, but opening your mind to other solutions may lead to a breakthrough solving a legitimately tough problem.
]]>1 2 3 4 5 6 7 8 9 10 |
|
After doing some performance profiling, it quickly popped up as the top offender. Why? Because it’s holding a database transaction open while saving a file. In order to fix it, we have to ditch our UnitOfWork
aspect and implement a finer-grained transaction. Basically what needs to happen is saving the entity and saving the file need to be separate operations so we can commit the transaction as soon as the entity is saved. And since saving the file could fail, we might have to clean up an orphaned file entity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
That wasn’t too bad except that the number of lines of code exploded and we have a few private methods in our service that only deal with plumbing. It would be nice to push them into the framework. Since C# is awesome, we can use a combination of delegates and extension methods to do that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
Now we can move the extension methods into the framework and out of our service. In a less awesome language we could define these convenience methods on the IUnitOfWork
interface and implement them in an abstract base class, but inheritance is evil and it’s a tradeoff we don’t have to make in C#.
If that wasn’t true, you could simply put all the best players on one team and watch them dominate. Russia tried it in the 2010 Winter Olympics and only walked away with the bronze medal. A successful team has an identity that transcends individuals. If you watch hockey, you’ll hear teams described as up-tempo, finesse, defensive or gritty and hard-hitting. One isn’t better than another, it really depends on what your core strengths are and then getting buy-in from everyone on the team.
On a software team, maybe you’ve got some pre-family twenty-something developers that are wizards with Javascript. Or, maybe you’ve got a veteran team and company with deep enough pockets to invest in a long term product strategy. It really doesn’t matter, just identify your strengths and get buy-in.
Developers are opinionated and not always team players; the cliché herding cats comes to mind. So how do you get developers to buy in? Try out this exercise. Get your team in front of a whiteboard and come up with a list of words that describe positive software quality factors. Here’s a start:
Have each team member to pick their top 5 and write them on the board. Compare the lists and allow the team to negotiate with each other to come up with a common top 3. Why is a shared list important? Writing software is about trade-offs. We make dozens of decisions per day while writing code and sharing a set of values is essential to consistent forward progress in a team environment.
If you don’t know where you’re going, you might not get there – Yogi Berra
Consider this list:
Compare that with:
Imagine you are sitting at the keyboard working on a feature. Can you see how the first list could influence completely different decisions than the second list? It’s not about right or wrong, it’s just about being explicit about how you make decisions and trusting the rest of your team to make similar choices.
]]>Step 1: Install WebDeploy on the web server you want to deploy to.
Step 2: Configure Web.config transforms. This will enable you to change connection strings and whatnot based on your build configuration.
Currently this is only supported for web applications, but since it’s built on top of MSBuild tasks, you can do the same thing to an App.config
with a little extra work. Take a peak at Microsoft.Web.Publishing.targets
(C:\Program Files\MSBuild\Microsoft\VisualStudio\v10.0\Web) to see how to use the build tasks.
1 2 |
|
Step 3: Figure out the MSBuild command line arguments that work for your application. This took a bit of trial and error before landing on this:
C:\Windows\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe MyProject.sln /p:Configuration=QA /p:OutputPath=bin /p:DeployOnBuild=True /p:DeployTarget=MSDeployPublish /p:MsDeployServiceUrl=https://myserver:8172/msdeploy.axd /p:username=***** /p:password=***** /p:AllowUntrustedCertificate=True /p:DeployIisAppPath=ci /p:MSDeployPublishMethod=WMSVC
Step 4: Configure the Build Runner in TeamCity
Paste the command line parameters you figured out in Step 3 into the Build Runner configuration in TeamCity:
Step 5: Configure build dependencies in TeamCity. This means the integration tests will only run if the unit tests have passed and so on.
Step 6: Write some code.
]]>Based on nearly eighty hours of conversations with fifteen all-time great programmers and computer scientists, the Q&A interviews in Coders at Work provide a multifaceted view into how great programmers learn to program, how they practice their craft, and what they think about the future of programming.
The author, Peter Siebel, asked each of the interviewees “Do you consider yourself a hacker, artist, craftsman or engineer?” It’s a good question and one I forgot about until Uncle Bob wrote an article suggesting a correlation with one’s definition of done. It’s a clever angle but there is more to explore.
On a PDP-8 with only eight instructions and two registers, you’re going to need some ingenuity, maybe even the rare kind of intellect that led a 14 year-old to win eight United States Chess Championships in a row.
It’s just you and your opponent at the board and you’re trying to prove something. — Bobby Fischer
The hacker lives in a world of tight constraints, like Houdini in a straight jacket inside a locked box submerged in icy water. The hacker gets a thrill from this kind of challenge and often relies on a toolbox of algorithms and decompilers to get the job done. The hacker isn’t satisfied with mainstream abstractions for IO, instead the hacker knows exactly what happens when you write a file or open a socket and why you might choose UDP instead of TCP. The hacker is the guy that’s going to push the industry forward with new technology and techniques.
It’s a fine line though, finding loopholes and circumventing the intent of your framework can be a recipe for disaster. Often times, it’s not just you and the machine, it’s you and a team of developers writing software for users that expect their cell phone to stream music from Pandora via Bluetooth to their car stereo while rendering a map of the current location with real-time traffic info. And if all this takes a few extra seconds, your software is casually dismissed as a piece of crap.
The artist believes that software is an expression of one’s self. The process is creative and each solution is subtly unique like a brush stroke on a canvas. An artist believes what they do is special and you could no more teach someone how to be a great programmer than you could teach someone how to write a great song or novel. You’ve either got it or you don’t.
Just like a great band, truly great software comes from a small team with just the right mix of strengths and styles. Too many contributors and it’s a mess. Not enough and it’s too predictable. Boston was a great band, but it was dominated by Tom Schulz, the guitarist, keyboardist, songwriter and producer. He owned every note. It’s not necessarily a bad thing, Boston sold over 31 million albums in the United States. But consider the Beatles, two creative forces pushing and pulling on each other, clashing and meshing to produce a truly dynamic catalog.
If you’re having open heart surgery, you probably want an experienced doctor holding the scalpel. The human body is a complex system and reading a text book just isn’t the same as years of experience.
Programming is difficult. At its core, it is about managing complexity. Computer programs are the most complex things that humans make. Quality is illusive and elusive. – Douglas Crockford
When a fine furniture craftsman looks at a piece of wood, the unique shape and the location of the knots are thoughtfully considered. The craftsman observes how dry or green the wood is, how hard or soft it is and applies a set of best practices honed through years of experience to produce the finest end result.
The craftsman believes that most software problems are the same but knows a solution can’t be stamped out by code monkeys. Physical and financial constraints, subtleties of the problem domain and quirks of a particular framework or language require an experienced and steady hand to produce a quality product. The craftsman believes software development is a highly skilled trade and to become an expert you must start as an apprentice.
An engineer writes the kind of software you count on when you drive a car, fly in a plane or dial 911; Where a failure means more than a delayed shipment or double payment on your online order. The stakes are high and the problems are too big to fit in one person’s head. Problems that only process and frameworks can wrangle. You can’t simply start typing and watch the design of your application emerge from nothing, it requires detailed analysis and thoughtful study.
Which one am I? I have a piece of paper that says Engineer on it, and while I enjoy writing frameworks as much as the next guy, I’m a craftsman. Maybe it’s because I’ve been doing this for a dozen years and I’d like to think my experience is worth more than a piece of paper.
Hey, if you want me to take a dump in a box and mark it guaranteed, I will. I got spare time. But for now, for your customer’s sake, for your daughter’s sake, ya might wanna think about buying a quality product from me. – Tommy Boy
Which one are you?
]]>Metallica eventually succumbed to their own success and tried to crank out albums more accessible to wider audiences. This tension between music with something to say and music that appeals to the masses has been the bane of the industry since the invention of the radio.
If you don’t believe me, go ask 10 people what their favorite Metallica song is. You’re going to hear Enter Sandman unless you work at an auto shop or maybe guitar center in which case you’re probably not reading this article. You’re going to hear Enter Sandman because that song was engineered by producers and marketing types to be played on the same radio stations that host morning shows like Jeff and Jer, yet that song couldn’t be further from the band’s roots.
Likewise, if you ask 10 programmers if they use design patterns, you will undoubtedly hear a resounding yes citing the widely accessible singleton pattern. Why is this the case? It’s because the singleton most closely resembles something that looks familiar when writing procedural code and it’s the easiest pattern to insert into a pile of crap. The problem is that the singleton represents design patterns about as well as Enter Sandman captures the essence of why Metallica was a great band.
Pattern catalogs should probably look more like this:
Design Patterns:
* Not a design pattern
Or at least come with a warning label.
Warning: The Singleton is actually a global variable. Use this pattern as a last resort and be sure to hide the fact you are using it from the rest of your application and your manager.
So how do you know if you’re using the Singleton correctly? I’ll give you a hint, if you have any knowledge of the fact that the object you’re using is a singleton, then you’re not using it correctly. Let’s say you’re writing some caching logic since it’s one of those places where singletons pop up.
Let’s start with a simple interface for finding a user object:
1 2 3 4 |
|
You realize you keep doing the same lookup over and over and adding some caching would really perk up the application. Hooray, for the Singleton!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Well this probably works great in a console application or Windows service, but what if your code is running in a web application, or as a WCF service. What if your Windows service is distributed across multiple machines? Should you care how your application is deployed in your CachingUserRepository
? No, you shouldn’t. That object has one responsibility, and it’s to manage collaboration with the cache. Let’s try this instead:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Maybe your cache should live for the lifetime of a single web request, or maybe a web session, or possibly a distributed cache is right for your scenario. The good news is that caching is one of those solved problems, like data access, so you can just pick your favorite library and plug it in to your application with a simple adapter:
1 2 3 4 5 6 |
|
Let’s say you still want to roll your own cache and you decide that singleton is right scope for you. That’s okay, just recognize that using a singleton is a configuration decision and should not leak out into your application. The only place that you should see that .Instance getter is where you bootstrap your application.
]]>