Clean Architecture Applied – The document generator

Now that I have time information about my workdays, the next step is to generate the invoice. There I face another problem. I want to generate a PDF, but I need an easy format to test against. Let me first take inventory of what I have:

  • I have an interface IDocumentGenerator that takes the workdays information
  • I found the DinkToPdf library that converts HTML to PDF

I learned that when I generate the PDF with DinkToPdf, the PDF has metadata included such as the date and time the PDF was created. So, this foils my plan to compare the created byte arrays. It’s a good thing that I have smart friends that remind me of the decorator pattern and pointed out that HTML is a lot easier to check.

I use a HTML templating engine to generate the HTML that represents an invoice. I found Fluid to be the most versatile library to generate the HTML I need.

To create the PDF, all I have to do is create a class that takes the HTML document generator, gets its output and let DinkToPdf do its work.

public interface IDocumentGenerator
{
    (string extension, byte[] document) Generate(Invoice invoice, Customer customer, BillableItem[] billableItems);
}

public class FluidHtmlDocumentGenerator : IDocumentGenerator
{
    public (string extension, byte[] document) Generate(Invoice invoice, Customer customer, BillableItem[] items)
    {
        using (var htmlStream = Assembly.GetExecutingAssembly().GetManifestResourceStream(TemplateLocation))
        using (var htmlReader = new StreamReader(htmlStream))
        {
            var htmlTemplate = htmlReader.ReadToEnd();
            FluidTemplate.TryParse(htmlTemplate, out var template);
            var invoiceDocument = GenerateHtml(customer, items, invoice, template);
            return ("html", invoiceDocument);
        }
    }
}

public class DinkToPdfDocumentGenerator : IDocumentGenerator
{
    public DinkToPdfDocumentGenerator(IDocumentGenerator htmlGenerator)
    {
        _htmlGenerator = htmlGenerator;
    }

    public (string extension, byte[] document) Generate(Invoice invoice, Customer customer, BillableItem[] billableItems)
    {
        _htmlGenerator.Generate(invoice, customer, billableItems);
        // use DinkToPdf to generate PDF
    }
}

When testing the code, I just use the FluidHtmlDocumentGenerator, convert the byte array into text and compare that to the output I expect. I have one test that generates a PDF so I can check that the DinkToPdfDocumentGenerator generates the correct output. All I do there is compare the length of the array returned. I can change the test slightly so it writes the output to a file. This allows me to see that I generate a PDF that looks like the rendered HTML. This is a manual process since I haven’t found a way to automate this. Which is just a small inconvenience as I verify that my layout is correct with the HTML tests.

Clean Architecture talks about being independent of libraries. Hiding these libraries behind interfaces allows me to implement other classes with different libraries. That is how I can easily change HTML or PDF generation libraries without my InvoiceGenerator needing any change at all. It’s a very flexible and extendable structure.

For example, in most of my tests, my setup of my document generator looks like this: new FluidHtmlDocumentGenerator(). In my actual production code, this changes to: new DinkToPdfDocumentGenerator(new FluidHtmlDocumentGenerator()).

The next infrastructure part is the customer repository, which I will write about next week.

Advertisements

Clean Architecture Applied – The work repository

Now that I have the general steps to get to an invoice, I can start working on the individual components that reside in the infrastructure layer; starting with the work repository.

The domain logic would need time data, which reminds me of the Repository pattern. So I create an interface IWorkRepository that gets work days between two dates: Task<WorkDay[]> GetWorkDays(DateTime start, DateTime end).

[DebuggerDisplay("{DebuggerDisplay}")]
public class WorkDay
{
    public WorkDay(string customerTag, DateTime day, int secondsWorked, bool billable)
    {
         Day = day;
         TimeWorked = TimeSpan.FromSeconds(secondsWorked);
         Billable = billable;
         CustomerTag = customerTag;
    }
    public DateTime Day { get; }
    public TimeSpan TimeWorked { get; }
    public bool Billable { get; }
    public string CustomerTag { get; }
    private string DebuggerDisplay => $"{CustomerTag} - {Day:d}: {TimeWorked}h ";
}
The private DebuggerDisplay field is a little workaround to allow easy string manipulation (especially dates) to get nice debug information.

This would allow me to use a fake work repository when testing to start verifying other behaviour. However, I learned about Flurl, a nice library that allows me to build HTTP requests easily, but also allows me to use it for testing. The next thing I did was build the TimeCampWorkRepository, referencing the actual TimeCamp API URL. Flurl allows me to use the TimeCampWorkRepository in the tests so I can verify that the JSON I retrieve from the API is correctly deserialised and mapped to the WorkDay structure. I got that JSON from actually calling the API and saving the response. This allows me to verify my own code to the point right where the actual API call will happen and how the returned information is handled.

This also ties into the Clean Architecture principle that the database, or more general data store, should be pushed to the infrastructure layer. I put this code in its own project to clearly separate this concern. Putting an interface between the actual call to the TimeCamp API and the control flow of the invoice generator allows me to add other data sources if I should ever change the app I use to track my time.

Next up is the document generator.

Clean Architecture Applied – Applying the principles

Before I dive into the technical details, let me give you a little warning:

This solution is over-engineered, I could write this fairly straight forward and be done in a few hours. The experiment here is to implement this app using the guidelines from Clean Architecture. So expect too many projects, a bit of plumbing code and a lot of interfaces.
Also, I cannot make the full code available as this code contains secrets (such as my customers). It would also allow anybody to generate invoices in my company's name. I will not be making this code easily available. Ever.
I thought about making an example repository to share with the world. I do not think I would give that repository the same attention I give this code, it would not represent the quality that I put into my work.
If you or your company need help with improving code quality, I offer consultancy services and training. Head over to More Than Code for more information.

The problem domain is not a difficult problem, so let’s check out how I structured the code to get it all working.

The first step is to get the data from the TimeCamp API, which I need to transform into days worked so I can sum them and then put them into a format that I can save as a PDF-file.

I started by creating a solution with a project InvoiceGenerator.Core. I can also name it .Business, .Domain or .Rules, pick something that clearly communicates what is in the project. At the same time, I create an InvoiceGenerator.Tests project, because all important parts need to be tested well.

Because this is not a large project, I only create one test project. I believe in testing use cases, such as “generate HTML invoice from TimeCamp data”, that test a lot my own code (I’ll get back to that in a minute). This allows me to change the inner working of a module, without losing any functionality. My tests will guard the functionality. That is why my tests will always be very specific. This test knows about the TimeCamp API and that the invoice will be in the HTML-format.

What I mean by “my own code” is all code that I write. It encompasses everything from domain logic, internal plumbing and all code right up to the point it goes out of my hands, such as to a database, the network or the hard drive.

Domain and Use Case

Invoice generator

The first test then sounds easy: generate an HTML invoice from TimeCamp data. It all starts in the core business logic. For that I create a class InvoiceGenerator in the Core project. This will orchestrate the interactions between all the little components. It determines the month the invoice should be generated for (the previous one). Then it gets the data about the days I worked. Finally, it generates invoices for all customers I worked for.

public async Task<Invoice[]> Generate()
{
  var startOfPreviousMonth = _dateProvider.Now.StartOfPreviousMonth();
  var endOfPreviousMonth = startOfPreviousMonth.AddMonths(1).AddDays(-1);
  var workDays = await _workHoursRepository.GetWorkDays(startOfPreviousMonth, endOfPreviousMonth);
  var billableWorkDays = workDays.Where(day => day.Billable);
  var invoices = new List<Invoice>();
  var previousInvoiceFileName = _invoiceReader.GetPreviousInvoice();
  var description = _dateProvider.Now.ToString("MMMM").ToLower();
  foreach (var daysForCustomer in billableWorkDays.GroupBy(x => x.CustomerTag))
  {
    var customerTag = daysForCustomer.Key;
    var customer = GetCustomer(customerTag);
    var billableItems = GetBillableItems(customer, daysForCustomer, startOfPreviousMonth);
    var invoice = GeneratedInvoice(previousInvoiceFileName, customer, description, billableItems);
    previousInvoiceFileName = invoice.FileName;
    invoices.Add(invoice);
  }
  return invoices.ToArray();
}

The _dateProvider is there to make injecting a fake date a lot easier. It’s quite difficult, read impossible, to change the DateTime.Now date. Which makes testing different months a lot easier.

The previous invoice is used to calculate the new invoice number, this ensures that I can keep numbers nice and sequential.

Adding immutable objects

Dealing with state is complex. Most bugs are introduced when unexpected data influences the flow of code. That is why I made most objects that pass through boundaries immutable. Objects from the InvoiceGenerator back to the console (which will become my UI, I will go into more detail in a future post on this aspect) or from the DocumentGenerator back to the InvoiceGenerator are being created by the originating code (InvoiceGenerator, DocumentGenerator).

public class Invoice
{
  private readonly string _description;
  private string _extension;
  private readonly int _id;
  private Invoice(int id, DateTime invoiceDate, string description, byte[] document)
  {
    _id = id;
    _description = description;
    InvoiceDate = invoiceDate;
    Document = document;
  }

  public byte[] Document { get; private set; }
  public DateTime InvoiceDate { get; }
  public DateTime ExpiryDate => InvoiceDate.AddMonths(1);
  public string Number => $"{InvoiceDate:yyyy}{_id:0000}";
  public string FileName => $"{Number}-invoice-{_description}.{_extension}";
  public static Invoice FromPrevious(DateTime invoiceDate, Customer customer, string description, byte[] document)
  {
    var id = GetId(invoiceDate);
    return new Invoice(id, invoiceDate, description, document);
  }
}

This way the document or invoice can’t be updated after it’s created. Not by my or any other code. I make it easy on myself by creating a static factory method that can take care of creating an object. I could just use the constructor, but I think the method is a bit cleaner to read.

The principles

The core domain consists of the general flow of the application. Getting workdays, grouping them per customer, transforming them to billable records and filling that data into invoice templates.

It then hands control to several plugins (such as a IWorkRepository or a DocumentGenerator). This is part of the layer approach. It also allows me to experiment with different ways of getting the workdays and different template engines to generate an invoice. It also means that I can easily test different components.

I had already thought about how I would tackle each issue, but let’s go through it step by step. Starting with the work repository.

Clean Architecture Applied – Intro

To make my life easier, I automated the generation of the invoices that I send as an independent contractor. This gave me the perfect excuse to build something that follows the rules set out by Uncle Bobs Clean Architecture. At least how I interpret them. Use this blog post as a source of inspiration on how to do it… Or a cautionary tale, however you see fit.

What is this Clean Architecture?

Let me start with defining what Clean Architecture is. It is a set of guidelines that will help me structure a system so it becomes a collection of loosely coupled components with the most stable parts, hopefully the domain, at its core. The further I move from the core, the more volatile the components become. Where volatile means “changing a lot” and not “behaving in unexpected ways”.

Just as the SOLID principles are there to provide guidelines to write maintainable code, so is Clean Architecture a set of guidelines to write easily maintainable and extendable software. As these are guidelines, I can follow or abandon them as I see fit. One of the founding rules for me, is that pragmatism takes priority over blindly following rules. The latter leads to religiously following a doctrine, which in itself is a dangerous practice. Always think for yourself and check that the pros outweigh the cons.

Source: http://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html

The image above resembles the Ports and Adapters and the Onion Architecture. That’s because Clean Architecture uses a lot of the same principles. It borrows heavily from Ports and Adapters for interoperability and how different components work together. It also uses the layered approach that can be found in the Onion Architecture. I think it kind of uses the best of those two systems and sprinkles a bit of best practices and pragmatism on top.

For the sake of the experiment, lets apply these guidelines. I will not go further into the specifics of Clean Architecture. Those can be found in the Clean Architecture book and the many articles that have already been written.

The problem details

Before I dive into the code, I will describe the problem I’m trying to solve. I want to automate making invoices, so that every month I have to do minimal work to get an invoice to send to my customers.

The data comes from my time keeping app TimeCamp. They have a very nice service which includes an API so I can easily access my time data. From that data I can calculate how many days I’ve worked for which customer.

With that information, I can then generate an invoice in PDF format with all information about my company, my client and the invoice lines. This way I don’t need to spend half an hour to an hour each month to fill out an Excel form to get an invoice. This might sound like a trivial time increase, but the experience from building this is equally valuable.

I encourage everybody reading this to experiment in their own time with a personal project and use techniques or technology you’re not familiar with. They’re awesome learning experiences and can be great poster projects when you want to advance your career.

Stay tuned because next week I will start detailing the architecture and the code of the application.

Different testing approaches

A while ago I attended a lecture of a programmer who explained the differences between the London and Detroit school
(also called Chicago school) of test driven design. I found his explanation kind of pointless.

The London school focuses on mock driven testing: dummy/stub/spy/mock out all the dependencies and injections. It focuses on interaction with other components. The Detroit school concentrates on the state of the system, which allows me to verify that the system correctly updates the state of the application. Source.

public class MyClass
{
  public MyClass(IRepository repository, ICalculator calculator) 
  {
    // save repository and calculator in fields
  }
  public void DoSomething(int id, int x, int y)
  {
    // update text in the database
    var sum = _calculator.Sum(x, y);
    _repository.Update(id, sum);
  }
}

public class MyClassLondonTests
{
  [Test]
  public void DoSomethingTest()
  {
    var mockRepo = new Mock<IRepository>();
    var mockCalc = new Mock<ICalculator>().Setup(x => x.Sum(2, 3)).Returns(5);
    new MyClass(mockRepo.Object, mockCalc.Object).DoSomething(1, 2, 3);
    mockRepo.Verify(x => x.Update(1, 5), Times.Once);
    mockCalc.Verify(x => x.Sum(2, 3), Times.Once);
  }
}

public class MyClassDetroitTests
{
  [Test]
  public void DoSomethingTest()
  {
    // made a fake in memory repository implementing the IRepository interface
    var fakeRepo = new InMemoryRepository();
    // my own calculator
    var calc = new Calculator();
    new MyClass(fakeRepo, calc).DoSomething(1, 2, 3);
    Assert.AreEqual(5, fakeRepo.Calculation.First(x => x.Id == 1).Value);
  }
}

This example is what I understand the London and Detroit schools represent. And I think it’s counterproductive to say one is better than the other. I think we should throw the schools out the window.

When I build a system, I don’t care that my system calls the Update function or the Sum function, in what order and how many times. I want to know that my use case does what it needs to do. In this case, and I think in most cases, checking my in-memory repository FakeRepository does that very easily.

The London school would just slow me down with unnecessary setup and verifying that I called all the correct functions with the correct inputs. It has an additional downside that is not to be underestimated: I want to be able to easily refactor my code. When I move the summation logic into MyClass, I would need to update a lot of the London school test setup, while with the Detroit setup, I would just need to remove the creation of the Calculator and injection into the MyClass constructor.

The Detroit school has the additional advantage that I use the Calculator class, I test that this class works in this use-case scenario. While there are programmers who like a unit test to just test the code in the MyClass and use an integration test to verify that the Calculator class works with the MyClass code, I find that a too fine-grained approach to testing.

While I sometimes write tests to verify specific parts of the code, I remove those later when the unit test, which tests the whole scenario, covers all the code in this scenario. In those small tests, I can make private functions public while I test them. When they work how I like, I make them private again and remove the temporary test.

Just to be clear, a scenario is a specific path through the code. So I can have a scenario “save sum to database” and another “database unavailable” to verify that the code keeps working even if something goes wrong.

The previous parts of this post described that I generally dislike using mocks and I advocate for using all the real parts. But that isn’t always convenient for tests. For example, getting a database in the correct state can be a time-consuming part of testing or simulating that a web-service call failure is handled correctly. For those parts, creating a mock/stub/fake/etc can be very time preserving and easy to set-up.

My general rule is: use all code that can run in-memory directly (think my own code, colleagues code, libraries that are used, etc), while code that requires access to external resources (think database, the internet, web-services, operating system information or hard drive access) should be replaced with a mock or a fake. I verify the result against whatever is most convenient.

Recently, I created mock-ups of HTML I wanted to generate, then generated the HTML using a fake data source, but keeping all the code I wrote and the library to generate the HTML from a template. Finally I compared the output with my mock-up. Afterwards, I refactored a lot of the internals of the code to be more efficient, I only needed to edit the test once or twice because I updated either the constructor or the signature of the method.

In this recent example I used both techniques. That’s the point I’m trying to make. I don’t care if I use some school of testing, I care about working code that is easy to understand. It’s not London or Detroit style of test driven design, both techniques are valuable. When calculating or generating things, London style value checking works very well. Detroit style is great for checking that external resources are called with the correct parameters.

So forget the terms that divide us and lets find joy in creating working software with all the tools at our disposal.

P.S. I’m opening up the comment section as an experiment. Let’s see if somebody has anything to say.

Static should be used sparingly

New year, new blog posts! Lets start with a problem from work. Let me present you with the problem code I had to analyse.

Once or twice a month, the name “Ken” got written to the database and another name was written to the log. This is a reproduction from the ASP.NET web call, actual code was a lot more complex.

private static string Name;
public static void Execute() 
{
  Name = Cache.GetNameFromSession();
  var description = $"I'm using the name {Name}";
  SaveToDatabase(description);
  Logger.Info($"Saving Name {Name} to the database");
}

If you figured out what went wrong, congratulations. I had to look an embarrassing amount of time to spot the static in the field description. After I noticed that, I recognised the timing problem that can occur.

Due to the static nature of the field, it is shared between multiple requests. The first request sets the Name to “Ken”, which creates the description “I’m using the name Ken” and saves that to the database. When it’s time to log the name, the second request has updated it to “Sophie” which is what will get logged.

Fortunately, the solution is quite simple: make the field non-static so it’s bound to the instance, then each request will have their own instance. The method itself is static and I couldn’t change that, so I cannot make the field non-static. The next solution is to call the cache where it needs to be called. A local variable is also not shared between multiple requests, so the problem doesn’t occur anymore.

public static void Execute() 
{
  var name = Cache.GetNameFromSession();
  var description = $"I'm using the name {name}";
  SaveToDatabase(description);
  Logger.Info($"Saving Name {name} to the database");
}

Let this be a lesson for all of us: use static sparingly. It’s great for a factory method such as Person.Create(name, age), but it’s the cause of a lot of subtle bugs when used in concurrent environments such as a web server.