# A Philosophy of Software Design

### Contents

Author: John Ousterholt
Publisher: Yaknyam Press
ISBN: 978-1-7321022-0-0

## Chapter 1: Introduction (It's All About Complexity)

• Writing software is one of the purest forms of creative activity
• Programs aren't bound by physical constraints
• Doesn't require any special physical skills unlike, say drawing or sculpture
• The only requirement you need to write software is a creative mind and the ability to visualize
• The greatest limitation in building software is our ability to understand that which we create
• Complexity accumulates over time as we add on to a system
• Slows development
• Tools can help with complexity, but cannot eliminate it
• Simplifying where we can allows us to build more capable systems without drowning in complexity
• 2 general approaches to fighting complexity:
• Make code simple and obvious
• Encapsulation: hide the complexity so that we can push it out of our minds when we don't absolutely need to deal with it
• Modular design
• The malleability of software means the design is never fixed
• Software is not like a physical object, which has a build or manufacturing step after design
• However, for a large part of the history of software design, we pretended that software was physical — waterfall model
• Why doesn't waterfall work for software?
• Software is far more complex than any physical system
• We can't visualize software design in a way that makes its implications apparent
• As a result, problems often don't become apparent until after implementation has started
• Arguably, this is a failure in our tools
• Why can't we visualize software?
• Attempting to patch around defective design introduces more complexity that only makes the problem worse
• As a result, most software today is built in a more incremental fashion — Agile
• More properly, we pretend that it's built in an incremental fashion; plenty of organizations still practice waterfall, but dressed with the paraphernalia of agile
• Build a small subset of the software first
• Allows design problems to be fixed early, before we've invested heavily in the design
• Works because software is malleable enough to allow design changes during implementation
• Incremental development means that software is almost never "done"
• Because software is continually redesigned, developers should always be thinking about design when writing code
• If managing complexity is the most important part of design, developers should always be thinking about complexity when writing code
• 2 goals
• Define complexity
• What is it?
• Why does it matter?
• How can we recognize when we have unnecessary complexity?
• Techniques to reduce complexity
• No easy recipe to reduce complexity in all cases
• Present a set of guidelines which allows for the comparison and evaluation of alternative designs

### How To Use This Book

• Book best used in conjunction with code reviews
• Start by recognizing red flags
• While it can be difficult to come up with a design that avoids red flags, the effort is worth it to make the code easier to work with long-term
• Use moderation and discretion — all of these guidelines have exceptions and limits

## Chapter 2: The Nature of Complexity

• What is complexity?
• How can we tell when a system is unnecessarily complex?
• What causes unnecessary complexity?

### Complexity Defined

• Complexity is anything that makes it hard to understand and modify a system
• Difficulty in understanding how a piece of code works
• Difficulty in finding code that needs to be modified
• Difficulty in fixing bugs without introducing new bugs
• Complexity is only valid in relation to a particular programmer attempting to achieve a particular goal
• It's possible for a large system to have low complexity, if it's well designed and properly documented
• It's possible for a small system to have high complexity
• Complexity is determined by activity
• If a system has complicated internal components, but they're hidden behind well-designed interfaces and rarely need to be modified, then the system is not complex
• Complexity can be thought of as {$C = \sum_p c_p t_p$}
• {$C$} → Overall complexity
• {$c_p$} → complexity of each part {$p$}
• {$t_p$} → fraction of time spent working on part {$p$}
• Effectively hiding complexity is almost as good as eliminating it
• The problem is that abstractions leak, and often leak unpredictably — you might think you've effectively hidden complexity, only for that complexity to come roaring back in the form of particularly mysterious bugs
• In other words {$t_p$} is not as predictable as you think it is
• Complexity is more apparent to readers of code than it is to writers

### Symptoms of Complexity

• Change amplification
• Changes to functionality require changes to many often disconnected parts of the code
• How much information does a developer need to know in order to complete a task
• Higher cognitive load → greater chance of bugs
• Measuring complexity by lines of code required for a change misses the impact of cognitive load
• Hypothesis: functional programming has a higher baseline cognitive load than imperative programming, and this goes doubly for code that makes extensive use of higher-order functions
• Unknown Unknowns
• Not obvious what must be modified
• Not obvious what information you need to carry out a task
• Unknown unknowns are the worst form of complexity
• Often unknown unknowns are discovered only when a change introduces bugs

### Causes of Complexity

• Complexity is caused by dependencies and obscurity
• Dependencies
• Any time you have code that cannot be modified in isolation, there exists a dependency
• Dependencies are a fundamental part of software and cannot be eliminated entirely
• However we can simply dependencies and make them obvious
• Obscurity
• Important information is not obvious
• Examples:
• Generic variable names
• Variable names without units
• Inconsistency
• Same variable being used in different ways in different parts of the codebase
• Different variables being used similarly
• Adding dependencies can add obscurity — not obvious that updating code in one location requires updates in another
• Often caused by inadequate documentation
• A good design requires less documentation to overcome its obscurity

### Complexity is Incremental

• Complexity is never caused by a single catastrophic error
• Instead it accumulates over time as small dependencies and obscurities get added
• Need a zero-tolerance approach to unnecessary complexity

### Conclusion

• Complexity is the result of the accumulation of dependencies and obscurities
• Complexity results in more modifications, time and risk required to add a particular piece of functionality to a codebase

## Chapter 3: Working Code Isn't Enough (Strategic vs. Tactical Programming)

• Many organizations prioritize features
• However, in order to have good design, one must take a step back to consider the program as a whole and fix problems before they become ingrained

### Tactical Programming

• Focus on getting something working as fast as possible
• Nearly impossible to produce a good system design
• Short-sighted
• No time spent planning for the future
• Exchange short term expediency for long-term complexity
• Tactical programming is one of the main ways of introducing accidental complexity into a codebase
• Once a team gets into the habit of programming tactically, it can be very difficult to break out
• A programmer who takes tactical programming to an extreme is known as a tactical tornado
• Rush features out quickly
• Are often praised by management
• But in reality, other programmers often have to clean up the mess that they create

### Strategic Programming

• Working code is not enough
• Working code should be the result of a good design
• Requires an investment mindset
• Invest time to improve the design, rather than tacking on a feature
• Investment can take two forms:
• Proactive investment
• Designing for extensibility
• Considering and evaluating alternatives
• Writing documentation
• Reactive investment
• Taking the time to update the design of the software in response to new requirements
• Removing complexity from the code when making a change

### How Much To Invest

• Lots of small investments in design improvement is more effective than a large investment upfront
• 10-20% should be standard
• Investment will pay for itself over time with faster development
• More importantly, well designed code makes it easy to predict how long a feature will take — arguably more important than the raw speed of development

### Startups and Investment

• In a startup environment, there is often a lot of pressure to ship quickly
• However, when a codebase turns to spaghetti, it's often very difficult to fix, even with lots more engineers
• If you have bad code, word will get out and the company will have trouble recruiting the best engineers
• Is this actually true?
• Facebook started out with a "move fast and break things" approach which led to lots of accidental complexity being introduced into the codebase
• This led to an unmaintainable codebase
• New philosophy is "Move fast with stable infrastructure"
• Remains to be seen whether Facebook can clean up its code
• Meanwhile Google and VMWare started out with a much stronger engineering culture
• Proved that success doesn't have to imply rushed engineering
• This section is at best equivocal and at worst undermines Ousterholt's point
• Facebook suffered basically no long-term consequences for its awful codebase
• Facebook's hiring wasn't hampered by its move-fast-and-break-things reputation, and in fact, for a while, Facebook was poaching talent from Google
• And of course, Facebook's experience with building PHP tooling proves with sufficient thrust, pigs fly just fine

### Conclusion

• Design requires continuous investment
• Be consistent in prioritizing strategic programming
• The longer design improvements are put off, the more painful they are to implement

## Chapter 4: Modules Should Be Deep

### Modular Design

• Decompose software so that it can be treated as a system of relatively independent modules
• Ideally, the complexity of a system can be bounded by the complexity of the most complex module
• In practice, however, modules have to interact with one another, which adds further complexity
• The goal of modular design is to minimize these interactions
• Each module has two parts:
• Interface: everything a developer must know to use the module
• Implementation: the code that dictates how the module carries out the operations that its interface specifies
• The interface to a module should be much simpler than its implementation
• Reduces complexity of the parts of the system that interact with this module
• Allows more freedom to change implementation details without affecting the interface

### What's An Interface

• Interface contains two kinds of information
• Formal information:
• Specified in code
• Can be checked by the programming language
• Example:
• Names of functions
• (In strongly typed languages) Number and types of function parameters
• Informal information
• Information that cannot be specified in a way that the programming language can check
• Can only be described via names, comments and documentation
• Example: The fact that a function named deleteFile deletes a file is an example of informal information, since the behavior cannot be checked by the programming language
• Clearly specified interfaces help eliminate unknown unknowns

### Abstractions

• An abstraction is a simplified fiew of an entity that omits unimportant detail
• The interface of a module is our abstraction for that module
• A good abstraction omits all unimportant detail without omitting any important detail
• An abstraction that includes unimportant detail is complex
• An abstraction that does not include important detail is obscure
• All complex systems that we interact with have abstractions, even if they're not software
• Don't need to know how the engine of a car works — wheel and pedals form an abstraction for car control
• Don't need to know the internals of an oven to use it — oven controls provide an abstraction for setting the temperature, etc
• One thing I think this book and many other software design books neglect to mention is that there's a tradeoff between abstraction and performance
• I don't need to understand how my car's engine works, but Lewis Hamilton sure does
• Similarly, many professional cooks take the time to understand how their oven works and take measures to calibrate it in order to ensure consistent results

### Deep Modules

• The best modules provide powerful functionality with simple interfaces
• These are deep modules
• In contrast, shallow modules have relatively complex interfaces for relatively little functionality
• Depth can thus be defined as the ratio {$D = C_{\mathit{Im}} / C_{\mathit{If}}$}
• {$C_{\mathit{Im}}$}: Implementation complexity
• {$C_{\mathit{If}}$}: Interface complexity
• A good example of a deep module is the system calls for file I/O in Unix:
• 5 basic system calls
• int open(const char* path, int flags, mode_t permissions)
• ssize_t read(int fd, const void* buffer, size_t count)
• ssize_t write(int fd, const void* buffer, size_t count)
• off_t lseek(int fd, off_t offset, int referencePosition)
• int close(int fd)
• Did he forget flush?
• These 5 system calls hide the complexity of hundreds of thousands of lines of filesystem code
• Another example of a deep module is the garbage collector in memory managed languages
• Most code doesn't even need to know that the garbage collector exists, so the interface complexity is zero or almost zero
• Complex implementation that works almost entirely behind the scenes

### Shallow Modules

• Relative complex interface in comparison to amount of functionality provided
• Example: void addNullValueForAttribute(String attribute)
• This method signature hides no complexity
• More complex to invoke this method than it is to directly add a null to the underlying data

### Classitis

• Conventional wisdom is that classes and functions should be small, not deep
• Results in a large number of interacting classes and functions
• While each individual class is simple the added complexity from their interaction often overwhelms the complexity savings in having smaller classes
• Leads to a more verbose programming style
• Ousterholt either doesn't understand or doesn't mention the reason behind the current trend towards many smaller interacting classes and functions: unit testing
• It's much easier to write unit tests around small classes, since you can create mocks for everything the class interacts with, making it easy to pass fake data to the class
• It's also a lot easier to Goodhart unit test coverage numbers by breaking up large modules into smaller modules and then writing unit tests on the smaller modules
• Maybe Ousterholt doesn't realize that out here, in the real world, we don't have fancy type-systems and the only way to have any assurance that your code works at all is to smother it in unit tests

### Example: Java and Unix I/O

• Java seems to have a culture of classitis
• In order to read a file in Java, you have wrap the three following classes one inside the next:
• FileInputStream
• BufferedInputStream
• ObjectInputStream
• The FileInputStream provides only rudimentary I/O
• BufferedInputStream adds buffering
• ObjectInputStream gets you the ability that you actually want: the ability to read and write serialized objects
• Requiring the developer to request buffering manually is error-prone
• Interfaces should be designed to make the comman case as simple as possible
• The developer almost always wants a buffered stream, so FileInputStream should be buffered by default, with a non-default constructor that returns an unbuffered stream if the developer specifically asks for one
• In contrast, the Unix file interface is much simpler
• Optimizes for the common case of sequential access
• Random access is possible with lseek, but the developer doesn't have to concern themselves with that function if they don't need it
• This section seems really unfair to Java
• Java has abstractions that handle the common case: FileReader, for example
• Moreover, the Java example is doing way more than the Unix file interface -- the Unix filesytem API just gets you bytes, whereas here you're reading and deserializing objects
• It's like comparing a pocketknife to a CNC mill and saying the pocketknife is superior because it as a simpler interface

### Conclusion

• Hide complexity by separating interface from implemenation
• Make modules deep

## Chapter 5: Information Hiding and Leakage

### Information Hiding

• Modules should encapsulate information
• Hidden information usually captures implementation decisions
• How to efficiently store and access data in a data structure
• How to use the file system
• Network protocol details
• Information hiding reduces complexity in two ways:
• Reduces cognitive load on users of the module
• Makes it easier to evolve the system by allowing changes to the implementation of the module to take place independently of the module's interface
• Declaring variables private is not the same as information hiding -- exposing internal information via getters and setters is not really any different than having a public variable
• Information can be partially hidden
• Using separate methods for "simple" and "advanced" usage hides the advanced usage from those who just want to use the module with its default settings

### Information Leakage

• Information leakage is the opposite of information hiding
• Occurs when a design decision is reflected in multiple locations
• Creates a dependency between modules -- any change to the design decision will require changes in multipl places
• Information in a module's interface is leaked by definition -- simpler interfaces are better at information hiding
• However, information can leak even if it's not in the interface
• Information leakage is an important red flag
• When you encounter unexpected information leakage, you should figure out how to consolidate modules so that a change will only affect a single module
• Merge classes
• Create a new class or interface that encapsulates the information

### Temporal Decomposition

• A common cause of information leakage is temporal decomposition
• Structure of the system corresponds to the order in which events occur
• Example: process in which the system has to read a file, transform the data, and then write some output
• Tempting to have three classes: reader, modifier, writer
• However, both the reader and the writer will need to have knowledge of the file format
• A better design is to combine the reader and the writer into a single IO class, which is responsible for transforming the file on disk into a set of data structures for the modifier class
• When designing modules, focus on the information required to implement each operation, rather than the order of operations
• Operations that require the same information should be together, even if they're not called in order

### Example: HTTP Server

• Students were asked to implement a simple HTTP server
• Stateless protocol that consists of simple requests and responses

### Example: Too Many Classes

• The most common mistake made by students when implementing the HTTP server was having too many classes
• Example: separate classes for reading and parsing requests
• It doesn't make sense to read a request without parsing it
• Callers would need to invoke the two classes in the correct order
• A single Reader class that could both read and parse the request would have been a better choice
• Too many small classes can be just as detrimental to information hiding as too few large classes

### Example: HTTP parameter handling

• Most student applications correctly anticipated that the calling code should not need to know whether a parameter was specified in a header, a URL or a body
• Also correctly realized that parameters should be transparently URL encoded and decoded
• However the interface to the parameters was often too shallow
• Parameters were returned as a single Map object, rather than allowing the calling code to select individual parameters by name

### Example: Defaults in HTTP Responses

• The most common mistake in students' HTTP response code was inadequate defaults
• Callers should not have to worry about specifying the HTTP version — the response should use the same HTTP version as the request automatically
• Good defaults are another form of information hiding
• Classes should "do the right thing" without having to be told to do so explicitly

### Information Hiding Within A Class

• We can extend the principle of information hiding to the internal structure of a class
• Use each instance variable in only a few places
• Make private methods encapsulate information just like public ones

### Taking it too far

• Don't hid information that the consumers of a class will need
• Minimize the amount of information a class exposes, but keep in mind that every class will need to expose some information

### Conclusion

• Information hiding is closely related to deep modules
• Modules that hide a lot of information are deep
• Try to not to be influenced by the order of operations when decomposing a design into modules