This is a blogified version of my 18-minute PyJamas talk, Refactor refactoring—How changing your views on refactoring can make your job more satisfying.
You can watch it here:
Refactoring is a topic I have been interested in most of my programming career but struggled to grasp. Early in my journey, I watched several talks by brilliant developers that impressed and intimidated me with their ability to test, manipulate code, and refactor it into an elegant solution. Additionally, most of the material on the subject was too cerebral for me to absorb.
Thankfully, I’ve recently realized there’s much misunderstanding about refactoring, and I want to clarify it. Being able to confidently refactor your code will multiply your effectiveness and bring you much more satisfaction.
What is refactoring?
Refactoring is improving code without changing its behavior.
This definition is one that most people don’t seem to know—at least in my experience.
When the teams I have been a part of talked about refactoring, they meant somebody needed to fix something. Usually, that meant project management would pull a developer or two away from creating new functionality to clean up code, reduce technical debt, or introduce a new dependency. Then, once they finished, there would be the additional pain of bringing their code back into the main line where we were all working.
That’s not refactoring.
Refactoring is a tool that improves the design of your code and how it communicates.
Why refactor?
There are two primary reasons why we refactor to:
- make our code more understandable and
- support new directions
Make code understandable
One of the guiding principles Guido Van Rossen used to shepherd Python was “code is red far more than is written.” With that insight, we need to write our code in a way that drives understanding of what it does.
For example, look at a piece of code.
@app.get("/prices")
async def compute_price(
type: str,
age: Optional[int] = None,
date: Optional[datetime.date] = None,
):
result = await database.fetch_one(
select(base_price_table.c.cost)
.where(base_price_table.c.type == type),
)
if age and age < 6:
return {"cost": 0}
else:
if type != 'night':
holidays = await database.fetch_all(select(holidays_table))
is_holiday = False
reduction = 0
for row in holidays:
if date:
if date == row.holiday:
is_holiday = True
if not is_holiday and date and date.weekday() == 0:
reduction = 35
# TODO apply reduction for others
if age and age < 15:
return {"cost": math.ceil(result.cost * .7)}
else:
if not age:
cost = result.cost * (1 - reduction / 100)
return {"cost": math.ceil(cost)}
else:
if age > 64:
cost = result.cost * .75 * (1 - reduction / 100)
return {"cost": math.ceil(cost)}
else:
cost = result.cost * (1 - reduction / 100)
return {"cost": math.ceil(cost)}
else:
if age and age >= 6:
if age and age > 64:
return {"cost": math.ceil(result.cost * .4)}
else:
return result
else:
return {"cost": 0}
This is an API endpoint that calculates the price of a lift ticket for a ski lodge. This example reminds me that computers will execute code no matter how confusing it is to humans. We need to write code in such a way that somebody new to the code base should understand:
- not only what the code does but
- how many variations there are in its behavior and
- how the behavior varies
How would you feel if you had to maintain this code? Do you feel like you understand this code? What would it take if you needed to add a new kind of lift ticket? How would you approach that task?
Let’s compare that to a refactored version that does the same thing.
async def compute_price(
kind: str,
age: int = AGE_MISSING,
date: datetime.date = DATE_MISSING,
):
base_price = await get_base_price(kind)
if kind == 'night':
if age < 6:
return 0
if 64 < age:
return math.ceil(base_price * .4)
return base_price
else:
if age < 6:
return 0
if age < 15:
return math.ceil(base_price * .7)
price_with_date_discount = (
base_price * await date_discount(date)
)
if 64 < age:
return math.ceil(price_with_date_discount * 0.75)
else:
return math.ceil(price_with_date_discount)
This code makes it clear that there are only two kinds of tickets, and the age of the person is the primary way the price varies.
The prior code is not bad; we’ve all written code like this. This code reliably works. But the second example is so much more understandable.
This clarity is one thing we should strive to achieve when we refactor.
Support new directions
The second reason we refactor is to have our projects support new directions.
Distinguished computer science professor Terry Winograd once wrote:
The main activity of programming is not the origination of new independent programs but the integration, modification, and explanation of existing ones.
Essentially what he’s saying is that the majority of the time we program, we’re not writing new code. Instead, we are modifying what’s already there. We are explaining how it works, whether through documentation or meetings, or we’re trying to integrate things.
He goes on to say:
it's extremely difficult to modify existing programs... the cost of which is almost never measured but in at least one case exceeded the original development cost by a factor of 100.
That’s staggering. It is expensive to maintain code. Why is that?
One reason is that we are under pressure to deliver new features as often as possible. When we approach our project with a new feature, we look for a code closely related to what we want it to do. We then add more code in that space to support what we need without stopping to think about the overall design of our project and how to improve it.
This practice increases the technical debt of our project. Technical debt accumulates as we go along, making it more expensive to maintain our projects. Eventually, it gets to the point where, as I said before, somebody needs to go in and fix it. Teams go through that pain to get their code back to a place where they can use it again.
But thankfully, unlike these bricklayers, it is easier for us to update code that’s already working. Brittle and stagnant code can become pliable and meet new requirements when refactored.
When should we refactor?
There are two primary times we should refactor.
Before we add new functionality
Kent Beck once said:
for each desired change, make the change easy (warning: this may be hard), then make the easy change
— Kent Beck 🌻 (@KentBeck) September 25, 2012
Approaching your project with a new requirement is the perfect time to refactor. There is a tangible reason why your code needs to change in a specific way.
You can look at your code and say, “if we knew we would to need to do this in the past, how would we have written our code?” Then you can refactor your code to achieve that.
After achieving something
The Test Driven Development community has a phrase, “red, green, refactor.”
The red part comes where a developer creates a test that runs code they want to exist. Then, they run the test, and it fails, traditionally showing a red mark.
Then they write the least amount of code to change the result, and eventually, the test passes, traditionally showing green.
Now that there’s a working layer of tests, the developer refactors their code to make it more elegant, understandable, or readable. They have confidence that there will be no change in behavior because the test will fail.
This practice is excellent because it separates two things many of us try to do simultaneously: We burden ourselves with writing elegant code to solve a problem. The red, green, refactor approach allows us to write code to solve the problem first. Then make it clear.
Any time you achieve something, look at your code and ask, “Is this code so clear that anyone can understand—it including me in 14 months?” Then refactor your code to make it more explicit and understandable.
How to refactor
Everything above has been helpful in my journey of learning to refactor, but the thing that really made it satisfying was finding a framework that helped me refactor.
This framework is called the Flocking Rules, and two developers created it in the Ruby community, Sandy Metz and Katrina Owen.
I appreciate these women so much because they spent years developing a way to teach refactoring to people who didn’t know design patterns.
Most refactoring advice expects you to know where you want to end up. But what if you don’t know that? Or what if you are implementing a pattern that doesn’t fit your situation well? This framework allows good design patterns to emerge from your needs.
Additionally, this framework has a fractal nature that allows you to sit in front of any code and improve it. And it enables your code to continue working, even if you were pulled away during a refactoring session.
Unfortunately, it’s much easier to understand the framework when watching someone perform it. Nonetheless, here is the framework:
The Flocking Rules
- Identify what to work on
- Find the things that are most alike
- Identify the smallest difference between them
- Make them identical. (Run tests after each step)
- Create a component (variable, parameter, function, or class) that will resolve the variations.
- Implement code to supply one variation.
- Replace one of the differences with a call to the component.
- Delete any unused code
- Repeat until you eliminate the differences
In summary
By continuously improving the design of code, we make it easier and easier to work with. This is in sharp contrast to what typically happens: little refactoring and a great deal of attention paid to expediently adding new features. If you get into the hygienic habit of refactoring continuously, you'll find that it is easier to extend and maintain code."
Practice the flocking rules with several refactoring katas, and then apply your learning to your daily work. You will find your work more satisfying.
Additionally, I’ve created the Python Refactoring Toolkit, an incredible resource to help you get a jump start on your refactoring journey.