Subscribe to mailing list

Get notified when we have new updates or new posts!

Subscribe Unicorn Data Science cover image
jen@unicornds.org profile image jen@unicornds.org

Why Data Scientists Should Try Advent of Code

Advent of Code (AoC) is a daily coding challenge that happens in December. It's probably the most joyful and effective way to level up your coding skill!

Why Data Scientists Should Try Advent of Code
Photo by Anna Zhynhel / Unsplash

We are just a few days away from December. And when midnight December 1st EST arrives, another round of the annual tradition of Advent of Code will begin anew. I'm beyond thrilled!

This will be my second year taking on the Advent of Code (AoC) challenges, and my journey into this world was quite fortuitous.

Last year around this time, I was attending the Recurse Center (which is another thing Data Scientists should consider; more on that in a future post!). As December approached, I started hearing murmurs about "AoC". Fellow attendees shared their excitement and strategies for tackling the upcoming challenges. When December 1st arrived, the community erupted with animated discussions about each day's problems and solutions.

Coming from Data Science, I had never encountered "AoC" before (though I could at least deduce from context it wasn't about the esteemed congress member). It took me a few days to work up the courage, but I finally learned from fellow Recursers that it's an annual holiday-themed programming challenge: each day brings a new problem that you can solve in any programming language you choose.

The date I properly discovered AoC was Dec. 4th, 2023. And what followed was two months (yes, beyond December!) of complete obsession with coding challenges. I was hooked. Some days, I could barely peel myself away from the computer. After completing each day's challenge, I found myself diving into problems from previous AoC years. At some point, several Recursers pointed me toward Project Euler, a collection of mathematically-focused programming challenges. Naturally, I jumped into those too.

For both AoC and Project Euler, I used Python. While I'd been coding in Python for years - building everything from data analysis pipelines to production machine learning systems - these two months of intense problem-solving addiction taught me so much. The experience was like a high-intensity, maximum-joy training camp for Python skills. Here's what I learned along the way.

ABT: Always Be Testing

For my very first few AoC problems, I read the questions and immediately dived in. I wrote up the code, got an answer, and submitted it. If it's wrong, I tried to find the problem and tried again, hoping for the best.

But AoC problems involve many little steps. Very quickly I was caught up with so many bugs and edge cases that submitting an answer felt more like a shot in the dark than a confident attempt.

So by day 3, I started to have a snippet of code that became a recurring section in my solution each day:

def mini_test():
    filename = "input-test.txt"
    assert get_total(filename) == 4361

    filename = "input-test2.txt"
    assert get_total(filename) == 7 + 592

The simple practice of having sanity tests was a big quality-of-life improvement. I've always advocated for writing tests. But the habit of constantly testing turned out to be valuable for hobby coding challenges too.

Performance and Optimization

As data scientists, our code might have the reputation of not being the most performant. There's a reason - we're typically not tasked with building systems that need sub-millisecond responses. If our big analysis takes a few minutes or even hours to churn through, we can always grab a coffee or, as I had done in grad school, let it run overnight.

But Advent of Code will not let this complacency slide. Each problem comes in two parts, where the second part often tests your solution's scalability.

I learned this lesson the hard way on 2023 Day 5. My solution was highly inefficient, and required going through over 2 billion samples to find the correct answer. It took many hours.

Along the journey I learned many Python optimization methods. For instance, in Python, removing the first element of an array by doing arr.pop(0) requires shifting all remaining elements, making it dramatically slower than arr.pop(), which removes item from the end. This seemingly small detail made the difference between a solution that ran in seconds versus hours. Another example is uing copy.deepcopy() to duplicate an array, which is significantly slower than list comprehension [a for a in list]. To be even faster, there is arr.copy().

"""A little benchmark demo for list operations in Python."""
import timeit

def benchmark_pop():
    setup = """
import random
from collections import deque

arr = list(range(100000))
deq = deque(range(100000))
"""
    pop_start = timeit.timeit('arr.pop(0)', setup=setup, number=10000)
    pop_end = timeit.timeit('arr.pop()', setup=setup, number=10000)
    deque_pop = timeit.timeit('deq.popleft()', setup=setup, number=10000)

    print("Time to pop 10000 elements:")
    print(f"pop(0):     {pop_start:.4f} seconds")
    print(f"pop():      {pop_end:.4f} seconds")
    print(f"deque.popleft(): {deque_pop:.4f} seconds")


def benchmark_copy():
    setup = """
import copy
arr = list(range(10000))
"""

    # Test different copy methods
    deep_copy = timeit.timeit('copy.deepcopy(arr)', setup=setup, number=1000)
    list_comp = timeit.timeit('[x for x in arr]', setup=setup, number=1000)
    list_copy = timeit.timeit('arr.copy()', setup=setup, number=1000)

    print("\nTime to copy a list 1000 times:")
    print(f"copy.deepcopy(): {deep_copy:.4f} seconds")
    print(f"list comprehension: {list_comp:.4f} seconds")
    print(f"list.copy(): {list_copy:.4f} seconds")



if __name__ == "__main__":
    benchmark_pop()
    benchmark_copy()


"""
Time to pop 10000 elements:
pop(0):     0.1916 seconds
pop():      0.0007 seconds
deque.popleft(): 0.0005 seconds

Time to copy a list 1000 times:
copy.deepcopy(): 4.5081 seconds
list comprehension: 0.2337 seconds
list.copy(): 0.0244 seconds
"""

Amazing Libraries

One of the unexpected joys of Advent of Code was discovering powerful Python libraries. While pandas and numpy are the bread and butter of data analysis, AoC introduced me to some incredibly useful standard library modules, as well as third-party packages. Here are some libraries that were particularly helpful for AoC.

Standard Libraries

  • collections.defaultdict is a dictionary that returns a default value when a key is not found. No more if key in dict checks!
  • itertools has a combinations function that generates all possible combinations of a list. There are also permutations and product utilities that make it easy to construct different types of combinations.
  • heapq is a priority queue implementation that's perfect for pathfinding problems. More on this later.

External Libraries

  • numpy: Yay we are already familiar with numpy! AoC often have questions involving matrices and vectors. numpy can be very helpful for those.
  • networkx: A powerhouse for graph problems. Once a while AoC challenges involve graph structures, and networkx makes these tasks very manageable.
  • z3: Don't let its unassuming documentation mislead you. z3 is such a powerful library that using it for AoC can feel like cheating. z3 is a theorem prover from Microsoft Research that can solve complex constraint satisfaction problems. Instead of writing intricate algorithms, you can often express the problem as a set of constraints and let z3 find the solution. I'm still trying to figure out how it works exactly. My current hypothesis is that its underlying algorithm is magic.

Visualizing the Problem

Data Scientists don't need convincing that visualization is extremely effective for problem solving. For AoC, I did find quick plotting very helpful, especially for questions that had hidden patterns.

A quick plot in a colaboratory notebook using matplotlib can reveal the hidden pattern in an AoC problem.

Throughout my AoC binge, I had relied heavily on visualization to make sense of a problem. These can be good old fashioned pen and paper:

Sometimes simple print() statements can work well enough for terminal-based visualization:

For animation, I began to use Python's built-in curses. This made me some movies that really helped with debugging:

Filling the Gaps on Core Programming Concepts

While data scientists are well-versed in statistical methods and machine learning algorithms, we often have gaps in fundamental computer science concepts. Advent of Code exposed these gaps for me, particularly with puzzles that required classic algorithms and programming patterns.

Take 2023 Day 17, for example. This puzzle involved finding the optimal path through a grid while managing heat loss – a classic path finding optimization problem. My many brute force attempts failed until I admitted defeat, and learned about Dijkstra's algorithm, and how to implement it with Python's built-in heapq. Since then, I've used heapq for many pathfinding challenges in previous years of AoC. You'll always find one of these each year!

Similarly, many puzzles that seemed impossibly complex became more manageable after I practiced dynamic programming over and over. Honestly, dynamic programming still feels magical, but after several repetitions, the thought process of breaking problems into smaller, overlapping subproblems started to feel a bit more fluid.

Everyone's Journey is Different; Find a Goal that Works for You

If you have never done AoC before and would like to get better with coding, I highly suggest you give it a try.

For me, the month that I discovered AoC, I probably spent a concerning 80+ hours on these puzzles. While this intensity was clearly not sustainable, it was the most I have ever learned within such short amount of time. At the end of the month, I felt like a more confident and adept programmer, and my passion for coding renewed.

It wasn't just coding skill I learned, but also problem solving philosophies. The biggest lesson? Knowing when to step back. If I'm still stuck on the same problem after a couple of hours, it usually meant I'm facing "unknown unknowns" – fundamental concepts I haven't yet learned. Sometimes the best way forward is to pause, research common algorithms and patterns, study other solutions, then return with new knowledge.

And working on AoC while at Recurse is in itself an amazing treat. As an AoC newbie, I was exchanging ideas with and learning from amazingly accomplished programmers. Many are veterans of coding competitions. Many are using AoC to practice new languages, sometimes languages they themselves created! Looking back, it was truly a festive, educational, and joyful December to remember.