Graph Databases

What Neo4j taught me that SQL never could

The problem with relational thinking

For most of my career, the relational model was just the way data worked. Tables, rows, foreign keys, joins, it felt natural because it was the only thing I knew. But the longer I worked with deeply interconnected data, the more I noticed a pattern: the harder it is to express a query, the more likely I was modeling the problem wrong.

When your data is fundamentally about relationships, people who know people, companies that own subsidiaries, events that trigger other events, a relational database is fighting you the whole way. Every join is a tax. Every many-to-many junction table is a compromise.

Modeling relationships as first-class citizens

Neo4j flips the mental model. Instead of treating relationships as something you reconstruct at query time through joins, relationships are stored directly as pointers between nodes. The graph structure isn't an emergent property, it's the storage format.

Here's what that looks like in practice. In SQL, finding all the people two degrees of separation from a given user means joining the same table three times:

SELECT DISTINCT f2.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = ?
  AND f2.friend_id != ?;

In Cypher (Neo4j's query language), the same query reads almost like English:

MATCH (u:User {id: $id})-[:KNOWS*2]-(friend)
RETURN DISTINCT friend;

That isn't just syntactic sugar. It reflects a fundamentally different way of thinking about the problem. The relationship KNOWS has the same ontological weight as the User node itself. You don't reconstruct it, you traverse it.

The performance revelation (which isn't the best part)

Everyone talks about performance when comparing graph databases to relational ones, and yes, the numbers are dramatic. Traversing relationships in Neo4j is index-free adjacency, which means the cost of a traversal doesn't grow with the size of the database. A SQL join, by contrast, gets more expensive as tables grow unless you carefully manage indices.

But honestly? The performance gains are the least interesting thing about switching. The real shift is in how you think about your data.

When you model data as a graph, you start asking different questions:

Where graphs fall down (to be fair)

Graph databases aren't a silver bullet. They're worse than relational databases for several common use cases:

Aggregations over large datasets. If you need to sum a column across millions of rows, SQL will crush a graph database. Graph stores are optimized for traversal, not columnar computation.

Tabular reporting. If your output needs to be a pivot table, a relational model maps to that naturally. Graphs require an awkward translation layer.

Transaction-heavy write patterns. While Neo4j supports ACID transactions, high-volume write workloads (think: logging millions of events per second) are better served by purpose-built stores.

The right answer is almost always: use both. Store the data where it fits, query it from the right tool. That's what Neo4j taught me, not that SQL is bad, but that different data shapes demand different tools.

What this means for AI

The reason I got into Neo4j wasn't for the database itself. It was because knowledge graphs are the most promising approach to grounding LLMs in factual data that I've seen.

When an LLM has no source of truth, it generates plausible text. When it can query a graph of verified relationships, it can reason about entities it's never seen before. This isn't just retrieval-augmented generation, it's giving AI a structured understanding of how things connect.

That's a topic for a separate post. But if the Neo4j experiment taught me one thing, it's that relationships are data too. And once you treat them that way, you can't unsee it.

← Back to all posts