A neural network is a stack of filters

Forget the brain analogy for a minute. Neural networks are not how the brain works. A better way to picture them:

Imagine you pour a messy input — a blurry photo, a sentence, a recording — through a stack of filters. Each filter pulls out some pattern. The first filter might notice edges. The next filter combines edges into shapes. The next combines shapes into objects. By the end of the stack, the computer has gone from "pile of pixels" to "this is a golden retriever."

That's a neural network. A stack of layers, each one transforming the input a little more abstractly than the last.

A neural network as a stack of filters. Raw input goes in the bottom; each layer adds abstraction; a label falls out the top.

Deep learning just means: use a lot of layers. "Deep" literally refers to how many layers you stack. Modern LLMs have dozens to hundreds.

The one idea behind it

Each layer is made of neurons — and a neuron is surprisingly dumb. It takes some numbers in, multiplies them by some weights, adds them up, and spits out one number. That's all.

What makes it work: there are millions of these tiny, dumb neurons. Their weights (the numbers they multiply by) start out random. Then, during training:

Feed the network an example.
Compare its guess to the right answer.
Slightly nudge every weight in the direction that would have made the guess better.
Repeat billions of times.

That's it. That's the algorithm. It's called backpropagation + gradient descent, and it's been powering the entire AI revolution.

You don't need to understand the math. You need to understand the shape: random network → feed it examples → nudge weights → eventually it's good.

Why "deep" matters

For decades, neural networks were small — a few layers — and they didn't work that well. Two things changed in the 2010s:

Data — the internet gave us billions of labeled examples for free.
Hardware — GPUs made it possible to train massive networks in reasonable time.

Suddenly, people could stack dozens of layers and train them on huge datasets. Performance on image recognition, speech, and translation jumped past what any hand-written program could do. That was the deep learning breakthrough, roughly 2012.

Every AI system you care about today — LLMs, image generators, speech models, self-driving — is deep learning.

Different shapes for different jobs

Not all neural networks look the same. There are architectures tailored to different kinds of data:

Architecture	Best at	Where you'll see it today
CNN	Images, vision	Medical imaging, self-driving car perception
RNN / LSTM	Sequences, time series (legacy)	Older translation models, some speech systems
Transformer	Sequences and everything else	Every modern LLM: GPT, Claude, Gemini, Llama

Why LLMs are "just" deep learning

Here's the reveal: an LLM is a deep neural network. A really, really big one, trained on an enormous amount of text. That's not a metaphor — that's literally the whole thing.

GPT-5, Claude 4.7, Gemini 2.5 — all neural networks built on the transformer architecture, trained by nudging billions of weights based on trillions of examples from the internet.

Everything else — fine-tuning, RLHF, agents, RAG, prompting — is built on top of that one idea.

What to take away

A neural network is a stack of layers that transforms input step by step.
"Deep" means "lots of layers." That's it.
Training = show examples, nudge the weights, repeat billions of times.
Different architectures (CNN, RNN, transformer) are tuned to different kinds of data.
LLMs are deep neural networks — just very big ones, built on transformers.

Next: How Computers Turn Words Into Numbers — the bridge that lets text flow through these networks in the first place.