Thursday, January 29, 2026

How LLM works End-to-END

 



𝐌𝐨𝐬𝐭 𝐩𝐞𝐨𝐩𝐥𝐞 𝐮𝐬𝐞 𝐋𝐋𝐌𝐬 𝐝𝐚𝐢𝐥𝐲 𝐛𝐮𝐭 𝐡𝐚𝐯𝐞 𝐧𝐨 𝐢𝐝𝐞𝐚 𝐰𝐡𝐚𝐭 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐛𝐞𝐭𝐰𝐞𝐞𝐧 𝐭𝐡𝐞𝐢𝐫 𝐩𝐫𝐨𝐦𝐩𝐭 𝐚𝐧𝐝 𝐭𝐡𝐞 𝐫𝐞𝐬𝐩𝐨𝐧𝐬𝐞.

Here are the optimized LLM workflows for 30+ production systems. The teams that understand these internals make fundamentally different and better architectural decisions.

Here is what actually happens when you hit "send" on a prompt:

Phase 1: Input Processing (Green Layer)

1. Input Text
• Raw text: "The dog ran up the hill"
• This is what users see natural language

2. Token Embeddings
• Text → subword tokens, mapped to IDs
• "The", "dog", "ran", "up", "the", "hill" become discrete units
• Why it matters: Token boundaries affect cost and context limits

3. Positional Embeddings
• Each token gets position information (binary vectors)
• The model learns "dog" in position 2 relates to "ran" in position 3
• Why it matters: Without positional encoding, word order is meaningless

4. Final Input Embedding
• Token embeddings + positional embeddings = rich vector representation
• This combined representation enters the transformer
• Why it matters: Quality here determines quality downstream

Phase 2: Transformer Block (Red Layer): N×N Layers

1. Multi-Head Self-Attention
• Model computes relationships between all tokens simultaneously
• "cat sat on mat" → each word attends to every other word
• Why it matters: This is where context understanding happens

2. Residual Connection & Layer Normalization (×2)
• Prevents gradient vanishing in deep networks
• Maintains information flow through many layers
• Why it matters: Enables scaling to 100+ billion parameters

3. Feed-Forward Network
• Dense neural network processes attended representations
• Non-linear transformations extract patterns
• Why it matters: This is where the actual "reasoning" computation happens

Repeated N times (GPT-4 has ~120 layers)
• Each layer refines understanding progressively
• Early layers: syntax and structure
• Middle layers: semantics and relationships  
• Later layers: task-specific reasoning

Phase 3: Prediction and Generation (Blue Layer)

1. Logits → Softmax
• Model outputs probability distribution over vocabulary
• "hill: 0.72, road: 0.08, yard: 0.06, path: 0.02"
• Why it matters: These probabilities determine output quality

2. Sampling Strategy
• Greedy: Always pick highest probability (deterministic, boring)
• Temperature: Control randomness (higher = more creative)
• Top-P: Sample from top probability mass (balanced approach)
• Why it matters: Same logits + different sampling = completely different outputs

3. Output Token
• Selected token: "hill"
• Process repeats for next token until completion
• Why it matters: Generation is iterative each token depends on previous tokens


No comments:

Post a Comment