Thursday, January 29, 2026

How LLM works End-to-END

 



𝐌𝐨𝐬𝐭 𝐩𝐞𝐨𝐩𝐥𝐞 𝐮𝐬𝐞 𝐋𝐋𝐌𝐬 𝐝𝐚𝐢𝐥𝐲 𝐛𝐮𝐭 𝐡𝐚𝐯𝐞 𝐧𝐨 𝐢𝐝𝐞𝐚 𝐰𝐡𝐚𝐭 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐛𝐞𝐭𝐰𝐞𝐞𝐧 𝐭𝐡𝐞𝐢𝐫 𝐩𝐫𝐨𝐦𝐩𝐭 𝐚𝐧𝐝 𝐭𝐡𝐞 𝐫𝐞𝐬𝐩𝐨𝐧𝐬𝐞.

Here are the optimized LLM workflows for 30+ production systems. The teams that understand these internals make fundamentally different and better architectural decisions.

Here is what actually happens when you hit "send" on a prompt:

Phase 1: Input Processing (Green Layer)

1. Input Text
• Raw text: "The dog ran up the hill"
• This is what users see natural language

2. Token Embeddings
• Text → subword tokens, mapped to IDs
• "The", "dog", "ran", "up", "the", "hill" become discrete units
• Why it matters: Token boundaries affect cost and context limits

3. Positional Embeddings
• Each token gets position information (binary vectors)
• The model learns "dog" in position 2 relates to "ran" in position 3
• Why it matters: Without positional encoding, word order is meaningless

4. Final Input Embedding
• Token embeddings + positional embeddings = rich vector representation
• This combined representation enters the transformer
• Why it matters: Quality here determines quality downstream

Phase 2: Transformer Block (Red Layer): N×N Layers

1. Multi-Head Self-Attention
• Model computes relationships between all tokens simultaneously
• "cat sat on mat" → each word attends to every other word
• Why it matters: This is where context understanding happens

2. Residual Connection & Layer Normalization (×2)
• Prevents gradient vanishing in deep networks
• Maintains information flow through many layers
• Why it matters: Enables scaling to 100+ billion parameters

3. Feed-Forward Network
• Dense neural network processes attended representations
• Non-linear transformations extract patterns
• Why it matters: This is where the actual "reasoning" computation happens

Repeated N times (GPT-4 has ~120 layers)
• Each layer refines understanding progressively
• Early layers: syntax and structure
• Middle layers: semantics and relationships  
• Later layers: task-specific reasoning

Phase 3: Prediction and Generation (Blue Layer)

1. Logits → Softmax
• Model outputs probability distribution over vocabulary
• "hill: 0.72, road: 0.08, yard: 0.06, path: 0.02"
• Why it matters: These probabilities determine output quality

2. Sampling Strategy
• Greedy: Always pick highest probability (deterministic, boring)
• Temperature: Control randomness (higher = more creative)
• Top-P: Sample from top probability mass (balanced approach)
• Why it matters: Same logits + different sampling = completely different outputs

3. Output Token
• Selected token: "hill"
• Process repeats for next token until completion
• Why it matters: Generation is iterative each token depends on previous tokens


Wednesday, January 14, 2026

ETL vs. ELT vs. ETLT: What’s the Real Difference?

Here is the distinct function of each approach based on modern architecture needs 👇



📌 1. ETL (Extract, Transform, Load) — "The Classic"

Process: Data is extracted ➡ Transformed in a separate staging serverLoaded into the Warehouse.
Best For: Complex transformations, strict security/compliance masking before data lands, or legacy on-prem systems with limited compute.

☁️ 2. ELT (Extract, Load, Transform) — "The Modern Standard"

Process: Extract raw data ➡ Load immediately into the Warehouse ➡ Transform using SQL/dbt inside the warehouse.
Best For: Modern Cloud Data Warehouses (Snowflake, BigQuery, Redshift) where storage is cheap and compute is massive.

⚖️ 3. ETLT (Extract, Transform, Load, Transform) — "The Hybrid"

Process: Lightweight cleaning (PII masking) before loading ➡ Heavy analytics transformations after loading.
Best For: When you need both strict Data Quality checks (pre-load) and complex analytical modeling (post-load).



Monday, January 5, 2026

Oracle’s "Always Free" tier offers

 




Oracle Cloud is underrated for side projects

If you are still burning free credits on AWS, Azure, or GCP for your learning or pet projects, you are seriously missing out. Most "free tiers" are either time-boxed to 12 months or offer compute power so weak it barely runs a basic application.

I recently started deploying my projects to Oracle Cloud Infrastructure (OCI), and the resources they give away for free are genuinely surprising.

While others give you 1vCPU and 1GB of RAM, Oracle’s "Always Free" tier offers:
✅ 4 ARM Cores
✅ A massive 24 GB of RAM
✅ 10 TB of Data Egress monthly

This isn't just for static pages. This is enough power to run serious applications.


Check out this blog on how to create a linux instance on Oracle cloud and deploy a n8n project on it: https://lnkd.in/gAgxZViF