AI Concepts Library — The AI Repository

🤖

Artificial Intelligence (AI)

The simulation of human intelligence by machines

Plain-English Definition

AI is when computers can do things that normally require human intelligence — understanding language, recognizing images, making decisions, solving problems.

The Big Picture

AI isn't a single technology. It's an umbrella term covering machine learning, deep learning, natural language processing, computer vision, robotics, and more. Modern AI is mostly statistical pattern recognition at massive scale.

Types of AI

Narrow AI: Excellent at one specific task (chess, image recognition, language)
General AI (AGI): Human-level performance across all tasks — doesn't exist yet
Superintelligence: Far beyond human level — hypothetical future scenario

Real-World Examples

Spam filters, recommendation engines (Netflix, Spotify), facial recognition, virtual assistants (Siri, Alexa), autonomous vehicles, ChatGPT.

📊

Machine Learning (ML)

AI systems that learn from data without being explicitly programmed

Plain-English Definition

Instead of coding rules manually, you feed the machine thousands of examples and let it discover the rules itself. The machine "learns" patterns from data.

How It Works

1. Collect training data → 2. Train a model (adjust mathematical parameters) → 3. Evaluate accuracy → 4. Deploy to make predictions on new data.

3 Main Types

Supervised: Labeled data (input → correct output). Used for classification, regression.
Unsupervised: Unlabeled data, find hidden structure. Used for clustering, anomaly detection.
Reinforcement: Agent learns by taking actions and receiving rewards/penalties.

Common Algorithms

Linear regression, decision trees, random forests, SVMs, k-means clustering, neural networks.

🧬

Neural Networks

Computing systems loosely inspired by the human brain

Plain-English Definition

A neural network is a system of interconnected nodes ("neurons") arranged in layers. Data flows in one end, gets transformed by each layer, and predictions come out the other end.

Structure

Input Layer: Receives raw data (pixels, words, numbers)
Hidden Layers: Learn increasingly abstract features
Output Layer: Produces the final prediction

How Learning Happens

Each connection has a weight. During training, the network makes a prediction, measures the error, and uses backpropagation to adjust weights to reduce the error. Repeat millions of times.

Deep Learning

Deep Learning = neural networks with many hidden layers. "Deep" refers to the depth (number of layers). More layers = ability to learn more complex patterns.

💬

Large Language Models (LLMs)

Massive AI models trained on text that can read and write like humans

Plain-English Definition

LLMs are AI models trained on enormous amounts of text (books, websites, code) that can understand and generate human-like text. They work by predicting the most likely next word, over and over.

How They're Trained

1. Collect hundreds of billions of words from the internet and books → 2. Train a transformer to predict the next word (self-supervised) → 3. Fine-tune with RLHF to make it helpful and safe.

Major LLMs Today

GPT-4o, o3 — OpenAI
Claude 3.5/4 — Anthropic
Gemini 1.5/2 — Google DeepMind
Llama 3 — Meta (open-source)
Mistral, Grok, Command R+ — others

What They Can Do

Write, summarize, translate, reason, code, analyze, answer questions, create content — and when combined with tools, much more.

⚡

Transformer Architecture

The neural network design behind every major modern AI model

Plain-English Definition

The Transformer is the neural network architecture introduced in the 2017 paper "Attention Is All You Need." It's the foundation of GPT, BERT, Claude, Gemini, Stable Diffusion, and essentially every frontier AI model.

Key Innovation: Self-Attention

The self-attention mechanism allows the model to weigh how relevant each word is to every other word in a sequence. This captures long-range dependencies that previous architectures (RNNs) struggled with.

Why It's Powerful

Processes all tokens in parallel (unlike sequential RNNs)
Scales efficiently with more compute and data
Works for text, images, audio, video, code, proteins

Encoder vs. Decoder

Encoder-only (BERT): Understanding tasks. Decoder-only (GPT): Text generation. Encoder-Decoder (T5): Translation, summarization.

🎨

Diffusion Models

The technology behind AI image generation (DALL-E, Stable Diffusion)

Plain-English Definition

Diffusion models generate images by learning to reverse a process that adds noise. Think of it as learning to "un-blur" a completely noisy image back into a clean, coherent picture.

How They Work

Forward Process: Take a real image → gradually add random noise until it's pure static (T steps).

Reverse Process: Train a neural network to predict and remove the noise at each step. At inference, start from pure noise and denoise step-by-step into an image.

Conditioning

Text prompts are encoded and used to guide the denoising process (classifier-free guidance), allowing text-to-image generation.

Major Models

DALL-E 3 (OpenAI)
Stable Diffusion 3 (Stability AI)
Midjourney
Imagen 2 (Google)
Sora (video — OpenAI)

🔀

Mixture of Experts (MoE)

Architecture that routes inputs to specialized sub-networks

Plain-English Definition

Instead of using all of a model's neurons for every input, MoE models activate only a small subset of "experts" for each token. This gives you a huge effective model size with the compute cost of a much smaller one.

How It Works

A learned "gating network" looks at each input token and decides which experts (specialized sub-networks) should process it. Typically top-2 or top-8 experts are chosen out of 8-64 total.

Why It Matters

GPT-4 is widely believed to be a 1.8T parameter MoE model that activates ~220B per token. Mixtral 8x7B performs like a 47B model with 13B active parameters per token.

Trade-offs

Pros: Much larger model capacity, same inference cost
Cons: Requires more memory to store all experts, load balancing is tricky

📚

Retrieval-Augmented Generation (RAG)

Giving LLMs access to external knowledge at query time

Plain-English Definition

RAG is a technique that combines a search system with an LLM. When you ask a question, the system first retrieves relevant documents from a knowledge base, then passes those documents to the LLM to generate an answer.

Why It's Important

LLMs have a knowledge cutoff date and can't access private data. RAG solves both problems — the model's knowledge is only limited by your retrieval database, which can be updated anytime.

How It Works

Embed your documents as vectors and store in a vector database
When a query comes in, embed it and find the most similar document chunks
Inject those chunks into the LLM prompt as context
LLM answers using the retrieved context

Popular RAG Frameworks

LangChain, LlamaIndex, Haystack, Cognita.

🎯

Reinforcement Learning from Human Feedback (RLHF)

How ChatGPT was trained to be helpful and safe

Plain-English Definition

RLHF is the training technique that transforms a raw language model (which just predicts text) into a helpful assistant. It uses human feedback to teach the model what "good" responses look like.

The 3 Steps

Step 1 — SFT: Fine-tune on high-quality (prompt, response) examples written by humans
Step 2 — Reward Model: Train a model to predict which response humans prefer (A vs B comparisons)
Step 3 — RL Optimization: Use PPO (reinforcement learning) to optimize the LLM to produce responses that score high on the reward model

Modern Alternatives

DPO: Skips the separate reward model, directly trains on preference pairs. Simpler and often works just as well.

Constitutional AI (Anthropic): Uses AI-generated critiques instead of human feedback for many steps.

🔢

Embeddings & Vector Search

Converting meaning into numbers for semantic search

Plain-English Definition

An embedding is a dense numerical vector (list of numbers like [0.2, -0.7, 0.4, …]) that represents the meaning of text, an image, or other data. Similar concepts have similar vectors.

Why It's Revolutionary

Traditional search is keyword-based: "cat" doesn't match "feline." Vector search is semantic: "cat" and "kitty" and "feline" all land near each other in embedding space. Search finds meaning, not just exact words.

The Pipeline

Pass text through an embedding model (text-embedding-3, E5, BGE)
Get back a 1536-dimensional vector
Store in a vector database (Pinecone, Weaviate, ChromaDB, pgvector)
At query time, embed the query and find the nearest stored vectors (ANN search)

Applications

Semantic search, RAG, recommendation systems, duplicate detection, clustering, classification.

🔄

Fine-Tuning & Transfer Learning

Adapting pre-trained models to specific tasks

Plain-English Definition

Pre-trained models learn general knowledge from massive datasets. Fine-tuning adapts this knowledge to a specific task using a smaller, domain-specific dataset — without training from scratch.

Fine-Tuning Methods

Full Fine-Tuning: Update all parameters. Expensive but thorough.
LoRA: Train small low-rank adapter matrices. 90%+ memory reduction. Most popular.
QLoRA: LoRA + 4-bit quantization. Fine-tune large models on consumer GPUs.
Prompt Tuning: Only tune soft prompt tokens, freeze the model.

When to Fine-Tune

Use fine-tuning when you need consistent tone/format, domain-specific knowledge baked in, faster responses (no long prompts), or when prompt engineering isn't enough.

🔍

Prompt Engineering

The art of communicating effectively with AI models

Plain-English Definition

Prompt engineering is the practice of designing inputs to AI models to get the best possible outputs. It's the new "programming" skill for the AI era.

Core Techniques

Zero-Shot: Ask directly without examples
Few-Shot: Provide 2-5 examples before your request
Chain-of-Thought: Ask the model to "think step by step" for reasoning tasks
System Prompts: Set persona, constraints, and context before the conversation
Role Assignment: "Act as an expert [X]…"

Advanced Patterns

Tree of Thoughts: Explore multiple reasoning paths. Self-Consistency: Generate multiple answers, pick the most common. ReAct: Reason + Act with tools.

📐

Quantization

Making AI models smaller and faster without losing much quality

Plain-English Definition

Quantization reduces the precision of model weights from 32-bit or 16-bit floating point numbers to 8-bit or 4-bit integers. This dramatically reduces model size and speeds up inference.

Why It Matters

A 70B parameter LLaMA model at FP16 needs ~140GB of GPU memory. At 4-bit (Q4), it needs ~40GB — runnable on consumer hardware.

Common Formats

FP32: Full precision, most accurate, huge
FP16/BF16: Half precision, standard for training
INT8 (Q8): 8-bit, minimal quality loss, 2x smaller
INT4 (Q4): 4-bit, small quality loss, 4x smaller vs FP16
GGUF: Format used by llama.cpp for local inference

🤝

AI Agents

AI systems that autonomously take actions to achieve goals

Plain-English Definition

An AI agent is a system that perceives its environment, makes decisions, takes actions, and learns from the results — autonomously, over extended time horizons, to achieve a specified goal.

What Agents Can Do

Browse the web and search for information
Write and execute code
Manage files and send emails
Book calendars and make API calls
Run multi-step research pipelines
Control other software (Computer Use)

Agent Architectures

ReAct: Alternate Reason → Act → Observe cycles. Plan-and-Execute: Make a plan first, then execute. Multi-Agent: Networks of specialized agents.

Popular Frameworks

LangChain Agents, LangGraph, AutoGen, CrewAI, OpenAI Assistants API, Claude Computer Use.

🖼

Generative AI (GenAI)

AI that creates new content — text, images, audio, video, code

Plain-English Definition

Generative AI refers to models that can create new, original content (as opposed to classification or prediction models). The content can be text, images, audio, video, code, 3D models, or any combination.

Key Technologies

LLMs: Text and code generation (ChatGPT, Claude, Gemini)
Diffusion Models: Image and video generation (DALL-E, Midjourney, Sora)
Audio Models: Speech synthesis and music (ElevenLabs, Suno, Udio)
Multimodal Models: Handle multiple modalities together

Economic Impact

GenAI is projected to add $4.4 trillion annually to the global economy (McKinsey). It's transforming creative work, software development, content production, and professional services.

👁

Computer Vision

Teaching machines to see and understand visual information

Plain-English Definition

Computer vision is the field of AI that enables machines to interpret and understand visual information from images and video — identifying objects, faces, text, scenes, and actions.

Key Tasks

Image Classification: "This is a cat" (whole image label)
Object Detection: "There are 3 cats at these coordinates"
Semantic Segmentation: Label every pixel in the image
Pose Estimation: Track body joints and poses
Optical Character Recognition (OCR): Read text from images

Key Architectures

CNNs, Vision Transformers (ViT), CLIP, SAM (Segment Anything). Modern vision models are increasingly unified with language models in multimodal systems.

🧩

Multimodal AI

AI that understands text, images, audio, and video together

Plain-English Definition

Multimodal AI can process and reason across multiple types of data simultaneously — looking at an image while reading text about it, or listening to audio while reading a transcript.

Why It's a Leap Forward

Humans don't experience the world as text only. We see, hear, and read simultaneously. Multimodal AI is the step toward systems that can interact with the world more like humans do.

Multimodal Models

GPT-4o — text, image, audio input/output
Gemini 1.5 Pro — text, images, audio, video, code
Claude 3.5 Sonnet — text + image understanding
LLaVA, Moondream — open-source vision-language models

Applications

Document understanding, visual QA, video summarization, medical imaging analysis, autonomous driving perception.

⚠️

AI Hallucinations

When AI confidently states things that are simply false

Plain-English Definition

AI hallucinations occur when a language model generates information that sounds plausible and confident but is factually incorrect, fabricated, or nonsensical.

Why It Happens

LLMs are trained to predict the most likely next token, not to be factually accurate. When they don't know something, they often generate plausible-sounding text rather than saying "I don't know."

Examples

Citing papers that don't exist with real-looking DOIs
Inventing court cases (lawyers have been sanctioned for this)
Stating wrong dates, statistics, or biographical details
Describing software APIs that don't exist or have changed

Mitigation Strategies

RAG (ground answers in retrieved documents), tool use (let AI search instead of recall), chain-of-thought reasoning, and better training techniques all reduce hallucinations.

🪟

Context Window

How much text an AI model can process at once

Plain-English Definition

The context window is the maximum amount of text (measured in tokens) that an AI model can "see" and process in a single interaction. It's the model's working memory.

Why It Matters

A small context window means the model can't process long documents or remember earlier parts of long conversations. A large context window enables new use cases — "Chat with your entire codebase."

Context Window Evolution

GPT-3 (2020): 4K tokens (~3,000 words)
GPT-4 (2023): 8K–32K tokens
Claude 3 (2024): 200K tokens (~150,000 words — 2+ novels)
Gemini 1.5 Pro: 1M tokens (~700,000 words)
Gemini 1.5 Flash: 2M tokens

What is a Token?

A token is roughly 3/4 of a word in English. "unbelievable" = 3 tokens. The word "a" = 1 token. Code and non-English languages use tokens differently.

🧠

Chain-of-Thought Reasoning

Teaching AI to think step-by-step before answering

Plain-English Definition

Chain-of-Thought (CoT) prompting encourages AI models to break down complex problems into intermediate reasoning steps before giving a final answer — dramatically improving accuracy on math, logic, and reasoning tasks.

Why It Works

LLMs generate text token by token. When asked to reason step-by-step, they "think out loud," which gives the model more computation to work through the problem before committing to an answer.

Variations

Zero-Shot CoT: Just add "Let's think step by step"
Few-Shot CoT: Show examples with reasoning chains
Self-Consistency: Sample multiple reasoning chains, pick the most common answer
Extended Thinking (o3/R1): Models trained to reason extensively before answering

🔧

Function Calling & Tool Use

How LLMs interact with external APIs, code, and real-world systems

Plain-English Definition

Function calling lets an LLM identify when a task requires an external action — running a search, reading a file, calling an API — and output a structured request for that action instead of guessing the answer from memory.

How It Works

Step 1: You describe available functions (name, parameters, description) in the system prompt or API schema
Step 2: The LLM decides if a function call is needed and outputs a structured call (e.g., get_weather(city="London"))
Step 3: Your application executes the function and returns the result to the LLM
Step 4: The LLM uses the result to generate the final response

Why It Matters

Tool use transforms LLMs from static knowledge bases into active agents. The model no longer needs to "remember" current data — it can look it up. This is the foundation of agentic AI.

Examples

Web search, code execution, database queries, sending emails, reading calendars, calling REST APIs, controlling IoT devices.

Supported By

OpenAI (function calling / tools), Anthropic Claude (tool use), Google Gemini (function declarations), Mistral, LLaMA 3.1+.

🔌

Model Context Protocol (MCP)

The open standard for connecting AI to any data source or tool

Plain-English Definition

MCP (Model Context Protocol) is an open protocol introduced by Anthropic in late 2024 that standardizes how AI applications connect to external data sources and tools. Think of it as a "USB-C port for AI" — one universal interface that works across any AI host and any data source.

The Problem MCP Solves

Before MCP, every AI tool integration was built from scratch — a custom connector for GitHub, another for Slack, another for your database. Each AI app had to rebuild all these integrations independently. MCP makes integrations portable and reusable.

Architecture

Hosts: AI applications that initiate connections (Claude Desktop, VS Code Copilot, Cursor, custom apps)
Clients: MCP client layer inside the host that manages server connections
Servers: Lightweight programs that expose resources, tools, and prompts over the MCP protocol

What MCP Servers Can Expose

Resources: Read-only data (files, database rows, API responses)
Tools: Callable actions (run a query, send a message, create a file)
Prompts: Pre-built prompt templates for common workflows

Real MCP Servers Available Today

GitHub, GitLab, Slack, Google Drive, Notion, PostgreSQL, SQLite, filesystem, web search, Puppeteer (browser control), Docker, and hundreds more — all community-built and open-source.

Why It's a Big Deal

MCP is becoming the standard for agentic AI tool use. Major IDEs (Cursor, VS Code, Zed), Claude Desktop, and growing numbers of apps support it. A single MCP server works across every compatible AI host.

🗂️

Agentic AI & Multi-Agent Systems

Networks of AI agents working together to solve complex tasks

Plain-English Definition

Agentic AI systems take sequences of actions over time to accomplish goals — planning, using tools, checking results, and adapting. Multi-agent systems add specialized agents that collaborate, delegate, and check each other's work.

Why Go Multi-Agent?

Tasks too long for a single context window can be split across agents
Specialized agents (coder, researcher, critic) outperform one generalist agent
Parallel execution — multiple agents work simultaneously
Independent agents can review and correct each other's outputs

Common Patterns

Orchestrator → Subagent: One agent delegates to specialized sub-agents
Peer-to-Peer: Agents collaborate as equals (AutoGen, CrewAI)
Pipeline: Output of one agent feeds into the next
Swarm: Many lightweight agents with emergent coordination

Key Challenges

Error propagation (mistakes compound), cost (many API calls), reliability, and giving agents the right level of autonomy without losing human oversight.

🛡️

AI Safety & Alignment

Ensuring AI systems do what humans actually want, safely

Plain-English Definition

AI safety is the field focused on building AI systems that reliably do what we intend — and don't cause harm when things go wrong. Alignment is the challenge of making AI goals and values match human values.

Key Problems

Specification gaming: AI achieves the stated goal in unintended ways (reward hacking)
Emergent behaviors: Capabilities appearing unexpectedly at scale
Deceptive alignment: A model that behaves well during training but not deployment
Power-seeking: Advanced agents optimizing for resource acquisition as an instrumental goal

Current Approaches

RLHF / DPO: Train models to match human preferences
Constitutional AI: Teach models to critique and revise outputs against principles
Interpretability: Understand what's happening inside models (Anthropic, DeepMind)
Red-teaming: Systematically try to find dangerous behaviors

Key Organizations

Anthropic (Constitutional AI, interpretability), OpenAI Safety Team, DeepMind Safety, ARC Evals, MIRI, Center for AI Safety (CAIS), UK AI Safety Institute.