53 terms defined in plain language. No jargon without explanation.
AI systems designed to operate with autonomy — planning multi-step workflows, making decisions, and executing actions without human approval at each step. The next evolution beyond chatbots.
An AI system that can autonomously perform multi-step tasks — browsing the web, writing and running code, using software tools, making decisions — with minimal human oversight. The current frontier of AI deployment in 2025-2026.
The frameworks, policies, and institutions for overseeing AI development and deployment. Includes government regulation, industry self-regulation, international agreements, and technical standards.
The field focused on ensuring AI systems behave as intended and don't cause unintended harm. Covers everything from preventing biased outputs to theoretical work on aligning superintelligent systems.
The challenge of ensuring AI systems pursue goals that are beneficial to humans. A misaligned superintelligent AI wouldn't necessarily be malicious — it might simply optimize for the wrong objective with devastating efficiency.
How software systems talk to each other. In AI context, the API is how developers access AI models — sending text in, getting responses back. OpenAI, Anthropic, and Google all offer APIs to their models.
A hypothetical AI system that can understand, learn, and apply knowledge across any intellectual task a human can do — not just narrow, specific ones. No AGI system exists yet. Timelines for when (or whether) it will are hotly debated.
A broad field of computer science focused on building systems that can perform tasks typically requiring human intelligence — reasoning, learning, perception, language understanding, and decision-making.
The core innovation of Transformers. Instead of processing text sequentially (word by word), attention lets the model look at all words simultaneously and decide which ones are most relevant to each other. This is why LLMs can understand context across long passages.
The fundamental algorithm for training neural networks. It calculates how much each parameter contributed to an error, then adjusts the parameters to reduce that error. Popularized by Hinton, Rumelhart, and Williams in 1986 — the paper that made deep learning possible.
A standardized test for measuring AI performance. Examples: MMLU (general knowledge), HumanEval (coding), GSM8K (math). Useful but imperfect — models can be optimized for benchmarks without gaining real capability.
A metric for evaluating probabilistic forecasts. Measures the mean squared error between predicted probabilities and actual outcomes. Score of 0 = perfect accuracy, 0.25 = coin-flip useless. TexTak uses Brier scores to track forecast accuracy.
The computational resources required to train and run AI models. Measured in GPU-hours or FLOPS. Access to compute is one of the primary barriers to entry in frontier AI development — training a single large model can cost tens of millions of dollars.
AI that processes and understands visual information — images and video. Applications include facial recognition, medical image analysis, autonomous driving, and image generation.
Anthropic's approach to AI alignment where the model is trained against a set of principles (a 'constitution') rather than relying solely on human feedback. The model critiques and revises its own outputs based on these principles.
The maximum amount of text a language model can consider at once. Measured in tokens. A 128K context window means the model can 'see' roughly 100,000 words at a time. Larger windows enable longer documents and conversations.
A subset of machine learning that uses neural networks with many layers (hence 'deep'). The approach that powers most modern AI, including language models, image generators, and speech recognition.
The architecture behind DALL-E, Stable Diffusion, and Midjourney. Works by learning to remove noise from images — training on the process of gradually corrupting images with static, then learning to reverse it. Generation starts from pure noise and progressively refines it into a coherent image.
The European Union's comprehensive AI regulation framework, passed in 2024. Classifies AI systems by risk level (unacceptable, high, limited, minimal) and imposes requirements accordingly. The most significant AI regulation in the world to date.
Taking a pre-trained model and training it further on a specific, smaller dataset to specialize it for a particular task. Like a medical school graduate doing a residency in cardiology.
A large AI model trained on broad data that can be adapted to many tasks. Rather than building a separate model for each use case, you start with a foundation model and fine-tune or prompt it. GPT-4, Claude, and LLaMA are foundation models.
The most capable AI models at any given time — the cutting edge. Currently refers to GPT-4-class and above systems from OpenAI, Anthropic, Google, and Meta. The frontier advances every few months.
A model architecture where two neural networks compete: one generates content, the other judges whether it's real or fake. The competition drives both to improve. Invented by Ian Goodfellow in 2014. Preceded the diffusion models that power modern image generators.
AI systems that create new content — text, images, music, video, code — rather than just analyzing or classifying existing content. The category that includes ChatGPT, DALL-E, Midjourney, Stable Diffusion, and Sora.
The hardware that powers AI training and inference. Originally designed for video game graphics, GPUs excel at the parallel mathematical operations that neural networks require. NVIDIA dominates the market.
Safety mechanisms built into AI systems to prevent harmful outputs — content filters, refusal behaviors, output classifiers. The boundaries that keep models from generating dangerous, illegal, or harmful content.
When an AI model generates information that sounds confident and plausible but is factually wrong. Not lying (the model has no intent) — more like confabulation. A fundamental limitation of current language models that remains unsolved.
When a trained model generates output — answering a question, creating an image, making a prediction. Training is learning; inference is using what was learned. Most AI costs are now in inference, not training.
The ability to understand why an AI model made a specific decision. Most deep learning models are 'black boxes' — they work, but nobody fully understands how. Interpretability research tries to open the box.
A neural network trained on massive amounts of text data that can generate, summarize, translate, and reason about language. Examples: GPT-4, Claude, LLaMA, Gemini. 'Large' refers to billions of parameters.
A subset of AI where systems learn patterns from data rather than being explicitly programmed with rules. Instead of writing 'if X then Y,' you show the system thousands of examples and it figures out the pattern.
Anthropic's open standard for connecting AI models to external data sources and tools. Allows Claude and other models to interact with services like Google Drive, Slack, and databases through a standardized interface.
A competitive advantage that's hard for rivals to replicate. In AI: is it data, compute, talent, distribution, or brand? Whether any AI company has a durable moat — or whether open-source models erode all moats — is an active strategic question.
AI systems that can process multiple types of input — text, images, audio, video — rather than just one. GPT-4, Claude, and Gemini are multimodal: you can send them images and they can describe, analyze, or reason about them.
The branch of AI focused on enabling computers to understand, interpret, and generate human language. Includes everything from spell-check to chatbots to machine translation.
A computing system inspired by biological neurons. Layers of interconnected nodes process information, with each connection having a learnable weight. The 'learning' happens by adjusting these weights based on data.
Models whose trained parameters are publicly released, allowing anyone to run, modify, and build on them. Meta's LLaMA and Mistral are open-weight. Distinct from 'open source' in the traditional software sense — training data and methods may not be shared.
A learnable value in a neural network — essentially a number that gets adjusted during training. GPT-4 reportedly has over 1 trillion parameters. More parameters generally (but not always) means more capability.
A measurement of how well a language model predicts text. Lower perplexity = better prediction. Roughly: if the model's perplexity on a sentence is 10, it's as 'surprised' as if it had to choose between 10 equally likely options for each word.
The practice of crafting inputs to AI models to get better outputs. Since LLMs are sensitive to how questions are phrased, the wording, structure, and examples in a prompt significantly affect the response quality.
A technique that gives language models access to external knowledge by retrieving relevant documents before generating a response. Reduces hallucination by grounding answers in actual sources rather than relying solely on training data.
Deliberately trying to make AI systems fail, produce harmful content, or behave unexpectedly. Named after military exercises. AI companies employ red teams to find vulnerabilities before deployment.
An umbrella term for practices aimed at developing AI that is fair, transparent, accountable, and safe. Every major AI company has a responsible AI team. Critics argue the term is often more marketing than substance.
A training technique where human evaluators rank model outputs by quality, and the model learns to produce responses that humans prefer. The key method behind making raw language models into helpful assistants like ChatGPT and Claude.
The empirical observation that model performance improves predictably as you increase model size, dataset size, and compute. The intellectual foundation behind the 'bigger is better' approach to AI development. Whether scaling laws will continue to hold is a central debate.
Training data generated by AI models rather than collected from the real world. Used when real data is scarce, expensive, or privacy-sensitive. Increasingly used to train newer models — raising questions about 'model collapse' when AI trains on AI output.
AI systems that generate images from written descriptions. Type 'a cat wearing a space suit on Mars' and get an image. Powered by diffusion models or transformer-based architectures.
AI systems that generate video clips from written descriptions. OpenAI's Sora is the most prominent example. Still early — physics and consistency are imperfect — but advancing rapidly.
The basic unit that language models process. Not exactly a word — more like a word fragment. 'Understanding' might be split into 'under' + 'standing.' A typical English word is 1-3 tokens. Context windows are measured in tokens.
The ability of AI models to interact with external tools — search engines, calculators, code interpreters, APIs, databases. Extends LLMs beyond text generation into real-world action.
The process of feeding data to a neural network so it can learn patterns. Like education for AI — the model sees billions of examples and adjusts its parameters to get better at predicting what comes next.
The neural network architecture behind virtually every modern LLM. Introduced in the 2017 paper 'Attention Is All You Need.' Its key innovation — the attention mechanism — allows the model to weigh the relevance of different parts of the input when generating each piece of output.
Alan Turing's proposed measure of machine intelligence: if a human can't distinguish between a machine's responses and a human's in conversation, the machine exhibits intelligent behavior. Modern AI arguably passes narrow versions but the deeper question remains debated.