AI Glossary

tom van wees founder and cco lleverage
Tom van Wees
November 19, 2024
5
min read

A practical guide to AI terminology, covering key concepts from basic machine learning to advanced language models. Written for product teams and developers, this glossary explains 70+ essential AI terms and concepts, from agents and benchmarking to vector databases and zero-shot learning.

Glossary

Terms to understand when it comes to AI

Agent: An AI model designed to autonomously interact with its environment to perform tasks, often adapting to new information.

Agentic Workflow: A method of task automation where agents work in a structured sequence to complete complex tasks independently.

AGI (Artificial General Intelligence): An advanced form of AI that can understand, learn, and apply knowledge across a wide range of tasks like a human.

AI Copilot: An AI assistant designed to collaborate with humans, often in real-time, to aid in tasks or decision-making.

Alignment: The process of ensuring an AI system's goals and actions align with human values and intentions.

ASI (Artificial Superintelligence): A hypothetical AI that surpasses human intelligence across all fields, including creativity, problem-solving, and emotional intelligence.

Benchmarking: The process of measuring an AI model's performance against set standards or other models.

Bias: Systematic errors in AI that can lead to unfair or inaccurate outcomes, often rooted in biased data.

Chain of Thought: A reasoning technique where AI models break down complex problems into intermediate steps for improved answers.

Chatbot: An AI-powered conversational agent that can communicate with users in text or voice formats to answer questions or provide assistance.

ChatGPT: A conversational AI model developed by OpenAI, based on the GPT architecture, for natural language interactions.

Classification: The process of categorizing data points into predefined classes, such as spam vs. non-spam emails.

Claude: An advanced AI chatbot created by Anthropic with an emphasis on ethical and safe interactions.

Completions: Responses generated by AI models based on the input prompt, typically used in text-based interactions.

Compute: The computational resources (e.g., processors, GPUs) required to train and run AI models.

Content Enrichment or Enrichment: Improving raw data by adding additional context, such as tags, metadata, or categorizations, to enhance usability.

Conversational AI: AI designed specifically for understanding and generating human language in a conversational context.

Data Augmentation: The process of artificially creating new training data from existing data to enhance model performance.

Data Extraction: The process of pulling specific data or insights from unstructured sources, like text or images.

Data Ingestion: The initial step in the data pipeline where data is collected from various sources and processed for use.

Data Sets: Collections of data used to train, validate, or test AI models.

Deep Learning: A subset of machine learning using neural networks with multiple layers to learn complex patterns in data.

Determinism: When an AI model produces the same output each time it receives the same input.

Diffusion: A process used in generative models to create or modify data, often seen in image generation techniques.

Embedding: A representation of data, often words or sentences, in a continuous vector space to capture its meaning or relationships.

Evaluations: Tests or assessments to measure the effectiveness or accuracy of AI models.

Explainable AI (XAI): AI systems designed with transparency to allow humans to understand how they reach their conclusions.

Few-shot Learning: A technique where AI models learn tasks with minimal training examples.

Fine-tuning: The process of adapting a pre-trained model to a specific task with additional data.

Foundation Model: A large-scale AI model pre-trained on vast data that can be adapted to various downstream tasks.

Generative AI: AI that can produce new content, such as text, images, or music, rather than simply analyzing existing data.

Gemini: A family of AI models by Google focused on both conversational and multimodal tasks.

GPT (Generative Pretrained Transformer): A transformer-based model that generates text by predicting the next word in a sequence.

GPU (Graphics Processing Unit): Hardware optimized for parallel processing, commonly used to accelerate AI computations.

Hallucination: When an AI model generates information that is not based on real data or facts.

Human-in-the-loop: A setup where human input guides or corrects AI decisions to improve performance or accuracy.

Inference: The process of making predictions or generating responses based on a trained AI model.

Knowledge Graph: A structured representation of interconnected facts that helps AI understand relationships between entities.

Large Language Model (LLM): A powerful type of AI trained on massive text data to understand and generate human language.

Latency: The time delay between a user's input and the AI's response.

Llama: Meta's open-source large language model designed for various text generation and understanding tasks.

Machine Learning: A field of AI where algorithms learn from data to make predictions or decisions without explicit programming.

Metadata: Data that provides information about other data, often used to organize and retrieve data efficiently.

Mistral: An open-source AI model focused on efficient, smaller-scale performance for various NLP tasks.

Model Configs: The settings and hyperparameters that define an AI model's structure and behavior.

Multimodal: AI models that can process and combine multiple types of input, such as text, images, and audio.

Multitask Prompt Tuning (MPT): A technique where prompts are adjusted to allow a model to perform multiple tasks.

Natural Language Processing (NLP): The field of AI focused on enabling computers to understand and process human language.

Neural Network: A series of interconnected nodes that mimic the human brain, used to detect patterns and make decisions in AI.

Node: Building blocks within Workflows in Lleverage.

Parameters: The values in a model that are adjusted during training to fit the data, such as weights in a neural network.

Parsing: The process of analyzing text to extract structured information, like document parsing (CV).

Pre-training: The initial phase of training a model on large datasets to develop foundational knowledge before fine-tuning.

Prompt: The input given to an AI model to generate a response, often structured to guide the model's output.

Prompt Chaining: The practice of linking multiple prompts to guide the AI through a sequence of responses.

Prompt Engineering: Crafting and optimizing prompts to achieve the best responses from AI models.

Prompt IDE: An interface to design, test, and refine prompts for better model interactions.

Prompt Massaging: Adjusting prompts to refine or correct model responses without major modifications.

RAG (Retrieval Augmented Generation): A model technique that retrieves data from external sources to improve response accuracy.

Reinforcement Learning: A type of machine learning where models learn by receiving rewards or penalties for their actions.

RLHF (Reinforcement Learning from Human Feedback): Training models by optimizing based on human feedback on responses.

Semantic Search: A search that uses the meaning of words rather than exact matches to retrieve relevant information.

Sentiment Analysis: The process of identifying the emotional tone in text, often used in social media monitoring.

Similarity Search: Finding data points similar to a query by comparing their vector embeddings.

Singularity: A theoretical point where AI surpasses human intelligence, leading to rapid and possibly unpredictable advances.

Structured Data: Data that is organized in a clear, defined format, such as tables or databases.

Structured Output: AI-generated data presented in an organized format like lists, tables, or fields.

Temperature: A parameter controlling the randomness of a model's output, where higher values lead to more varied responses.

TensorFlow: An open-source framework by Google for building and deploying machine learning models.

Token: A unit of text, such as a word or character, that a model processes to generate responses.

Token Limit: The maximum number of tokens a model can handle in a single input or output sequence.

Top-P (Nucleus Sampling): A decoding method where only the top cumulative probability tokens are considered in response generation.

Training Data: Data used to train an AI model, helping it learn patterns and make predictions.

Transformer: A type of model architecture that excels in handling sequential data, particularly for NLP tasks.

Unstructured Data: Data not organized in a pre-defined way, like raw text, audio, or images.

Variable: A storage element in programming or machine learning that can hold data values for processing.

Vector Database: A specialized database optimized for storing and retrieving vector embeddings (e.g. Weaviate, Pinecone)

Vectorizing: The process of converting text or other data into numerical vectors to enable similarity comparisons.

Zero-shot Learning: When a model performs a task it wasn't explicitly trained for by leveraging general knowledge.

Want to read more?

Lleverage is on a mission to make all engineers AI engineers, by providing an end-to-end AI development platform that helps product & development teams to build, test, and deploy AI features.