Cambrion Glossary
Welcome to the official glossary for the Cambrion API-based agentic document processing platform. This resource provides definitions for the key concepts, components, and architectural principles that underpin our platform. Use the A-Z navigation below to find the terms you need.
A
Agent / Agentic AI
An autonomous AI entity designed to perform complex tasks by reasoning, planning, and executing a sequence of operations. At Cambrion, an Agent is configured through a Pipeline to process unstructured documents. These agents go beyond simple data extraction by validating information, linking to external data sources, and learning from user feedback, a process central to Agentic AI.
Agentic Orchestration
Cambrion's proprietary core intellectual property describing how an agent intelligently selects, configures, and chains actions (like using different VLMs, validation rules, or external tools). This process leverages deep workflow knowledge and user instructions to optimize for accuracy and relevance, going far beyond simple prompting.
API (Application Programming Interface)
The primary method for integrating Cambrion's capabilities into your existing workflows and systems. The platform provides a robust REST API that allows developers to programmatically send documents, manage pipelines, and receive structured data outputs, enabling seamless automation.
Asynchronous Processing
A processing mode where data processing requests are handled in the background, without requiring the client to wait for a real-time response. This is ideal for high-volume or non-urgent tasks, improving system efficiency and scalability.
B
Business Context
Additional information, such as business rules, requirements, or links to data sources, that is provided to a Cambrion Agent. This context allows the agent to validate data and make more intelligent decisions, ensuring the output is not just extracted but is also reliable and relevant to your specific workflow.
C
Continuous Learning
The ability of the Cambrion platform to improve its accuracy over time through a feedback loop. This is achieved via In-Context Learning from user instructions and API Hints, which deepens the agent's understanding of specific domains and workflows without model retraining.
D
Data Enrichment
The process of augmenting extracted data with additional information from internal or external sources. A Cambrion agent can perform enrichment by querying a database for product details or searching the web for company information, adding value beyond simple extraction.
Deployment
The process of making an agentic pipeline active and available for use. Cambrion supports rapid deployment to a secure EU cloud environment or, for enterprise customers, on-premise within their own infrastructure for maximum data control.
Document Flexibility
The platform's inherent ability to process a wide variety of document types and layouts without prior setup or configuration. This is a key benefit of the zero-shot, template-free approach, contrasting with legacy systems that require rigid templates for each document type.
E
Extraction
A core capability of the Cambrion platform, involving the automated identification and retrieval of specific data points from unstructured documents. Cambrion performs intelligent extraction that is not dependent on fixed templates, allowing it to handle diverse and complex document formats accurately.
F
Feedback Loop
The core mechanism that enables continuous learning. Users provide feedback on the agent's output via the UI, API Hints, or Cambrion's Feedback API. This information is fed back into the system to refine the agent's understanding and improve the accuracy of future processing for that specific use case.
Foundation Model
A large-scale, pre-trained artificial intelligence model that serves as the base for more specialized tasks. Cambrion's platform leverages state-of-the-art foundation models for their powerful reasoning and vision capabilities, orchestrating them to perform specific document automation tasks.
G
GDPR (General Data Protection Regulation)
A key compliance standard that the Cambrion platform is designed to meet. As a Europe-native company, Cambrion offers full data control, EU-based cloud hosting, and on-premise deployment options to ensure the highest level of data sovereignty and privacy.
H
Hints
A feedback mechanism available via the UI and API that allows users to guide and correct the agent's behavior, improving its accuracy over time.
I
IDP (Intelligent Document Processing)
The market category for technologies that extract, and process data from documents. Cambrion represents the "4th Wave" of IDP, using Agentic AI and VLMs to overcome the limitations of older, template-based IDP solutions, which are often rigid and require extensive training.
In-context Learning
A key feature where the AI agent's accuracy improves based on the context provided in each task, without needing to be retrained. This allows the system to become progressively more tuned to specific customer workflows.
J
JSON (JavaScript Object Notation)
The standard format for the clean, validated, and ready-to-use structured data that the Cambrion platform outputs via its API. This format is universally easy for other software systems and workflows to ingest and utilize.
K
Knowledge Worker
The primary user of modern enterprise software. Cambrion empowers non-technical knowledge workers by providing a self-serve UI to automate document-heavy tasks, freeing them from manual data entry to focus on higher-value activities.
L
Linking
An intelligent operation where the Cambrion agent validates or matches extracted data by connecting it with information from internal or external sources.
M
Markdown Output
An alternative output format for structured data. Markdown is a lightweight markup language that can be used to create formatted text, making it useful for generating human-readable summaries or reports directly from the agent's output.
Multi-modal
The ability of an AI model to process and understand information from multiple types of data, or modalities, simultaneously. Cambrion's agents are multi-modal, leveraging Vision-Language Models (VLMs) to interpret both the text and the visual layout of a document to achieve higher accuracy.
N
Non-technical Users
A key audience for the Cambrion platform. The self-serve, no-code UI is designed specifically for business users and domain experts, enabling them to build and manage powerful automation pipelines without requiring engineering support.
O
On-Premise
A deployment option offered for enterprise clients that allows the Cambrion platform to be hosted within a company's own private infrastructure. This provides maximum data control and security, ensuring that sensitive documents never leave the corporate environment.
P
Pipeline
The core configuration unit in the Cambrion platform that defines a specific document processing workflow. A pipeline is set up in minutes using a simple UI, specifying the document types, data to be extracted, and any extraction requirements in natural language. Each pipeline is executed by a dedicated Cambrion Pipeline Agent and can be deployed via the API.
Playground
The interactive, self-serve user interface within the Cambrion platform used to configure and test pipelines in minutes. It is designed for non-technical users without writing any code.
Q
Query
A request for information, either from a database or a web search, that a Cambrion agent can perform as part of its workflow. This ability allows the agent to perform data enrichment and validation by looking up information that is not present in the original document.
R
RAG (Retrieval-Augmented Generation)
An AI technique that enhances the accuracy of Large Language Models (LLMs) by providing them with relevant, retrieved information. Cambrion significantly improves RAG effectiveness by first transforming unstructured documents into clean, validated, structured data, which can then be used to populate vector databases for precise retrieval, eliminating the "Garbage In, Garbage Out" problem.
S
SDK (Software Development Kit)
A set of tools and libraries provided by Cambrion to simplify the integration of its platform into various programming environments. Alongside the REST API, the SDK helps developers to quickly add document automation capabilities to their applications.
Structured Data
The end product of Cambrion's process: clean and ready-to-use data organized in a predictable format. This is in direct contrast to the messy, varied formats of Unstructured Data and is essential for reliable automation and effective AI applications.
T
Template-Free
The architectural principle that Cambrion operates without relying on rigid, pre-defined templates. This allows the platform to handle variations in document structure and format dynamically, a major advantage over legacy OCR and IDP systems that fail when a layout changes.
U
Unstructured Data
Data that does not have a pre-defined data model or is not organized in a pre-defined manner, which accounts for 80-90% of all enterprise data. This includes PDFs, images, emails, and handwritten notes. Cambrion specializes in transforming this type of data into valuable, Structured Data.
V
Validation
A critical step in Cambrion's process where the extracted data is checked for accuracy and reliability against business rules, internal or external data sources. This built-in validation ensures high data quality, which is essential for trustworthy downstream automation and AI systems.
VLM (Vision-Language Model)
State-of-the-art AI models that can understand and process information from both images and text simultaneously. Cambrion leverages a variety of leading VLMs as part of its Agentic Orchestration core, allowing its agents to process a wide range of documents and overcome the limits of legacy systems.
W
Workflow Integration
The process of connecting Cambrion's data processing capabilities into a company's existing business applications and workflows. This is typically achieved via the platform's REST API, allowing for a seamless flow of data from unstructured documents into systems like ERPs, CRMs, or other databases.
X
XML (eXtensible Markup Language)
A data output format that can be supported as a custom option. While JSON is the default, Cambrion can configure pipelines to deliver structured data in XML or other formats to meet the specific integration requirements of legacy enterprise systems.
Y
Yield (Data Yield)
A metric used to evaluate the effectiveness of a pipeline, representing the percentage of documents processed that result in high-quality, fully validated structured data without errors or need for manual correction. A high data yield is a key indicator of a pipeline's ROI.
Z
Zero-shot
The ability of an AI system to handle tasks it has not been explicitly trained on. Cambrion's agents have zero-shot capabilities, meaning they can instantly process and extract data from any document type without requiring costly data labeling, model training, or the creation of rigid templates.