AI Glossary for Advanced Users: Subagents, Vision, Python & More

Agents & Automation

In Part 1, we explained what an agent is. Here we cover the concepts that make agents truly useful in practice.

Subagent

A specialized agent that is called by a parent agent for a specific subtask. The main agent coordinates – the subagents are the specialists.

Finance example: The main agent processes an invoice. For the tax compliance check, it calls a tax subagent; for account coding, it calls a bookkeeping subagent. Think of it as a team lead delegating tasks to subject-matter experts.

Human-in-the-Loop

A workflow design where the AI handles routine cases autonomously but escalates uncertain or critical cases to a human. The gold standard for AI in finance.

Finance example: The AI codes 90% of invoices automatically. For the remaining 10% – unclear account assignments, unusual amounts – it flags the case for human review.

Confidence Score

A measure of how certain the AI is about its output – expressed as a number between 0 and 1 (or 0–100%). If the score falls below a defined threshold, the case is routed to a human.

Finance example: The AI identifies a vendor with 98% confidence → automatic processing. At only 65% → manual review. This is how we manage quality and risk.

Seeing & Understanding: How AI Reads Documents

AI can do more than process text – it can also “see” images, PDFs, and scans. This changes how we work with documents.

Multimodal

A model that can process not just text but also images, PDFs, or audio. Modern LLMs (Large Language Models, see Part 1) like Claude and ChatGPT are multimodal – we can send them a photo of an invoice and they read the contents.

Finance example: Instead of scanning and converting an invoice first, we upload the PDF directly – the AI reads the amount, vendor, and line items straight from the image.

OCR (Optical Character Recognition)

A technology that converts text from images or scanned documents into machine-readable text. The traditional method for making paper documents digitally processable.

Finance example: The scanner on the desk captures an image – OCR turns it into searchable text. But here’s the thing: OCR only recognizes characters; it doesn’t understand what they mean. (More on this in our comparison OCR vs. AI Extraction.)

Vision (Computer Vision / Image Recognition)

An LLM’s ability to not just read images but to understand them. While traditional OCR only recognizes individual characters, a vision model understands the full context: it knows that “12,500.00” next to “Net amount” on an invoice is the net amount – regardless of the layout.

Finance example: We photograph a stack of paper invoices with our phone – different vendors, different countries, different languages. The vision model identifies the vendor, amount, line items, and tax rates on every invoice – without us having to create a template for each layout.

OCR vs. Vision – the key difference

OCR recognizes letters. Vision understands meaning. A vision model can say: “This invoice has an incorrect tax calculation” – because it understands the context and can spot inconsistencies. Traditional OCR systems often need per-vendor templates for structured extraction – vision models don’t.

Good to know: Even though vision models work impressively well out of the box, a scalable process with high quality still requires significant engineering. Error handling, cost optimization, validation, and speed are challenges that only become apparent at higher volumes. (More on this: OCR vs. AI Extraction.)

How AI Works Behind the Scenes

When we use AI tools, there’s more happening behind the scenes than we see. These terms help us better understand what we’re getting back.

Python

A programming language widely used in data analysis. LLMs can write and execute Python code in the background when we ask them to analyze data. In ChatGPT and Claude, the code runs automatically behind the scenes. We never see it – just the result.

Finance example: We upload an Excel file with accounts payable data and ask: “How often did we pay within the payment terms, broken down by vendor category?” The AI writes a Python script in the background that analyzes the data and creates a chart – we just get the finished result.

HTML (Hypertext Markup Language)

The language that web pages are written in. Modern LLMs can generate complete, interactive HTML pages – with charts, dashboards, and filters. In Claude, these live previews are called “Artifacts.” We don’t need to understand any code for this.

Finance example: We upload a CSV file (a simple spreadsheet file) with monthly figures and say: “Create an interactive dashboard with spending by category.” Claude generates a finished HTML page with a pie chart, bar chart, and dropdown filters – in seconds, with no developer needed.

Markdown

A simple text formatting system using symbols: # for headings, **bold** for bold text, - for bullet points. Nearly every AI tool outputs its responses in Markdown.

# Monthly Close January Costs are **12% over budget**. - Personnel: 45,000 EUR - Rent: 12,000 EUR - IT: 8,500 EUR

Why it matters: When Claude creates an analysis, it comes in Markdown. If you know the symbols, you immediately understand the structure – even in raw format.

Python vs. HTML – when does the AI use which?

When we ask the AI to calculate things (totals, comparisons, statistical analyses), it writes Python. When we need a visualization (dashboard, interactive chart), it generates HTML. Often it uses both: Python for the calculation, HTML for the presentation.

Infrastructure & Operations

Terms that come up when the question is: where and how does the AI actually run?

Cloud vs. On-Premise

Cloud = the AI runs on the provider’s servers (e.g., at Anthropic or OpenAI). On-premise = the AI runs on the company’s own infrastructure. Most AI tools use the cloud, but for sensitive data there are also on-premise solutions.

Good to know: Cloud doesn’t automatically mean insecure – providers like Anthropic and OpenAI offer strict data protection agreements. On-premise offers maximum control but is significantly more effort to set up and operate.

Open Source vs. Open Weight – what’s the difference?

Two terms that are often confused. “Open source” means the complete source code is published – anyone can read, modify, and redistribute it. “Open weight” in the context of AI models means the trained weights are available, but not necessarily the training code or training data. (More on open-weight models in the Basics Glossary.)

Why it matters: When a provider says “open source,” it’s worth taking a closer look. Llama from Meta is open weight – we can use the model, but we can’t trace how it was trained.

Terminal (Command Line)

A text-based interface where we type commands directly to the computer – without graphical buttons or menus. Some AI tools – for example Claude Code, an AI developer tool from Anthropic – run in the terminal instead of the browser.

The terminal is more powerful because it can combine commands, automate tasks, and access anything the computer can do. The downside: we need to know the commands – there are no buttons to click.

cd Documents → Open a folder (cd = change directory) ls → List files in a folder (ls = list) mkdir Invoices → Create a new folder (mkdir = make directory)

The good news: Most finance teams never need the terminal. And if they do – just ask the AI: “What command do I need to type to do X?”

Now we know the key terms

With the basics glossary and this second part, we’ve covered the terms we encounter most often in our day-to-day work with AI tools. That doesn’t mean we need to understand everything in detail – but we can hold our own in conversations about subagents, vision models, or Python-based analyses.

This glossary is growing. We welcome suggestions and continuously add new terms.

Agents & Automation

Seeing & Understanding: How AI Reads Documents

How AI Works Behind the Scenes

Infrastructure & Operations

Now we know the key terms

New tips straight to your inbox