RAG Isn't Dead, You Just Need a Better Starting Point

Phil Nash (X, LinkedIn), a developer relations engineer at IBM, opened his AI Engineer Europe talk by taking aim at the "RAG is dead" discourse. His counter is simple: if every business had less than a million tokens of data, maybe. But they don't, and not everyone wants to pay for a million input tokens on every query. RAG isn't going anywhere -- it's just harder than people think.

"It turns out that RAG is actually hard, and it's hard for different reasons for different projects."

The Problem Isn't the Loop

Phil argues that the basic retrieve-and-generate pattern isn't where teams struggle. The difficulty is everything around it: parsing messy documents (especially PDFs), choosing embedding models, tuning chunking strategies, and adapting search to specific data and user patterns. Every organization's documents and users are different, so there's no universal solution.

His reframe: what's missing isn't a better algorithm but a better starting point. An opinionated-but-flexible baseline built from open-source components -- sensible defaults with every layer exposed for customization. He presented a stack combining three open-source projects (Docling for document processing, OpenSearch for search, and Langflow for orchestration) as one such baseline, currently at version 0.4.0.

OpenRAG logo with the logos of its three open-source components: the Docling duck mascot, the OpenSearch logo, and the Langflow logo

Parsing Is the First Hard Problem

The document processing layer uses Docling, an open-source library from IBM Research Zurich. Phil walks through why parsing matters: RAG applications ingest HTML, Markdown, Word docs, slides, spreadsheets, audio, video, and PDFs. Each format has different challenges.

Docling runs multiple processing pipelines depending on file type:

Docling's pipeline architecture diagram showing multiple processing paths: a simple pipeline for text documents, an ASR pipeline for audio and video, and two PDF pipelines (standard multi-model and VLM), all converging into a DocTags intermediate representation that feeds into chunking and export

Simple pipeline -- text extraction and hierarchy for straightforward formats like Markdown and HTML
ASR pipeline -- automatic speech recognition for audio and video
Standard PDF pipeline -- a collection of small focused models for layout analysis, table extraction, and image extraction, with optional OCR for scanned documents
VLM pipeline -- a 258-million-parameter vision model (Granite Docling) that extracts everything in a single pass

All of these produce an intermediate representation -- an XML-like format called DocTags -- that can be converted to Markdown, HTML, or JSON. The chunker then uses the document's parsed structure rather than arbitrary character counts, which Nash presents as a meaningful improvement over naive chunking strategies. The whole thing runs offline, which matters for air-gapped environments.

Search Beyond Vectors

The retrieval layer uses OpenSearch, the open-source Elasticsearch fork. Nash's point here is that vector search alone isn't enough -- hybrid search combining vectors and keyword matching gives better results out of the box.

A few features he highlights:

Multiple embedding models simultaneously -- useful when migrating between embedding models, though he notes it slows search down
Configurable filtering and aggregation -- exposed to end users as "knowledge filters" so they can scope queries to specific document sets
JVector -- a disk-ANN-based vector index plugin that replaces the default HNSW/IVF options. The key property: indexes don't need to fit entirely in memory, and it supports live indexing

Making Retrieval Agentic

The most interesting architectural choice Phil describes is replacing the traditional retrieval pipeline with an agent. Instead of the standard embed-query, top-K, stuff-into-prompt flow, the user's query goes to an agent equipped with search tools. The agent decides what searches to run and how many.

Slide titled "Langflow — Agentic retrieval" showing the full agent flow in the Langflow visual editor, with embedding models on the left feeding into an OpenSearch multi-model search node, connected to an agent node with tools including a calculator and MCP flow, a prompt template, and a chat output

In his demo, the agent has access to an OpenSearch search tool, a calculator, and an MCP-based URL ingester. He demos the generation side running Granite 4 3B locally through Ollama, with Qwen3 Embedding 0.6B handling the embeddings -- also local.

"They're language models, not math models. So a calculator is always useful."

The orchestration layer (Langflow) provides a drag-and-drop visual editor for these flows. Nash demos adding guardrails to the agent pipeline by dragging in components -- the point being that customization doesn't require rewriting code.

Langflow editor zoomed into the agent node, showing the agent configuration panel with model selection (granite4-3b), agent instructions, and connected tools including an MCP URL ingester, calculator, and OpenSearch search

Everything Is Swappable

Phil is explicit that the stack isn't prescriptive about model providers. It supports OpenAI, Anthropic, WatsonX AI, and Ollama for both embedding and generation. The demo uses fully local models, but you can swap in hosted APIs without changing the pipeline.

OpenRAG settings page showing cloud connectors (Google Drive, SharePoint, OneDrive), model providers (OpenAI, Ollama, IBM watsonx.ai, Anthropic), and agent configuration with granite4-3b selected as the language model

The project also includes cloud connectors for Google Drive, SharePoint, and OneDrive for document syncing, a chat UI with suggested follow-up prompts, and API keys for using the search and agent capabilities in external applications. There's also an MCP server available for handing the stack off to other agents.

The Takeaway

Nash's closing argument is that whether RAG is "solved" depends entirely on your data and your users. The answer isn't a single algorithm -- it's a stack that gives you a strong baseline and lets you tune every layer against your own evaluation criteria.

"Well, that's kind of up to you, to your data and to your users."

Phil Nash spoke at AI Engineer Europe 2026. Developer relations engineer at IBM.