Vectorless RAG - PageIndex : Learn from First Principles - No 3rd Party Libraries

Learn from First Principles

Mar 21, 2026

Vectorless RAG: The Concepts

Note: This tutorial was written with AI assistance and reviewed/published by a human. I use AI as a co-pilot — the I own the quality.

What Problem Does RAG Solve?

Large Language Models (LLMs) are powerful but have a knowledge cutoff. They can’t answer questions about your private documents, recent data, or domain-specific content. Retrieval-Augmented Generation (RAG) solves this by retrieving relevant content from your documents and feeding it to the LLM as context.

Traditional (Vector) RAG — How It Works

The standard approach involves:

Chunk — Split documents into small text fragments (e.g., 500 tokens each)
Embed — Convert each chunk into a vector using an embedding model
Store — Save vectors in a vector database (Pinecone, Weaveworks, ChromaDB, etc.)
Query — Convert the user’s question into a vector
Search — Find the most similar vectors via cosine similarity
Generate — Pass matching chunks to the LLM to generate an answer

The Problem: Similarity ≠ Relevance

Vector search finds text that sounds like your question, but that’s not the same as text that answers it. Consider:

Question: “What was the revenue impact of the Q3 supply chain disruption?”

A vector search might return chunks about:

“Supply chain” in the operations section (similar words, wrong context)
“Revenue” in the quarterly summary (similar words, missing the connection)
But miss the actual analysis buried in section 4.2.3 of the financial review

This is what PageIndex calls “vibe retrieval” — matching by vibes (semantic similarity) rather than reasoning.

Other Vector RAG Drawbacks

Chunking destroys structure — A table split across two chunks loses meaning
No traceability — You can’t explain why a chunk was retrieved
Tuning overhead — Chunk size, overlap, embedding model, similarity threshold all need tuning
Infrastructure cost — Vector databases add operational complexity

Vectorless RAG — The PageIndex Approach

PageIndex (by VectifyAI) proposes a fundamentally different approach: No vectors. No embeddings. No chunking. Instead, it mimics how a human expert navigates a long document.

How a Human Expert Works

When a domain expert needs to find information in a 200-page report, they don’t read every page. They:

Scan the table of contents to understand the document structure
Reason about which sections are likely to contain the answer
Navigate to those specific sections
Read the relevant content
Synthesize an answer

PageIndex replicates this exact workflow using LLMs.

The Two-Phase Architecture

Phase 1: Index Construction

The document is transformed into a hierarchical tree index — think of it as a smart table of contents with summaries:

Document: "Microservices Architecture Guide"
├── 1. Introduction [pages 1-3]
│   Summary: "Defines microservices, compares with monoliths..."
│   ├── 1.1 What Are Microservices [pages 1-1]
│   ├── 1.2 Why Microservices Matter [pages 1-2]
│   └── 1.3 Monolith vs Microservices [pages 2-3]
├── 2. Core Design Principles [pages 3-6]
│   Summary: "Covers single responsibility, loose coupling, API design..."
│   ├── 2.1 Single Responsibility [pages 3-4]
│   ├── 2.2 Loose Coupling [pages 4-5]
│   └── ...
└── ...

Each node contains:

Title — The section heading
Page range — Where this section lives in the document
Summary — A concise description of the section’s content
Children — Subsections (creating the hierarchy)

The LLM generates this structure by reading the document and identifying its natural organization.

Phase 2: Reasoning-Based Retrieval

When a user asks a question:

Tree Search — The LLM receives the tree index (titles + summaries, NOT the full text) and reasons about which nodes are relevant
Context Extraction — The actual text from the selected nodes is extracted
Answer Generation — The LLM reads the extracted text and generates an answer with citations

User Question
    ↓
[LLM reads tree index]  ← titles + summaries only
    ↓
"Section 2.2 and 3.1 are relevant because..."  ← explicit reasoning
    ↓
[Extract text from sections 2.2 and 3.1]
    ↓
[LLM generates answer with page citations]
    ↓
Answer + Sources

Why This Works Better

AspectVector RAGVectorless RAGRetrievalSemantic similarity (approximate)LLM reasoning (deliberate)Document structureDestroyed by chunkingPreserved as hierarchyTraceabilityOpaque similarity scoresExplicit reasoning chainInfrastructureEmbedding model + vector DBJust an LLMTuningChunk size, overlap, thresholdMinimal

Performance

PageIndex powers Mafin 2.5, which achieved 98.7% accuracy on FinanceBench — a benchmark for financial document analysis. This significantly outperforms traditional vector-based RAG on complex, domain-specific documents.

What We Build in This Demo

This project implements both phases from scratch:

Indexer — Parses a document, sends it to an LLM, and builds the hierarchical tree index
Retriever — Accepts questions, uses LLM reasoning to navigate the tree, extracts context, and generates cited answers

Available in both JavaScript (zero dependencies) and Python (zero dependencies for markdown; pymupdf optional for PDF support). Supports multiple LLM providers: OpenAI (default), Google Gemini, Anthropic Claude, and Ollama (local).

JavaScript Demo: Vectorless RAG from Scratch

Note: This tutorial was written with AI assistance and reviewed/published by a human. We use AI as a co-pilot — the human owns the quality.

Prerequisites

Node.js 18+ (for built-in fetch-compatible APIs)
An LLM API key from any supported provider (see below)
No npm packages required — this is truly zero-dependency

Supported Providers

ProviderAPI Key Env VarDefault ModelGet a KeyOpenAI (default)OPENAI_API_KEYgpt-4o-miniplatform.openai.comGoogle GeminiGEMINI_API_KEYgemini-2.0-flashaistudio.google.comAnthropic ClaudeANTHROPIC_API_KEYclaude-sonnet-4-20250514console.anthropic.comOllama (local)none neededllama3ollama.com

Project Structure

js/
├── index.js          # CLI entry point + interactive query loop
├── indexer.js        # Document parsing + hierarchical tree construction
├── retriever.js      # LLM-driven tree search + answer generation
└── sample-data/
    └── sample-doc.md # Sample document (microservices guide)

Quick Start

# Option 1: Set your API key in .env (recommended)
cp ../.env.example ../.env
# Edit ../.env and add your key

# Option 2: Or export directly
export OPENAI_API_KEY="sk-your-key-here"

# Navigate to the js directory
cd js

# Run with the sample document (uses OpenAI by default)
node index.js --file ./sample-data/sample-doc.md

# Use a different provider
node index.js --file ./sample-data/sample-doc.md --provider gemini
node index.js --file ./sample-data/sample-doc.md --provider anthropic
node index.js --file ./sample-data/sample-doc.md --provider ollama

# Override the model
node index.js --file ./sample-data/sample-doc.md --model gpt-4o
node index.js --file ./sample-data/sample-doc.md --provider gemini --model gemini-1.5-pro

How It Works — File by File

1. indexer.js — Building the Tree Index

This file does two things:

a) Parse the document into “pages”

function parseMarkdownIntoPages(filePath, linesPerPage = 40) {
  const content = fs.readFileSync(filePath, "utf-8");
  const lines = content.split("\n");
  const pages = [];

  for (let i = 0; i < lines.length; i += linesPerPage) {
    const pageLines = lines.slice(i, i + linesPerPage);
    pages.push({
      pageNumber: pages.length + 1,
      text: pageLines.join("\n"),
    });
  }
  return pages;
}

Since markdown files don’t have real “pages,” we simulate them by grouping every 40 lines into a page. This gives the LLM page references to work with. For PDFs, each physical page would be one entry.

b) Ask the LLM to build a hierarchical index

The key prompt instructs the LLM to:

Read the entire document (tagged with page markers)
Identify the natural hierarchy (sections, subsections)
Return a JSON tree with title, page range, and summary for each node

The LLM response looks like:

{
  "title": "Microservices Architecture Guide",
  "start_page": 1,
  "end_page": 8,
  "summary": "A comprehensive guide covering microservices design, communication, deployment, security, and testing.",
  "children": [
    {
      "title": "Introduction",
      "start_page": 1,
      "end_page": 2,
      "summary": "Defines microservices and compares them with monolithic architecture.",
      "children": [...]
    }
  ]
}

c) LLM API call — zero dependencies, multi-provider

Instead of using any SDK, we make raw HTTPS calls. Each provider (OpenAI, Gemini, Claude, Ollama) has its own config for endpoint, auth, request format, and response parsing — all handled by a PROVIDERS config object:

const PROVIDERS = {
  openai:    { hostname: "api.openai.com",    path: "/v1/chat/completions", ... },
  gemini:    { hostname: "generativelanguage.googleapis.com", ... },
  anthropic: { hostname: "api.anthropic.com", path: "/v1/messages", ... },
  ollama:    { hostname: "localhost",          port: 11434, ... },
};

function callLLM(apiKey, model, systemPrompt, userPrompt, provider = "openai") {
  const cfg = PROVIDERS[provider];
  // builds request body, headers, and parses response per provider
}

This means zero npm packages regardless of which provider you use.

2. retriever.js — Reasoning-Based Retrieval

This is where the magic happens. Three steps:

Step 1: Tree Search (LLM reasons over the index)

The LLM receives ONLY the tree structure (titles + summaries, not full text) and reasons about which nodes answer the question:

function buildSearchPrompt(query, tree) {
  const slimTree = stripTextFromTree(tree); // remove full text, keep summaries

  return `You are given a user question and a hierarchical tree index...
    QUESTION: ${query}
    DOCUMENT TREE: ${JSON.stringify(slimTree, null, 2)}
    ...select the most specific nodes...`;
}

The LLM responds with its reasoning:

{
  "thinking": "The question asks about testing strategies. Node 0.6 covers Testing Strategies with children for the testing pyramid, E2E testing, and performance testing.",
  "selected_nodes": ["0.6.1", "0.6.2", "0.6.3"]
}

Step 2: Context Extraction

We look up the selected nodes in the tree, find their page ranges, and extract the actual text:

function extractContext(selectedNodeIds, tree, pages) {
  // Find each node, get its page range, collect text
  for (const nodeId of selectedNodeIds) {
    const node = allNodes.find((n) => n.node_id === nodeId);
    const nodePages = pages.filter(
      (p) => p.pageNumber >= node.start_page && p.pageNumber <= node.end_page
    );
    // ...
  }
}

Step 3: Answer Generation

The extracted text is sent to the LLM with instructions to answer the question and cite sources:

{
  "answer": "The microservices testing strategy follows a pyramid model with three levels...",
  "sources": [
    {"node_id": "0.6.1", "title": "The Testing Pyramid", "pages": "6-7"}
  ]
}

3. index.js — CLI and Interactive Loop

Ties everything together:

Parses CLI arguments
Builds or loads the index
Enters an interactive query loop where you type questions and get cited answers

Running the Demo

Step 1: Build the Index

# Set your key (or use .env)
export OPENAI_API_KEY="sk-..."
cd js
node index.js --file ./sample-data/sample-doc.md

You’ll see the tree being built:

🤖 Provider: openai | Model: gpt-4o-mini

📄 Parsing document: ./sample-data/sample-doc.md
   Found 8 pages

🌳 Building hierarchical tree index via LLM (openai/gpt-4o-mini)...

📋 Document Index:
0 Microservices Architecture Guide [pages 1-8]
  → A comprehensive guide to building microservices...
  0.1 Introduction [pages 1-2]
    → Defines microservices, why they matter...
    0.1.1 What Are Microservices [pages 1-1]
    ...

💾 Index saved to: ./sample-data/sample-doc_index.json

Step 2: Ask Questions

❓ Your question: What are the main communication patterns between microservices?

🌳 Step 1: Searching tree index (LLM reasoning)...
   Thinking: The question is about communication patterns. Section 3 covers
   Communication Patterns with subsections on synchronous, asynchronous, and
   service mesh approaches.
   Selected nodes: 0.3.1, 0.3.2, 0.3.3

📖 Step 2: Extracting context from selected nodes...
   Extracted 3 sections

💡 Step 3: Generating answer from context...

──────────────────────────────────────────────────────
📝 ANSWER:
──────────────────────────────────────────────────────
The three main communication patterns are:
1. Synchronous (HTTP/REST, gRPC) — request-response...
2. Asynchronous (Kafka, RabbitMQ) — event-driven...
3. Service Mesh (Istio, Linkerd) — infrastructure layer...

📌 Sources:
   • Synchronous Communication (Pages 4-4)
   • Asynchronous Communication (Pages 4-5)
   • Service Mesh (Pages 5-5)
──────────────────────────────────────────────────────

Step 3: Reuse the Index

Skip the indexing step on subsequent runs:

node index.js --index ./sample-data/sample-doc_index.json --file ./sample-data/sample-doc.md

Try These Questions

“How does Netflix handle microservices at scale?”
“What security measures should be implemented at the API gateway?”
“Compare synchronous and asynchronous communication patterns”
“What is the strangler fig pattern?”
“How should microservices testing be structured?”

Using Your Own Documents

Point --file at any markdown or text file:

node index.js --file ~/my-report.md
node index.js --file ~/meeting-notes.txt

Switching Providers

# Google Gemini
export GEMINI_API_KEY="your-key"
node index.js --file ./sample-data/sample-doc.md --provider gemini

# Anthropic Claude
export ANTHROPIC_API_KEY="your-key"
node index.js --file ./sample-data/sample-doc.md --provider anthropic

# Ollama (local, no key needed — just run: ollama serve)
node index.js --file ./sample-data/sample-doc.md --provider ollama --model llama3

Or put all your keys in the .env file and just use --provider:

node index.js --file doc.md --provider gemini

Key Takeaway

Notice what’s NOT happening:

No embedding model
No vector database
No chunking or overlap tuning
No similarity scores

Just structured indexing + LLM reasoning. The LLM explicitly tells you why it picked each section, and every answer includes page-level citations. That’s Vectorless RAG.

Python Demo: Vectorless RAG from Scratch

Note: This tutorial was written with AI assistance and reviewed/published by a human. We use AI as a co-pilot — the human owns the quality.

Prerequisites

Python 3.10+
An LLM API key from any supported provider (see below)
No pip packages required for markdown files
Optional: pymupdf for PDF support (pip install pymupdf)

Supported Providers

Project Structure

python/
├── main.py           # CLI entry point + interactive query loop
├── indexer.py        # Document parsing + hierarchical tree construction
├── retriever.py      # LLM-driven tree search + answer generation
└── sample-data/
    └── sample-doc.md # Sample document (microservices guide)

Quick Start

# Option 1: Set your API key in .env (recommended)
cp ../.env.example ../.env
# Edit ../.env and add your key

# Option 2: Or export directly
export OPENAI_API_KEY="sk-your-key-here"

# Navigate to the python directory
cd python

# Run with the sample document (uses OpenAI by default)
python main.py --file ./sample-data/sample-doc.md

# Use a different provider
python main.py --file ./sample-data/sample-doc.md --provider gemini
python main.py --file ./sample-data/sample-doc.md --provider anthropic
python main.py --file ./sample-data/sample-doc.md --provider ollama

# Run with a PDF (requires: pip install pymupdf)
python main.py --file ~/Downloads/report.pdf

# Override the model
python main.py --file ./sample-data/sample-doc.md --model gpt-4o
python main.py --file ./sample-data/sample-doc.md --provider gemini --model gemini-1.5-pro

How It Works — File by File

1. indexer.py — Building the Tree Index

Document Parsing

The indexer supports two input formats:

Markdown/Text — Groups lines into simulated “pages”:

def parse_markdown_into_pages(file_path, lines_per_page=40):
    with open(file_path, "r") as f:
        lines = f.readlines()

    pages = []
    for i in range(0, len(lines), lines_per_page):
        page_lines = lines[i : i + lines_per_page]
        pages.append({"page_number": len(pages) + 1, "text": "".join(page_lines)})
    return pages

PDF — Extracts text per physical page using PyMuPDF:

def parse_pdf_into_pages(file_path):
    import pymupdf
    doc = pymupdf.open(file_path)
    pages = []
    for i, page in enumerate(doc):
        text = page.get_text()
        if text.strip():
            pages.append({"page_number": i + 1, "text": text})
    return pages

LLM Call — Zero Dependencies, Multi-Provider

Uses Python’s built-in urllib to call any provider’s API directly. Each provider (OpenAI, Gemini, Claude, Ollama) has its own config for endpoint, auth, request format, and response parsing:

PROVIDERS = {
    "openai":    {"url": "https://api.openai.com/v1/chat/completions", ...},
    "gemini":    {"url_fn": lambda model, key: f"https://.../{model}:generateContent?key={key}", ...},
    "anthropic": {"url": "https://api.anthropic.com/v1/messages", ...},
    "ollama":    {"url": "http://localhost:11434/api/chat", ...},
}

def call_llm(api_key, model, system_prompt, user_prompt, provider="openai"):
    cfg = PROVIDERS[provider]
    # builds request body, headers, and parses response per provider

This means zero pip packages regardless of which provider you use (except pymupdf for PDF parsing).

Tree Construction

The LLM receives the full document (tagged with page markers) and returns a hierarchical JSON tree:

def build_index(file_path, api_key, model="gpt-4o-mini"):
    pages = parse_document(file_path)
    response = call_llm(api_key, model, SYSTEM_PROMPT, build_index_prompt(pages))
    tree = json.loads(response)
    assign_node_ids(tree)  # Add IDs like "0", "0.1", "0.1.1"
    return {"tree": tree, "pages": pages, ...}

2. retriever.py — Reasoning-Based Retrieval

Three-step pipeline:

Step 1: Tree Search

The LLM receives only the tree structure (titles + summaries) and reasons about relevance:

def search_tree(query, tree, api_key, model):
    response = call_llm(
        api_key, model,
        "You are a document retrieval assistant.",
        build_search_prompt(query, tree),
    )
    return json.loads(response)
    # Returns: {"thinking": "...", "selected_nodes": ["0.3.1", "0.3.2"]}

Step 2: Context Extraction

Extract actual text from the pages covered by selected nodes:

def extract_context(selected_node_ids, tree, pages):
    all_nodes = collect_all_nodes(tree)
    for node_id in selected_node_ids:
        node = next(n for n in all_nodes if n["node_id"] == node_id)
        node_pages = [p for p in pages
                      if node["start_page"] <= p["page_number"] <= node["end_page"]]
        text = "\n\n".join(p["text"] for p in node_pages)
        # ...

Step 3: Answer Generation

The LLM reads the extracted text and generates a cited answer:

def generate_answer(query, contexts, api_key, model):
    response = call_llm(api_key, model, "...", build_answer_prompt(query, contexts))
    return json.loads(response)
    # Returns: {"answer": "...", "sources": [{"title": "...", "pages": "..."}]}

3. main.py — CLI Entry Point

Uses argparse for CLI handling, then enters an interactive loop:

python main.py --file ./sample-data/sample-doc.md

Commands in the interactive loop:

Type a question → get a cited answer
tree → view the document index
quit → exit

Running the Demo

With Markdown (zero dependencies)

# Set your key (or use .env)
export OPENAI_API_KEY="sk-..."
cd python
python main.py --file ./sample-data/sample-doc.md

With PDF

pip install pymupdf
python main.py --file ~/Downloads/annual-report.pdf

Sample Session

🤖 Provider: openai | Model: gpt-4o-mini

📄 Parsing document: ./sample-data/sample-doc.md
   Found 8 pages

🌳 Building hierarchical tree index via LLM (openai/gpt-4o-mini)...

📋 Document Index:
0 Microservices Architecture Guide [pages 1-8]
  → Comprehensive guide covering design, deployment, security...
  0.1 Introduction [pages 1-2]
    ...
  0.5 Security Considerations [pages 5-6]
    0.5.1 Authentication and Authorization [pages 5-6]
    ...

❓ Your question: How should secrets be managed in microservices?

🌳 Step 1: Searching tree index (LLM reasoning)...
   Thinking: This question is about secrets management, which falls under
   security. Node 0.5.3 "Data Protection" covers secrets management,
   encryption, and tools like HashiCorp Vault.
   Selected nodes: 0.5.3

📖 Step 2: Extracting context from selected nodes...
   Extracted 1 section (Data Protection)

💡 Step 3: Generating answer from context...

──────────────────────────────────────────────────────
📝 ANSWER:
──────────────────────────────────────────────────────
Secrets in microservices should never be stored in code or plain-text
environment variables. Instead, use dedicated tools like HashiCorp Vault,
AWS Secrets Manager, or Kubernetes Secrets (with encryption at rest).
These tools provide secure storage and automatic rotation of database
passwords, API keys, and certificates.

📌 Sources:
   • Data Protection (Pages 6-6)
──────────────────────────────────────────────────────

Try These Questions

“What is the strangler fig pattern?”
“How does Uber organize its microservices?”
“What are the three pillars of observability?”
“Compare REST, gRPC, and GraphQL for microservices”
“What testing approach works best for microservices?”

Using Your Own Documents

# Markdown or text files
python main.py --file ~/notes/architecture-decisions.md

# PDF files (pip install pymupdf)
python main.py --file ~/Documents/whitepaper.pdf

# With a stronger model
python main.py --file ~/report.pdf --model gpt-4o

Key Takeaway

The Python implementation is functionally identical to the JavaScript version, with one bonus: PDF support. The core algorithm is the same — build a hierarchical tree, let the LLM reason over it, extract context from the selected nodes, and generate a cited answer.

No vectors. No embeddings. No chunking. Just structured indexing + LLM reasoning.

Source code

You can use your own Open AI/Gemini/ Anthropic Claude/ Ollama key

https://github.com/rajeshpillai/ai-examples/tree/main/vectorless-rag-pageindex

Algorisys Labs - The First Principles Learning Hub

Discussion about this post

Ready for more?

Algorisys Labs - The First Principles Learning Hub

Vectorless RAG - PageIndex : Learn from First Principles - No 3rd Party Libraries

Learn from First Principles

Vectorless RAG: The Concepts

What Problem Does RAG Solve?

Traditional (Vector) RAG — How It Works

The Problem: Similarity ≠ Relevance

Other Vector RAG Drawbacks

Vectorless RAG — The PageIndex Approach

How a Human Expert Works

The Two-Phase Architecture

Phase 1: Index Construction

Phase 2: Reasoning-Based Retrieval

Why This Works Better

Performance

What We Build in This Demo

Further Reading

JavaScript Demo: Vectorless RAG from Scratch

Prerequisites

Supported Providers

Project Structure

Quick Start

How It Works — File by File

1. indexer.js — Building the Tree Index

a) Parse the document into “pages”

b) Ask the LLM to build a hierarchical index

c) LLM API call — zero dependencies, multi-provider

2. retriever.js — Reasoning-Based Retrieval

Step 1: Tree Search (LLM reasons over the index)

Step 2: Context Extraction

Step 3: Answer Generation

3. index.js — CLI and Interactive Loop

Running the Demo

Step 1: Build the Index

Step 2: Ask Questions

Step 3: Reuse the Index

Try These Questions

Using Your Own Documents

Switching Providers

Key Takeaway

Python Demo: Vectorless RAG from Scratch

Prerequisites

Supported Providers

Project Structure

Quick Start

How It Works — File by File

1. indexer.py — Building the Tree Index

Document Parsing

LLM Call — Zero Dependencies, Multi-Provider

Tree Construction

2. retriever.py — Reasoning-Based Retrieval

Step 1: Tree Search

Step 2: Context Extraction

Step 3: Answer Generation

3. main.py — CLI Entry Point

Running the Demo

With Markdown (zero dependencies)

With PDF

Sample Session

Try These Questions

Using Your Own Documents

Key Takeaway

Source code

Discussion about this post

Ready for more?