April 12, 20263 min read

Building Mentora: a six-agent AI career coach with a three-tier memory layer

How I built Mentora — an agentic AI career coach platform — with six specialized agents on Groq, pgvector long-term memory, Redis episodic state, and a RAG pipeline keeping per-session cost at ~$0.02.

Agentic AIMulti-Agent SystemsRAGpgvectorGroq

Mentora is not a single-chatbot product. It is a multi-agent platform that coaches users across diagnostics, planning, accountability, mock interviews, escalation, and memory — and the only reason it feels coherent is the layer underneath.

My work covered the entire system: agent orchestration, the memory layer, the RAG pipeline, and the voice mock interview flow.

Six Agents, One Coherent Coach

The product is built around six specialized agents running on Groq LLMs.

  • Diagnostic agent assesses the user's current state
  • Planner agent translates goals into a structured path
  • Accountability agent tracks follow-through across sessions
  • Mock interview agent runs voice-driven practice
  • Escalation agent routes edge cases to human review
  • Memory agent maintains context across all the others

The hard part was not the agents themselves. It was making them feel like a single coach.

A Three-Tier Memory Layer

Agentic products break when agents forget. So memory was the first thing I designed.

  • Long-term memory lives in PostgreSQL + pgvector, with resume and document chunking and embeddings for semantic recall
  • Episodic memory lives in Redis for fast access to session-level context
  • Working memory is request-scoped so each turn stays focused without leaking state

That layered model is what lets the planner reference what the diagnostic agent learned three sessions ago without re-asking.

RAG That Powers Real Surfaces

The retrieval pipeline does not just feed the agents. It also powers natural-language admin queries over project data and grounds the voice mock interview agent.

Key choices:

  1. Semantic retrieval over vector embeddings in pgvector for grounded, document-aware answers
  2. Groq Llama 3.3 as the core inference model for low-latency multi-agent calls
  3. Whisper STT for the voice mock interview flow, integrated end-to-end with the agent loop

Per-Session Cost as a First-Class Constraint

Agentic systems get expensive fast. I treated cost as a product requirement, not a finance concern.

Through retrieval tuning, prompt scoping, and routing, per-session AI cost is held at ~$0.02. That number is what makes the platform usable, not just demoable.

What I Learned

Multi-agent products live or die on three layers: memory, retrieval, and cost discipline. Get those right and the agents can stay specialized without the product feeling fragmented.

Mentora reinforced what I already believed: in agentic AI, the system design is the user experience.