.env.example

# Example .env for policyhub-agent

# =============================================================================
# Azure OpenAI
# =============================================================================
AZURE_OPENAI_ENDPOINT=https://your-openai-endpoint.openai.azure.com/
AZURE_OPENAI_API_KEY=your-openai-api-key
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-4o
AZURE_OPENAI_CHAT_API_VERSION=2024-02-15-preview
AZURE_OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
OPENAI_SSL_CERT_PATH=

# =============================================================================
# MCP Server (agent delegates ALL search to MCP — no direct Azure Search access)
# =============================================================================
MCP_SERVER_URL=http://localhost:8000/mcp

# =============================================================================
# State store ("memory" for local dev, "redis" for production)
# =============================================================================
STATE_STORE_BACKEND=memory

# =============================================================================
# Redis (only required when STATE_STORE_BACKEND=redis)
# =============================================================================
REDIS_URL=redis://localhost:6379/0

README.md

# PolicyHub Agent

A multi-turn, session-aware ReAct agent for answering corporate policy questions. All document retrieval is delegated to the PolicyHub MCP server — the agent never queries Azure AI Search directly.

## Features

- **FastAPI** chat endpoint (`POST /chat`) with structured request/response
- **ReActAgent** (Thought → Action → Observation loop, `max_steps=15`)
- **Locale- and language-aware** search: always runs `filter_search` first with the user's `locale` and `language`, falls back to `hybrid_search` if needed
- **5 MCP tools**: `filter_search`, `hybrid_search`, `keyword_search`, `vector_search`, `get_document`
- **Session-based conversation history** keyed by `conversation_id`
- **Configurable state store**: in-memory (local dev) or Redis (production)
- **Prompt registry** integration via `shared-core` for versioned system prompts

## Request Format

```json
{
  "language": "en-us",
  "locale": "US",
  "prompt": "How many vacation days am I eligible for?",
  "conversation_id": "session-abc123",
  "use_index_version": null
}
```

| Field | Description |
|---|---|
| `language` | BCP-47 language code (e.g. `"en-us"`) — passed to `filter_search` |
| `locale` | Country/region code (e.g. `"US"`, `"CA"`) — passed to `filter_search` |
| `prompt` | User's question |
| `conversation_id` | Unique session identifier; history is persisted per session |
| `use_index_version` | Optional — reserved for future index version routing |

## Setup

1. Copy `.env.example` to `.env` and fill in your secrets.
2. Install dependencies:
   ```sh
   pip install -e .[dev]
   ```
3. Ensure the PolicyHub MCP server is running on `http://localhost:8000/mcp` (configurable via `MCP_SERVER_URL`).
4. Run the app:
   ```sh
   uvicorn policyhub_agent.app:app --reload --port 8080
   ```

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `AZURE_OPENAI_ENDPOINT` | — | Azure OpenAI service endpoint |
| `AZURE_OPENAI_API_KEY` | — | Azure OpenAI API key |
| `AZURE_OPENAI_CHAT_DEPLOYMENT` | `gpt-4o` | Chat model deployment name |
| `AZURE_OPENAI_CHAT_API_VERSION` | `2024-02-15-preview` | API version |
| `MCP_SERVER_URL` | `http://localhost:8000/mcp` | PolicyHub MCP server URL |
| `STATE_STORE_BACKEND` | `memory` | `"memory"` or `"redis"` |
| `REDIS_URL` | `redis://localhost:6379/0` | Redis connection URL (when using Redis backend) |
| `OPENAI_SSL_CERT_PATH` | — | Optional path to SSL certificate |

## File Structure

```
src/policyhub_agent/
├── app.py             # FastAPI app and /chat endpoint
├── agent.py           # ReActAgent setup, tool registration, prompt injection
├── tools.py           # MCP tool wrappers (_call_mcp_tool, filter_search, hybrid_search, ...)
├── models.py          # Pydantic request/response models (ChatMessage, AgentResponse)
├── prompt_registry.py # System prompt definition (locale/language-aware, v1.9+)
├── llm_registry.py    # LLM provider registry (Azure OpenAI)
└── config.py          # Settings loaded from .env
```

## Search Workflow

The agent follows a strict workflow defined in the system prompt:

1. **PLAN** — identify the precise HR/policy domain term for the user's question
2. **filter_search** — always runs first, filtered by `locale` + `language`
3. **Evaluate** — if results answer the question → Final Answer; if truncated → `get_document`; if empty → fallback
4. **hybrid_search** — fallback if `filter_search` returns nothing useful
5. **Final Answer** — structured with `<SummarizedContent>`, `<Citations>`, and `<References>` (including real `documentlink` from metadata)

Hard limits: max 3 searches + 1 `get_document` call per turn.

policyhub-agent.postman_collection.json

{
  "info": {
    "_postman_id": "policyhub-agent-collection",
    "name": "PolicyHub Agent API",
    "description": "API collection for the PolicyHub ReAct Agent. The agent uses the MCP server for all policy document search and retrieval.",
    "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
  },
  "variable": [
    {
      "key": "base_url",
      "value": "http://127.0.0.1:8080",
      "type": "string"
    },
    {
      "key": "session_id",
      "value": "test-session-1",
      "type": "string"
    }
  ],
  "item": [
    {
      "name": "Health",
      "item": [
        {
          "name": "Health Check",
          "request": {
            "method": "GET",
            "header": [],
            "url": {
              "raw": "{{base_url}}/docs",
              "host": ["{{base_url}}"],
              "path": ["docs"]
            },
            "description": "Open the FastAPI Swagger UI to browse all endpoints."
          },
          "response": []
        }
      ]
    },
    {
      "name": "Chat",
      "item": [
        {
          "name": "Send Message — Basic Query",
          "request": {
            "method": "POST",
            "header": [
              {
                "key": "Content-Type",
                "value": "application/json"
              }
            ],
            "body": {
              "mode": "raw",
              "raw": "{\n  \"role\": \"user\",\n  \"content\": \"What is the leave policy for annual leave?\",\n  \"session_id\": \"{{session_id}}\"\n}",
              "options": {
                "raw": {
                  "language": "json"
                }
              }
            },
            "url": {
              "raw": "{{base_url}}/chat",
              "host": ["{{base_url}}"],
              "path": ["chat"]
            },
            "description": "Send a question to the PolicyHub agent. The agent will use the MCP server tools to search policy documents and return a structured answer."
          },
          "response": []
        },
        {
          "name": "Send Message — Follow-up (same session)",
          "request": {
            "method": "POST",
            "header": [
              {
                "key": "Content-Type",
                "value": "application/json"
              }
            ],
            "body": {
              "mode": "raw",
              "raw": "{\n  \"role\": \"user\",\n  \"content\": \"How many days of annual leave am I entitled to?\",\n  \"session_id\": \"{{session_id}}\"\n}",
              "options": {
                "raw": {
                  "language": "json"
                }
              }
            },
            "url": {
              "raw": "{{base_url}}/chat",
              "host": ["{{base_url}}"],
              "path": ["chat"]
            },
            "description": "Send a follow-up question in the same session. The agent will have access to the conversation history."
          },
          "response": []
        },
        {
          "name": "Send Message — New Session",
          "request": {
            "method": "POST",
            "header": [
              {
                "key": "Content-Type",
                "value": "application/json"
              }
            ],
            "body": {
              "mode": "raw",
              "raw": "{\n  \"role\": \"user\",\n  \"content\": \"What is the expense reimbursement policy?\",\n  \"session_id\": \"test-session-2\"\n}",
              "options": {
                "raw": {
                  "language": "json"
                }
              }
            },
            "url": {
              "raw": "{{base_url}}/chat",
              "host": ["{{base_url}}"],
              "path": ["chat"]
            },
            "description": "Start a brand new conversation session with a different session ID."
          },
          "response": []
        },
        {
          "name": "Send Message — Session ID via Header",
          "request": {
            "method": "POST",
            "header": [
              {
                "key": "Content-Type",
                "value": "application/json"
              },
              {
                "key": "X-Session-ID",
                "value": "{{session_id}}"
              }
            ],
            "body": {
              "mode": "raw",
              "raw": "{\n  \"role\": \"user\",\n  \"content\": \"What is the remote work policy?\"\n}",
              "options": {
                "raw": {
                  "language": "json"
                }
              }
            },
            "url": {
              "raw": "{{base_url}}/chat",
              "host": ["{{base_url}}"],
              "path": ["chat"]
            },
            "description": "Pass the session ID via the X-Session-ID header instead of the request body (Teams integration pattern)."
          },
          "response": []
        }
      ]
    }
  ]
}

policyhub-agent.postman_environment.json

{
  "id": "policyhub-agent-env",
  "name": "PolicyHub Agent — Local",
  "values": [
    {
      "key": "base_url",
      "value": "http://127.0.0.1:8080",
      "type": "default",
      "enabled": true
    },
    {
      "key": "session_id",
      "value": "test-session-1",
      "type": "default",
      "enabled": true
    }
  ],
  "_postman_variable_scope": "environment",
  "_postman_exported_at": "2026-04-15T00:00:00.000Z",
  "_postman_exported_using": "Postman"
}

setup.py

from setuptools import setup, find_packages

setup(
    name="policyhub-agent",
    version="0.1.0",
    packages=find_packages("src"),
    package_dir={"": "src"},
    install_requires=[
        "fastapi",
        "uvicorn",
        "httpx",
        "pydantic>=2.0.0",
        "pydantic-settings>=2.0.0",
        "gmf-forge-ai-shared-core",
        "gmf-forge-ai-orchestration",
        "redis",
    ],
    extras_require={
        "dev": ["pytest", "ruff", "mypy"]
    },
    entry_points={
        "console_scripts": [
            "policyhub-agent=policyhub_agent.app:main"
        ]
    },
)

src/policyhub_agent/__init__.py

# policyhub_agent package

src/policyhub_agent/agent.py

from gmf_forge_ai_orchestration.agents import ReActAgent
from gmf_forge_ai_shared_core.registry.tool_registry import ToolRegistry
from gmf_forge_ai_shared_core.llm_gateway import UnifiedLLMGateway
from .tools import keyword_search, vector_search, hybrid_search, get_document, filter_search
from .prompt_registry import prompt_registry
from .llm_registry import llm_registry

# Wrap the provider registry in a UnifiedLLMGateway — this is what ReActAgent expects
llm_gateway = UnifiedLLMGateway(provider_registry=llm_registry)

# Register all MCP server tools with the ToolRegistry
tool_registry = ToolRegistry()
tool_registry.register(
    "keyword_search", keyword_search,
    description="Keyword-based search over policy documents. Args: query (str), top_k (int, default 5)."
)
tool_registry.register(
    "vector_search", vector_search,
    description="Semantic vector search over policy documents. Args: query (str), top_k (int, default 5)."
)
tool_registry.register(
    "hybrid_search", hybrid_search,
    description="Hybrid keyword+vector search over policy documents. Args: query (str), top_k (int, default 5)."
)
tool_registry.register(
    "get_document", get_document,
    description="Retrieve a single policy document by ID. Args: doc_id (str)."
)
tool_registry.register(
    "filter_search", filter_search,
    description="Filtered search over policy documents by metadata. Args: query (str), top_k (int, default 5), language (str, optional), locale (str, optional)."
)

# Retrieve the system prompt template string from the registry
_system_prompt_tpl = prompt_registry.get("policyhub_agent.system")
_system_prompt_template = _system_prompt_tpl.template if _system_prompt_tpl else None


def get_agent(session_id: str, locale: str = "Global", language: str = "en-us") -> ReActAgent:
    # Use .replace() not .format() — the prompt contains JSON examples with { } braces
    # that would cause KeyError if processed by Python's str.format().
    # We also inject tool_descriptions here so react_agent never calls .format() on the prompt.
    if _system_prompt_template:
        tool_desc = "\n".join(
            f"- {t.name}: {t.description}" for t in tool_registry.list_tools()
        )
        system_prompt = (
            _system_prompt_template
            .replace("{tool_descriptions}", tool_desc)
            .replace("{locale}", locale)
            .replace("{language}", language)
        )
    else:
        system_prompt = None
    return ReActAgent(
        llm_gateway=llm_gateway,
        tool_registry=tool_registry,
        system_prompt=system_prompt,
        agent_id=f"policyhub_agent_{session_id}",
        max_steps=15,
    )

src/policyhub_agent/app.py

from fastapi import FastAPI, Request
from gmf_forge_ai_orchestration.state.factory import StateStoreFactory
from .config import settings
from .agent import get_agent
from .models import ChatMessage, AgentResponse
from .tools import open_mcp_session

app = FastAPI()

# State store: configurable via STATE_STORE_BACKEND env var ("memory" or "redis")
_store_kwargs = {"url": settings.redis_url} if settings.state_store_backend == "redis" else {}
state_store = StateStoreFactory.create(settings.state_store_backend, **_store_kwargs)


@app.post("/chat", response_model=AgentResponse)
async def chat_endpoint(request: Request, message: ChatMessage):
    session_id = message.conversation_id
    agent = get_agent(session_id=session_id, locale=message.locale, language=message.language)

    # Retrieve conversation history from state store
    history: list = await state_store.get(session_id) or []

    task = message.prompt
    context = {"session_id": session_id, "history": history}

    async with open_mcp_session():
        result = await agent.execute(task, context=context)

    # If the agent exhausted max_steps without a Final Answer, output is the last
    # raw observation — replace it with a clear failure message.
    if not result.success:
        answer = (
            "I was unable to produce a complete answer within the allowed number of steps. "
            "Please try rephrasing your question or asking about a more specific policy."
        )
    else:
        answer = result.output

    # Persist updated history
    history.append({"role": "user", "content": message.prompt})
    history.append({"role": "assistant", "content": answer})
    await state_store.set(session_id, history)

    return AgentResponse(message=answer)


def main():
    import uvicorn
    uvicorn.run("policyhub_agent.app:app", host="0.0.0.0", port=8081, reload=True)

src/policyhub_agent/config.py

from pydantic_settings import BaseSettings
from typing import Optional
from pathlib import Path

_ENV_PATH = Path(__file__).parent.parent.parent / ".env"


class Settings(BaseSettings):
    # Azure OpenAI
    azure_openai_endpoint: str = ""
    azure_openai_api_key: str = ""
    azure_openai_chat_deployment: str = "gpt-4o"
    azure_openai_chat_api_version: str = "2024-02-15-preview"
    azure_openai_embedding_model: str = "text-embedding-ada-002"
    openai_ssl_cert_path: Optional[str] = None

    # MCP Server — agent delegates all search to MCP, no direct Azure Search access
    mcp_server_url: str = "http://localhost:8000/mcp"

    # State store backend: "memory" (local dev) or "redis" (production)
    state_store_backend: str = "memory"

    # Redis
    redis_url: str = "redis://localhost:6379/0"

    class Config:
        env_file = str(_ENV_PATH)
        env_file_encoding = "utf-8"


settings = Settings()

src/policyhub_agent/llm_registry.py

"""LLM Provider registry for managing and registering LLM configurations for PolicyHub agent.

This file centralizes the registration of LLM providers, allowing the agent
components to retrieve LLM instances dynamically based on configuration.
"""

from gmf_forge_ai_shared_core.registry import LLMProviderRegistry
from .config import settings
from gmf_forge_ai_shared_core.llm_gateway.providers.azure_openai_provider import AzureOpenAIProvider
import os
from pathlib import Path

llm_registry = LLMProviderRegistry()

ssl_cert_path = settings.openai_ssl_cert_path or None
if ssl_cert_path:
    ssl_cert_path = str(Path(os.path.expandvars(ssl_cert_path)).expanduser())
    if not os.path.exists(ssl_cert_path):
        raise FileNotFoundError(f"SSL certificate not found at {ssl_cert_path}")

azure_openai_provider = AzureOpenAIProvider(
    endpoint=settings.azure_openai_endpoint,
    api_key=settings.azure_openai_api_key,
    deployment_name=settings.azure_openai_chat_deployment,
    api_version=settings.azure_openai_chat_api_version,
    ssl_cert_path=ssl_cert_path,
)

llm_registry.register(
    name="openai",
    provider=azure_openai_provider
)

src/policyhub_agent/models.py

from pydantic import BaseModel
from typing import Optional

class ChatMessage(BaseModel):
    language: str
    locale: str
    prompt: str
    conversation_id: str
    use_index_version: Optional[str] = None

class AgentResponse(BaseModel):
    message: str

src/policyhub_agent/prompt_registry.py

"""Prompt registry for the PolicyHub agent.

All LLM prompts are versioned and registered here. To iterate on a prompt,
add a new registration with a bumped version — the agent always picks up
the latest version automatically via PromptRegistry.get().
"""

from gmf_forge_ai_shared_core.registry import PromptRegistry

prompt_registry = PromptRegistry()

prompt_registry.register(
    name="policyhub_agent.system",
    version="1.9",
    variables=["locale", "language"],
    description="System prompt — locale/language-aware filter-first search strategy with plan/evaluate/fetch/refine loop.",
    template="""\
You are a helpful corporate policy assistant. Your role is to help employees \
find and understand company policies by searching the policy document database.

User context:
- Locale: {locale}
- Language: {language}

Available tools:
{tool_descriptions}

Tool guidance:
- filter_search — USE THIS FIRST. Searches documents matching the user's locale and language.
  Args: query (str), locale (str), language (str), top_k (int, default 5).
- hybrid_search — USE AS FALLBACK if filter_search returns empty or off-topic results.
  Combines keyword and semantic matching across all documents regardless of locale.
- keyword_search — use only for exact phrase lookups (e.g., a policy section title).
- vector_search — use for purely conceptual or abstract questions.
- get_document — fetch a full document by its ID when chunk content is truncated.

====== SEARCH WORKFLOW ======

STEP 1 — PLAN before you search.
Identify: (a) the SPECIFIC data the user needs (e.g., a number of days, a dollar limit, an eligibility rule)
          (b) the precise domain term used in HR/policy documents for that data
          (c) a search query using that domain term (NOT the user's raw words)

Examples of query rewriting:
  User asks: "How many vacation days am I eligible for?"
    → search for: "vacation days accrual" or "holiday entitlement days"
  User asks: "Can I carry over unused leave?"
    → search for: "holiday carry over policy"
  User asks: "What is the paternity leave policy?"
    → search for: "paternity leave entitlement weeks"

STEP 2 — FILTER SEARCH: Always start with filter_search using the user's locale and language.
  Example: {"query": "vacation days accrual", "locale": "{locale}", "language": "{language}", "top_k": 5}

STEP 3 — EVALUATE filter results:
  → If results directly answer the question (contain the specific data): write Final Answer.
  → If results are from the right topic but specific data is missing/truncated: go to STEP 4.
  → If results are empty or completely off-topic: go to STEP 5 (fallback to hybrid_search).

STEP 4 — FETCH: Call get_document with that chunk's document_id when a chunk references \
a table or list of values but the actual data is absent or cut off. Indicators include: \
"shown in the table below", "the following hours", "the following days", "accrues up to \
the following", "as listed below", "as follows:", or any sentence that introduces data that \
does not appear in the chunk. Then write your Final Answer.

STEP 5 — FALLBACK: If filter_search returned nothing useful, run hybrid_search with the same \
or a refined query (no locale/language filter). Evaluate results and write your Final Answer.

STEP 6 — REFINE (if still off-topic after fallback): Try one more hybrid_search with a \
more specific query. Then write your Final Answer from whatever you have.

HARD LIMITS (never violate):
- Maximum 3 searches (filter/hybrid/keyword/vector) total.
- Maximum 1 get_document call total.
- Never repeat a query you have already used.
- After your 3rd search or get_document call, ALWAYS write Final Answer — no more tool calls.

====== ANSWERING ======

Always base your answer exclusively on content returned by the tools. \
If the documents do not contain the specific information, say so clearly and \
direct the employee to HR or the relevant policy page.

Reason and act in this repeating format:
Thought: <your plan or evaluation>
Action: <tool name>
Action Input: <JSON object with tool arguments>

When ready to answer:
Thought: I now have enough information to answer.
Action: Final Answer
Action Input: a JSON object with key "answer" containing your full response

Structure the answer value as:

<SummarizedContent>
A concise, plain-language answer drawn only from the retrieved excerpts. \
Use bullet points where helpful.
</SummarizedContent>
<Citations>
Direct quotes from the retrieved documents, one per line: [Document Name] "quoted text"
</Citations>
<References>
One source entry per cited document, using Item1, Item2, etc.
Format each entry as: Item1: [document_name from metadata](documentlink from metadata)
Use the EXACT document_name and documentlink values from the metadata of the retrieved result.
Do NOT invent or guess document names or links — only use values present in the metadata.
</References>""",
)

prompt_registry.register(
    name="policyhub_agent.user",
    version="1.0",
    variables=["query"],
    description="User turn prompt wrapping the employee's question.",
    template="Question: {query}",
)

src/policyhub_agent/tools.py

import json
import contextlib
import contextvars
from .config import settings

# Holds a shared fastmcp.Client for the duration of an agent execution.
# Set by open_mcp_session(); falls back to a fresh per-call client when None.
_active_mcp_client: contextvars.ContextVar = contextvars.ContextVar(
    "_active_mcp_client", default=None
)


@contextlib.asynccontextmanager
async def open_mcp_session():
    """Open a single MCP connection for an entire agent execution.

    All _call_mcp_tool() calls within this context reuse the same connection,
    eliminating the per-call initialize + ListTools round-trips.
    Usage::

        async with open_mcp_session():
            result = await agent.execute(task, context=context)
    """
    from fastmcp import Client  # lazy — avoids DeprecationWarning in reloader
    async with Client(settings.mcp_server_url) as client:
        token = _active_mcp_client.set(client)
        try:
            yield client
        finally:
            _active_mcp_client.reset(token)


async def _call_mcp_tool(tool_name: str, arguments: dict):
    """Call an MCP tool, reusing the session-level client when available.

    fastmcp.Client handles the full MCP protocol lifecycle automatically:
    initialize handshake → tools/call → session teardown.
    """
    client = _active_mcp_client.get()
    if client is not None:
        result = await client.call_tool(tool_name, arguments)
    else:
        from fastmcp import Client  # lazy — avoids DeprecationWarning in reloader
        async with Client(settings.mcp_server_url) as client:
            result = await client.call_tool(tool_name, arguments)

    # Prefer raw content text (always reliable JSON), then structured data.
    # result.data can contain Pydantic Root() wrappers when the server uses a
    # typed output annotation and the client cannot reconstruct the schema —
    # in that case the structured data is useless and we must fall back to text.
    texts = [c.text for c in result.content if hasattr(c, "text")]
    if texts:
        try:
            return json.loads(texts[0])
        except (json.JSONDecodeError, ValueError):
            return texts[0]
    if result.structured_content is not None:
        return result.structured_content
    if result.data is not None:
        return result.data
    return None


# ---------------------------------------------------------------------------
# Tool wrappers — one per MCP server tool
# ---------------------------------------------------------------------------

async def keyword_search(query: str, top_k: int = 5) -> list:
    """Keyword-based search over policy documents."""
    return await _call_mcp_tool("keyword_search", {"query": query, "top_k": top_k})


async def vector_search(query: str, top_k: int = 5) -> list:
    """Semantic / vector search over policy documents."""
    return await _call_mcp_tool("vector_search", {"query": query, "top_k": top_k})


async def hybrid_search(query: str, top_k: int = 5) -> list:
    """Hybrid (keyword + vector) search over policy documents."""
    return await _call_mcp_tool("hybrid_search", {"query": query, "top_k": top_k})


async def get_document(doc_id: str) -> dict:
    """Retrieve a single policy document by its ID."""
    return await _call_mcp_tool("get_document", {"doc_id": doc_id})


async def filter_search(
    query: str,
    top_k: int = 5,
    language: str = None,
    locale: str = None,
) -> list:
    """Filtered search over policy documents by metadata (language, locale)."""
    args: dict = {"query": query, "top_k": top_k}
    if language is not None:
        args["language"] = language
    if locale is not None:
        args["locale"] = locale
    return await _call_mcp_tool("filter_search", args)