Updated 2026-06-01

Hybrid Infrastructure

How Reflect Memory manages hybrid cloud + on-prem deployments, model hosting, egress controls, and internal model endpoints.

MD PDF

Doc-specific AI prompt

Summarize the "Hybrid Infrastructure" document from the Reflect Memory diligence bundle. Highlight any deployment or security impacts mentioned, note follow-up questions, and cite where to go next. Use anonymized descriptors for traction references — no company names or unverified pipeline claims. Production runs on self-managed infrastructure (not Railway or Vercel).

Hybrid Infrastructure

Questions this doc answers

How do you mix hosted connectors with private on-prem models?
What happens when one team needs cloud AI (ChatGPT/Claude) and another uses internal models?
How do you keep AI egress under control across those boundaries?

Shared memory across hybrid stacks

Reflect Memory abstracts the infrastructure tier from the AI tier. A hosted deployment can still connect to your on-prem models via RM_ALLOWED_MODEL_HOSTS, while self-host deployments can still serve ChatGPT/Claude via MCP if agent keys are available.

disableModelEgress (default true in self-host) prevents outbound LLM calls unless explicit.
requireInternalModelBaseUrl ensures every hosted model call resolves to allowedModelHosts.
allowedModelHosts also gates scheduled webhooks and automation triggers that otherwise would reach public endpoints.

Model connectivity patterns

Cloud-first + self-host complement: Use hosted API for general reads/writes, and self-hosted connectors for highly regulated data. Both sides read/write the same memory store, so context replicates naturally.
On-prem model bridging: Install Reflect Memory inside your VPC. Set RM_ALLOWED_MODEL_HOSTS=llama.local,llama.next, RM_REQUIRE_INTERNAL_MODEL_BASE_URL=true, and RM_DISABLE_MODEL_EGRESS=true. Your LLMs call the local REST API; the same environment also accepts MCP keys from corporate-approved tools.
Hybrid autop-run: Use the MCP server to let GPUs in your datacenter talk to Reflect Memory. Agents like Claude or Grok can reach internal memory via the same RM_PUBLIC_URL referencing your reverse proxy.

Connectivity guardrails

enforceModelHostPolicy(baseUrl) compares URLs to RM_ALLOWED_MODEL_HOSTS.
Private deployments still support optional RM_ALLOW_PUBLIC_WEBHOOKS for integration with approved partners.
Telemetry and billing events track cross-boundary usage so you can see if a self-hosted team is hitting cloud connectors or vice versa.