Skip to Content
🚀 {xpay✦} is building the future of x402 payments - Join the developer beta →
Publishers (Agent-Ready)Content Indexing

Content Indexing Pipeline

How publisher content is indexed, stored, and refreshed.

Indexing Flow

Publisher RSS/Atom Feed POST /introspect (validate + preview) POST /servers (create MCP server record with upstreamType: "content") POST /index (fetch feed → parse → store in DynamoDB) Content live at {slug}.mcp.xpay.sh Every 6 hours: contentRefresher detects new articles

DynamoDB Schema

Table: xpay-content-server-index-{stage}

KeyTypeDescription
serverId (PK)StringThe MCP server ID this content belongs to
articleId (SK)StringSHA-256 hash of the article URL (first 16 chars)

Attributes

FieldTypeDescription
urlStringOriginal article URL
titleStringArticle title
descriptionStringPlain text description (max 500 chars)
contentStringFull article as markdown (max 50KB)
publishedAtNumberUnix timestamp (ms) of publish date
authorStringAuthor name
categoriesListCategory tags from feed
lastIndexedNumberWhen this article was last indexed
sourceString"rss" or "sitemap"
titleLowerStringLowercase title for keyword search
descriptionLowerStringLowercase description for keyword search

GSI: PublishedAtIndex

KeyTypeDescription
serverId (PK)StringPartition by server
publishedAt (SK)NumberSort by publish date

Used by list-recent tool to query articles sorted by date.

Feed Refresh (Scheduled)

The contentRefresher Lambda runs every 6 hours:

  1. Scans xpay-mcp-proxy-servers for servers with upstreamType: "content" and status: "active"
  2. Fetches each server’s feedUrl
  3. Parses the feed and compares article URLs against existing index
  4. Indexes only new articles (URL hash not in existing set)
  5. Updates the server record with lastRefreshed timestamp and incremented articleCount

The refresher only adds new articles — it does not update or delete existing ones. Deleted articles remain in the index until they expire via TTL (if configured).

Performance

  • Each server refresh takes 1–3 seconds (feed fetch + DynamoDB writes)
  • Lambda timeout: 300 seconds (can handle ~100 servers per run)
  • At scale (1000+ servers), consider partitioning refreshes across multiple scheduled invocations

HTML to Markdown Conversion

Article content is converted from HTML to markdown during indexing:

HTML ElementMarkdown Output
<h1><h6>#######
<strong>, <b>**bold**
<em>, <i>*italic*
<a href="...">[text](url)
<img>![alt](src)
<li>- item
<blockquote>> quote
<code>`code`
<pre>``` code ```
<p>Paragraph with blank line
<script>, <style>, <nav>, <header>, <footer>Removed entirely

HTML entities (&amp;, &lt;, &gt;, &quot;, &#39;, &nbsp;) are decoded.

Server Record Fields

When a publisher server is created, these additional fields are set on the xpay-mcp-proxy-servers record:

FieldTypeDescription
upstreamTypeString"content" — tells the MCP Proxy to route to the content server
feedUrlStringThe RSS/sitemap URL
feedTypeString"rss2.0", "atom", "rss1.0", or "sitemap"
articleCountNumberTotal indexed articles
lastRefreshedNumberTimestamp of last successful refresh
categoryString"Content & Knowledge"
Last updated on: