Skip to Content
🚀 {xpay✦} is building the future of x402 payments - Join the developer beta →
Publishers (Agent-Ready)Content Indexing

Content Indexing Pipeline

How publisher content is indexed, stored, and refreshed.

Indexing Flow

Publisher RSS/Atom Feed POST /introspect (validate + preview) POST /servers (create MCP server record with upstreamType: "content") POST /index (fetch feed → parse → store in DynamoDB) Content live at {slug}.mcp.xpay.sh Every 6 hours: contentRefresher detects new articles

What Gets Indexed

Each article stored in the index includes:

FieldDescription
URLOriginal article URL on your site
TitleArticle headline
DescriptionPlain text excerpt (up to 500 characters)
ContentFull article as clean markdown (up to 50KB)
Published dateWhen the article was originally published
AuthorByline/author name
CategoriesTopic tags from your feed
SourceWhether it came from RSS or sitemap

Articles are deduplicated by URL — re-indexing the same feed won’t create duplicates.

Feed Refresh (Scheduled)

The contentRefresher Lambda runs every 6 hours:

  1. Scans xpay-mcp-proxy-servers for servers with upstreamType: "content" and status: "active"
  2. Fetches each server’s feedUrl
  3. Parses the feed and compares article URLs against existing index
  4. Indexes only new articles (URL hash not in existing set)
  5. Updates the server record with lastRefreshed timestamp and incremented articleCount

The refresher only adds new articles — it does not update or delete existing ones. Deleted articles remain in the index until they expire via TTL (if configured).

Performance

  • Each server refresh takes 1–3 seconds (feed fetch + DynamoDB writes)
  • Lambda timeout: 300 seconds (can handle ~100 servers per run)
  • At scale (1000+ servers), consider partitioning refreshes across multiple scheduled invocations

HTML to Markdown Conversion

Article content is converted from HTML to markdown during indexing:

HTML ElementMarkdown Output
<h1><h6>#######
<strong>, <b>**bold**
<em>, <i>*italic*
<a href="...">[text](url)
<img>![alt](src)
<li>- item
<blockquote>> quote
<code>`code`
<pre>``` code ```
<p>Paragraph with blank line
<script>, <style>, <nav>, <header>, <footer>Removed entirely

HTML entities (&amp;, &lt;, &gt;, &quot;, &#39;, &nbsp;) are decoded.

Server Record Fields

When a publisher server is created, these additional fields are set on the xpay-mcp-proxy-servers record:

FieldTypeDescription
upstreamTypeString"content" — tells the MCP Proxy to route to the content server
feedUrlStringThe RSS/sitemap URL
feedTypeString"rss2.0", "atom", "rss1.0", or "sitemap"
articleCountNumberTotal indexed articles
lastRefreshedNumberTimestamp of last successful refresh
categoryString"Content & Knowledge"
Last updated on: