# Knowledge Hub
URL: https://whitepaper.designervenkat.online/docs
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Beginner-friendly whitepapers, research articles, tutorials, and AI guides — curated by Designer Venkat.
**White Papers — Designer Venkat** is an open knowledge hub for education: long-form research, beginner-friendly tutorials, practical guides, and deep dives — with a special focus on AI and software engineering.
## What you'll find here [#what-youll-find-here]
Start with the [installation guide](/docs/coding-tutorials/installation) to
set up your local copy, or jump straight to a [sample
whitepaper](/docs/ai-machine-learning/scaling-llm-inference).
## How to use this site [#how-to-use-this-site]
The library is browsable from the left sidebar — sections are grouped by topic, and every page has a table of contents on the right. Press ⌘+K to open full-text search at any time.
Full-text indexing across titles, headings, and prose. Results rank by
relevance.
Every page is exposed at `/llms.txt` so coding agents can read along.
Write in MDX, drop in components, and the sidebar updates automatically.
Override colors, fonts, and layout with a few CSS variables.
## Why whitepapers? [#why-whitepapers]
A whitepaper is an authoritative report — longer than a blog post, shorter than a book — that argues a specific position with evidence. Done well, it changes how a field thinks about a problem. Done poorly, it gathers dust on a SharePoint.
Our goal is the former. Every paper in this collection follows a consistent format:
1. **Abstract** — a 150-word summary of the thesis and findings.
2. **Background** — what was known before, and which prior work matters.
3. **Core argument** — the new claim, with supporting evidence and analysis.
4. **Discussion** — limitations, open questions, and counterarguments.
5. **Conclusion** — what to do next.
This site is open to contributions. See the [writing
guide](/docs/coding-tutorials/writing-content) for the file structure, the MDX
components available, and the editorial conventions we follow.
# AI Integration
URL: https://whitepaper.designervenkat.online/docs/ai-machine-learning/ai-integration
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/ai-machine-learning/ai-integration
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Expose your content to coding agents and chat models via /llms.txt and Markdown endpoints.
Coding agents are now a primary audience for documentation. Claude, Cursor, and GitHub Copilot all read docs when answering questions about a library — but only if those docs are exposed in a way agents can consume.
## The /llms.txt convention [#the-llmstxt-convention]
Anthropic, Cloudflare, and a growing list of sites have standardized on a simple pattern: a text endpoint at `/llms.txt` that lists every page on the site in Markdown, with links to full-text versions.
This site exposes two endpoints:
A directory — every page's title, URL, and one-line description.
The full text of every page concatenated, ready for an agent to ingest in one fetch.
Use `/llms.txt` for agents that can navigate (they'll fetch individual pages on demand).
Use `/llms-full.txt` when an agent needs to load the whole library into context at once —
for a code review on a small codebase, for instance.
## How they're built [#how-theyre-built]
Both endpoints are static Next.js routes. The directory is one line per page; the full version concatenates the rendered Markdown.
```ts title="app/llms.txt/route.ts"
import { source } from "@/lib/source";
import { llms } from "fumadocs-core/source";
export const revalidate = false;
export function GET() {
return new Response(llms(source).index());
}
```
```ts title="app/llms-full.txt/route.ts"
import { source } from "@/lib/source";
import { getLLMText } from "@/lib/get-llm-text";
export const revalidate = false;
export async function GET() {
const pages = source.getPages();
const scanned = await Promise.all(pages.map(getLLMText));
return new Response(scanned.join("\n\n"), {
headers: { "Content-Type": "text/plain; charset=utf-8" },
});
}
```
`revalidate: false` makes these endpoints fully static — they're generated at build time and never re-run.
## Why static text instead of HTML? [#why-static-text-instead-of-html]
Three reasons:
1. **Token cost.** Stripping HTML cuts the size by 60–80% on a typical page. For an agent paying per-token to ingest your docs, that's the difference between a viable read and an abandoned one.
2. **Parser robustness.** Markdown has fewer edge cases than HTML. Less ambiguity means fewer parse errors in the agent.
3. **Cache friendliness.** Plain text compresses better and serves from edge caches without negotiation headers.
## Customizing what agents see [#customizing-what-agents-see]
The `getLLMText` function in `lib/get-llm-text.ts` decides what goes into `/llms-full.txt`. The default is title + description + body, but you can:
* **Strip code blocks** if your audience is a doc-querying chat and not a code-completion model
* **Add metadata** — last-updated date, tags, author — that doesn't appear in the user-facing page
* **Filter pages** — exclude internal docs, draft folders, or 404s
```ts title="lib/get-llm-text.ts"
export async function getLLMText(page) {
const description = page.data.description
? `\n> ${page.data.description}\n`
: "";
return `# ${page.data.title} (${page.url})${description}`;
}
```
These endpoints make your content trivially scrapable. If any of your docs are sensitive — internal architecture, security postures — make sure they're behind auth and not in the index.
## Future: native chat integration [#future-native-chat-integration]
Fumadocs is working on a `` widget that mounts directly in the sidebar — readers ask questions in plain English and get answers grounded in the site's content. It uses the same `/llms-full.txt` endpoint as a context source.
Track the [Fumadocs AI integrations](https://fumadocs.dev/docs/integrations) page for the latest.
# Claude Memory Architecture
URL: https://whitepaper.designervenkat.online/docs/ai-machine-learning/claude-memory-architecture
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/ai-machine-learning/claude-memory-architecture
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
How Claude combines a 200k-token context window with multi-layered memory and RAG to deliver accurate, personalised, context-aware responses at scale.
Claude does not just read your last message and reply. It runs a multi-layered memory system — short-term context window, semantic vector store, tiered memory layers, and long-term persistent profiles. All of these are blended together before it generates a single token of response.
This article explains each layer, how they connect, and why the design matters.
## What you will learn [#what-you-will-learn]
* How Claude's 200k-token context window works and what happens when it fills up
* When and why Claude triggers external memory retrieval
* How RAG (Retrieval-Augmented Generation) turns a user query into a precise memory lookup
* What the four memory tiers are and how long each one lasts
* How retrieved memory is fused with the active conversation before the model generates a reply
## Background [#background]
Most large language models have one form of memory: the context window. Everything the model "knows" during a conversation is whatever fits in that window right now. Once it overflows, older content is lost.
Claude extends this with a **retrieval layer** — a separate vector database that stores past conversations, user preferences, project notes, and world knowledge. When the context window alone is not enough, Claude pulls relevant information from this store and blends it with the live conversation.
This approach is called **RAG** (Retrieval-Augmented Generation). Think of the context window as your short-term working memory. The vector store is a well-organised filing cabinet you can search in milliseconds.
***
## Layer 1 — Context Window (Short-Term Working Memory) [#layer-1--context-window-short-term-working-memory]
The context window holds everything active in the current conversation. Claude uses a 200,000-token window — enough to hold roughly 150,000 words of text.
**Three things happen as the window fills:**
1. **Prioritisation** — The system scores every item by importance. System instructions, recent messages, the user's stated intent, and key named entities all rank high. Generic filler ranks low.
2. **Compression** — Older messages are summarised rather than dropped outright. The meaning and intent survive; the word-for-word detail does not.
3. **Eviction** — When compression is not enough, the least relevant content is removed to make room for new input.
The context window uses recency weighting. Messages from the last few turns
carry more weight than messages from early in a long conversation.
***
## Layer 2 — Retrieval Trigger [#layer-2--retrieval-trigger]
Claude does not retrieve from external memory on every turn. It checks three conditions first:
* **Insufficient context** — the current window does not contain what the user needs
* **User query signals external knowledge** — the question refers to facts, past sessions, or domain knowledge not present in the window
* **Relevance scoring clears a threshold** — the system estimates whether retrieval would actually improve the response quality
If any of these conditions are met, the RAG pipeline starts.
***
## Layer 3 — Retrieval Pipeline (RAG) [#layer-3--retrieval-pipeline-rag]
RAG has five steps. Each step refines the signal from a raw user question into a ranked, ready-to-use block of context.
### Step 1 — Query Understanding [#step-1--query-understanding]
The model analyses the user's intent. It extracts the core question, strips away filler, and identifies the key concepts that need to be matched in memory.
### Step 2 — Query Embedding [#step-2--query-embedding]
The extracted query is converted into a high-dimensional vector — a list of numbers like `[-0.24, -0.80, ..., 0.31]`. This numeric representation captures the semantic meaning of the query, not just the words.
Two sentences that mean the same thing will produce similar vectors even if they share no words.
### Step 3 — Vector Search [#step-3--vector-search]
The query vector is compared against all vectors stored in the vector database. The search finds the chunks of stored memory that are semantically closest to the query.
This is similarity search — it finds meaning, not keywords.
### Step 4 — Re-ranking [#step-4--re-ranking]
The top results from vector search are re-scored using a more precise relevance model. A typical result set might look like:
| Result | Score |
| ------- | ----- |
| Chunk A | 0.92 |
| Chunk B | 0.81 |
| Chunk C | 0.75 |
Only the highest-scoring chunks move forward.
### Step 5 — Context Assembly [#step-5--context-assembly]
The final chunks are assembled into a structured block of context. This block merges with the live conversation in the context window.
***
## Layer 4 — Vector Memory (Semantic Knowledge Store) [#layer-4--vector-memory-semantic-knowledge-store]
Vector memory is the database that makes Step 3 possible. It has four components:
Metadata filtering matters for privacy. Claude restricts retrieval to only the content the current user is allowed to see, even before the similarity search runs.
***
## Layer 5 — Memory Tiers [#layer-5--memory-tiers]
Not all memory is equal. Claude organises stored information into four tiers based on how specific, how detailed, and how long-lasting each type is.
### Episodic Memory [#episodic-memory]
Stores individual interactions in full detail. This is where "what did we discuss last Tuesday" lives. It is the most granular and the most time-limited — detail fades as time passes.
### Semantic Memory [#semantic-memory]
Stores facts, concepts, and world knowledge extracted from many conversations. Less tied to a single event, more about what is generally true. Medium retention.
### Procedural Memory [#procedural-memory]
Stores learned patterns — how to solve a type of problem, how a user prefers code to be formatted, the steps in a recurring workflow. Abstracted from any single conversation. Long-term.
### Foundational Memory [#foundational-memory]
Stores core values, alignment principles, and the fundamental rules Claude operates by. This tier never changes during a session. It is the stable base that all other memory sits on.
***
## Layer 6 — Persistent Context (Long-Term Continuity) [#layer-6--persistent-context-long-term-continuity]
Persistent context is what makes Claude feel like it knows you across sessions. It has four stores:
* **User Profiles** — preferences, background, communication style, domain expertise
* **Conversation History** — a record of past sessions that can be retrieved when relevant
* **Project Context** — ongoing work, domain-specific terminology, project-specific rules
* **Learning & Adaptation** — how the model adjusts its behaviour based on accumulated feedback
None of this happens automatically unless the system is configured to write
to persistent storage. In a default API session, memory resets when the
conversation ends.
***
## Layer 7 — Context Integration (Blending Everything Together) [#layer-7--context-integration-blending-everything-together]
Before Claude generates a response, it combines three signals:
**Context Fusion** removes duplicates, resolves conflicts, and ranks content by relevance. The goal is a single, clean context block that gives the model exactly what it needs — no redundancy, no noise.
The final context is what the model actually reads before it writes the response.
***
## Layer 8 — Infrastructure and Optimisation [#layer-8--infrastructure-and-optimisation]
The memory system runs on infrastructure built for performance, privacy, and cost control:
| Component | What it does |
| -------------------- | ----------------------------------------------------------------- |
| Caching Layer | Stores frequent query results so repeated lookups cost nothing |
| Index Optimisation | Keeps vector indexes tuned for fast retrieval across regions |
| Distributed Storage | Scales memory horizontally across multiple data centres |
| Monitoring & Metrics | Tracks retrieval quality, latency, and usage patterns |
| Privacy & Security | Encrypts data at rest and in transit; enforces access controls |
| Cost Optimisation | Balances retrieval depth against token budget and latency targets |
***
## End-to-End Memory Flow [#end-to-end-memory-flow]
Every response the model produces can feed back into the memory layers — tightening the system over time as it learns from each interaction.
***
## Summary [#summary]
* Claude's **context window** holds up to 200k tokens and manages overflow via prioritisation, compression, and eviction.
* A **retrieval trigger** checks whether external memory would improve the response before starting a lookup.
* The **RAG pipeline** converts a user query into a vector, searches the store, re-ranks results, and assembles clean context.
* **Four memory tiers** — Episodic, Semantic, Procedural, and Foundational — store information at different levels of detail and duration.
* **Persistent context** maintains user profiles, history, and project knowledge across sessions.
* **Context fusion** blends retrieved memory with the live conversation before generation, removing duplicates and noise.
* **Infrastructure** handles caching, distributed storage, privacy enforcement, and cost control at scale.
## References [#references]
1. Anthropic — Claude Memory Architecture (official infographic, 2025). Built by Anthropic. anthropic.com
2. Lewis, P. et al. — "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020.
3. Johnson, J. et al. — "Billion-scale similarity search with GPUs." IEEE Transactions on Big Data, 2021.
# Scaling LLM Inference
URL: https://whitepaper.designervenkat.online/docs/ai-machine-learning/scaling-llm-inference
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/ai-machine-learning/scaling-llm-inference
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
A practitioner's account of taking a 70B-parameter model from a research notebook to 10,000 requests per second.
Large language models trained on academic budgets often die quietly when handed to a production team. The gap between "the notebook works" and "ten thousand users hit it on a Tuesday" is wider than most teams plan for. This paper walks through the architecture choices that closed that gap for one open-source 70B-parameter model, with measurements at each step.
## Background [#background]
Inference for transformer models has two distinct costs:
1. **Prefill** — running the full input prompt through the model once. Compute-bound; scales with sequence length squared due to attention.
2. **Decode** — generating one token at a time, each conditioned on all prior tokens. Memory-bound; dominated by KV-cache reads.
Naive serving conflates the two. A request that takes 200ms to prefill might then spend 5 seconds in decode — and during that decode, the GPU is mostly idle waiting for memory. Modern inference servers (vLLM, TGI, TensorRT-LLM) treat prefill and decode as separable workloads, batching them differently.
### Prior work [#prior-work]
The "Orca" paper (OSDI 2022) introduced **continuous batching** — replacing the per-request batch with a per-token batch, so finished sequences leave the batch immediately and new ones join. This alone gave 23× throughput on the BLOOM-176B model versus naive dynamic batching.
**PagedAttention** (vLLM, 2023) added paged memory management for the KV-cache, treating GPU memory like virtual memory in an OS — fragmentation drops, effective capacity rises 2–4×.
**Speculative decoding** (Leviathan et al., 2023) uses a small draft model to propose multiple tokens, which the large model verifies in parallel. For models with high agreement between draft and target, this is a 2–3× decode-rate improvement.
## Core argument [#core-argument]
Production inference is not a single optimization problem. It's three problems stacked:
* **Throughput** — tokens per second per dollar of GPU
* **Latency** — time-to-first-token (TTFT) and inter-token-latency (ITL)
* **Tail behavior** — what the p99.9 user experiences
Optimizations that improve one often regress another. Continuous batching lifts throughput by 10× but raises TTFT for any request that joins a busy batch. Speculative decoding lifts decode rate but eats prefill compute. The job of a serving system is to expose these trade-offs as tunable, not to claim a single winning configuration.
### Results [#results]
Measurements from a 70B-parameter dense model, served on 8× H100 GPUs with tensor-parallel splitting:
TTFT held within 250ms (p50) and 800ms (p99) across all configurations. ITL stayed below 40ms (p99) once speculative decoding was tuned.
## Discussion [#discussion]
Two limitations worth naming:
**Workload mix matters more than headline numbers.** Speculative decoding helps short generations and hurts long ones — for a 4,096-token completion, the draft model's mispredictions compound and you end up doing extra verification work. We saw a 1.4× regression when the workload shifted from chat (50-token responses) to code generation (1,000+ tokens).
**FP8 quantization is not free.** On reasoning benchmarks (MMLU, MATH), FP8 lost 0.8–1.4 points versus BF16. For factual recall benchmarks, the loss was within noise. If your users are doing chain-of-thought work, measure before shipping FP8.
Each technique in the table above was deployed individually, measured, and only kept if the latency tradeoff was acceptable. Stacking them in one shot makes regressions impossible to attribute.
## Conclusion [#conclusion]
Going from 1× to 30× throughput is not a single technique — it's the disciplined application of four, each carefully measured against latency and quality. The serving system that gets this right is the one that exposes these as runtime knobs rather than build-time decisions.
For teams just starting out: deploy continuous batching first. It's the largest single win, the most stable, and the easiest to reason about. Layer the rest on once you have measurements to defend each one.
## References [#references]
1. Yu et al., "Orca: A Distributed Serving System for Transformer-Based Generative Models" (OSDI 2022)
2. Kwon et al., "Efficient Memory Management for Large Language Model Serving with PagedAttention" (SOSP 2023)
3. Leviathan, Kalman, Matias, "Fast Inference from Transformers via Speculative Decoding" (ICML 2023)
4. Dettmers et al., "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale" (NeurIPS 2022)
# Configuration
URL: https://whitepaper.designervenkat.online/docs/coding-tutorials/configuration
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/coding-tutorials/configuration
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Site title, navigation, theme colors, and search behavior — everything that customizes how the docs feel.
Most of the site's behavior is controlled by three files. Once you know what each one owns, customization is straightforward.
## Where things live [#where-things-live]
## Site title and navigation [#site-title-and-navigation]
Edit `lib/layout.shared.ts` — this is the single source of truth for the brand title, nav links, and footer.
```ts title="lib/layout.shared.ts"
export const baseOptions = {
nav: {
title: "nxt-whitepapers",
},
links: [
{ text: "GitHub", url: "https://github.com/your-org/nxt-whitepapers" },
],
};
```
The `nav.title` shows in the top-left of both the docs sidebar and any home pages you add.
## Theme colors [#theme-colors]
Fumadocs uses CSS variables for color — primary, background, foreground, muted, and so on. To match the brand, override them in `app/globals.css`:
```css title="app/globals.css"
:root {
--brand-blue-300: #93c5fd;
--brand-sky-500: #0ea5e9;
--gradient-primary: linear-gradient(
to bottom,
var(--brand-blue-300),
var(--brand-sky-500)
);
--color-fd-primary: var(--brand-sky-500);
}
```
OKLCH is perceptually uniform — same lightness value reads as the same
brightness across hues. HSL fakes this and tends to produce muddier mid-tones.
If you prefer HSL, both formats work; just be consistent across light and dark
variants.
## Sidebar order [#sidebar-order]
The order of pages in the sidebar comes from `meta.json` files. Each folder can have one:
```json title="content/docs/meta.json"
{
"pages": [
"index",
"ai-machine-learning",
"coding-tutorials",
"security",
"ui-ux",
"performances"
]
}
```
* `"index"` and `"ai-machine-learning"` are file/folder names (no extension)
* `"---Label---"` becomes a section divider with the given text
Pages not listed in `meta.json` are appended to the end in alphabetical order.
To hide a page, prefix the filename with `_` — Fumadocs ignores those.
## Search [#search]
Search is configured in `app/api/search/route.ts`. It uses Fumadocs' `advanced` mode, which indexes headings and body text:
```ts title="app/api/search/route.ts"
import { source } from "@/lib/source";
import { createSearchAPI } from "fumadocs-core/search/server";
export const { GET } = createSearchAPI("advanced", {
indexes: source.getPages().map((page) => ({
title: page.data.title,
structuredData: page.data.structuredData,
id: page.url,
url: page.url,
})),
});
```
Switch to `"simple"` if you only want to match titles — faster, less useful.
## Feedback collector [#feedback-collector]
The thumbs-up/thumbs-down widget at the bottom of every page is wired in `app/docs/[[...slug]]/page.tsx`. Right now it `console.log`s the response; replace the server action body with a fetch to your analytics endpoint.
```ts onSendAction=
{async (feedback) => {
"use server";
console.log("[feedback]", feedback);
return {};
}}
```
```ts onSendAction=
{async (feedback) => {
"use server";
await posthog.capture("docs_feedback", feedback);
return {};
}}
```
```ts onSendAction=
{async (feedback) => {
"use server";
await fetch("https://api.example.com/feedback", {
method: "POST",
body: JSON.stringify(feedback),
});
return {};
}}
```
# Your First Whitepaper
URL: https://whitepaper.designervenkat.online/docs/coding-tutorials/first-whitepaper
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/coding-tutorials/first-whitepaper
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Walk through publishing a paper end-to-end, from frontmatter to deployed URL.
This guide takes a single whitepaper from blank file to live URL. The whole loop should take about ten minutes.
## The structure of a whitepaper [#the-structure-of-a-whitepaper]
A paper lives as an MDX file under a topic folder in `content/docs/` — for example `content/docs/security/`. Pick a filename that reads well in a URL — `consensus-under-network-partitions.mdx` is better than `paper3.mdx`.
### Create the file [#create-the-file]
```bash
touch content/docs/security/your-paper-slug.mdx
```
### Add frontmatter [#add-frontmatter]
Every MDX file starts with a YAML block between `---` markers. Two fields are required: `title` and `description`. The title becomes the `` and the search index entry; the description shows under the title and in social previews.
```mdx
---
title: Consensus Under Network Partitions
description: A re-examination of CAP trade-offs in modern cloud topologies.
---
```
Don't add a `# Title` heading in the body — Fumadocs renders the title from frontmatter. A second `` will duplicate it.
### Write the abstract [#write-the-abstract]
The first paragraph is the abstract. Keep it tight — 150 words or less — and treat it as the only paragraph anyone might actually read.
```mdx
The CAP theorem is often invoked to justify availability over consistency,
but the trade-off looks different when the partition probability is small
and the cost of inconsistency is large. This paper argues for...
```
### Break the body into sections [#break-the-body-into-sections]
Use `##` for top-level sections and `###` for sub-sections. The right-hand TOC populates automatically from these. Aim for sections of 200–600 words each — longer than that, and readers lose the thread.
```mdx
## Background
Prior work on consensus falls into three camps...
### Paxos and its descendants
...
### CRDTs and eventual consistency
...
```
### Drop in components where they earn their keep [#drop-in-components-where-they-earn-their-keep]
Diagrams, tables, code blocks, and callouts make papers easier to skim. Don't sprinkle them everywhere — use one when the prose can't do the job alone.
```mdx
import { Callout } from "fumadocs-ui/components/callout";
Lamport's original Paxos paper is famously hard to read. Stoppable Paxos
(2008) is the version most implementations follow.
```
### Preview locally [#preview-locally]
If the dev server is running, the new file appears in the sidebar within a second of saving. Click through and check:
* The title and description look right
* Headings nest correctly in the TOC
* Code blocks have the right syntax highlighting
* Links resolve
### Deploy [#deploy]
Push to your main branch. Whatever's configured in your CI — Vercel, Netlify, your own runner — picks up the change and rebuilds. The static export means the new page is live within a couple of minutes.
## Editorial conventions [#editorial-conventions]
A few rules we follow to keep the collection coherent:
* **Cite primary sources.** Link directly to papers, not summary blog posts.
* **One claim per section.** If a section needs a sub-heading to introduce a second claim, split it.
* **Quantify everything.** "Significantly faster" is meaningless. "37% lower p99 latency on a 16-core box" is useful.
* **Acknowledge limitations.** Every paper has a "Discussion" section covering what the analysis doesn't cover.
Once your paper is in good shape, open a pull request. The editorial team
reviews for clarity and factual accuracy — usually a 2-3 day turnaround.
# Installation
URL: https://whitepaper.designervenkat.online/docs/coding-tutorials/installation
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/coding-tutorials/installation
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Clone the repository, install dependencies, and run the dev server in under a minute.
This guide walks through setting up a local copy of nxt-whitepapers — useful if you plan to contribute a paper, fork the project, or just want to read offline.
## Prerequisites [#prerequisites]
You'll need:
* **Node.js 20** or later (the build uses Turbopack, which expects modern Node)
* **npm 10** or later (or pnpm/yarn, but examples use npm)
* A terminal you're comfortable with
If you're on macOS and using Homebrew, `brew install node@20` will get you
both Node and npm in one shot.
## Install [#install]
### Clone the repository [#clone-the-repository]
Pull the source from GitHub. You can fork first if you plan to push changes back.
```bash
git clone https://github.com/your-org/nxt-whitepapers.git
cd nxt-whitepapers
```
```bash
git clone git@github.com:your-org/nxt-whitepapers.git
cd nxt-whitepapers
```
```bash
gh repo clone your-org/nxt-whitepapers
cd nxt-whitepapers
```
### Install dependencies [#install-dependencies]
```bash
npm install
```
This installs Next.js, Fumadocs, Tailwind, and a handful of MDX plugins. Expect 30–60 seconds on a cold cache.
### Start the dev server [#start-the-dev-server]
```bash
npm run dev
```
The site is now live at [http://localhost:3000](http://localhost:3000). Edits to `.mdx` files hot-reload — no restart needed.
## Verify [#verify]
Open the homepage and confirm the sidebar lists at least one section. If it's empty, the content pipeline hasn't picked up the MDX files — usually a stale `.source/` cache.
```bash
rm -rf .source
npm run dev
```
Don't commit the `.source/` directory. It's generated at build time from
`content/docs/` and is already in `.gitignore`.
## Next steps [#next-steps]
Once the dev server is running, head to [Configuration](/docs/coding-tutorials/configuration) to tailor the site name, theme, and search behavior. Or skip straight to [Writing your first whitepaper](/docs/coding-tutorials/first-whitepaper) if you're ready to publish.
# Search
URL: https://whitepaper.designervenkat.online/docs/coding-tutorials/search
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/coding-tutorials/search
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
How the full-text index works, how to extend it, and what to do when results look wrong.
Search is one of the few features users notice within seconds. A bad search experience teaches readers to fall back to Google with a `site:` filter; a good one becomes how they navigate the entire library.
## How the index works [#how-the-index-works]
When you run `npm run build`, Fumadocs walks every MDX file under `content/docs/` and extracts structured data:
1. **Title** from frontmatter
2. **Headings** — every `##`, `###`, with their text and anchor IDs
3. **Body** — prose between headings, with HTML tags stripped
This is shipped as a JSON index served by `/api/search`. The client-side search component (mounted in the sidebar) fetches the index on first focus, then runs queries locally. No round trip per keystroke.
An average documentation site has 50–500 pages. The index for that is 100KB–2MB gzipped — small enough to ship to the client. Server-side search makes sense only above \~10,000 pages.
## Modes [#modes]
`createSearchAPI` accepts two modes. The trade-off is index size versus result quality.
This project uses `advanced`:
```ts title="app/api/search/route.ts"
import { source } from "@/lib/source";
import { createSearchAPI } from "fumadocs-core/search/server";
export const { GET } = createSearchAPI("advanced", {
indexes: source.getPages().map((page) => ({
title: page.data.title,
structuredData: page.data.structuredData,
id: page.url,
url: page.url,
})),
});
```
`structuredData` comes from Fumadocs' `remark-structure` plugin, which is enabled by default — no extra config needed.
## Customizing relevance [#customizing-relevance]
The default ranking weighs title matches more than heading matches more than body matches. Two ways to influence it:
### Add tags [#add-tags]
Frontmatter `tags` are searchable with higher weight than body prose:
```mdx
---
title: Consensus Under Network Partitions
description: ...
tags: [distributed-systems, cap, raft]
---
```
Queries for "raft" will now rank this paper near the top even if "raft" doesn't appear in the title.
### Custom synonyms [#custom-synonyms]
For domain-specific abbreviations, expand them at index time:
```ts
indexes: source.getPages().map((page) => ({
title: page.data.title,
structuredData: page.data.structuredData,
id: page.url,
url: page.url,
extra_tokens: synonyms(page.data.title),
})),
```
Where `synonyms()` maps "LLM" → "large language model", "CRDT" → "conflict-free replicated data type", etc. Worth doing if your domain has 10+ such abbreviations.
## Debugging [#debugging]
If search results look wrong, check three things:
1. **Is the page in the index?** Hit `/api/search?query=` (empty query) and inspect the response. If the page isn't there, the build skipped it — usually a parse error in the MDX.
2. **Are headings being extracted?** Check the structured data in `page.data.structuredData`. Headings should appear as `{ type: "heading", content: "..." }` entries.
3. **Is the client cache stale?** The browser caches the index. Hard refresh (Cmd+Shift+R) to bust it.
If you have unpublished work in `content/docs/_drafts/`, the leading underscore tells Fumadocs to ignore the folder. Without it, drafts show up in search and confuse readers.
## Server-side search [#server-side-search]
If the index grows past 5MB, switch to server-side. The pattern:
1. Ship the index to a search service (Algolia, Meilisearch, Typesense)
2. Replace `/api/search` with a proxy to that service
3. Use Fumadocs' `createSearchClient` with a custom fetcher
Typesense is the lightest of the three — self-hostable, fast, with a typo-tolerant engine that matches Algolia's quality on most queries.
# Writing Content
URL: https://whitepaper.designervenkat.online/docs/coding-tutorials/writing-content
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/coding-tutorials/writing-content
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
MDX components, conventions, and the editorial rules that keep the collection readable.
Whitepapers are written in MDX — Markdown with embedded React components. This page covers the components available, the editorial conventions, and a few gotchas.
## Components at your disposal [#components-at-your-disposal]
Every MDX file can import from `fumadocs-ui/components/*`. The components most useful for whitepapers:
| Component | When to use |
| --------------------------- | -------------------------------------------------------------- |
| `Callout` | Asides, warnings, footnotes that don't belong in flow |
| `Tabs` / `Tab` | When the same concept has language- or platform-specific forms |
| `Steps` / `Step` | Tutorial-style sequences |
| `Accordions` / `Accordion` | Optional detail that most readers should skip |
| `TypeTable` | API reference tables — types, defaults, descriptions |
| `Files` / `Folder` / `File` | Directory trees |
### Callout [#callout]
```mdx
import { Callout } from "fumadocs-ui/components/callout";
Body text goes here. Markdown works inside.
```
Types: `info`, `warn`, `error`, `success`. Default is `info`.
This is what an `info` callout looks like rendered.
This is what a `warn` callout looks like rendered.
### Tabs [#tabs]
Use when the same instruction has variants. Don't overuse — three tabs is fine, eight is a sign you need a different structure.
`bash npm install fumadocs-ui `
`bash pnpm add fumadocs-ui `
`bash yarn add fumadocs-ui `
### Code blocks [#code-blocks]
Triple-backtick fences with a language identifier get syntax highlighting. Add a `title` for files:
````mdx
```ts title="lib/source.ts"
import { docs } from "collections/server";
import { loader } from "fumadocs-core/source";
export const source = loader({
baseUrl: "/docs",
source: docs.toFumadocsSource(),
});
```
````
Supported languages: TypeScript, JavaScript, Python, Go, Rust, SQL, Bash, JSON, YAML, and most others Shiki recognizes.
## Editorial conventions [#editorial-conventions]
A few rules that keep the library readable as it grows.
### Headings [#headings]
* `##` for top-level sections. Aim for 4–8 per paper.
* `###` for sub-sections. Optional; use only when a section has genuinely distinct sub-claims.
* Never `####` or below. If you need that depth, the section is too long.
### Length [#length]
* **Abstract:** 100–200 words
* **Section:** 200–600 words
* **Whole paper:** 3,000–8,000 words
Papers shorter than 3,000 words usually belong as blog posts. Papers longer than 8,000 should be split into a series.
### Citations [#citations]
Link directly to the primary source — the paper, the spec, the original blog post. Summary articles age poorly and the links rot. Use the References section at the end for formal citations.
### Tone [#tone]
Direct, specific, quantitative where possible. Avoid:
* Hedging adjectives ("significantly", "substantially") without numbers
* Marketing-speak ("seamless", "robust", "powerful")
* Rhetorical questions in body prose (they read as filler)
Search snippets, social previews, and skim-readers all anchor on the opening.
Treat it as the abstract: state the claim, name the evidence, hint at the
conclusion.
## Common pitfalls [#common-pitfalls]
A few things that trip up new contributors:
* **Duplicate `# Title`** — frontmatter `title` already renders the h1. Don't repeat it in the body.
* **Stale `.source/`** — if a new file doesn't appear in the sidebar, delete `.source/` and restart `npm run dev`.
* **Bad relative links** — links in MDX use the URL, not the file path. `/docs/security/foo` not `../security/foo.mdx`.
* **Tabs/Steps imported wrong** — `import { Tab, Tabs } from 'fumadocs-ui/components/tabs'` — both, plural for the wrapper.
# Introduction to Design Principles
URL: https://whitepaper.designervenkat.online/docs/design-principles/introduction
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/design-principles/introduction
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Core design principles and user experience fundamentals every builder should understand before shipping an interface.
Good products are not only functional — they feel clear, trustworthy, and intentional. **Design principles** are the rules of thumb that guide those decisions. **User experience (UX)** is how people actually feel when they use what you build.
This section covers the ideas behind great interfaces — not just how they look, but how they behave.
## What are design principles? [#what-are-design-principles]
Design principles are repeatable guidelines that help teams make consistent choices. They answer questions like:
* What should draw attention first?
* How much information belongs on one screen?
* When should we guide the user vs. get out of the way?
Common principles include **clarity**, **consistency**, **feedback**, **accessibility**, and **progressive disclosure** — revealing complexity only when the user needs it.
## UX vs. UI [#ux-vs-ui]
| Term | Focus |
| ------------------------ | ------------------------------------------------------- |
| **UI (User Interface)** | Visual layout — typography, color, spacing, components |
| **UX (User Experience)** | The full journey — goals, friction, trust, and outcomes |
UI is what you see. UX is how it *works* for a real person trying to accomplish something.
Before polishing pixels, define the user's goal in one sentence. Every layout
decision should support that goal — or be removed.
## Heuristics worth knowing [#heuristics-worth-knowing]
Nielsen's usability heuristics remain a practical checklist for beginners:
1. **Visibility of system status** — users should always know what's happening.
2. **Match between system and the real world** — use language and patterns people already understand.
3. **User control and freedom** — make undo, back, and cancel easy.
4. **Consistency and standards** — similar actions should look and behave the same way.
5. **Error prevention** — design to stop mistakes before they happen.
## What you'll find in this section [#what-youll-find-in-this-section]
Articles here explore design thinking for builders — research methods, information architecture, interaction patterns, and the craft of experiences that respect the user's time and attention.
More guides are on the way. For implementation-focused theming, see [UI & UX customization](/docs/ui-ux/customization).
# Consensus Under Network Partitions
URL: https://whitepaper.designervenkat.online/docs/security/consensus-protocols
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/security/consensus-protocols
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Why CAP trade-offs look different in modern cloud topologies, and what that means for system design.
The CAP theorem is the most invoked, least understood result in distributed systems. It's used to justify availability over consistency, consistency over availability, and occasionally both in the same architecture diagram. This paper revisits the trade-off in light of two changes since Brewer's original conjecture: cloud networks are now much more reliable than they were in 2000, and the cost of inconsistency has grown sharply for many workloads.
## Background [#background]
Brewer's CAP theorem, formalized by Gilbert and Lynch (2002), states that a distributed system cannot simultaneously provide:
* **Consistency** — every read returns the most recent write
* **Availability** — every request receives a non-error response
* **Partition tolerance** — the system continues to function despite arbitrary message loss
When a partition occurs, the system must sacrifice either C or A. This is the famous "pick two" framing, though that framing is misleading: partitions happen, so you're really picking between C and A, with P as a given.
### What's changed since 2000 [#whats-changed-since-2000]
Three things, mainly:
1. **Network reliability**. Inside a single cloud region in 2024, partition rates are measured in basis points per year. Cross-region links are less reliable, but most systems are not cross-region.
2. **Workload shifts**. Financial transactions, inventory management, and authentication systems pay much higher costs for inconsistency than the early-web workloads (blogs, social posts) that shaped Dynamo and Cassandra.
3. **Consensus performance**. Raft (2014) and EPaxos (2013) brought consensus latency from tens of milliseconds to single-digit milliseconds inside a region. The performance argument against CP systems is weaker than it was.
## Core argument [#core-argument]
The CAP framing treats availability and consistency as binary properties. They are not. Both are continuous, both can be tuned, and the right operating point depends on the cost of each kind of failure.
### A more useful model [#a-more-useful-model]
Consider four quantities:
* **p** — probability of network partition per unit time
* **C(c)** — cost of an inconsistent read (depends on workload class **c**)
* **A(c)** — cost of unavailability during a partition
* **t** — expected partition duration
Total expected cost is:
```
E[cost] = p · t · A(c) (AP system, pays availability cost during partition)
+ p · C(c) (CP system, pays inconsistency cost per stale read)
```
For workloads where C(c) ≫ A(c) — payments, leader election, inventory — the CP system wins even at high partition rates. For workloads where A(c) ≫ C(c) — feed ranking, recommendations, telemetry — the AP system wins.
Most real systems are neither pure CP nor pure AP. They use **bounded staleness**: reads may be stale by up to N seconds or M writes, and the system tracks both. This is what Spanner, CockroachDB, and FaunaDB do under the hood.
## Implementation patterns [#implementation-patterns]
Three patterns appear repeatedly in production systems that navigate CAP well:
A read is sent to multiple replicas simultaneously; the first non-error response wins. This converts the tail of replica latency into a constant — at the cost of duplicating read traffic. Used by Spanner, BigTable, and most major caches.
The interesting failure mode is when the hedge response comes from a replica that's about to be partitioned. The client sees a stale value, then the partition heals and subsequent reads disagree. Hedged reads must be paired with monotonic-read guarantees to be safe.
Dynamo-style systems let each request specify the read quorum (R) and write quorum (W), with the invariant that R + W > N (replica count) for strong consistency.
The flexibility is real but rarely used well. In our measurements across three large Dynamo deployments, 94% of requests used the default (R=1, W=1), trading consistency for latency without any explicit decision. Workload-aware defaults — R and W chosen per table based on read/write ratio — would have prevented most of the resulting incidents.
Strictly weaker than linearizability, strictly stronger than eventual consistency: reads respect the happens-before relation. Implemented via version vectors or hybrid logical clocks.
The operational property that matters: a user who writes value X and then reads will see X or something newer, even across replica failovers. This is enough for most user-facing workloads — much cheaper than full linearizability, much safer than eventual consistency.
## Discussion [#discussion]
Two limitations of this analysis:
**The cost functions are hard to measure.** "Cost of an inconsistent read" is workload-specific and often political — engineering teams understate it, compliance teams overstate it. The framework is useful for structuring the conversation, not for producing a single number.
**Partition probability is a moving target.** Cloud providers have made networks dramatically more reliable, but they've also added abstraction layers (load balancers, service meshes, sidecars) that introduce their own failure modes. The "network partition" of 2024 is more likely to be a misconfigured Envoy than a severed fiber.
## Conclusion [#conclusion]
CAP is a useful starting point, not an architecture decision. The interesting questions are: how often do partitions happen in your environment, how long do they last, and what does each kind of failure actually cost the business? Answering those three questions gives you a defensible position on the CAP trade-off; invoking CAP without them is cargo-culting.
## References [#references]
1. Gilbert and Lynch, "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" (2002)
2. DeCandia et al., "Dynamo: Amazon's Highly Available Key-Value Store" (SOSP 2007)
3. Ongaro and Ousterhout, "In Search of an Understandable Consensus Algorithm" (USENIX ATC 2014)
4. Corbett et al., "Spanner: Google's Globally Distributed Database" (OSDI 2012)
5. Bailis et al., "Highly Available Transactions: Virtues and Limitations" (VLDB 2014)
# Customization
URL: https://whitepaper.designervenkat.online/docs/ui-ux/customization
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/ui-ux/customization
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en
Override colors, fonts, layout, and components — most changes are a single CSS variable away.
Fumadocs is designed to look reasonable out of the box and bend to your brand without a fork. Most customization happens in two files: `app/globals.css` for design tokens, and `lib/layout.shared.ts` for structural options.
## Color tokens [#color-tokens]
Every color in the UI is driven by a CSS variable. The full set:
| Variable | Purpose |
| ------------------------------- | ----------------------------------------- |
| `--color-fd-background` | Page background |
| `--color-fd-foreground` | Body text |
| `--color-fd-primary` | Accent — TOC active state, links on hover |
| `--color-fd-primary-foreground` | Text on primary backgrounds |
| `--color-fd-muted` | Secondary surfaces (sidebar, code blocks) |
| `--color-fd-muted-foreground` | Secondary text (descriptions, captions) |
| `--color-fd-border` | All borders and dividers |
| `--color-fd-card` | Card and callout backgrounds |
| `--color-fd-accent` | Hover state for nav items |
Override any of them in `app/globals.css`:
```css title="app/globals.css"
:root {
--brand-blue-300: #93c5fd;
--brand-sky-500: #0ea5e9;
--gradient-primary: linear-gradient(
to bottom,
var(--brand-blue-300),
var(--brand-sky-500)
);
--color-fd-primary: var(--brand-sky-500);
}
```
`oklch(L C H)` — Lightness (0–1), Chroma (0–0.4ish), Hue (0–360°). L=0.5 reads
as the same brightness regardless of hue. For a light theme, set L between
0.85 and 0.95; for dark, 0.15 to 0.30.
## Fonts [#fonts]
The site uses Geist by default. To swap, edit `app/layout.tsx`:
```tsx
import { Geist, Geist_Mono } from "next/font/google";
const geistSans = Geist({ variable: "--font-geist-sans", subsets: ["latin"] });
const geistMono = Geist_Mono({ variable: "--font-geist-mono", subsets: ["latin"] });
```
```tsx
import { Inter, JetBrains_Mono } from "next/font/google";
const sans = Inter({ variable: "--font-geist-sans", subsets: ["latin"] });
const mono = JetBrains_Mono({ variable: "--font-geist-mono", subsets: ["latin"] });
```
Keep the variable names (`--font-geist-sans`, `--font-geist-mono`) — globals.css references them.
```tsx
import localFont from "next/font/local";
const customSans = localFont({
src: "../public/fonts/MyFont.woff2",
variable: "--font-geist-sans",
});
```
## Layout structure [#layout-structure]
`lib/layout.shared.ts` exposes the props passed into Fumadocs' `DocsLayout`:
```ts title="lib/layout.shared.ts"
export const baseOptions = {
nav: {
title: "nxt-whitepapers",
url: "/",
},
links: [
{ text: "Docs", url: "/docs" },
{ text: "GitHub", url: "https://github.com/your-org/nxt-whitepapers" },
],
githubUrl: "https://github.com/your-org/nxt-whitepapers",
};
```
The `links` array shows in the top-right of the docs header. The `githubUrl` adds a small GitHub icon in the bottom-left of the sidebar.
## Sidebar sections [#sidebar-sections]
Group pages by editing `content/docs/meta.json`. The `pages` array supports three entry types:
* **A page slug** — `"installation"` → renders the page in order
* **A folder name** — `"coding-tutorials"` → renders the folder's contents (nested)
* **A separator** — `"---Label---"` → renders a section heading
Example:
```json title="content/docs/meta.json"
{
"pages": [
"index",
"ai-machine-learning",
"coding-tutorials",
"security",
"ui-ux",
"performances"
]
}
```
## Overriding the page component [#overriding-the-page-component]
If you need to add custom content above or below every page — a banner, an author bio, related links — edit `app/docs/[[...slug]]/page.tsx`:
```tsx title="app/docs/[[...slug]]/page.tsx"
```
Anything inside `` inherits the typography styles. Anything outside it doesn't.
It's tempting to copy `DocsLayout` from `node_modules` and modify it. Don't —
you lose upstream improvements and bug fixes. Override via props (`sidebar`,
`nav`, `containerProps`) instead. Fumadocs exposes most of what you'd want to
change.
## Dark mode [#dark-mode]
Fumadocs ships with a theme toggle in the sidebar footer. The toggle writes to `localStorage` and adds a `dark` class to ``. To force a default:
```ts title="app/layout.tsx"
{children}
```
Setting `enableSystem: false` ignores the user's OS preference and uses `defaultTheme` for first-time visitors only.