# MCP Documentation Server (@vjlanguage/mcp-vj-docs)

A Model Context Protocol (MCP) server for documentation crawling, indexing, and retrieval. This package provides tools for crawling websites, storing and indexing the content, and searching through that content using TF-IDF based search. The search results are optimized for large language models.

# MCP 文档服务器 (@vjlanguage/mcp-vj-docs)

一个用于文档爬取、索引和检索的模型上下文协议（MCP）服务器。该包提供了爬取网站、存储和索引内容以及使用基于 TF-IDF 的搜索来搜索内容的工具。搜索结果经过优化，适合大型语言模型使用。

## Features | 功能

- **Documentation Crawling**: Crawl documentation from websites using Firecrawl
- **Content Processing**: Convert HTML to Markdown and extract relevant content
- **Storage & Indexing**: Store documents using lowdb with TF-IDF based indexing
- **LLM-Optimized Search**: Search for documentation with aggregated results optimized for large language models
  - **Full Content Return**: No character length limits on search results
  - **Content-First Results**: Prioritizes content over URLs in search results
  - **Smart Deduplication**: Removes duplicate content and returns only the top 3 most relevant results
  - **AI-Optimized Format**: Results structured specifically for AI consumption and code generation
  - **Complete Document Context**: Returns full document content via `fullDocument` field for comprehensive context
- **Custom Corpus Management**: Add your own text corpus files for inclusion in search results
  - **Multiple Format Support**: Supports TXT, Markdown, and PDF files
  - **Automatic Indexing**: Files in corpus directory are automatically indexed and searchable
- **MCP Integration**: Expose tools for crawling and searching via Model Context Protocol
- **Path Handling**: Support for tilde (~) expansion in file paths
- **Server Modes**: Support for both SSE (Server-Sent Events) and stdio transports

## 功能

- **文档爬取**：使用 Firecrawl 从网站爬取文档
- **内容处理**：将 HTML 转换为 Markdown 并提取相关内容
- **存储和索引**：使用 lowdb 存储文档，并使用基于 TF-IDF 的索引
- **LLM 优化搜索**：搜索文档并返回经过聚合的结果，专为大型语言模型优化
  - **完整内容返回**：搜索结果没有字符长度限制
  - **内容优先结果**：在搜索结果中优先考虑内容而非 URL
  - **智能去重**：移除重复内容并仅返回前 3 个最相关的结果
  - **AI 优化格式**：结果结构专为 AI 消费和代码生成而设计
  - **完整文档上下文**：通过 `fullDocument` 字段返回完整文档内容，提供全面的上下文
- **自定义语料库管理**：添加您自己的文本语料库文件以包含在搜索结果中
  - **多格式支持**：支持 TXT、Markdown 和 PDF 文件
  - **自动索引**：语料目录中的文件自动索引并可搜索
- **MCP 集成**：通过模型上下文协议暴露爬取和搜索工具
- **路径处理**：支持波浪号（~）在文件路径中的扩展
- **服务器模式**：支持 SSE（服务器发送事件）和 stdio 传输

## Changelog | 更新日志

### 2025-04-11
- **Search Result Enhancement**: Modified search functionality to include relevant paragraphs for each individual result item, rather than only showing content for the top result.
- **Result Format Improvement**: Changed the structure to make it clearer which document content belongs to which search result.
- **Document Retrieval Enhancement**: Improved the `vjdoc_get_document` tool to support partial matching for both URL and title parameters.

### 2025年04月11日
- **搜索结果增强**：修改了搜索功能，为每个单独的结果项包含相关段落，而不仅仅是显示顶部结果的内容。
- **结果格式改进**：更改了结构，使其更清晰地显示哪些文档内容属于哪个搜索结果。
- **文档检索增强**：改进了 `vjdoc_get_document` 工具，支持 URL 和标题参数的部分匹配。

## Installation | 安装

```bash
# Install globally | 全局安装
npm install -g @vjlanguage/mcp-vj-docs

# Or use with npx | 或使用 npx
npx @vjlanguage/mcp-vj-docs
```

## Firecrawl Registration and API Key | Firecrawl 注册和 API 密钥

### English

This package uses Firecrawl service for web crawling. To use it, you need to:

1. **Register for Firecrawl**:
   - Visit [Firecrawl website](https://firecrawl.dev) and create an account
   - Or use the local Firecrawl service by setting `FIRECRAWL_API_URL` to your local endpoint

2. **Get your API Key**:
   - After registration, navigate to your account dashboard
   - Find and copy your API key
   - Add this key to your environment variables or MCP configuration

3. **Configure the API Key**:
   - Set the `FIRECRAWL_API_KEY` environment variable
   - Or add it to your MCP configuration (see example below)

### 中文

本包使用 Firecrawl 服务进行网页爬取。要使用它，您需要：

1. **注册 Firecrawl**：
   - 访问 [Firecrawl 网站](https://firecrawl.dev) 并创建账户
   - 或通过设置 `FIRECRAWL_API_URL` 为您的本地端点来使用本地 Firecrawl 服务

2. **获取您的 API 密钥**：
   - 注册后，导航到您的账户仪表板
   - 找到并复制您的 API 密钥
   - 将此密钥添加到您的环境变量或 MCP 配置中

3. **配置 API 密钥**：
   - 设置 `FIRECRAWL_API_KEY` 环境变量
   - 或将其添加到您的 MCP 配置中（见下面的示例）

## Usage | 使用方法

### Environment Variables | 环境变量

- `VJDOC_DB_PATH` - Path to the database file (default: ./data/docs.json) | 数据库文件路径（默认：./data/docs.json）
- `VJDOC_MAX_DEPTH` - Maximum depth to crawl (default: 3) | 最大爬取深度（默认：3）
- `VJDOC_MAX_PAGES` - Maximum number of pages to crawl (default: 100) | 最大爬取页面数（默认：100）
- `VJDOC_LOG_DIR` - Directory for log files | 日志文件目录
- `VJDOC_LOG_TO_FILE` - Whether to log to file (true/false) | 是否记录到文件（true/false）
- `VJDOC_LOG_LEVEL` - Log level (error, warn, info, debug) | 日志级别（error, warn, info, debug）
- `FIRECRAWL_API_KEY` - API key for Firecrawl service | Firecrawl 服务的 API 密钥
- `FIRECRAWL_API_URL` - Custom URL for Firecrawl API | Firecrawl API 的自定义 URL
- `MCP_TRANSPORT` - Transport method (sse or stdio, default: sse) | 传输方法（sse 或 stdio，默认：sse）
- `VJDOC_TFIDF_FILES_DIR` - Directory for custom corpus files (default: ~/mcpdata/tfidf_files) | 自定义语料库文件目录（默认：~/mcpdata/tfidf_files）

```json
{
  "mcpServers": {
    "mcp-vj-docs": {
      "command": "npx",
      "args": ["-y", "@vjlanguage/mcp-vj-docs@latest"],
      "env": {
        "FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
        "VJDOC_MAX_DEPTH": "4",
        "VJDOC_MAX_PAGES": "100",
        "VJDOC_DB_PATH": "~/mcpdata/docs.json",
        "VJDOC_LOG_DIR": "~/mcpdata/logs",
        "VJDOC_LOG_TO_FILE": "true",
        "VJDOC_LOG_LEVEL": "debug",
        "FIRECRAWL_API_URL": "http://localhost:5002",
        "VJDOC_TFIDF_FILES_DIR": "~/mcpdata/tfidf_files"
      },
      "disabled": false,
      "timeout": 3600,
      "autoApprove": ["vjdoc_search", "vjdoc_crawl", "vjdoc_add_corpus_file"]
    }
  }
}
```

## MCP Tools | MCP 工具

The server exposes the following MCP tools:
服务器暴露以下 MCP 工具：

### 1. `vjdoc_crawl` Tool | `vjdoc_crawl` 工具

Crawls a website and indexes its content for search.
爬取网站并为搜索索引其内容。

**Parameters | 参数:**
- `url` (string, required): The URL to crawl (e.g., "https://example.com/docs") | 要爬取的 URL（例如，"https://example.com/docs"）
- `maxDepth` (number, optional): Maximum depth to crawl, default: 3 | 最大爬取深度，默认：3
- `maxPages` (number, optional): Maximum number of pages to crawl, default: 100 | 最大爬取页面数，默认：100
- `includePatterns` (array of strings, optional): Patterns to include in crawl (e.g., ["docs/*"]) | 要包含在爬取中的模式（例如，["docs/*"]）
- `excludePatterns` (array of strings, optional): Patterns to exclude from crawl (e.g., ["blog/*"]) | 要从爬取中排除的模式（例如，["blog/*"]）
- `defaultCategory` (string, optional): Default category for documents if not detected automatically | 如果未自动检测到，文档的默认类别

**Example | 示例:**
```json
{
  "url": "https://example.com/docs",
  "maxDepth": 3,
  "maxPages": 100,
  "includePatterns": ["docs/*"],
  "excludePatterns": ["blog/*"]
}
```

**Response | 响应:**
```json
{
  "success": true,
  "message": "Successfully crawled and indexed 42 pages from https://example.com/docs",
  "count": 42
}
```

### 2. `vjdoc_search` Tool | `vjdoc_search` 工具

Searches indexed documents with results optimized for large language models.
搜索已索引的文档，结果经过优化，适合大型语言模型。

**Parameters | 参数:**
- `query` (string, required): The search query (e.g., "how to use the API") | 搜索查询（例如，"如何使用 API"）
- `limit` (number, optional): Maximum number of sources to consider, default: 10 | 要考虑的最大源数，默认：10
- `filters` (object, optional): Optional filters to narrow down search results | 可选过滤器，用于缩小搜索结果范围
  - `categories` (array of strings, optional): Filter by document categories | 按文档类别过滤
  - `dateFrom` (number, optional): Filter documents created after this timestamp | 过滤在此时间戳之后创建的文档
  - `dateTo` (number, optional): Filter documents created before this timestamp | 过滤在此时间戳之前创建的文档
  - `metadata` (object, optional): Filter by metadata fields | 按元数据字段过滤
- `userId` (string, optional): Optional user ID for personalized results | 可选的用户 ID，用于个性化结果

**Example | 示例:**
```json
{
  "query": "how to use the API",
  "limit": 5,
  "filters": {
    "categories": ["API Documentation"]
  }
}
```

**Response | 响应:**
```json
{
  "success": true,
  "results": {
    "paragraph": "The API can be used by making HTTP requests to the endpoints...",
    "sources": [
      {
        "url": "https://example.com/docs/api",
        "title": "API Documentation",
        "relevance": 0.85,
        "paragraph": "The API can be used by making HTTP requests to the endpoints...",
        "highlightedParagraph": "The **API** can be used by making **HTTP** requests to the **endpoints**...",
        "fullDocument": "Complete document content for this specific result..."
      }
    ]
  }
}
```

### 3. `vjdoc_add_corpus_file` Tool | `vjdoc_add_corpus_file` 工具

Adds a custom corpus file to the TF-IDF files directory for inclusion in search results. This is perfect for adding your own code snippets, documentation, error solutions, or technical notes that you want to be searchable.

向 TF-IDF 文件目录添加自定义语料库文件，以包含在搜索结果中。这非常适合添加您自己的代码片段、文档、错误解决方案或技术笔记，使它们可被搜索。

**Parameters | 参数:**
- `content` (string, required): The text content to add to the corpus file | 要添加到语料库文件的文本内容
- `filename` (string, optional): Optional filename for the corpus file (without extension) | 语料库文件的可选文件名（不带扩展名）
- `category` (string, optional): Optional category for the corpus file | 语料库文件的可选类别

**Recommended Categories | 推荐类别:**
- `Code Snippet` - Reusable code patterns and examples | 可重用的代码模式和示例
- `API Documentation` - Function and parameter descriptions | 函数和参数描述
- `Error Solution` - Common errors and their fixes | 常见错误及其修复方法
- `Technical Note` - Personal learning summaries | 个人学习总结

**Example | 示例:**
```json
{
  "content": "// 快速排序实现\nfunction quickSort(arr) {\n  if (arr.length <= 1) return arr;\n  const pivot = arr[0];\n  const left = []; \n  const right = [];\n  for (let i = 1; i < arr.length; i++) {\n    arr[i] < pivot ? left.push(arr[i]) : right.push(arr[i]);\n  }\n  return [...quickSort(left), pivot, ...quickSort(right)];\n}\n\n// 常见错误：Uncaught TypeError\n// 解决方案：检查变量是否为null/undefined",
  "filename": "quicksort_algorithm",
  "category": "Code Snippet"
}
```

**Response | 响应:**
```json
{
  "success": true,
  "message": "Successfully added corpus file: code_snippet_quicksort_algorithm.txt",
  "filename": "code_snippet_quicksort_algorithm.txt",
  "category": "Code Snippet"
}
```

### 4. `vjdoc_get_docs_meta` Tool | `vjdoc_get_docs_meta` 工具

Retrieves metadata about all documents and corpus files to help LLMs understand the available content and plan effective searches.

获取所有文档和语料库文件的元数据，帮助大型语言模型了解可用内容并规划有效的搜索。

**Parameters | 参数:**
- `query` (string, required): Natural language query or requirement | 自然语言查询或需求

**Response Format | 响应格式:**
```json
{
  "query": "Original natural language query",
  "documents": [
    {
      "url": "Document URL",
      "title": "Document title",
      "category": "Document category",
      "timestamp": 1712190000000,
      "keywords": ["keyword1", "keyword2", "..."],
      "summary": "Brief summary of document content..."
    }
  ],
  "totalDocuments": 42,
  "categories": ["API Documentation", "Code Snippet", "..."],
  "suggestion": "Search guidance for LLMs"
}
```

### 5. `vjdoc_get_document` Tool | `vjdoc_get_document` 工具

Gets the full content of a specific document by URL or title.

通过 URL 或标题获取特定文档的完整内容。

**Parameters | 参数:**
- `url` (string, optional): URL of the document to retrieve | 要检索的文档的 URL
- `title` (string, optional): Title of the document to retrieve | 要检索的文档的标题

**Notes | 注意:**
- At least one of `url` or `title` must be provided | 必须提供 `url` 或 `title` 中的至少一个
- The tool supports partial matching for both parameters | 该工具支持两个参数的部分匹配
  - When using `url` parameter, it will find documents where the URL contains the provided string | 使用 `url` 参数时，它将查找 URL 包含所提供字符串的文档
  - When using `title` parameter, it will find documents where the title contains the provided string (case-insensitive) | 使用 `title` 参数时，它将查找标题包含所提供字符串的文档（不区分大小写）

**Example | 示例:**
```json
{
  "url": "https://example.com/docs/auth"
}
```

or | 或

```json
{
  "title": "Authentication Guide"
}
```

**Response | 响应:**
```json
{
  "url": "https://example.com/docs/auth",
  "title": "Authentication Guide",
  "content": "Complete document content...",
  "metadata": {
    "category": "API Documentation",
    "lastModified": "2023-01-15T12:00:00Z"
  }
}
```

## Using with AI Coding Assistants | 与 AI 编码助手一起使用

You can use these MCP tools with various AI coding assistants to enhance your documentation workflow.
您可以在各种 AI 编码助手中使用这些 MCP 工具来增强您的文档工作流程。

### Using with Cursor | 在 Cursor 中使用

In Cursor, you can use the MCP tools through the command interface:
在 Cursor 中，您可以通过命令界面使用 MCP 工具：

1. **Setup | 设置**: Configure Cursor to use your MCP server | 配置 Cursor 使用您的 MCP 服务器

2. **Crawling | 爬取**: Use the `/mcp` command to invoke the crawl tool | 使用 `/mcp` 命令调用 crawl 工具
   ```
   /mcp mcp-vj-docs vjdoc_crawl {"url": "https://example.com/docs", "maxDepth": 3, "maxPages": 100}
   ```

3. **Searching | 搜索**: Use the `/mcp` command to invoke the search tool | 使用 `/mcp` 命令调用 search 工具
   ```
   /mcp mcp-vj-docs vjdoc_search {"query": "authentication", "limit": 5, "filters": {"categories": ["API Documentation"]}}
   ```

4. **Adding Corpus Files | 添加语料库文件**: Use the `/mcp` command to add custom corpus files | 使用 `/mcp` 命令添加自定义语料库文件
   ```
   /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "// Your code here", "category": "Code Snippet"}
   ```

5. **Getting Document Content | 获取文档内容**: Use the `/mcp` command to get full document content | 使用 `/mcp` 命令获取完整文档内容
   ```
   /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth"}
   ```
   or | 或
   ```
   /mcp mcp-vj-docs vjdoc_get_document {"title": "Authentication Guide"}
   ```

### Advanced Workflow with AI Assistants | 与 AI 助手的高级工作流程

When working with AI assistants like Claude or GPT, you can create a more effective workflow:

1. **First, get document metadata** to understand what's available:
   ```
   /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement JWT authentication"}
   ```

2. **Then, search for relevant documents**:
   ```
   /mcp mcp-vj-docs vjdoc_search {"query": "JWT authentication implementation", "limit": 3}
   ```

3. **Finally, get the full content** of the most relevant document for comprehensive context:
   ```
   /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
   ```

4. **Ask the AI assistant** to explain or generate code based on the full document:
   ```
   Based on this documentation, please explain how to implement JWT authentication in my Node.js application.
   ```

This workflow ensures the AI has complete context while minimizing token usage by only retrieving full content for the most relevant documents.

当与 Claude 或 GPT 等 AI 助手一起工作时，您可以创建更有效的工作流程：

1. **首先，获取文档元数据**以了解有哪些可用内容：
   ```
   /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "我需要实现 JWT 认证"}
   ```

2. **然后，搜索相关文档**：
   ```
   /mcp mcp-vj-docs vjdoc_search {"query": "JWT 认证实现", "limit": 3}
   ```

3. **最后，获取最相关文档的完整内容**以获得全面的上下文：
   ```
   /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
   ```

4. **请求 AI 助手**基于完整文档解释或生成代码：
   ```
   根据这份文档，请解释如何在我的 Node.js 应用程序中实现 JWT 认证。
   ```

这个工作流程确保 AI 拥有完整的上下文，同时通过仅检索最相关文档的完整内容来最小化令牌使用。

## Troubleshooting | 故障排除

### Common Issues | 常见问题

1. **Database Path Issues | 数据库路径问题**
   - Ensure the directory for your database exists | 确保您的数据库目录存在
   - Check if you have write permissions to the specified path | 检查您是否有写入指定路径的权限
   - For tilde paths, ensure your home directory is correctly detected | 对于波浪号路径，确保正确检测到您的主目录

2. **Firecrawl API Issues | Firecrawl API 问题**
   - Verify your API key is correct | 验证您的 API 密钥是否正确
   - Check if you've reached API rate limits | 检查您是否达到了 API 速率限制
   - If using a local Firecrawl service, ensure it's running | 如果使用本地 Firecrawl 服务，确保它正在运行

3. **Crawling Issues | 爬取问题**
   - Some websites may block crawlers | 某些网站可能会阻止爬虫
   - Check if the website requires authentication | 检查网站是否需要身份验证
   - Try reducing the crawl depth and page limit | 尝试减少爬取深度和页面限制

### Logs | 日志

Check the logs for more detailed error information:
查看日志以获取更详细的错误信息：

- If `VJDOC_LOG_TO_FILE` is enabled, check the log files in your log directory | 如果启用了 `VJDOC_LOG_TO_FILE`，请检查日志目录中的日志文件
- Otherwise, check the console output | 否则，检查控制台输出

## Search Tool Response Format | 搜索工具响应格式

The `vjdoc_search` tool returns results in the following format:

```json
{
  "results": [
    {
      "url": "https://example.com/docs/api",
      "title": "API Documentation",
      "relevance": 0.85,
      "category": "API Documentation",
      "paragraph": "Content excerpt most relevant to this document...",
      "highlightedParagraph": "Content with **highlighted** query terms for this document...",
      "fullDocument": "Complete content for this specific document..." // Only present for the most relevant result
    },
    {
      "url": "https://example.com/docs/guide",
      "title": "User Guide",
      "relevance": 0.75,
      "category": "Documentation",
      "paragraph": "Content excerpt most relevant to this document...",
      "highlightedParagraph": "Content with **highlighted** query terms for this document..."
      // No fullDocument field for lower-ranked results
    },
    // More results...
  ],
  "content": "Summary of content most relevant to the query...",
  "fullDocument": "Complete document of the most relevant result",
  "personalized": true
}
```

Key fields:
- `results`: 带有相关性分数的来源列表
  - 每个结果包括：
    - `url`: 文档 URL
    - `title`: 文档标题
    - `relevance`: 相关性分数
    - `category`: 文档类别
    - `paragraph`: 来自此特定文档的相关段落摘录
    - `highlightedParagraph`: 带有高亮查询词的此文档段落
    - `fullDocument`: 完整的文档内容（仅适用于最相关的结果）
- `content`: 与查询相关的提取内容摘要
- `fullDocument`: 最相关结果的完整文档内容
- `personalized`: 结果是否基于用户 ID 进行了个性化

### 搜索工具响应格式

`vjdoc_search` 工具返回以下格式的结果：

```json
{
  "results": [
    {
      "url": "https://example.com/docs/api",
      "title": "API Documentation",
      "relevance": 0.85,
      "category": "API Documentation",
      "paragraph": "与此文档最相关的内容段落...",
      "highlightedParagraph": "带有**高亮**查询词的此文档段落...",
      "fullDocument": "此特定文档的完整内容..." // 只有最相关的结果才包含此字段
    },
    {
      "url": "https://example.com/docs/guide",
      "title": "User Guide",
      "relevance": 0.75,
      "category": "Documentation",
      "paragraph": "与此文档最相关的内容段落...",
      "highlightedParagraph": "带有**高亮**查询词的此文档段落..."
      // 较低排名的结果没有 fullDocument 字段
    },
    // 更多结果...
  ],
  "content": "与查询最相关的摘要内容...",
  "fullDocument": "最相关结果的完整文档内容",
  "personalized": true
}
```

关键字段：
- `results`: 带有相关性分数的来源列表
  - 每个结果包括：
    - `url`: 文档 URL
    - `title`: 文档标题
    - `relevance`: 相关性分数
    - `category`: 文档类别
    - `paragraph`: 来自此特定文档的相关段落摘录
    - `highlightedParagraph`: 带有高亮查询词的此文档段落
    - `fullDocument`: 完整的文档内容（仅适用于最相关的结果）
- `content`: 与查询相关的提取内容摘要
- `fullDocument`: 最相关结果的完整文档内容
- `personalized`: 结果是否基于用户 ID 进行了个性化

## Examples | 示例

### Searching Across Database and Corpus | 在数据库和语料库中搜索

```
/mcp mcp-vj-docs vjdoc_search {"query": "authentication", "limit": 5}
```

This will search for "authentication" in both the crawled documents (database) and your custom corpus files.

### Using Natural Language Queries | 使用自然语言查询

For natural language requirements, you can use the metadata tool first:

```
/mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement user authentication in my React application"}
```

Then use the search tool with the refined query:

```
/mcp mcp-vj-docs vjdoc_search {"query": "React authentication implementation", "filters": {"categories": ["Code Snippet", "API Documentation"]}}
```

### Utilizing the fullDocument Field | 利用 fullDocument 字段

When working with LLMs, you can use the `fullDocument` field to provide comprehensive context:

```javascript
// 使用 fullDocument 字段与 LLM 的示例
const searchResults = await searchDocs("如何实现 JWT 认证");
const fullContext = searchResults.fullDocument;

// 现在您可以要求 LLM 基于完整文档生成代码
const generatedCode = await llm.generateCode(
  `基于此文档: ${fullContext}\n\n生成一个 JWT 认证实现`
);
```

```

## Real-World Use Cases | 实际使用场景

### Personal Knowledge Base | 个人知识库

- Save code snippets you frequently use for easy reference | 保存您经常使用的代码片段以便于参考
- Document API endpoints with examples | 使用示例记录 API 端点
- Keep track of error messages and their solutions | 跟踪错误消息及其解决方案
- Store configuration examples for different environments | 存储不同环境的配置示例
- Create a personal knowledge base of technical notes | 创建技术笔记的个人知识库

**Pro Tip | 专业提示:** 
Organize your corpus files with consistent categories to make searching more effective. You can then filter search results by category to find exactly what you need!
使用一致的类别组织您的语料库文件，使搜索更有效。然后，您可以按类别过滤搜索结果，以找到您需要的确切内容！

## PDF Support | PDF 支持

The system now supports adding PDF files to the corpus. PDFs are automatically converted to Markdown format for better searchability. | 系统现在支持将PDF文件添加到语料库。PDF会自动转换为Markdown格式以提高可搜索性。

**Adding a PDF file in Cline | 在Cline中添加PDF文件**:

Simply provide the absolute path to your PDF file:
```bash
cline mcp mcp-vj-docs vjdoc_add_corpus_file --filePath "/absolute/path/to/your/document.pdf" --category "Documentation"
```

**Adding a PDF file in Cursor | 在Cursor中添加PDF文件**:

Simply provide the absolute path to your PDF file:
```
/mcp mcp-vj-docs vjdoc_add_corpus_file {"filePath": "/absolute/path/to/your/document.pdf", "category": "Documentation"}
```

The system extracts text from the PDF and converts it to Markdown format, preserving structure like headings, code blocks, and lists where possible. | 系统从PDF中提取文本并将其转换为Markdown格式，尽可能保留标题、代码块和列表等结构。

## How It Works | 工作原理

1. When you add a corpus file, it's saved to the `VJDOC_TFIDF_FILES_DIR` directory | 当您添加语料库文件时，它会保存到 `VJDOC_TFIDF_FILES_DIR` 目录
2. If you don't specify a filename, one will be generated automatically | 如果您不指定文件名，将自动生成一个
3. The category will be added as a prefix to the filename | 类别将作为前缀添加到文件名中
4. The file is automatically indexed and will appear in search results | 文件会自动索引并出现在搜索结果中
5. You can search for this content later using the `vjdoc_search` tool | 您可以稍后使用 `vjdoc_search` 工具搜索此内容

## Practical Workflow Examples | 实用工作流程示例

Here are some practical workflows combining these tools:
以下是结合这些工具的一些实用工作流程：

1. **Documentation Indexing | 文档索引**
   - Crawl your project documentation: | 爬取您的项目文档：
     ```
     /mcp mcp-vj-docs vjdoc_crawl {"url": "https://your-project-docs.com"}
     ```
   - Add custom code snippets: | 添加自定义代码片段：
     ```
     /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "// Your code here", "category": "Code Snippet"}
     ```
   - Search across all indexed content: | 搜索所有已索引内容：
     ```
     /mcp mcp-vj-docs vjdoc_search {"query": "how to implement feature X"}
     ```

2. **Personal Knowledge Base | 个人知识库**
   - Add error solutions as you encounter them: | 添加您遇到的错误解决方案：
     ```
     /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "Error: Module not found\nSolution: Run npm install", "category": "Error Solution"}
     ```
   - Add API documentation for your projects: | 为您的项目添加 API 文档：
     ```
     /mcp mcp-vj-docs vjdoc_add_corpus_file {"content": "function getData(id) - Retrieves data by ID from the API", "category": "API Documentation"}
     ```
   - Search your knowledge base when needed: | 在需要时搜索您的知识库：
     ```
     /mcp mcp-vj-docs vjdoc_search {"query": "module not found", "filters": {"categories": ["Error Solution"]}}
     ```

## Advanced Workflow with AI Assistants | 与 AI 助手的高级工作流程

When working with AI assistants like Claude or GPT, you can create a more effective workflow:

1. **First, get document metadata** to understand what's available:
   ```
   /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "I need to implement JWT authentication"}
   ```

2. **Then, search for relevant documents**:
   ```
   /mcp mcp-vj-docs vjdoc_search {"query": "JWT authentication implementation", "limit": 3}
   ```

3. **Finally, get the full content** of the most relevant document for comprehensive context:
   ```
   /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
   ```

4. **Ask the AI assistant** to explain or generate code based on the full document:
   ```
   Based on this documentation, please explain how to implement JWT authentication in my Node.js application.
   ```

This workflow ensures the AI has complete context while minimizing token usage by only retrieving full content for the most relevant documents.

当与 Claude 或 GPT 等 AI 助手一起工作时，您可以创建更有效的工作流程：

1. **首先，获取文档元数据**以了解有哪些可用内容：
   ```
   /mcp mcp-vj-docs vjdoc_get_docs_meta {"query": "我需要实现 JWT 认证"}
   ```

2. **然后，搜索相关文档**：
   ```
   /mcp mcp-vj-docs vjdoc_search {"query": "JWT 认证实现", "limit": 3}
   ```

3. **最后，获取最相关文档的完整内容**以获得全面的上下文：
   ```
   /mcp mcp-vj-docs vjdoc_get_document {"url": "https://example.com/docs/auth/jwt"}
   ```

4. **请求 AI 助手**基于完整文档解释或生成代码：
   ```
   根据这份文档，请解释如何在我的 Node.js 应用程序中实现 JWT 认证。
   ```

这个工作流程确保 AI 拥有完整的上下文，同时通过仅检索最相关文档的完整内容来最小化令牌使用。
