# Process Management Improvements

This document describes the process management improvements implemented in Conduit.

## Overview

The updated implementation provides:
- Robust PID file handling with file locking
- Graceful shutdown with connection draining
- Health check endpoints for monitoring
- Integration with PM2 and systemd

## Key Components

### 1. PidManager (`src/utils/pidManager.ts`)
- File locking prevents race conditions
- Automatic cleanup of stale PID files
- Graceful stop with configurable timeout
- Process existence verification

### 2. GracefulShutdown (`src/utils/gracefulShutdown.ts`)
- Handles SIGTERM/SIGINT signals
- Tracks active connections
- Waits for requests to complete
- Configurable shutdown timeout (default: 30s)

### 3. Enhanced Server (`src/serverWrapper.ts`)
- Health endpoints: `/health`, `/ready`, `/alive`
- Integration with graceful shutdown
- Backward compatible with existing code

## Configuration

Before starting, create a config file at `~/.conduit/config.json`:

```json
{
  "Providers": [
    {
      "name": "your-provider",
      "apiKey": "YOUR_API_KEY",
      "baseUrl": "https://api.provider.com/v1",
      "model": "model-name"
    }
  ]
}
```

## Usage

### Basic Commands

```bash
# Start the service
conduit start
# or
node dist/cli.js start

# Check status
conduit status

# Stop gracefully
conduit stop
```

### Health Checks

```bash
# Basic health
curl http://localhost:3456/health

# Readiness (returns 503 during shutdown)
curl http://localhost:3456/ready

# Liveness check
curl http://localhost:3456/alive
```

## Production Deployment

### Option 1: PM2 (Recommended)

```bash
# Install PM2
npm install -g pm2

# Start with PM2
pm2 start ecosystem.config.js

# Save configuration
pm2 save

# Enable startup
pm2 startup
```

### Option 2: Systemd (Linux)

```bash
# Install as systemd service
sudo ./scripts/install-systemd.sh

# Manage with systemctl
sudo systemctl start claude-code-router
sudo systemctl status claude-code-router
sudo systemctl stop claude-code-router
```

## Monitoring

The service provides metrics suitable for:
- Prometheus monitoring
- Kubernetes health probes
- Load balancer health checks
- Custom monitoring solutions

## Troubleshooting

### Service won't start
- Check if port 3456 is already in use
- Verify config file exists and is valid
- Check PID file: `~/.claude-code-router/.claude-code-router.pid`

### Graceful shutdown issues
- Default timeout is 30 seconds
- Check for long-running requests
- Monitor active connections during shutdown

### Lock file issues
- Lock file: `~/.claude-code-router/.claude-code-router.pid.lock`
- Automatically cleaned up after 5 seconds if stale
- Manual cleanup: `rm ~/.claude-code-router/.claude-code-router.pid.lock`