# RuVector Security Remediation Guide ## Overview This guide provides specific code changes to remediate the 6 critical/high security findings identified in the RuVector init system audit. --- ## REMEDIATION #1: Implement Backup Before Reset **File:** `.claude/skills/cfn-local-ruvector-accelerator/src/cli/reset.rs` **Current Risk:** Complete data loss without backup **Severity:** CRITICAL ### Current Code ```rust pub fn execute(&self) -> Result<()> { let ruvector_dir = self.project_dir.join(".ruvector"); if !self.confirm { eprintln!("⚠️ This will delete all indexed data!"); eprintln!("To proceed, run with --confirm"); return Ok(()); } if ruvector_dir.exists() { fs::remove_dir_all(&ruvector_dir)?; // UNSAFE: No backup! info!("Reset complete: removed .ruvector directory"); } else { info!("No RuVector data found to reset"); } Ok(()) } ``` ### Recommended Fix ```rust use chrono::Local; use std::path::PathBuf; pub fn execute(&self) -> Result<()> { let ruvector_dir = self.project_dir.join(".ruvector"); if !self.confirm { eprintln!("⚠️ This will delete all indexed data!"); eprintln!("To proceed, run with --confirm"); return Ok(()); } if ruvector_dir.exists() { // STEP 1: Create timestamped backup FIRST let backup_dir = self.create_timestamped_backup(&ruvector_dir)?; info!("Created backup at: {}", backup_dir.display()); // STEP 2: Then proceed with deletion fs::remove_dir_all(&ruvector_dir)?; info!("Reset complete: removed .ruvector directory"); info!("Backup preserved at: {}", backup_dir.display()); // STEP 3: Log the operation self.log_deletion_event(&backup_dir)?; } else { info!("No RuVector data found to reset"); } Ok(()) } fn create_timestamped_backup(&self, source_dir: &Path) -> Result { use std::fs::create_dir_all; let timestamp = Local::now().format("%Y%m%d_%H%M%S").to_string(); let backup_dir = self.project_dir.join(".ruvector_backups") .join(format!("backup_{}", timestamp)); create_dir_all(&backup_dir)?; // Copy entire directory copy_dir_recursive(source_dir, &backup_dir)?; debug!("Backup created: {}", backup_dir.display()); Ok(backup_dir) } fn log_deletion_event(&self, backup_location: &Path) -> Result<()> { let log_entry = format!( "[{}] Reset command executed. Backup: {}", Local::now().to_rfc3339(), backup_location.display() ); // Write to audit log let audit_log = self.project_dir.join(".ruvector_audit.log"); std::fs::OpenOptions::new() .create(true) .append(true) .open(&audit_log)? .write_all(format!("{}\n", log_entry).as_bytes())?; Ok(()) } fn copy_dir_recursive(src: &Path, dst: &Path) -> Result<()> { fs::create_dir_all(dst)?; for entry in fs::read_dir(src)? { let entry = entry?; let ty = entry.file_type()?; let path = entry.path(); let file_name = entry.file_name(); let new_path = dst.join(file_name); if ty.is_dir() { copy_dir_recursive(&path, &new_path)?; } else { fs::copy(&path, &new_path)?; } } Ok(()) } ``` ### Verification ```rust #[test] fn test_reset_creates_backup_before_deletion() { let temp_dir = tempdir().unwrap(); let test_dir = temp_dir.path(); // Create test data let ruvector = test_dir.join(".ruvector"); fs::create_dir(&ruvector).unwrap(); fs::write(ruvector.join("test.txt"), "important data").unwrap(); // Execute reset with confirm let cmd = ResetCommand::new(test_dir, true); cmd.execute().unwrap(); // Verify backup exists let backups = test_dir.join(".ruvector_backups"); assert!(backups.exists()); assert!(backups.join("backup_*").exists()); // Verify original was deleted assert!(!ruvector.exists()); } ``` --- ## REMEDIATION #2: Change CASCADE to RESTRICT **File:** `.claude/skills/cfn-local-ruvector-accelerator/src/schema_v2.rs` **Current Risk:** Uncontrolled cascading deletes **Severity:** CRITICAL ### Current Code (Lines 232-283) ```sql CREATE TABLE IF NOT EXISTS entities ( ... parent_id INTEGER, ... FOREIGN KEY (parent_id) REFERENCES entities(id) ON DELETE CASCADE -- UNSAFE! ); CREATE TABLE IF NOT EXISTS refs ( ... FOREIGN KEY (source_entity_id) REFERENCES entities(id) ON DELETE CASCADE -- UNSAFE! ); CREATE TABLE IF NOT EXISTS entity_embeddings ( entity_id INTEGER PRIMARY KEY, ... FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE CASCADE -- UNSAFE! ); ``` ### Recommended Fix ```sql CREATE TABLE IF NOT EXISTS entities ( id INTEGER PRIMARY KEY AUTOINCREMENT, kind TEXT NOT NULL, name TEXT NOT NULL, ... parent_id INTEGER, ... -- Change CASCADE to RESTRICT to prevent silent cascades FOREIGN KEY (parent_id) REFERENCES entities(id) ON DELETE RESTRICT ); -- Create audit trigger to log deletions CREATE TRIGGER IF NOT EXISTS log_entity_deletion BEFORE DELETE ON entities FOR EACH ROW BEGIN INSERT INTO deletion_audit_log ( table_name, entity_id, entity_kind, deleted_at, deletion_method ) VALUES ( 'entities', OLD.id, OLD.kind, strftime('%s', 'now'), 'direct_delete' ); END; CREATE TABLE IF NOT EXISTS refs ( id INTEGER PRIMARY KEY AUTOINCREMENT, source_entity_id INTEGER NOT NULL, ... -- RESTRICT prevents cascading deletes FOREIGN KEY (source_entity_id) REFERENCES entities(id) ON DELETE RESTRICT ); CREATE TABLE IF NOT EXISTS entity_embeddings ( entity_id INTEGER PRIMARY KEY, ... -- RESTRICT prevents orphaned embeddings FOREIGN KEY (entity_id) REFERENCES entities(id) ON DELETE RESTRICT ); -- Add audit log table CREATE TABLE IF NOT EXISTS deletion_audit_log ( id INTEGER PRIMARY KEY AUTOINCREMENT, table_name TEXT NOT NULL, entity_id INTEGER, entity_kind TEXT, deleted_at INTEGER NOT NULL, deletion_method TEXT, created_at INTEGER DEFAULT (strftime('%s', 'now')) ); CREATE INDEX IF NOT EXISTS idx_deletion_audit_timestamp ON deletion_audit_log(deleted_at); ``` ### Handling RESTRICT Violations ```rust pub fn safe_delete_entity(&self, entity_id: i64) -> Result<()> { // First check for dependent records let ref_count: i64 = self.conn.query_row( "SELECT COUNT(*) FROM refs WHERE source_entity_id = ?", [entity_id], |row| row.get(0) )?; if ref_count > 0 { return Err(anyhow!( "Cannot delete entity: {} references depend on this entity. \ Use force_delete_with_cascade() to remove all dependent data.", ref_count )); } // Safe to delete self.conn.execute("DELETE FROM entities WHERE id = ?", [entity_id])?; Ok(()) } pub fn force_delete_with_cascade(&self, entity_id: i64) -> Result<()> { // Only called with explicit user approval let mut tx = self.conn.transaction()?; // Delete in dependency order (children first) tx.execute("DELETE FROM entity_embeddings WHERE entity_id = ?", [entity_id])?; tx.execute("DELETE FROM type_usage WHERE entity_id = ?", [entity_id])?; tx.execute("DELETE FROM refs WHERE source_entity_id = ?", [entity_id])?; tx.execute("DELETE FROM entities WHERE id = ?", [entity_id])?; // Log the cascade tx.execute( "INSERT INTO deletion_audit_log (table_name, entity_id, deletion_method) \ VALUES ('entities', ?, 'force_cascade_delete')", [entity_id] )?; tx.commit()?; Ok(()) } ``` --- ## REMEDIATION #3: Add Preview Mode to Cleanup **File:** `.claude/skills/cfn-local-ruvector-accelerator/src/cli/cleanup.rs` **Current Risk:** Deletion without visibility into impact **Severity:** HIGH ### Current Code ```rust fn remove_old_embeddings(&self, store: &SqliteStore, days: u32) -> Result<()> { info!("Removing embeddings older than {} days", days); let cutoff = SystemTime::now() .duration_since(UNIX_EPOCH)? .as_secs() - (days as u64 * 86400); let removed = if self.dry_run { store.count_old_embeddings(cutoff)? } else { store.remove_old_embeddings(cutoff)? // UNSAFE: No preview! }; if self.dry_run { info!("Would remove {} old embeddings", removed); } else { info!("Removed {} old embeddings", removed); } Ok(()) } ``` ### Recommended Fix ```rust pub struct CleanupCommand { // ... existing fields ... preview: bool, // NEW: Add preview flag backup_before_delete: bool, // NEW: Backup deleted records } impl CleanupCommand { pub fn execute(&self) -> Result<()> { info!("Starting cleanup process"); if self.dry_run { info!("Running in dry-run mode - no changes will be made"); } // STEP 1: Preview what will be deleted let preview = self.preview_cleanup()?; println!("\n{}", self.format_cleanup_preview(&preview)); if self.preview { info!("Preview mode: showing what would be deleted"); return Ok(()); } // STEP 2: Get confirmation unless forced if !self.force && !self.dry_run { eprintln!("\n⚠️ This will permanently delete records above."); eprintln!("Run with --force to proceed, or --preview to see without changes."); return Ok(()); } // STEP 3: Backup deleted records if requested if self.backup_before_delete { self.export_cleanup_records(&preview)?; } // STEP 4: Execute cleanup self.execute_cleanup(&preview)?; Ok(()) } fn preview_cleanup(&self) -> Result { let store = SqliteStore::new(&self.project_dir.join(".ruvector").join("index.db"))?; let mut preview = CleanupPreview::default(); if let Some(days) = self.older_than { let cutoff = SystemTime::now() .duration_since(UNIX_EPOCH)? .as_secs() - (days as u64 * 86400); preview.old_embeddings_count = store.count_old_embeddings(cutoff)?; preview.oldest_embedding_date = store.find_oldest_embedding_before(cutoff)?; } if self.remove_orphans { preview.orphaned_embeddings_count = store.count_orphaned_embeddings()?; } Ok(preview) } fn format_cleanup_preview(&self, preview: &CleanupPreview) -> String { let mut output = String::from("\n=== Cleanup Preview ===\n"); if let Some(count) = preview.old_embeddings_count { output.push_str(&format!( " Old embeddings (>{} days): {}\n", self.older_than.unwrap_or(30), count )); if let Some(date) = &preview.oldest_embedding_date { output.push_str(&format!(" Oldest embedding from: {}\n", date)); } } if let Some(count) = preview.orphaned_embeddings_count { output.push_str(&format!(" Orphaned embeddings: {}\n", count)); } output } fn export_cleanup_records(&self, preview: &CleanupPreview) -> Result<()> { let timestamp = Local::now().format("%Y%m%d_%H%M%S").to_string(); let export_path = self.project_dir .join(".ruvector_backups") .join(format!("cleanup_export_{}.json", timestamp)); fs::create_dir_all(export_path.parent().unwrap())?; // Export records before deletion let records = self.collect_records_for_deletion()?; let json = serde_json::to_string_pretty(&records)?; fs::write(&export_path, json)?; info!("Exported {} records to: {}", records.len(), export_path.display()); Ok(()) } } #[derive(Debug, Default)] struct CleanupPreview { old_embeddings_count: Option, oldest_embedding_date: Option, orphaned_embeddings_count: Option, } ``` --- ## REMEDIATION #4: Preserve Migration Backups **File:** `.claude/skills/cfn-local-ruvector-accelerator/src/migration.rs` **Current Risk:** Immediate loss of backup after migration **Severity:** CRITICAL ### Current Code ```rust fn cleanup_after_migration(&self, old_version: u32) -> Result<()> { // ... validation code ... // Drop backup tables after successful migration self.conn.execute_batch( r#" DROP TABLE IF EXISTS embeddings_v1_backup; DROP TABLE IF EXISTS files_v1_backup; "# )?; self.conn.execute("VACUUM", [])?; Ok(()) } ``` ### Recommended Fix ```rust const BACKUP_RETENTION_DAYS: u32 = 7; fn cleanup_after_migration(&self, old_version: u32) -> Result<()> { info!("Cleaning up after migration from version {}", old_version); // Verify migration was successful let new_entities_count: i64 = self.conn.query_row( "SELECT COUNT(*) FROM entities", [], |row| row.get(0) )?; if new_entities_count == 0 && old_version > 0 { warn!("No entities found after migration, keeping backup tables"); return Ok(()); } // Create recovery record BEFORE dropping backups self.create_backup_recovery_record(old_version)?; // Check backup retention policy let should_keep_backup = self.should_keep_backup(old_version)?; if should_keep_backup { info!("Keeping backup tables for {} days (recovery period)", BACKUP_RETENTION_DAYS); return Ok(()); } // Safe to drop backups - but create export first info!("Exporting backup data before cleanup"); self.export_backup_tables()?; // ONLY NOW drop backup tables self.conn.execute_batch( r#" DROP TABLE IF EXISTS embeddings_v1_backup; DROP TABLE IF EXISTS files_v1_backup; "# )?; info!("Backup tables dropped successfully"); // Run VACUUM to reclaim space debug!("Running VACUUM to reclaim database space"); self.conn.execute("VACUUM", [])?; Ok(()) } fn should_keep_backup(&self, old_version: u32) -> Result { // Check when migration was done let migration_time: i64 = self.conn.query_row( "SELECT applied_at FROM schema_version WHERE version = ? LIMIT 1", [old_version], |row| row.get(0) )?; let now = SystemTime::now() .duration_since(UNIX_EPOCH)? .as_secs() as i64; let age_days = (now - migration_time) / (24 * 3600); Ok(age_days < BACKUP_RETENTION_DAYS as i64) } fn create_backup_recovery_record(&self, old_version: u32) -> Result<()> { let now = SystemTime::now() .duration_since(UNIX_EPOCH)? .as_secs(); self.conn.execute( "INSERT INTO migration_recovery ( source_version, target_version, backup_created_at, backup_expires_at, status ) VALUES (?, ?, ?, ?, 'active')", rusqlite::params![ old_version, 2, now, now + (BACKUP_RETENTION_DAYS as u64 * 24 * 3600) ] )?; Ok(()) } fn export_backup_tables(&self) -> Result<()> { use chrono::Local; let timestamp = Local::now().format("%Y%m%d_%H%M%S").to_string(); let backup_dir = self.db_path .parent() .unwrap() .join(format!("migration_backup_{}", timestamp)); fs::create_dir_all(&backup_dir)?; // Export v1 embeddings let mut stmt = self.conn.prepare( "SELECT pattern, embedding, metadata FROM embeddings_v1_backup" )?; let embeddings_file = fs::File::create(backup_dir.join("embeddings.json"))?; let writer = io::BufWriter::new(embeddings_file); // ... write JSON records ... info!("Backup exported to: {}", backup_dir.display()); Ok(()) } ``` --- ## REMEDIATION #5: Protect index_all.sh **File:** `.claude/skills/cfn-local-ruvector-accelerator/index_all.sh` **Current Risk:** Unconditional index deletion **Severity:** CRITICAL ### Current Code ```bash #!/bin/bash # Index all files in the project echo "Starting comprehensive indexing of all files..." cd .claude/skills/cfn-local-ruvector-accelerator # Clear existing index rm -rf index/ # UNSAFE: No confirmation, no backup! ``` ### Recommended Fix ```bash #!/bin/bash # Index all files in the project set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" INDEX_DIR="$SCRIPT_DIR/index" PRESERVE_INDEX="${PRESERVE_INDEX:-false}" BACKUP_INDEX="${BACKUP_INDEX:-true}" log_info() { echo "[INFO] $1" } log_error() { echo "[ERROR] $1" >&2 } # Function to backup index backup_index() { if [[ ! -d "$INDEX_DIR" ]]; then return 0 fi local timestamp=$(date +%Y%m%d_%H%M%S) local backup_dir="${INDEX_DIR}_backup_${timestamp}" log_info "Creating backup: $backup_dir" cp -r "$INDEX_DIR" "$backup_dir" log_info "Backup created successfully" } # Function to clear index clear_index() { if [[ ! -d "$INDEX_DIR" ]]; then log_info "No existing index to clear" return 0 fi if [[ "$BACKUP_INDEX" == "true" ]]; then backup_index fi log_info "Clearing index directory" rm -rf "$INDEX_DIR" # Log the action { echo "$(date '+%Y-%m-%d %H:%M:%S') - Index cleared" } >> "$SCRIPT_DIR/.index_audit.log" } # Show usage show_usage() { cat << 'EOF' Usage: ./index_all.sh [OPTIONS] Options: --preserve-index Keep existing index (incremental update) --no-backup Don't backup before clearing --force Force re-indexing all files Environment: PRESERVE_INDEX=true ./index_all.sh BACKUP_INDEX=false ./index_all.sh EOF } # Parse arguments while [[ $# -gt 0 ]]; do case $1 in --preserve-index) PRESERVE_INDEX=true shift ;; --no-backup) BACKUP_INDEX=false shift ;; --force) # Force reindexing (default behavior after clear) shift ;; --help) show_usage exit 0 ;; *) log_error "Unknown option: $1" show_usage exit 1 ;; esac done log_info "Starting comprehensive indexing of all files..." cd "$SCRIPT_DIR" # Clear or preserve index if [[ "$PRESERVE_INDEX" == "true" ]]; then log_info "Preserving existing index (incremental mode)" else log_info "Index will be cleared and rebuilt" clear_index fi # ... rest of indexing code ... ``` --- ## REMEDIATION #6: Protect Test Script Cleanup **File:** `.claude/skills/cfn-local-ruvector-accelerator/test-local-ruvector.sh` **Current Risk:** Unprotected directory deletion via variables **Severity:** MEDIUM ### Current Code ```bash # Clean up previous test rm -rf "$STORAGE_PATH" "$TEST_DIR" ``` ### Recommended Fix ```bash #!/bin/bash # test-local-ruvector.sh - Test Local RuVector implementation set -euo pipefail # Use mktemp for safer temporary directories SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" STORAGE_PATH="$(mktemp -d -t ruvector-test-storage-XXXXXX)" TEST_DIR="$(mktemp -d -t ruvector-test-project-XXXXXX)" # Cleanup trap - only runs on exit cleanup() { local exit_code=$? if [[ -d "$STORAGE_PATH" ]]; then echo "Cleaning up test storage: $STORAGE_PATH" rm -rf "$STORAGE_PATH" fi if [[ -d "$TEST_DIR" ]]; then echo "Cleaning up test directory: $TEST_DIR" rm -rf "$TEST_DIR" fi exit $exit_code } trap cleanup EXIT # Verify paths are safe (sanity checks) if [[ ! "$STORAGE_PATH" =~ ^/tmp/ruvector-test-storage ]]; then echo "ERROR: Invalid storage path: $STORAGE_PATH" >&2 exit 1 fi if [[ ! "$TEST_DIR" =~ ^/tmp/ruvector-test-project ]]; then echo "ERROR: Invalid test directory: $TEST_DIR" >&2 exit 1 fi # Verify paths don't exist or are empty if [[ -d "$STORAGE_PATH" && -n $(find "$STORAGE_PATH" -type f 2>/dev/null | head -1) ]]; then echo "ERROR: Storage path not empty: $STORAGE_PATH" >&2 exit 1 fi echo "🧪 Testing Local RuVector Accelerator..." echo "Storage: $STORAGE_PATH" echo "Test Dir: $TEST_DIR" mkdir -p "$TEST_DIR" # ... rest of test code ... # Note: cleanup happens automatically via trap on exit ``` --- ## Implementation Checklist - [ ] Remediation #1: Reset backup mechanism implemented - [ ] Remediation #2: CASCADE changed to RESTRICT - [ ] Remediation #3: Cleanup preview mode added - [ ] Remediation #4: Migration backups retained 7 days - [ ] Remediation #5: index_all.sh protected - [ ] Remediation #6: Test script cleanup protected - [ ] Unit tests added for each remediation - [ ] Integration tests verify fixes - [ ] Backward compatibility verified - [ ] Performance impact assessed - [ ] Deployment plan documented - [ ] Team review completed - [ ] Production deployment approved --- ## Testing Strategy Each remediation should include: 1. Unit test for the specific fix 2. Integration test with real data 3. Edge case testing (empty dirs, permissions, etc.) 4. Concurrent operation testing 5. Recovery/rollback testing --- ## Deployment Order 1. First: Implement backups (Remediation #1, #4) 2. Second: Add constraints (Remediation #2) 3. Third: Enhance UX (Remediation #3) 4. Fourth: Script fixes (Remediation #5, #6) 5. Finally: Full integration test and production deployment --- ## Success Criteria - All destructive operations have backups - No silent cascading deletes - All dangerous operations require explicit confirmation - Audit trail exists for all deletions - Recovery mechanism available for 7+ days - Tests pass 100% - No performance regression - All findings marked REMEDIATED