# string-similarity-plus

A robust string similarity calculator that handles various special characters and Unicode variations.

## Features

- Calculate similarity percentage between two strings
- Normalize special characters (quotes, dashes, spaces, etc.)
- Find similar strings in an array based on a threshold
- Works with multilingual text including CJK characters

## Installation
```bash
npm install string-similarity-plus
```

## Usage
```javascript
const { calculateStringSimilarity, findSimilarStrings } = require('string-similarity-plus');
// Calculate similarity between two strings
const str1 = "<h2>2. 無限供應肉類火鍋放題 - 牛摩</h2>";
const str2 = "<h2>2. 無限供應肉類火鍋放題 – 牛摩</h2>";
const similarity = calculateStringSimilarity(str1, str2);
console.log(Similarity: ${similarity.toFixed(2)}%); // Should show very high similarity
// Find similar strings in an array
const content = [
"<h2>2. 無限供應肉類火鍋放題 – 牛摩</h2>",
"<h2>Some other content</h2>",
"<h2>無限供應火鍋放題牛摩</h2>",
];
const searchString = "<h2>2. 無限供應肉類火鍋放題 - 牛摩</h2>";
const SIMILARITY_THRESHOLD = 80; // Set your desired similarity threshold
const matches = findSimilarStrings(searchString, content, SIMILARITY_THRESHOLD);
console.log(matches); // Will show matching items
```

## API

### calculateStringSimilarity(str1, str2)

Calculates the similarity percentage between two strings.

- **Parameters**:
  - `str1` (string): First string to compare
  - `str2` (string): Second string to compare
- **Returns**: Number between 0-100 representing similarity percentage

### findSimilarStrings(searchString, contentArray, threshold)

Finds strings in an array that are similar to the search string.

- **Parameters**:
  - `searchString` (string): String to search for
  - `contentArray` (array): Array of strings to search in
  - `threshold` (number, optional): Similarity threshold percentage (default: 80)
- **Returns**: Array of matching strings

## Special Character Handling

This library normalizes various special characters including:

- Different types of quotes and apostrophes
- Various dashes and hyphens
- Different space characters
- Various brackets and parentheses
- Different types of dots, ellipses, and slashes
- And more...

## License

MIT