Groq OCR 🔬

<div align="center">
  <div>
    <h1 align="center">Groq OCR 🔬</h1>
  </div>
  <p>An npm library and CLI to run OCR with Groq provided models.</p>
  <a href="https://www.npmjs.com/package/groq-ocr"><img src="https://img.shields.io/npm/v/groq-ocr" alt="Current version"></a>
<a href="https://groq.com" target="_blank" rel="noopener noreferrer">
  <img
    src="https://groq.com/wp-content/uploads/2024/03/PBG-mark1-color.svg"
    alt="Powered by Groq for fast inference."
    width="200"
    height="200"
  />
</a>
</div>

## Table of Contents
- [Disclaimer](#disclaimer)
- [Installation](#installation)
- [Usage](#usage)
  - [Use as NPM package](#use-as-npm-package)
  - [ocr options](#ocr-options)
  - [Use as CLI](#use-as-cli)
  - [CLI Examples](#cli-examples)
  - [CLI Options](#cli-options)
- [How it works](#how-it-works)
- [Models](#models)
- [Roadmap](#roadmap)
- [Credit](#credit)

---

## Disclaimer

_This project is still in development‼️_

_Multi-page PDF support is experimental and work in progress._

_PDF support relies on [pdftopic](https://github.com/Ilyes-El-Majouti/pdftopic) library which requires node>=12 and imagemagick._

_JSON mode might fail with `json_validate_failed` error_

## Installation

`npm i groq-ocr` to use as an NPM package.

`npm i -g groq-ocr` to use as a CLI.

## Usage

### Use as NPM package:

```javascript
import { ocr, GroqVisionModel } from "groq-ocr";
const result = await ocr({
  filePath: "./filepath.jpg", // Allowed formats: jpg, jpeg, png, pdf.
  apiKey: process.env.GROQ_API_KEY, // Get your API key from https://console.groq.com/
  model: GroqVisionModel.LLAMA_32_90B, // available models: LLAMA_32_11B, LLAMA_32_90B. Default: LLAMA_32_11B
  jsonMode: false, // Default: false. Set to true to get JSON output.
  additionalInstructions: "Additional instructions to be included in the prompt.", // Use to give custom instructions to the model.
});
```

### ocr options:

- **filePath** (required): Path to image/PDF file or URL
  - Supported formats: `.jpg`, `.jpeg`, `.png`, `.pdf`
- **apiKey** (optional): Groq API key
  - Defaults to `GROQ_API_KEY` environment variable
- **model** (optional): Vision model to use
  - `GroqVisionModel.LLAMA_32_11B` (default) - Llama 3.2 11B Vision Preview
  - `GroqVisionModel.LLAMA_32_90B` - Llama 3.2 90B Vision Preview
- **jsonMode** (optional): Return structured JSON instead of markdown
  - Defaults to `false`
- **additionalInstructions** (optional): Additional instructions to be included in the prompt.
  - Defaults to "" - use to give custom instructions to the model.

### Use as CLI:

Either set your Groq API key as environment variable:

```bash
export GROQ_API_KEY=your-api-key
```

Or provide it as CLI option with `-k` flag when running commands.

### CLI Examples

```bash
# Basic usage
groq-ocr -f image.jpg

# Output as JSON
groq-ocr -f scan.pdf -j

# Save to file
groq-ocr -f receipt.png -o result.txt

# Use specific model and API key
groq-ocr -f document.jpg -m llama-3.2-90b-vision-preview -k your-api-key
```

### CLI Options

- `-f, --file <path>` (required): Path to input image/PDF file
- `-k, --api-key <key>`: Groq API key (defaults to `GROQ_API_KEY` env var)
- `-m, --model <model>`: Vision model to use:
  - `llama-3.2-11b-vision-preview` (default)
  - `llama-3.2-90b-vision-preview`
- `-j, --json`: Output in JSON format instead of markdown
- `-o, --output <path>`: Write result to file instead of console
- `-V, --version`: Display version number
- `-h, --help`: Display help information


## How it works

This library and CLI uses multimodal models with vision capabilities provided by [Groq](https://groq.com/) to run OCR on images and PDFs and return markdown or JSON.

PDFs are converted to images using [pdftopic](https://github.com/Ilyes-El-Majouti/pdftopic).

## Models

The plan is to support all models provided by Groq with vision capabilities.
[Groq vision models](https://console.groq.com/docs/vision)

Currently supported models:

```typescript
enum GroqVisionModel {
  LLAMA_32_11B = "llama-3.2-11b-vision-preview",
  LLAMA_32_90B = "llama-3.2-90b-vision-preview",
}
```

## Roadmap

- [x] Add support for local images OCR
- [x] Add support for remote images OCR
- [x] Add support for single page PDFs
- [x] Add support for JSON output in addition to markdown
- [x] Add CLI 
- [x] extend prompt with custom instructions
- [ ] Add support for multi-page PDFs OCR (Available but experimental)


## Credit

This project was highly inspired by [llama-ocr](https://github.com/Nutlope/llama-ocr/tree/main).

[![Formatted with Biome](https://img.shields.io/badge/Formatted_with-Biome-60a5fa?style=flat&logo=biome)](https://biomejs.dev/)