Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Model For Structured Doc AI At Scale

Mistral AI has launched Mistral OCR 3, its latest optical character recognition service that powers the company’s Doc AI stack. The model, named as mistral-ocr-2512, is constructed to extract interleaved textual content material and photos from PDFs and completely different paperwork whereas preserving development, and it does this at an aggressive price of $2 per 1,000 pages with a 50% low value when used through the Batch API.

What Mistral OCR 3 is Optimized for?

Mistral OCR 3 targets typical enterprise doc workloads. The model is tuned for varieties, scanned paperwork, superior tables, and handwriting. It’s evaluated on inside benchmarks drawn from precise enterprise use situations, the place it achieves a 74% common win worth over Mistral OCR 2 all through these doc lessons using a fuzzy match metric in the direction of ground actuality.

The model outputs markdown that preserves doc construction, and when desk formatting is enabled, it enriches the output with HTML based desk representations. This combination presents downstream applications every the content material materials and the structural information that’s wished for retrieval pipelines, analytics, and agent workflows.

Place in Mistral Doc AI

OCR 3 sits inside Mistral Doc AI, the company’s doc processing performance that mixes OCR with structured data extraction and Doc QnA.

It now powers the Doc AI Playground in Mistral AI Studio. On this interface, clients add PDFs or pictures and get once more each clear textual content material or structured JSON with out writing code. The an identical underlying OCR pipeline is accessible by means of most of the people API, which allows teams to maneuver from interactive exploration to manufacturing workloads with out altering the core model.

Inputs, Outputs, And Building

The OCR processor accepts numerous doc codecs through a single API. The doc self-discipline can stage to:

document_url for PDFs, pptx, docx and additional
image_url for image types paying homage to png, jpeg or avif
Uploaded or base64 encoded PDFs or pictures through the an identical schema

That’s documented inside the OCR Processor a part of Mistral’s Doc AI docs.

The response is a JSON object with a pages array. Each internet web page incorporates an index, a markdown string, a list of pictures, a list of tables when table_format="html" is used, detected hyperlinks, elective header and footer fields when header or footer extraction is enabled, and a dimensions object with internet web page dimension. There’s moreover a document_annotation self-discipline for structured annotations and a usage_info block for accounting information.

When pictures and HTML tables are extracted, the markdown consists of placeholders paying homage to ![img-0.jpeg](img-0.jpeg) and [tbl-3.html](tbl-3.html). These placeholders are mapped once more to express content material materials using the pictures and tables arrays inside the response, which simplifies downstream reconstruction.

Upgrades Over Mistral OCR 2

Mistral OCR 3 introduces numerous concrete upgrades relative to OCR 2. Most of the people launch notes emphasize 4 major areas.

Handwriting Mistral OCR 3 further exactly interprets cursive, blended content material materials annotations, and handwritten textual content material positioned on excessive of printed templates.
Sorts It improves detection of packing containers, labels, and handwritten entries in dense layouts paying homage to invoices, receipts, compliance varieties, and authorities paperwork.
Scanned and complex paperwork The model is further sturdy to compression artifacts, skew, distortion, low DPI, and background noise in scanned pages.
Sophisticated tables It reconstructs desk buildings with headers, merged cells, multi row blocks, and column hierarchies, and it’ll presumably return HTML tables with right colspan and rowspan tags so that construction is preserved.

https://mistral.ai/data/mistral-ocr-3

Pricing, Batch Inference, And Annotations

The OCR 3 model card lists pricing at $2 per 1,000 pages for conventional OCR and $3 per 1,000 annotated pages when structured annotations are used.

Mistral moreover exposes OCR 3 through its Batch Inference API /v1/batch, which is documented beneath the batching a part of the platform. Batch processing halves the environment friendly OCR price to $1 per 1,000 pages by making use of a 50% low value for jobs that run through the batch pipeline.

The model integrates with two important choices on the an identical endpoint, Annotations – Structured and BBox Extraction. These allow builders to attach schema pushed labels to areas of a doc and get bounding packing containers for textual content material and completely different components, which is useful when mapping content material materials into downstream applications or UI overlays.

Key Takeaways

Model and place: Mistral OCR 3, named as mistral-ocr-2512, is the model new OCR service that powers Mistral’s Doc AI stack for internet web page based doc understanding.
Accuracy good factors: On inside benchmarks masking varieties, scanned paperwork, superior tables, and handwriting, OCR 3 achieves a 74% common win worth over Mistral OCR 2, and Mistral positions it as state-of-the-art in the direction of every standard and AI native OCR applications.
Structured outputs for RAG: The service extracts interleaved textual content material and embedded pictures and returns markdown enriched with HTML reconstructed tables, preserving construction and desk development so outputs can feed instantly into RAG, brokers, and search pipelines with minimal extra parsing.
API and doc codecs: Builders entry OCR 3 by means of the /v1/ocr endpoint or SDK, passing PDFs as document_url and photos paying homage to png or jpeg as image_url, and would possibly enable decisions like HTML desk output, header or footer extraction, and base64 pictures inside the response.
Pricing and batch processing: OCR 3 is priced at 2 {{dollars}} per 1,000 pages and three {{dollars}} per 1,000 annotated pages, and when used through the Batch API the environment friendly price for conventional OCR drops to 1 dollar per 1,000 pages for giant scale processing.

Strive the TECHNICAL DETAILS. Be completely satisfied to try our GitHub Net web page for Tutorials, Codes and Notebooks. Moreover, be at liberty to watch us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Publication.

Michal Sutter is an data science expert with a Grasp of Science in Information Science from the School of Padova. With a powerful foundation in statistical analysis, machine learning, and data engineering, Michal excels at reworking superior datasets into actionable insights.

Elevate your perspective with NextTech Data, the place innovation meets notion.
Uncover the latest breakthroughs, get distinctive updates, and be part of with a worldwide group of future-focused thinkers.
Unlock tomorrow’s traits within the current day: be taught further, subscribe to our e-newsletter, and develop to be part of the NextTech neighborhood at NextTech-news.com

Keep forward of the curve with NextBusiness 24. Discover extra tales, subscribe to our publication, and be part of our rising group at nextbusiness24.com

What's Hot

The place Did China’s ‘Wolf Warrior Diplomacy’ Come From (and The place Did It Go)? – The Diplomat

Verkehr: Sechs Verletzte bei Unfall auf der B216

3 ‘Robust Purchase’ Dividend Kings That Wall Avenue Loves Most in 2026

Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Model For Structured Doc AI At Scale

OpenAI Introduces Codex Security In Evaluation Preview For Context-Aware Vulnerability Detection, Validation, And Patch Expertise All through Codebases

UAE Factors Emergency Alert In Dubai Over Potential Missile Menace

Li Auto Would possibly Launch Its First Two-Wheeled Robotic This Yr

The place Did China’s ‘Wolf Warrior Diplomacy’ Come From (and The place Did It Go)? – The Diplomat

Verkehr: Sechs Verletzte bei Unfall auf der B216

3 ‘Robust Purchase’ Dividend Kings That Wall Avenue Loves Most in 2026

Ero Copper Corp. (ERO:CA) This fall 2025 Earnings Name Transcript

The place Did China’s ‘Wolf Warrior Diplomacy’ Come From (and The place Did It Go)? – The Diplomat

Verkehr: Sechs Verletzte bei Unfall auf der B216

3 ‘Robust Purchase’ Dividend Kings That Wall Avenue Loves Most in 2026

Topics

-

Regional Insights

What's Hot

Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Model For Structured Doc AI At Scale

What Mistral OCR 3 is Optimized for?

Place in Mistral Doc AI

Inputs, Outputs, And Building

Upgrades Over Mistral OCR 2

Pricing, Batch Inference, And Annotations

Key Takeaways

Related Posts

Topics

-

Regional Insights