When ABBYY Vantage processes a PDF document, it needs to decide how to extract the text. PDFs can contain an embedded text layer (searchable text written directly into the file) or they may be image-only files that require optical character recognition (OCR) to extract text. The PDF Processing Mode setting gives you explicit control over which method Vantage uses. This is especially useful when:Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
- You are working in regulated industries where reproducibility and auditability of extraction results are required.
- Your document set contains PDFs with low-quality or unreliable embedded text layers that would produce better results with OCR.
- You are migrating from ABBYY FlexiCapture and need to replicate the processing behavior of your existing workflows.
- You need consistent, predictable processing behavior across all documents regardless of their content.
Available modes
| Mode | Description | When to use |
|---|---|---|
| Default (Recommended) | Uses the embedded PDF text layer when available and supplements it with OCR as needed. This is the standard Vantage processing behavior. | General use. Recommended for most document sets with a mix of text-layer and image-only PDFs. |
| Use Text Layer Only | Extracts text exclusively from the embedded PDF text layer. If no text layer exists, Vantage falls back to OCR automatically. | Use when you have high-quality, trusted text layers and want faster extraction without full OCR. Useful for regulated environments where the existing text layer is the authoritative source. |
| Use OCR Only | Ignores any embedded PDF text layer and performs full OCR on every page of the document. | Use when PDF text layers are known to be unreliable or corrupt, or when you need consistent OCR-based extraction across all documents regardless of their structure. |
Example scenarios
The following examples show typical situations where each mode is the best choice.Use Text Layer Only
Use Text Layer Only
Your organization processes digitally-born PDF invoices exported from a vendor’s ERP system. The embedded text layer is accurate and machine-generated. Using Use Text Layer Only delivers fast, reliable extraction without running unnecessary OCR.
Default (Recommended)
Default (Recommended)
You process a high-volume mix of scanned paper documents and digitally-born PDFs in the same workflow. Some files have clean text layers; others don’t. Default (Recommended) handles both automatically without any per-document configuration.
Use OCR Only
Use OCR Only
Your document set consists of PDFs produced by a legacy scanning system that embeds a low-quality text layer during scanning. That embedded layer contains recognition errors that degrade field extraction. Use OCR Only bypasses it entirely and extracts clean text directly from the page image.
Regulated Environments
Regulated Environments
You work in a regulated industry (such as financial services or healthcare) where extraction results must be fully reproducible and auditable. Locking the mode to either Use Text Layer Only or Use OCR Only ensures the same processing path is always used, regardless of how documents arrive.
Where to configure
The PDF Processing Mode setting is available in the following locations:- OCR Skill settings — General tab, under Image Processing
- OCR activity settings within a Process skill — General tab, under Image Processing
Supported Technology Core versions
PDF Processing Mode is supported for skills using Technology Core 3. It is not available for earlier Technology Core versions.Related topics
OCR skill
Overview of the OCR skill and what it can extract.
Set up an OCR skill
Create a new OCR skill and configure each tab.
OCR activity
Run an OCR skill as part of a Process skill workflow.
Technology Core versions
Choose the engine version that powers a skill.
Skill Catalog
Discover, publish, and reuse skills across your tenant.
