Firecrawl unveils Rust-powered PDF Parser v2, 3x faster with Auto, Fast, OCR
Firecrawl's Rust-based PDF Parser v2 claims up to 3x faster extraction and defaults to Auto mode with OCR fallback, plus a maxPages parameter and unified billing (15 tokens = 1 credit).

Firecrawl rolled out PDF Parser v2, a ground-up rewrite that the company says is "up to 3x faster" and "more reliable across every document type," replacing the old extraction engine with a Rust-based parser and three modes: Auto, Fast, and OCR. The release highlights that Auto mode, the default, provides "fast extraction with automatic OCR fallback" and "handles the edge cases that break traditional parsers, including charts, tables, mixed encodings, and multi-column layouts," letting users ingest academic papers, regulatory filings, large document sets, or feed AI agents in real time with no code changes required.
Developers can force a specific behavior via the parsePDF parameter; the release includes a Python example using formats=['markdown'] and an API key placeholder 'fc-YOUR_API_KEY'. The example is published verbatim in the release notes: "from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
# Auto mode (default): fast extraction with automatic OCR fallback result = firecrawl.scrape( url=' formats=['markdown'], parsePDF='auto' )
# Fast mode: Rust-based text extraction only result = firecrawl.scrape( url=' formats=['markdown'], parsePDF='fast' )
# OCR mode: for scanned or image-only PDFs result = firecrawl.scrape( url=' formats=['markdown'], parsePDF='ocr' )"

On the engineering side, the changelog records infrastructure and performance work including "Added static IP proxy pool + proxy location support," "Webhooks: Implemented signatures, refactored sending, added scrape error events," and "Performance: Optimized map, converted Rust natives to single NAPI library." The project added a maxPages parameter to the PDF parser in the v2 scrape API in PR #2047 by @devin-ai-integration[bot], and other repo activity includes #2067 by @rafaelsideguide for next cursor pagination and #2063 by @mogery adding a /team/queue-status endpoint.
The v2.6.0 highlights emphasize platform and billing changes under the "v2.6.0" header: "Unified Billing Model - Credits and tokens merged into single system. Extract now uses credits (15 tokens = 1 credit), existing tokens work everywhere." Other v2.6.0 items listed verbatim include "Full Release of Branding Format - Full support across Playground, MCP, JS and Python SDKs," "Change Tracking - Faster and more reliable detection of web page content updates," "Reliability and Speed Improvements - All endpoints significantly faster with improved reliability," "Instant Credit Purchases - Buy credit packs directly from dashboard without waiting for auto-recharge," "Improved Markdown Parsing - Enhanced markdown conversion and main content extraction accuracy," and "Core Stability Fixes - Fixed change-tracking issues, PDF timeouts, and improved error handling."
Bug fixes in the release notes are explicit: "Corrected concurrency limit scaling," "Fixed search result links/descriptions and retry mechanism for empty results," "Re-signed expired screenshot URLs," "Trimmed null chars from PDF titles + fixed encoding," "Fixed sitemap parsing and added `.gz` sitemap support," "Fixed js-sdk `zod-to-json-schema` import," "Fixed webhook data format regression," and "Improved credit handling in account object," among others listed in the changelog.
Firecrawl positions the update as lowering friction for data projects with marketing copy intact: "Turn complex PDFs from the web into structured data much more quickly" and "Start getting Web Data for free and scale seamlessly as your project expands. No credit card needed." The announcement also claims the launch has gained "over 1k likes," a traction metric presented without platform context.
Missing from the release are independent benchmarks and a release date tied to the v2 rollout, per the notes; the company has not published the dataset, hardware, or test methodology underlying the "up to 3x faster" claim, nor has it published per-credit pricing beyond the 15 tokens = 1 credit conversion. For immediate practical use, specify parsePDF='fast' to prioritize Rust text extraction, parsePDF='ocr' for image-only PDFs, and use the maxPages parameter (PR #2047) when processing large documents to limit scope and cost.
Know something we missed? Have a correction or additional information?
Submit a Tip

