How does privacy compare between browser and cloud PDF tools?

Client-side tools never transmit the PDF to a server. File content stays in the browser's memory for the session and is destroyed when you close the tab. Cloud tools upload the PDF; even reputable providers retain logs, and their TLS termination points can be subject to lawful intercept or breach. For sensitive documents, client-side is strictly safer.

What are the memory limits?

Browsers cap WASM heap at 2-4GB depending on OS and build (4GB on 64-bit, sometimes 2GB on older Windows). For a typical PDF this is plenty; for 500-page scanned documents or high-DPI image PDFs, you may need to process page-by-page rather than loading the whole document.

What libraries power browser-based PDF tools?

The core stack is pdf.js for rendering and text extraction (from Mozilla), pdf-lib for creation and modification, and tesseract.js for client-side OCR. For compression and image recoding, squoosh-style WASM encoders (mozjpeg, oxipng, libwebp). All are open source and production-ready.

Can I use Web Workers for PDF processing?

Yes, and you should. Main-thread PDF work kills INP and freezes the UI. Move pdf-lib, tesseract.js, and image re-encoding into a dedicated worker via Comlink. The worker can also use OffscreenCanvas for rendering without blocking the main thread.

Does WebGPU help PDF processing?

For compute-intensive tasks like image filtering, OCR pre-processing, and rasterization at scale, yes. WebGPU shaders can offload pixel operations that would otherwise saturate the CPU. General PDF structure manipulation (pdf-lib) doesn't benefit because it's symbolic, not pixel work.

How does mobile performance compare to desktop?

Mid-tier phones complete typical PDF operations 3-8x slower than desktops due to CPU, memory, and thermal throttling. Files over 50MB become noticeably slow on mobile. Display a progress indicator, avoid blocking the UI, and consider offering server-side fallback for genuinely large files.

What are the trade-offs vs server-side?

Client-side wins on privacy, offline capability, server cost, and latency for small files. Server-side wins on very large files, guaranteed compatibility with every PDF variant, real-time collaboration, and when the output must be emailed or linked without the user's device doing the work.

BLOG · UPDATED 2026-04-17

Client-Side PDF Processing with WebAssembly: The 2026 Architecture

Q: Is client-side PDF processing actually viable in 2026?

Yes for most operations up to ~200MB files. WebAssembly, WebGPU, and mature JS libraries (pdf-lib, pdf.js, tesseract.js) cover viewing, merging, splitting, form filling, compression, OCR, and text extraction. Very large files and heavy batch workflows still favor server-side processing due to browser memory caps of 2-4GB.

April 17, 2026 · 20 min read · By FastTool Editors

Three years ago, serious PDF tooling meant a server. You uploaded your tax return, your NDA, your medical record; the server parsed it, did the operation, and returned a result. By 2026 that architecture looks increasingly wrong for small-to-medium files. Browsers got fast. WebAssembly got mature. JavaScript libraries for PDF manipulation got good enough that the same operations run in-memory, on the user's device, in less time than the round-trip used to take. This post is about how to actually build that stack, where the limits are, and when to still keep a server in the loop.

Why client-side in 2026
The core stack: pdf.js, pdf-lib, tesseract.js
How WebAssembly unlocks this
Memory limits and large files
Web Workers + OffscreenCanvas
WebGPU for compute-heavy operations
Privacy trade-offs
Mobile performance reality
Operations catalogue: what works client-side
When you still need a server
Case: 80-tool browser PDF suite
FAQ

Why Client-Side in 2026

The economics and the user-experience both flipped. In 2021 a server-based PDF tool made sense because browsers couldn't do the work. In 2026, keeping PDF processing on a server means:

Higher infrastructure cost (per-GB compute, storage egress).
Higher latency (network round-trip dominates small-file operations).
Real privacy obligations (GDPR, HIPAA, CCPA all touch uploaded user files).
Worse offline experience (useless without connectivity).
Harder compliance story (data residency, retention, audit trails).

Client-side architecture inverts all five. Standard processing stays in your browser. The user's CPU and GPU do the work. No regulatory exposure from uploads because there are no uploads. Offline works. Infrastructure costs are a static CDN bill, not a per-user compute bill.

The browser is the right PDF processing environment for 80-90% of personal and small-business workflows. Server-side survives for enterprise scale, very large documents, and workflows where the server already owns the data anyway.

The Core Stack: pdf.js, pdf-lib, tesseract.js

Three libraries handle the overwhelming majority of PDF operations entirely in the browser. None require a paid license. All are actively maintained.

Library	Role	Size	Works in worker
pdf.js	Parse, render, extract text	~400KB gzip	Yes (Mozilla provides pdf.worker.js)
pdf-lib	Create, modify, merge, split	~350KB gzip	Yes
tesseract.js	Client-side OCR	~1MB + language data	Yes, and should
mozjpeg-wasm	JPEG re-compression	~400KB	Yes
oxipng-wasm	PNG optimization	~600KB	Yes

pdf.js for rendering and text extraction

import * as pdfjsLib from 'pdfjs-dist';
pdfjsLib.GlobalWorkerOptions.workerSrc =
  'https://cdn.jsdelivr.net/npm/pdfjs-dist/build/pdf.worker.min.js';

async function extractText(file) {
  const buffer = await file.arrayBuffer();
  const pdf = await pdfjsLib.getDocument({ data: buffer }).promise;
  const pages = [];
  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const content = await page.getTextContent();
    pages.push(content.items.map(item => item.str).join(' '));
  }
  return pages;
}

pdf-lib for modification

import { PDFDocument, rgb } from 'pdf-lib';

async function addWatermark(file, text) {
  const buffer = await file.arrayBuffer();
  const pdf = await PDFDocument.load(buffer);

  for (const page of pdf.getPages()) {
    const { width, height } = page.getSize();
    page.drawText(text, {
      x: width / 2 - 100,
      y: height / 2,
      size: 40,
      color: rgb(0.9, 0.1, 0.1),
      opacity: 0.25,
      rotate: { type: 'degrees', angle: -30 },
    });
  }

  return new Blob([await pdf.save()], { type: 'application/pdf' });
}

Merging and splitting

async function merge(files) {
  const merged = await PDFDocument.create();
  for (const f of files) {
    const src = await PDFDocument.load(await f.arrayBuffer());
    const pages = await merged.copyPages(src, src.getPageIndices());
    pages.forEach(p => merged.addPage(p));
  }
  return new Blob([await merged.save()], { type: 'application/pdf' });
}

OCR with tesseract.js

import { createWorker } from 'tesseract.js';

async function ocrPdf(file, lang = 'eng') {
  const worker = await createWorker(lang);

  // Render each page to canvas, then OCR
  const pdf = await pdfjsLib.getDocument({ data: await file.arrayBuffer() }).promise;
  const results = [];
  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const viewport = page.getViewport({ scale: 2 });
    const canvas = new OffscreenCanvas(viewport.width, viewport.height);
    const ctx = canvas.getContext('2d');
    await page.render({ canvasContext: ctx, viewport }).promise;

    const bitmap = canvas.transferToImageBitmap();
    const { data } = await worker.recognize(bitmap);
    results.push(data.text);
  }
  await worker.terminate();
  return results;
}

Pipe OCR output through a text case converter and whitespace trimmer to clean it up before display. For structured extraction (invoice fields, form data), combine OCR with regex or named-entity extraction.

How WebAssembly Unlocks This

WebAssembly compiles C/C++/Rust to bytecode that runs at near-native speed in the browser. For PDF processing this matters because:

Battle-tested native libraries (mozjpeg, libwebp, poppler, Ghostscript-derivatives) compile to WASM with minimal changes.
Typed arrays give JS direct access to WASM memory without serialization cost.
SIMD extensions in WebAssembly 2 let encoders use vector instructions.
WebAssembly Garbage Collection (WASM-GC) shipped in 2024-2025, making GC languages (Kotlin, Scala, C# coming) viable without bundled runtimes.

Real-world impact: compressing a 5MB JPEG inside a PDF takes ~800ms in pure JS; ~120ms in mozjpeg-wasm. Running OCR over a 20-page document takes ~90 seconds in JS-only OCR approaches; ~15 seconds with tesseract.js WASM.

Memory Limits and Large Files

The browser caps WebAssembly linear memory at 4GB on 64-bit and 2GB on 32-bit builds. Practical ceilings:

Text-only PDFs: 200MB+ handled comfortably.
Image-heavy PDFs: ~100MB before memory pressure.
Scanned documents with embedded 300-DPI images: ~50MB per load, but page-by-page streaming avoids the cap.

Strategy for large files: process page-by-page rather than loading the whole document. pdf.js and pdf-lib both support streaming. Hold only the current page's data; garbage-collect between pages.

// Streaming merge pattern: release as you go
async function mergeStream(files) {
  const out = await PDFDocument.create();
  for (const f of files) {
    const buf = await f.arrayBuffer();
    const src = await PDFDocument.load(buf);
    for (let i = 0; i < src.getPageCount(); i++) {
      const [p] = await out.copyPages(src, [i]);
      out.addPage(p);
    }
    // src goes out of scope; GC can reclaim it
  }
  return out.save();
}

Web Workers + OffscreenCanvas

PDF work on the main thread ruins INP. Move everything to a worker.

// pdf-worker.ts
import * as Comlink from 'comlink';
import { PDFDocument } from 'pdf-lib';

const api = {
  async merge(buffers) {
    const out = await PDFDocument.create();
    for (const b of buffers) {
      const src = await PDFDocument.load(b);
      const pages = await out.copyPages(src, src.getPageIndices());
      pages.forEach(p => out.addPage(p));
    }
    return out.save();
  },
};

Comlink.expose(api);

// main.ts
const worker = new Worker(new URL('./pdf-worker.ts', import.meta.url), { type: 'module' });
const api = Comlink.wrap(worker);

async function handleMerge(files) {
  const buffers = await Promise.all(files.map(f => f.arrayBuffer()));
  const merged = await api.merge(Comlink.transfer(buffers, buffers));
  downloadBlob(new Blob([merged], { type: 'application/pdf' }));
}

OffscreenCanvas lets the worker render pages without touching the DOM. Combined with transferable ArrayBuffers (zero-copy transfer between main and worker), the main thread stays free for user interactions.

WebGPU for Compute-Heavy Operations

WebGPU brings compute shaders to the browser. For PDF tooling, the wins are:

Image pre-processing for OCR: binarization, deskewing, noise reduction on GPU is 20-50x faster than CPU.
Batch rasterization: rendering 50 pages in parallel on the GPU vs sequentially on CPU.
Format re-encoding: AVIF and WebP encoders can leverage GPU for entropy coding.

WebGPU shipped stable in Chrome, Edge, and Safari. Firefox is still in progress as of April 2026. For cross-browser code, feature-detect and fall back to WASM-SIMD or plain WASM.

if (navigator.gpu) {
  const adapter = await navigator.gpu.requestAdapter();
  const device = await adapter.requestDevice();
  // GPU-accelerated image pipeline
} else {
  // CPU fallback via WASM-SIMD
}

Privacy Trade-offs

The privacy story is the reason users increasingly prefer client-side PDF tools. A realistic comparison:

Property	Client-side	Cloud
File uploaded to external server	No	Yes
File visible to provider employees	No	Yes with admin access
File subject to lawful intercept	No	Yes
File lingers in logs / backups	No	Often yes
Regulatory scope (GDPR, HIPAA)	Minimal	Significant
Works offline	Yes	No
Works behind air-gap / firewall	Yes	No
Provider data breach impacts user	No	Yes

Client-side isn't a marketing claim; it's architecturally distinct. You can verify it by watching the Network tab while the operation runs. Review outgoing requests to verify which resources are contacted during processing.

Mobile Performance Reality

Desktops do PDF work quickly. Phones don't. Typical comparisons we've measured on real devices:

Operation	M2 MacBook	Pixel 8	Moto G Power
Merge 5 PDFs (50 pages total)	0.8s	2.4s	6.1s
Compress 20MB photo PDF	3.2s	9.8s	28s
OCR 10 scanned pages	8s	22s	68s
Convert PDF to Word (20 pages)	2.1s	5.4s	14s

Mobile UX requires:

Progress indicators for anything over ~500ms.
Chunked processing with yield points (scheduler.yield) to keep the UI responsive.
Upfront file-size warnings above 20MB on mobile.
Graceful failures with clear error messages when memory runs out.

Operations Catalogue: What Works Client-Side

Everything on this list ships reliably in-browser:

View, render pages to images
Extract text (PDF text extractor)
Merge multiple PDFs (PDF merger)
Split by page or range (PDF splitter)
Add watermark, page numbers, signature (watermark, page numberer, signature adder)
Remove pages, rotate pages (remove, rotate)
Edit metadata (metadata editor)
Compress (image recode, font subsetting via PDF compressor)
Convert from images (image to PDF)
Convert to markdown (PDF to markdown)
Unlock with password (PDF unlock)
Fill forms (form filler)
OCR scanned pages with tesseract.js

When You Still Need a Server

Keep a server in the loop for:

Files over ~500MB or 1000+ pages where memory becomes the binding constraint.
Workflows requiring guaranteed rendering fidelity across every PDF variant (heavy forms, unusual fonts, obscure compression schemes) where server-side Ghostscript / Poppler is more comprehensive than pdf.js.
Collaboration workflows where multiple users need synchronized access to the same PDF with change tracking.
Digital signature workflows requiring CA-attached certificates that can't live in browser storage.
Enterprise SaaS where the customer already stores files server-side for other reasons (document management systems, DMS integration).
Operations requiring specialized commercial libraries (PDF/A conversion validation, heavy rasterization for archival).

For most consumer and small-business workflows, none of these apply. Ship client-side first; add server-side for the edge cases.

Case: 80-Tool Browser PDF Suite

A community-maintained 80-tool PDF suite running entirely in-browser was built in 2025 using pdf.js + pdf-lib + mozjpeg-wasm. Architecture notes from that project worth copying:

Shared Web Worker pool. A single worker handles all heavy operations; UI components post messages. Avoids spinning up workers per operation.
Lazy-loaded WASM modules. mozjpeg loads only when compression is invoked; tesseract.js only on OCR. Initial bundle stays under 800KB.
Streaming file API. Large files are processed in chunks via File.stream() rather than arrayBuffer(). Keeps peak memory low.
IndexedDB session cache. Intermediate results (rendered page images, OCR outputs) cache in IndexedDB with TTL. Same file processed again in a session uses the cache.
No analytics inside processing code. Zero telemetry fires while PDF data is in memory. Usage metrics come from UI event counts only.
Strict CSP. connect-src 'self' blocks any accidental data exfiltration. Content Security Policy header is the privacy audit's best evidence.

The pattern scales. Once the worker infrastructure and WASM loading is in place, adding a new PDF tool is a couple of hundred lines of glue and UI. That's how a solo developer ships 80 tools; you build the platform once.

Frequently Asked Questions

Is client-side PDF processing actually viable in 2026?

Yes for most files up to ~200MB. Beyond that, page-streaming or server fallback covers edge cases. For the 90% of personal and small-business workflows, client-side is strictly better.

How do I verify a tool is actually client-side?

Open DevTools, go to Network tab, set throttling to Offline, upload the file, run the operation. If it completes, it's client-side. A cloud tool would fail immediately offline.

What about PDF/A archival compliance?

PDF/A-1, PDF/A-2 creation is possible in pdf-lib with correct metadata and color space handling. Full compliance validation (not just creation) typically still requires server-side veraPDF. For generation-only workflows, client-side works.

Can I encrypt PDFs client-side?

Yes. pdf-lib supports AES-256 encryption for password-protecting the output. Client-side encryption is safer than cloud encryption because the password never touches any server.

Does WebAssembly affect SEO?

WASM itself doesn't impact SEO directly. Good practices matter: ship a useful HTML landing page with content that describes the tool, lazy-load the WASM after initial render so LCP isn't harmed, and use schema.org SoftwareApplication markup.

What about digital signatures with actual certificates?

Adding visual signatures (drawn / image) works client-side. Cryptographically signing with a real X.509 certificate typically requires OS-integrated key store access, which browsers restrict. Workarounds: Web Crypto API for new key pairs generated in-browser, or delegate signing to a backend service that holds the cert.