May 28, 2026 · Tutorial

How to Verify PDF E-Signatures Programmatically with Node.js & Python (2026)

A developer's guide to extracting PKI certificate metadata, parsing ByteRange offsets, and verifying cryptographic signatures inside PDFs.

Michael Beckett
Michael Beckett

Founder, Signbee

TL;DR

Visual stamps inside a PDF do not prove a signature is valid. Cryptographic verification is the only way to verify that a document was signed by a specific certificate and has not been altered post-signature. In this guide, we break down PDF binary structure, parse ByteRange arrays, and write complete programmatic verification scripts using Node.js and Python.

How PDF digital signatures work

When you sign a PDF file, the PDF software does not just draw an image of your signature on the canvas. It embeds a cryptographic signature inside the PDF binary file structure. This structure is called detached signing.

Within the PDF body, a specific dictionary object containing the type /Sig (signature) is created. This object holds a placeholder slot where the signature itself will reside, along with information telling the parser how to find the signed bytes:

  • /Contents: A hex-encoded PKCS#7 / CMS (Cryptographic Message Syntax) container containing the public certificate, digital signature, and cryptographic hashes.
  • /ByteRange: An array of four integers. It tells the reader exactly which byte ranges of the document were hashed and signed, excluding the /Contents placeholder.

Understanding the ByteRange structure

Because a signature cannot contain its own hash without changing the file contents (and invalidating the hash), we must leave a hole in the binary stream where the signature is placed.

The /ByteRange array format looks like: [Start1, Length1, Start2, Length2]. For example, if a PDF is 100,000 bytes long, and the signature placeholder starts at byte 10,000 and ends at byte 25,000, the array will read:

/ByteRange [0 10000 25000 75000]

To verify this document, a program must read bytes from 0 to 10,000, concatenate them with bytes from 25,000 to 100,000, and calculate the SHA-256 hash of that combined sequence.

PDF signature standards compared

FormatSpecificationKey Feature
PKCS#7 DetachedRFC 2315 / PDF 1.7Basic binary signing envelope
PAdES-B-BETSI TS 102 778Basic electronic signature profile
PAdES-B-LTVPDF 2.0 / ETSIEmbedded CRL/OCSP revocation info
Adobe.PPKLiteAdobe ExtensionAdobe-specific signature format

Programmatic verification in Node.js

In Node.js, we can write a function to locate the /ByteRange and the hex-encoded /Contents inside a PDF buffer. Then, we reconstruct the signed data stream and use node-forge to verify the signatures.

JavaScript — PDF Signature Extraction & Verification
const fs = require('fs');
const forge = require('node-forge');

function verifyPdfSignature(pdfFilePath) {
  const pdfBuffer = fs.readFileSync(pdfFilePath);
  const pdfString = pdfBuffer.toString('binary');

  // 1. Locate the ByteRange [Start1 Length1 Start2 Length2]
  const byteRangeRegex = /\/ByteRange\s*\[\s*(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s*\]/g;
  const match = byteRangeRegex.exec(pdfString);
  if (!match) {
    throw new Error('No ByteRange match found. Document might not be signed.');
  }

  const byteRange = [
    parseInt(match[1], 10),
    parseInt(match[2], 10),
    parseInt(match[3], 10),
    parseInt(match[4], 10)
  ];

  // 2. Reconstruct the signed data segment
  const signedContent = Buffer.concat([
    pdfBuffer.subarray(byteRange[0], byteRange[0] + byteRange[1]),
    pdfBuffer.subarray(byteRange[2], byteRange[2] + byteRange[3])
  ]);

  // 3. Extract the CMS signature block (/Contents)
  const signatureHexOffset = byteRange[0] + byteRange[1];
  const signatureHexLength = byteRange[2] - signatureHexOffset;
  const signatureHex = pdfBuffer.subarray(signatureHexOffset, signatureHexOffset + signatureHexLength)
    .toString('binary')
    .replace(/^<+/, '')
    .replace(/>+$/, '')
    .replace(/\s/g, ''); // strip spaces and newlines

  const signatureBuffer = Buffer.from(signatureHex, 'hex');

  // 4. Parse the PKCS#7 envelope using node-forge
  const p7Asn1 = forge.asn1.fromDer(signatureBuffer.toString('binary'));
  const p7 = forge.pkcs7.messageFromAsn1(p7Asn1);

  // Attach original signed bytes to verify signature validity
  p7.content = forge.util.createBuffer(signedContent.toString('binary'));

  // 5. Run verification
  const isVerified = p7.verify();

  // 6. Extract Certificate Information
  const signerCert = p7.signerInfos[0].certificate;
  const commonNameAttr = signerCert.subject.getField('CN');
  const commonName = commonNameAttr ? commonNameAttr.value : 'Unknown Signer';

  return {
    verified: isVerified,
    signerCommonName: commonName,
    validFrom: signerCert.validity.notBefore,
    validTo: signerCert.validity.notAfter
  };
}

// Example usage:
// const result = verifyPdfSignature('./signed_contract.pdf');
// console.log(`Verified: ${result.verified}, Signer: ${result.signerCommonName}`);

Programmatic verification in Python

In Python, the industry standard is to use the pyHanko library, combined with pyhanko-certvalidator. This abstracts the low-level byte parsing and does cryptographic trust chain checks.

Python — Signature Verification with pyHanko
from pyhanko.pdf_utils.reader import PdfFileReader
from pyhanko_certvalidator import ValidationContext
from pyhanko.sign import fields
from pyhanko.sign.validation import validate_pdf_signature, SignatureValidationSpec

def verify_python_pdf_signature(pdf_path):
    with open(pdf_path, 'rb') as f:
        reader = PdfFileReader(f)
        
        # Enumerate signature fields in the PDF document
        for sig_field in fields.enumerate_sig_fields(reader):
            field_name = sig_field.field_name
            print(f"Validating field: {field_name}")
            
            # Setup a validation context (can inject custom trust anchors here)
            validation_context = ValidationContext(trust_roots=[])
            
            status = validate_pdf_signature(
                sig_field,
                spec=SignatureValidationSpec(
                    validation_context=validation_context
                )
            )
            
            # Check document integrity
            print(f"Signature Intact: {status.intact}")
            print(f"Signature Valid: {status.valid}")
            
            # Extract certificate metadata
            cert = status.signing_cert
            print(f"Signed by: {cert.subject.human_friendly}")
            print(f"Issuer: {cert.issuer.human_friendly}")
            print(f"Valid from: {cert.validity['not_before']}")
            print(f"Valid to: {cert.validity['not_after']}")

# Usage
# verify_python_pdf_signature('NDA_signed.pdf')

Verifying certificate revocation (OCSP/CRL)

Verifying that the cryptographic signature is intact only tells you that the document hasn't been tampered with since signing. It does not prove the signer's identity was valid at that moment. You must check for certificate revocation.

Certificates can be revoked by the issuing Certificate Authority (CA) if the signer's private key is compromised, or if their corporate credentials change. There are two mechanisms for checking this:

  • Certificate Revocation Lists (CRLs): A list of revoked serial numbers published periodically by the CA. Your verification system must download this list and check if the signing certificate is present.
  • Online Certificate Status Protocol (OCSP): A real-time API query sent directly to the CA request handler. OCSP is much lighter than downloading large CRLs.

When utilizing an enterprise API like Signbee, revocation checks, timestamp authority alignment (RFC 3161), and LTV (Long-Term Validation) compliance are handled automatically, saving weeks of low-level development.

Common verification pitfalls to avoid

PitfallSecurity Risk
Using RegExp for binary extractionRegex matches fail on incremental updates or cross-reference streams
Ignoring certificate expirationAccepting documents signed with expired or invalid certificates
Not verifying the root anchorAnyone can generate self-signed certificates and claim they are legitimate
Skipping hash matching checksAccepts modified pages that were appended post-signing

Frequently Asked Questions

How does programmatic PDF signature verification differ from visual verification?

Visual verification only checks for the presence of a visual signature image in the document, which can be easily faked. Programmatic cryptographic verification extracts the digital signature from the PDF's binary data, parses the PKCS'7 envelope, extracts the signing certificate, and verifies the cryptographic hash of the PDF byte range. This guarantees that the document has not been altered since it was signed and verifies the signer's identity using PKI (Public Key Infrastructure).

What is the role of ByteRange in PDF digital signatures?

The ByteRange is an array of even numbers specifying the byte offsets in the PDF file that are covered by the digital signature. It typically excludes the signature contents (/Contents) themselves to avoid self-reference (as the signature cannot sign itself). To verify a signature, you must reconstruct the exact signed byte sequence by concatenating the slices defined in the ByteRange and checking if their hash matches the signed hash contained in the PKCS'7 structure.

How do you check if an e-signature certificate has been revoked?

To verify if a certificate is revoked, you must query the Certificate Authority (CA) that issued the certificate. This is done programmatically using Online Certificate Status Protocol (OCSP) requests or by downloading Certificate Revocation Lists (CRLs) listed in the certificate's extensions. If the OCSP response or CRL indicates the certificate was revoked before the signing time, the signature is invalid.

Never drop a document — 5 free docs/month, 100 req/min.

Last updated: May 29, 2026 · Michael Beckett is the founder of Signbee and B2bee Ltd.

Related resources