Fraud Detection Engine

Every trick they try,
we catch it

11 detection methods powered by AI forensics, math validation, and deep document analysis. From PDF tampering to AI-generated fakes — nothing gets past DocEnsure.

11Detection Methods
6AI Signals
99%Accuracy
Core Detection

PDF Tampering Detection

DocEnsure analyzes the internal structure of every PDF to detect post-creation modifications. We inspect content streams, cross-reference tables, incremental saves, and object modifications to find evidence of tampering that's invisible to the naked eye.

How It Works

  • Parses PDF internal structure (xref table, content streams, object tree)
  • Counts incremental save markers (%%EOF) — more than 1 = edited after creation
  • Detects white-rectangle overlays used to hide original text
  • Identifies content stream modifications and rewritten objects
  • Checks for linearization tampering and broken cross-references
  • Analyzes page content order for insertion anomalies

Editors Detected

  • Adobe Acrobat Pro / Reader edit traces
  • iText / iTextSharp library signatures
  • PDFtk, qpdf, and command-line tool markers
  • Foxit PhantomPDF modification artifacts
  • Nitro PDF editing footprints
  • Online PDF editors (Sejda, SmallPDF, PDFescape)
tampering_result.json
{ "check": "pdf_tampering", "status": "FLAGGED", "eof_count": 3, "editor_detected": "iTextSharp 5.5.13", "white_rectangles": 2, "incremental_saves": true, "modified_objects": ["salary_amount", "net_pay"], "risk_score": 0.92, "details": "Document was created by SAP HR, then modified using iTextSharp. Two white rectangles detected overlaying original amount fields." }
Financial Validation

Salary Slip Math Validation

Fraudsters edit salary amounts but forget to fix the math. DocEnsure extracts every number from the salary slip using OCR and validates all arithmetic relationships. If Basic + HRA + DA does not equal Gross, or Gross minus Deductions does not equal Net Pay, we flag it instantly.

Validations Performed

  • Gross Pay = Basic + HRA + DA + Special Allowances
  • Net Pay = Gross Pay - (PF + ESI + Professional Tax + TDS + Other Deductions)
  • PF contribution approximately 12% of Basic Pay
  • ESI approximately 0.75% of Gross (if applicable)
  • Year-to-date totals match monthly accumulation
  • Tax deductions consistent with declared tax regime

What It Detects

  • Inflated salary amounts where totals do not add up
  • Edited net pay that contradicts component breakdown
  • PF percentage mismatches indicating forged basic pay
  • Missing or extra components that break standard payroll structure
  • Rounding inconsistencies from manual number editing
salary_validation.json
{ "check": "salary_math", "status": "FLAGGED", "extracted": { "basic": 45000, "hra": 18000, "da": 4500, "gross": 85000, "pf": 5400, "net": 79600 }, "errors": [ "Gross mismatch: 45000+18000+4500 = 67500, not 85000", "PF 5400 is 12% of 45000 = 5400 (OK)", "Net mismatch: 85000-5400 = 79600 (matches inflated gross)" ], "risk_score": 0.95, "verdict": "Gross salary inflated by 17500" }
AI / ML Detection

AI-Generated Document Detection

With ChatGPT, Midjourney, and DALL-E, anyone can create realistic-looking documents from scratch. DocEnsure's 6-signal neural forensic engine analyzes image properties that AI generators cannot perfectly replicate, catching synthetic documents that fool the human eye.

6 AI Detection Signals

  • Frequency Spectrum Analysis — AI images lack natural high-frequency sensor noise patterns
  • Noise Residual Mapping — Real cameras produce varied noise; AI produces uniform patterns
  • Color Channel Correlation — Camera sensors produce correlated RGB noise; AI does not
  • Texture Naturalness (LBP) — Local Binary Pattern analysis detects AI texture artifacts
  • JPEG Compression Forensics — AI-generated images show different compression signatures
  • PDF Metadata Cross-Check — AI tools leave metadata traces (creation tool, timestamps)
ai_detection.json
{ "check": "ai_generated", "status": "FLAGGED", "ai_probability": 0.94, "signals": { "frequency_spectrum": 0.91, "noise_residual": 0.88, "color_correlation": 0.96, "texture_lbp": 0.93, "jpeg_artifacts": 0.87, "metadata_check": "FAIL" }, "suspected_tool": "Midjourney v5", "details": "Image exhibits uniform noise distribution and lacks natural sensor patterns. Frequency analysis confirms synthetic origin." }
Typography Analysis

Font Forensics

When someone edits a PDF, the replacement text almost never uses the exact same font as the original. DocEnsure extracts and analyzes every font used in the document, comparing font families, weights, sizes, and rendering characteristics to detect inconsistencies that reveal editing.

Analysis Details

  • Extracts all embedded and referenced fonts from PDF font tables
  • Compares font families across all text elements on each page
  • Detects font substitution (e.g., Arial replacing Calibri in edited fields)
  • Measures character spacing and kerning differences between text blocks
  • Identifies system default fonts used by PDF editors (Helvetica, Times-Roman)
  • Flags font count anomalies (payroll systems typically use 1-3 fonts)

Detection Thresholds

SignalThresholdAction
Font count> 4 fontsFlag for review
Font mismatchDifferent family on amountsHigh risk
Size variance> 0.5pt differenceFlag for review
Editor font detectedHelvetica/Times in non-standard positionSuspicious
font_forensics.json
{ "check": "font_forensics", "status": "FLAGGED", "fonts_found": [ { "name": "Calibri", "usage": "body text", "count": 847 }, { "name": "Calibri-Bold", "usage": "headers", "count": 23 }, { "name": "Helvetica", "usage": "salary_amount, net_pay", "count": 4 } ], "anomaly": "Helvetica used only on 2 amount fields — rest of document uses Calibri", "risk_score": 0.88, "verdict": "Font substitution detected on financial fields" }
Pixel-Level Analysis

Image Forensics

For scanned documents and image-based PDFs, DocEnsure performs pixel-level forensic analysis to detect image manipulation. Our engine uses four complementary techniques that together catch even sophisticated Photoshop edits.

Error Level Analysis (ELA)

  • Re-compresses the image and measures error differences
  • Edited regions show different error levels than original content
  • Detects copy-paste, airbrushing, and content-aware fill

Copy-Move Detection

  • Finds duplicate regions within the same image
  • Catches cloned stamps, signatures, and text blocks
  • Works even after rotation, scaling, and compression

Noise Analysis

  • Maps noise distribution across the entire image
  • Edited regions have different noise characteristics
  • Detects spliced content from different source images

Splicing Detection

  • Identifies inconsistent lighting and shadow directions
  • Detects boundary artifacts from image compositing
  • Analyzes JPEG grid alignment for splice boundaries
image_forensics.json
{ "check": "image_forensics", "status": "FLAGGED", "techniques": { "ela": { "score": 0.87, "regions_flagged": 2, "details": "High error levels around salary amount region" }, "copy_move": { "score": 0.12, "duplicates_found": 0 }, "noise_analysis": { "score": 0.79, "inconsistent_regions": 1, "details": "Noise pattern differs in bottom-right quadrant" }, "splicing": { "score": 0.34, "boundaries_found": 0 } }, "composite_score": 0.82, "verdict": "Image manipulation detected via ELA and noise analysis" }
Cross-Document Intelligence

Auto Candidate Identity Linking

When multiple documents are uploaded for the same candidate, DocEnsure automatically extracts identity fields and links them across documents. If the Aadhaar says "Rahul Sharma" but the salary slip says "Rahul S", our fuzzy matching engine catches it and flags the discrepancy.

Fields Extracted

  • Full name (with nickname and abbreviation handling)
  • Date of birth
  • PAN number
  • Aadhaar number (last 4 digits)
  • Employee ID / registration number
  • Father's name / spouse name

5-Signal Matching

  • Exact ID match — PAN or Aadhaar number matches across documents
  • Fuzzy name similarity — handles abbreviations, initials, and spelling variants
  • DOB correlation — exact or near-match date of birth
  • Employee ID linkage — same employee ID across salary slips and letters
  • Filename analysis — extracts candidate name from file naming patterns
identity_linking.json
{ "check": "identity_linking", "status": "LINKED", "candidate": "Rahul Sharma", "documents_linked": 4, "matches": [ { "doc": "aadhaar.pdf", "name": "Rahul Sharma", "match": "exact" }, { "doc": "pan_card.pdf", "name": "Rahul Sharma", "pan": "ABCPS1234R", "match": "exact" }, { "doc": "salary_slip.pdf", "name": "Rahul S", "match": "fuzzy (87%)" }, { "doc": "offer_letter.pdf", "name": "R Sharma", "emp_id": "EMP-4521", "match": "fuzzy (79%)" } ], "discrepancies": [ "Name variation: 'Rahul S' vs 'Rahul Sharma' on salary slip", "Father name missing on salary slip (present on Aadhaar)" ], "confidence": 0.91 }
Scoring & Verdict

4-Level Verdict System

Every document receives a composite risk score from 0 to 1 based on all checks performed. The score maps to one of four actionable verdict levels, each with specific recommended actions for your team.

CLEAR

Score: 0.00 - 0.25

Document is authentic. No tampering indicators found across all checks. Safe to accept with no further action required.

REVIEW RECOMMENDED

Score: 0.26 - 0.50

Minor concerns detected — slight font variations, low-confidence metadata flags, or minor math rounding differences. Manual review advised before accepting.

SUSPICIOUS

Score: 0.51 - 0.75

Significant tampering indicators found. Multiple checks flagged issues including editor signatures, math mismatches, or image anomalies. Do not accept without thorough investigation.

FRAUDULENT

Score: 0.76 - 1.00

Strong evidence of forgery. Multiple critical checks failed — confirmed PDF editing, math fraud, AI generation, or identity mismatches. Reject immediately and escalate.

18 Checks

Visual & Layout Analysis

DocEnsure performs 18 visual and layout checks on every document, analyzing alignment, spacing, color consistency, and structural integrity. These checks catch visual anomalies that indicate physical or digital manipulation.

Layout Checks

  • Text alignment consistency across all fields
  • Column and row spacing uniformity
  • Header and footer positioning validation
  • Logo placement and size consistency
  • Border and line straightness analysis
  • Page margin consistency

Visual Checks

  • Color consistency across text elements
  • Background uniformity (detects paste-over regions)
  • Print quality and resolution consistency
  • Scan angle and rotation detection
  • Shadow and lighting direction analysis
  • Pixelation and blur region detection
  • White-space anomaly detection
  • Contrast ratio consistency
  • Anti-aliasing pattern analysis
  • Compression artifact uniformity
  • DPI consistency across page regions
  • Watermark integrity validation
visual_analysis.json
{ "check": "visual_layout", "total_checks": 18, "passed": 15, "flagged": 3, "flags": [ { "check": "background_uniformity", "region": "salary_amount_field", "issue": "White rectangle overlay detected (255,255,255)" }, { "check": "color_consistency", "issue": "Text color #000000 vs #1a1a1a in amount field" }, { "check": "dpi_consistency", "issue": "Amount region at 150 DPI vs document body at 300 DPI" } ], "risk_score": 0.67 }
18 Checks

Metadata Forensics

Every PDF carries hidden metadata that reveals its creation history. DocEnsure extracts and analyzes 18 metadata properties to detect documents that have been modified, recreated, or generated using unexpected tools.

Metadata Analysis

  • Creator application identification (SAP, Tally, Oracle vs. PDF editors)
  • Producer tool chain verification
  • Creation date vs. modification date analysis
  • Timezone consistency checks
  • PDF version compatibility validation
  • XMP metadata cross-referencing with document info
  • Embedded file and attachment scanning
  • JavaScript and action detection
  • Digital signature certificate validation

Suspicious Patterns

  • Creation tool mismatch (salary slip "created by" Adobe Acrobat)
  • Future or impossible timestamps
  • Multiple producer tools indicating re-processing
  • Missing metadata that legitimate systems always include
  • Timezone inconsistencies with claimed document origin
  • Metadata stripped or anonymized (common in forged documents)
  • Embedded fonts from unexpected sources
  • Page count modifications
  • Document ID changes indicating recreation
metadata_forensics.json
{ "check": "metadata_forensics", "total_checks": 18, "passed": 14, "flagged": 4, "metadata": { "creator": "Adobe Acrobat Pro DC 2023", "producer": "Adobe PDF Library 15.0", "created": "2024-01-15T10:30:00+05:30", "modified": "2024-03-22T23:45:00+05:30" }, "flags": [ "Creator is Adobe Acrobat (expected: SAP/HR payroll system)", "Modified 2 months after creation date", "Modification at 23:45 IST (unusual business hours)", "PDF version upgraded from 1.4 to 1.7 (indicates re-save)" ], "risk_score": 0.74 }
14 Checks

Security Features

Government-issued and official documents contain security features like QR codes, watermarks, holograms, and official stamps. DocEnsure validates these security elements across 14 specialized checks to confirm document authenticity.

QR Code Verification

  • Decodes and validates QR code data against document content
  • Verifies QR code digital signatures (Aadhaar, DigiLocker)
  • Cross-references QR data with extracted document fields
  • Detects replaced or regenerated QR codes

Watermark Analysis

  • Detects presence and integrity of expected watermarks
  • Validates watermark positioning and opacity
  • Identifies removed or overwritten watermarks

Hologram & Stamp Detection

  • Holographic element presence verification
  • Official stamp and seal detection
  • Stamp positioning and overlap analysis
  • Signature presence and placement validation
  • Microprint detection (where applicable)
  • Security thread and guilloche pattern verification
  • Embossing detection on scanned documents
security_features.json
{ "check": "security_features", "total_checks": 14, "passed": 11, "flagged": 3, "results": { "qr_code": { "present": true, "valid_signature": true, "data_matches_document": false, "issue": "QR name 'Rahul Kumar' differs from printed 'Rahul Sharma'" }, "watermark": { "present": true, "integrity": "intact" }, "stamp": { "present": true, "positioning": "anomalous", "issue": "Company stamp overlaps signature boundary (likely pasted)" }, "hologram": { "expected": true, "detected": false, "issue": "Hologram region not found on Aadhaar card" } }, "risk_score": 0.61 }
10 Checks

Audit & Compliance

DocEnsure generates a complete, tamper-proof audit trail for every document verified. From SHA-256 hashing to chain of custody tracking, our compliance module ensures your verification process meets regulatory requirements.

SHA-256 Document Integrity

  • Generates SHA-256 hash of every uploaded document at intake
  • Stores hash immutably to prove document was not altered after upload
  • Re-hashes and compares at any future audit point

Chain of Custody

  • Tracks who uploaded, when, and from what IP/device
  • Records every verification action and result
  • Logs reviewer actions (approve, reject, escalate)
  • Maintains complete activity timeline per document

Compliance Timestamps

  • ISO 8601 timestamps on all verification events
  • Verification report generation with unique report IDs
  • Data retention policy compliance tracking
audit_trail.json
{ "check": "audit_compliance", "total_checks": 10, "passed": 10, "document_hash": "sha256:a7f3b2c...e9d1f4", "integrity": "VERIFIED", "chain_of_custody": [ { "action": "uploaded", "by": "hr@company.com", "at": "2024-03-15T10:30:00Z", "ip": "203.0.113.42" }, { "action": "verified", "checks_run": 142, "verdict": "SUSPICIOUS", "at": "2024-03-15T10:31:47Z" }, { "action": "escalated", "by": "hr@company.com", "to": "compliance@company.com", "at": "2024-03-15T10:45:00Z" } ], "report_id": "RPT-2024-03-15-0042", "retention_expires": "2027-03-15T00:00:00Z" }

Ready to detect document fraud?

Get in touch to see how DocEnsure's 11 detection methods can protect your organization from forged, edited, and AI-generated documents.

Get in Touch