XSS defenses go wrong when teams treat “sanitize user input” like a single magic step. It isn’t. Different kinds of input need different handling, and some of the most common advice online is flat-out incomplete.

My opinion: if you only remember one thing, remember this — validate for business rules, encode for output, and sanitize only when you intentionally allow HTML.

That distinction matters because “sanitizing input” can mean wildly different things:

  • stripping dangerous characters
  • validating format
  • cleaning user-supplied HTML
  • escaping data before rendering
  • filtering URLs or CSS values

Those are not interchangeable.

The short version

Here’s the practical comparison:

Approach Best for Pros Cons
Input validation Emails, usernames, IDs, dates Simple, predictable, blocks bad data early Does not stop XSS by itself
Output encoding Any untrusted data rendered into HTML/JS/CSS/URLs Most reliable general defense Must match the exact output context
HTML sanitization Rich text editors, comments with formatting Lets users keep safe HTML Easy to misconfigure, library-dependent
Character stripping / regex cleaning Very limited controlled formats Fast for narrow cases Dangerous as a general XSS defense
CSP Defense in depth Reduces impact of some XSS bugs Not a replacement for proper handling

If your app does not need user HTML, do not sanitize HTML. Store the text and output-encode it.

1. Input validation

Input validation is about enforcing what the data should be, not trying to detect every possible attack payload.

For example:

  • usernames: letters, numbers, underscores, 3–30 chars
  • age: integer in a sensible range
  • country code: two uppercase letters
  • product ID: UUID or numeric ID

Pros

  • Easy to reason about
  • Improves data quality
  • Shrinks attack surface
  • Usually fast and cheap to implement

Cons

  • Doesn’t solve XSS when you later render the data unsafely
  • Breaks down for free-form text fields
  • Too many teams overestimate what regex can do here

A good validation rule:

function validateUsername(input) {
  return /^[a-zA-Z0-9_]{3,30}$/.test(input);
}

A bad validation rule pretending to stop XSS:

function antiXssFilter(input) {
  return !/<script|javascript:|onerror=|onload=/i.test(input);
}

That second one is brittle and bypassable. Attackers do not politely use <script> every time.

Use validation to enforce expected structure. Don’t use it as your primary XSS control.

2. Output encoding

This is the workhorse defense. If you render untrusted data as text in the browser, output encoding is usually what you want.

The catch: encoding depends on where the data lands.

HTML text context

Safe pattern:

<div>{{ user.bio }}</div>

In modern templating systems, this is often auto-escaped by default.

Rendered safely, <img src=x onerror=alert(1)> becomes text, not executable HTML.

Pros

  • Reliable when used correctly
  • Built into many frameworks
  • Preserves original data
  • Works well for plain text user content

Cons

  • Context-sensitive
  • Easy to break when developers bypass framework protections
  • Not enough when you intentionally allow HTML

Dangerous mistake: wrong sink

This is where teams get burned:

element.innerHTML = userInput;

If userInput is untrusted, you probably just created an XSS sink.

Safer:

element.textContent = userInput;

Or in the DOM:

const div = document.createElement('div');
div.textContent = userInput;
container.appendChild(div);

If you’re using React, Vue, Angular, Razor, Django templates, or similar, the default escaped rendering is usually the safe path. Problems start when someone reaches for raw HTML rendering like dangerouslySetInnerHTML, v-html, or direct DOM injection.

Official docs worth reviewing for your stack:

3. HTML sanitization

This is the right tool when users are allowed to submit rich text: comments with formatting, CMS content, support tickets with markup, WYSIWYG editor output.

You are no longer treating input as plain text. You are allowing some HTML, so you need a sanitizer that removes dangerous elements and attributes while preserving approved markup.

Pros

  • Supports rich content
  • Better user experience than stripping all formatting
  • Can enforce allowlists for tags and attributes

Cons

  • Harder than it looks
  • Misconfiguration creates holes
  • Sanitizer bypasses happen, so patching matters
  • HTML is not the only problem — URLs, SVG, MathML, and CSS can be tricky

Here’s a typical server-side example in Node.js using a sanitizer library:

import sanitizeHtml from 'sanitize-html';

const clean = sanitizeHtml(userHtml, {
  allowedTags: ['p', 'b', 'i', 'em', 'strong', 'a', 'ul', 'ol', 'li', 'code', 'pre'],
  allowedAttributes: {
    a: ['href', 'title']
  },
  allowedSchemes: ['http', 'https', 'mailto']
});

That’s a sane starting point. Still, I would review every allowed tag and attribute with suspicion.

A few rules I follow:

  • Prefer a small allowlist
  • Be careful with style
  • Be very careful with SVG
  • Restrict URL schemes
  • Patch sanitizer dependencies promptly
  • Test with real payloads, not just happy-path formatting

If your product does not truly need HTML, don’t sanitize HTML “just in case.” That adds complexity you don’t need.

4. Character stripping and regex-based cleaning

This is the old-school move:

input = input.replace(/<script.*?>.*?<\/script>/gi, '');
input = input.replace(/[<>]/g, '');

I don’t recommend this as a general XSS strategy.

Pros

  • Simple for highly constrained input
  • Can be okay as a normalization step in narrow cases
  • Sometimes useful for cosmetic cleanup

Cons

  • Incomplete by design
  • Easy to bypass with encoding tricks or alternate payload forms
  • Often destroys legitimate user content
  • Gives teams false confidence

If the field is “first name,” then yes, you can aggressively constrain it. If the field is “message,” “profile bio,” or “article body,” regex stripping is the wrong abstraction.

The browser parses HTML, not your intentions. That parser is much more flexible than a few regular expressions.

5. URL sanitization

URLs deserve their own section because developers often allow them into href, src, or redirect parameters without enough checks.

Bad:

link.href = userInput;

If userInput is javascript:alert(1), you have a problem.

Safer:

function isSafeUrl(value) {
  try {
    const url = new URL(value, 'https://example.com');
    return ['http:', 'https:'].includes(url.protocol);
  } catch {
    return false;
  }
}

Then:

if (isSafeUrl(userInput)) {
  link.href = userInput;
}

Pros

  • Effective for link and media handling
  • Easy to build around protocol allowlists

Cons

  • Developers forget relative URLs, base resolution, and odd schemes
  • Different sinks may have different parsing behavior

Treat URLs as structured data, not random strings.

6. CSP as backup, not cleanup

Content Security Policy won’t sanitize input. It won’t fix unsafe innerHTML. What it does is reduce blast radius when something slips through.

A decent CSP can block inline script execution, restrict script sources, and make some classes of XSS much harder to exploit.

Pros

  • Strong defense in depth
  • Helps contain mistakes
  • Useful visibility with reporting

Cons

  • Doesn’t replace output encoding or sanitization
  • Can be painful to retrofit
  • Weak CSPs are common and often overestimated

For implementation patterns, nonce usage, and rollout strategy, see CSP Guide.

What I recommend in real projects

Here’s the practical decision tree I use:

If the field should be plain text

  • Validate length and business rules
  • Store raw text
  • Output-encode on render
  • Use safe DOM APIs like textContent

Example:

app.post('/comment', (req, res) => {
  const comment = String(req.body.comment || '').slice(0, 2000);
  saveComment(comment);
  res.sendStatus(204);
});

Then during rendering, escape by default in your template engine.

If the field should contain limited HTML

  • Sanitize with a maintained HTML sanitizer
  • Use a strict allowlist
  • Validate URLs inside attributes
  • Re-sanitize if content is transformed later
  • Render only into trusted HTML sinks after sanitization

If the field has a strict format

  • Validate hard
  • Reject anything outside expected shape
  • Still encode on output

That last part matters. Even validated data can become dangerous in the wrong output context.

Common mistakes

These show up constantly in code reviews:

  • Sanitizing once at input time and assuming the data is forever safe
  • Using the same escaping for HTML, JavaScript, CSS, and URLs
  • Trusting client-side sanitization alone
  • Allowing raw HTML because “the admin panel is internal”
  • Forgetting that stored XSS is usually worse than reflected XSS
  • Using innerHTML for convenience
  • Building custom sanitizers when a maintained library exists

My strongest opinion here: don’t invent your own XSS filter. I’ve never seen a homegrown one age well.

Pros and cons recap

If you want the blunt answer:

  • Best default: output encoding
  • Best for strict fields: input validation
  • Best for rich text: HTML sanitization
  • Worst general advice: strip “bad characters” and hope
  • Best backup layer: CSP

XSS prevention works best when you stop asking “How do I sanitize all user input?” and start asking “What kind of data is this, and where will it be rendered?”

That shift is where most security programs start getting this right.