URL parsing · SSRF · open redirect

One URL, two different hosts.

See where a URL splits: how the WHATWG parser (browser & fetch()) and RFC 3986 disagree on the host — the gap behind SSRF, open-redirect and allowlist-bypass bugs — plus an allowlist-bypass tester. Nothing is fetched; it runs in your browser.

Open the checker ↗

The gap attackers live in

The effective host

Where a browser / fetch() really connects (WHATWG) — IP forms collapsed, IDN→punycode, case folded.

WHATWG vs RFC 3986

Side-by-side components, with the host highlighted when the parsers disagree.

Allowlist-bypass tester

Enter your allowed host; common naive checks run against the URL and the fooled ones are flagged.

Confusion flags

Backslash, userinfo @, slashes, encoded delimiters, whitespace, IDN homographs, IP normalization to private/metadata ranges.

The five parser-confusion classes

Backslash confusion

A URL containing a backslash. Browsers (and WHATWG) treat \ as / for special schemes, so the authority can end somewhere else than a library that keeps \ literal expects.

Slash confusion

An irregular number of slashes after the scheme. Some libraries collapse extra slashes and still find an authority; others see an empty authority — so the host differs.

Scheme confusion

A malformed or missing scheme. Parsers disagree on whether an authority follows, turning //evil.example or http:/evil.example into a host for some and a path for others.

URL-encoded data confusion

Percent-encoded delimiters (%2F, %5C, %2E, %40, %23, %3F). Parsers differ on whether they decode before or after splitting components, moving the host/path boundary.

Scheme mixup

A URL handled by a parser that lacks scheme-specific rules, so scheme-specific normalization (e.g. for http) is skipped and the components land differently.

Honest scope

urldiffer computes only two parsers — both exact (browser-native WHATWG + the RFC 3986 grammar) — and never fetches the URL or uploads anything. Other languages’ libraries are documented from research, not simulated. It is a deterministic diff tool, not a scanner or a proof of safety. Taxonomy and the cross-library findings come from "Exploiting URL Parsing Confusion" (Claroty Team82 + Snyk, 2022), which analysed 16 URL libraries across languages — including Python urllib/urllib3/rfc3986, curl, Go net/url, PHP parse_url, Node url/url-parse, Java, Ruby and Perl — and found the same input is split into different hosts by different libraries. urldiffer computes only the WHATWG and RFC 3986 parsers live; treat the cross-language behaviour as documented background, and verify against the actual libraries in your stack.

Frequently asked questions

What does urldiffer do?

Paste one URL and it shows how two parsers decompose it side by side — the WHATWG URL parser (what a browser and fetch() actually use, i.e. where the request really goes) and the RFC 3986 generic-URI grammar (what many validators and older libraries approximate) — then flags the differences that cause SSRF, open-redirect and allowlist-bypass bugs. Optionally enter your intended "allowed host" and it shows whether common naive allowlist checks would be fooled. Everything runs in your browser.

Why compare parsers at all?

Almost every server-side URL allowlist is a two-parser system: one library validates the URL (e.g. checks the host against a list) and a different client then fetches it. If those two parsers disagree about the host, an attacker can craft a URL that passes the check but makes the fetcher connect somewhere else — that is the root cause of a large class of SSRF and open-redirect vulnerabilities (Claroty Team82 & Snyk analysed 16 libraries and found five recurring inconsistency classes).

Which parsers does it compute live?

Two, and both are exact: the WHATWG parser via the browser-native URL() (so the "effective host" is literally what your browser would connect to, including IP normalization and IDN→punycode), and an RFC 3986 Appendix-B decomposition. We deliberately do NOT fake the output of PHP parse_url, Python urllib, Node's legacy url, Go net/url etc. — running them faithfully in the browser is impossible, and guessing would be dishonest for a security tool. Their documented quirks are summarised in the reference section instead.

How does the allowlist-bypass tester work?

Enter the host you intend to allow (e.g. api.internal). The tool computes the real effective host from the WHATWG parser, then runs four common hand-rolled checks against your URL — prefix match, "contains", host-before-@, and naive suffix — and marks any that say "allowed" while the browser would actually go elsewhere. A green "no bypass" means all four agree with reality for this input (it is not a proof of safety for all inputs).

What confusion classes does it flag?

Backslash (browsers treat \ as / for http/https; many libraries do not), userinfo "@" (WHATWG takes the host after the LAST @, RFC before the FIRST @), extra/missing slashes, protocol-relative //, percent-encoded delimiters (%2F, %5C, %2E, %40…), whitespace/control characters (browsers strip tab/newline/CR, validators often do not), IDN/Unicode homographs (shown as punycode), and IP normalization (hex/octal/decimal/short forms that resolve to a private, loopback or cloud-metadata address).

Is this a vulnerability scanner?

No. It is a deterministic parsing & diff tool. It does not fetch the URL, send anything anywhere, or prove your code is safe or unsafe — it shows you, for one specific input, where parsers disagree and which naive checks that input would defeat. Use it to understand and reproduce parser-confusion behaviour; use real testing and a hardened URL library for production decisions.

Is it accurate and private?

The two live parsers are exact (browser-native WHATWG and the RFC 3986 grammar) and the naive checks are explicitly defined. Nothing is uploaded — there is no backend, no fetch, no logging; the URL you paste is processed entirely in your browser.

Open the full interactive checker ↗