URL parsing · SSRF · open redirect

One URL,
two different hosts.

Paste a URL and see where it splits: how the WHATWG parser (what a browser and fetch() actually connect to) and RFC 3986 (what validators approximate) disagree on the host — the exact gap behind SSRF, open-redirect and allowlist-bypass bugs. Add your intended allowed host to see whether common naive checks get fooled. Nothing is fetched; it all runs in your browser.

Try a bypass:

Where the browser / fetch() actually connects (WHATWG)

http://evil.example

Effective href: http://allowed.example@evil.example/

⚠ Allowlist bypass: a naive check accepts "allowed.example", but the request really goes to "evil.example"

Naive checkResult
url.startsWith("https://allowed.example")BYPASSED — says allowed
url.includes("allowed.example")BYPASSED — says allowed
url.split("://")[1].split("/")[0].split("@")[0] === "allowed.example"BYPASSED — says allowed
extractedHost.endsWith("allowed.example")rejected
ComponentWHATWG (browser / fetch)RFC 3986 (validators)
Schemehttphttp
Userinfoallowed.exampleallowed.example
Hostevil.exampleevil.example
Port
Path//
Query
Fragment

Parser-confusion flags

mediumUserinfo "@" in authority

An "@" appears in the authority. WHATWG uses the host after the LAST "@" (everything before is userinfo); RFC 3986 treats userinfo as text before the FIRST "@". With more than one "@", parsers disagree on the host — the root of http://trusted@evil.example bypasses.

Two exact parsers (browser-native WHATWG + RFC 3986), computed in your browser — nothing is fetched or uploaded. Other languages’ libraries are documented, not simulated. A deterministic diff tool, not a safety proof. How it’s computed →

A deterministic parsing & diff tool — not a vulnerability scanner, and not a proof that your code is safe.

What it shows

The gap attackers live in

The effective host

The host a browser / fetch() really connects to (WHATWG), already normalized — IP hex/octal/decimal collapsed, IDN turned to punycode, case folded.

WHATWG vs RFC 3986

A side-by-side decomposition of every component, with the host row highlighted when the two parsers disagree — the root of parser-confusion vulnerabilities.

Allowlist-bypass tester

Enter your allowed host; four common hand-rolled checks (prefix, contains, host-before-@, suffix) run against the URL and any fooled into "allowed" are flagged.

Confusion flags

Backslash, userinfo @, extra slashes, protocol-relative, percent-encoded delimiters, whitespace/control, IDN homographs, and IP normalization into private/loopback/metadata ranges.

Honest by design

Only two parsers are computed — both exact (browser-native WHATWG + the RFC grammar). Other languages’ libraries are documented from research, never faked.

Nothing leaves your browser

100% static page. It never fetches the URL and has no backend, no logging — paste a payload, not a secret.

Open methodology

Exactly how it’s computed

No black box, no network. Two exact parsers and explicitly-defined checks — so you can verify every result.

The five parser-confusion classes

Backslash confusion

A URL containing a backslash. Browsers (and WHATWG) treat \ as / for special schemes, so the authority can end somewhere else than a library that keeps \ literal expects.

Slash confusion

An irregular number of slashes after the scheme. Some libraries collapse extra slashes and still find an authority; others see an empty authority — so the host differs.

Scheme confusion

A malformed or missing scheme. Parsers disagree on whether an authority follows, turning //evil.example or http:/evil.example into a host for some and a path for others.

URL-encoded data confusion

Percent-encoded delimiters (%2F, %5C, %2E, %40, %23, %3F). Parsers differ on whether they decode before or after splitting components, moving the host/path boundary.

Scheme mixup

A URL handled by a parser that lacks scheme-specific rules, so scheme-specific normalization (e.g. for http) is skipped and the components land differently.

Source & honest scope: Taxonomy and the cross-library findings come from "Exploiting URL Parsing Confusion" (Claroty Team82 + Snyk, 2022), which analysed 16 URL libraries across languages — including Python urllib/urllib3/rfc3986, curl, Go net/url, PHP parse_url, Node url/url-parse, Java, Ruby and Perl — and found the same input is split into different hosts by different libraries. urldiffer computes only the WHATWG and RFC 3986 parsers live; treat the cross-language behaviour as documented background, and verify against the actual libraries in your stack.

Frequently asked questions

What does urldiffer do?

Paste one URL and it shows how two parsers decompose it side by side — the WHATWG URL parser (what a browser and fetch() actually use, i.e. where the request really goes) and the RFC 3986 generic-URI grammar (what many validators and older libraries approximate) — then flags the differences that cause SSRF, open-redirect and allowlist-bypass bugs. Optionally enter your intended "allowed host" and it shows whether common naive allowlist checks would be fooled. Everything runs in your browser.

Why compare parsers at all?

Almost every server-side URL allowlist is a two-parser system: one library validates the URL (e.g. checks the host against a list) and a different client then fetches it. If those two parsers disagree about the host, an attacker can craft a URL that passes the check but makes the fetcher connect somewhere else — that is the root cause of a large class of SSRF and open-redirect vulnerabilities (Claroty Team82 & Snyk analysed 16 libraries and found five recurring inconsistency classes).

Which parsers does it compute live?

Two, and both are exact: the WHATWG parser via the browser-native URL() (so the "effective host" is literally what your browser would connect to, including IP normalization and IDN→punycode), and an RFC 3986 Appendix-B decomposition. We deliberately do NOT fake the output of PHP parse_url, Python urllib, Node's legacy url, Go net/url etc. — running them faithfully in the browser is impossible, and guessing would be dishonest for a security tool. Their documented quirks are summarised in the reference section instead.

How does the allowlist-bypass tester work?

Enter the host you intend to allow (e.g. api.internal). The tool computes the real effective host from the WHATWG parser, then runs four common hand-rolled checks against your URL — prefix match, "contains", host-before-@, and naive suffix — and marks any that say "allowed" while the browser would actually go elsewhere. A green "no bypass" means all four agree with reality for this input (it is not a proof of safety for all inputs).

What confusion classes does it flag?

Backslash (browsers treat \ as / for http/https; many libraries do not), userinfo "@" (WHATWG takes the host after the LAST @, RFC before the FIRST @), extra/missing slashes, protocol-relative //, percent-encoded delimiters (%2F, %5C, %2E, %40…), whitespace/control characters (browsers strip tab/newline/CR, validators often do not), IDN/Unicode homographs (shown as punycode), and IP normalization (hex/octal/decimal/short forms that resolve to a private, loopback or cloud-metadata address).

Is this a vulnerability scanner?

No. It is a deterministic parsing & diff tool. It does not fetch the URL, send anything anywhere, or prove your code is safe or unsafe — it shows you, for one specific input, where parsers disagree and which naive checks that input would defeat. Use it to understand and reproduce parser-confusion behaviour; use real testing and a hardened URL library for production decisions.

Is it accurate and private?

The two live parsers are exact (browser-native WHATWG and the RFC 3986 grammar) and the naive checks are explicitly defined. Nothing is uploaded — there is no backend, no fetch, no logging; the URL you paste is processed entirely in your browser.