Question 1

What does urldiffer do?

Accepted Answer

Paste one URL and it shows how two parsers decompose it side by side — the WHATWG URL parser (what a browser and fetch() actually use, i.e. where the request really goes) and the RFC 3986 generic-URI grammar (what many validators and older libraries approximate) — then flags the differences that cause SSRF, open-redirect and allowlist-bypass bugs. Optionally enter your intended "allowed host" and it shows whether common naive allowlist checks would be fooled. Everything runs in your browser.

Question 2

Why compare parsers at all?

Accepted Answer

Almost every server-side URL allowlist is a two-parser system: one library validates the URL (e.g. checks the host against a list) and a different client then fetches it. If those two parsers disagree about the host, an attacker can craft a URL that passes the check but makes the fetcher connect somewhere else — that is the root cause of a large class of SSRF and open-redirect vulnerabilities (Claroty Team82 & Snyk analysed 16 libraries and found five recurring inconsistency classes).

Question 3

Which parsers does it compute live?

Accepted Answer

Two, and both are exact: the WHATWG parser via the browser-native URL() (so the "effective host" is literally what your browser would connect to, including IP normalization and IDN→punycode), and an RFC 3986 Appendix-B decomposition. We deliberately do NOT fake the output of PHP parse_url, Python urllib, Node's legacy url, Go net/url etc. — running them faithfully in the browser is impossible, and guessing would be dishonest for a security tool. Their documented quirks are summarised in the reference section instead.

Question 4

How does the allowlist-bypass tester work?

Accepted Answer

Enter the host you intend to allow (e.g. api.internal). The tool computes the real effective host from the WHATWG parser, then runs four common hand-rolled checks against your URL — prefix match, "contains", host-before-@, and naive suffix — and marks any that say "allowed" while the browser would actually go elsewhere. A green "no bypass" means all four agree with reality for this input (it is not a proof of safety for all inputs).

Question 5

What confusion classes does it flag?

Accepted Answer

Backslash (browsers treat \ as / for http/https; many libraries do not), userinfo "@" (WHATWG takes the host after the LAST @, RFC before the FIRST @), extra/missing slashes, protocol-relative //, percent-encoded delimiters (%2F, %5C, %2E, %40…), whitespace/control characters (browsers strip tab/newline/CR, validators often do not), IDN/Unicode homographs (shown as punycode), and IP normalization (hex/octal/decimal/short forms that resolve to a private, loopback or cloud-metadata address).

Question 6

Is this a vulnerability scanner?

Accepted Answer

No. It is a deterministic parsing & diff tool. It does not fetch the URL, send anything anywhere, or prove your code is safe or unsafe — it shows you, for one specific input, where parsers disagree and which naive checks that input would defeat. Use it to understand and reproduce parser-confusion behaviour; use real testing and a hardened URL library for production decisions.

Question 7

Is it accurate and private?

Accepted Answer

The two live parsers are exact (browser-native WHATWG and the RFC 3986 grammar) and the naive checks are explicitly defined. Nothing is uploaded — there is no backend, no fetch, no logging; the URL you paste is processed entirely in your browser.

One URL, two different hosts.

The gap attackers live in

The five parser-confusion classes

Honest scope

Frequently asked questions