Field notes on NPM dependency confusion, typosquat patterns, and the metadata that escapes repository review entirely.

The blind spot everyone trips on

If you've shipped a Node project in the last five years, you have a package.json that resolves to between 600 and 2,400 transitive dependencies depending on how rough you've been with npm install. Your security tooling almost certainly scans the repo. It almost certainly does not scan the graph that npm actually downloaded on your last prod-2026-05-01 build.

Those two things are not the same. The repo declares intent. The graph is what runs.

I've spent the last six months looking at the gap, and the short version is this: the dependency graph leaks more than the repo because the metadata you don't review — maintainer changes, publish history, install scripts, source URLs, tarball contents — is where 80% of supply-chain risk actually lives. The repo audit catches the obvious. The graph audit catches everything else.

Four patterns I see weekly

Typosquat candidates. Someone publishes reqeusts-oauth (note the swapped letters) and it sits at 12 weekly downloads for a year. Then someone with a tired Tuesday and an autocomplete that lies adds it to a build. The package itself is plausible — it has a README, a package.json, even a couple of tests — and it ships a malicious postinstall script that lifts your ~/.npmrc and ~/.aws/credentials. The repo audit won't catch this. The repo doesn't have the malicious code. The graph does.

Dependency confusion. Your private registry path is @internal-package/api-client. Somebody publishes that exact name on public npm, version 99.99.99. Your CI pulls the public one because the resolver doesn't know better. The repo says "internal." The graph says "public." This pattern got Microsoft, Apple, and PayPal in 2021 and it still works in 2026 because resolver semantics are subtle.

Maintainer takeover. A maintainer of a 2 million weekly download package gets phished or sells the namespace. The package keeps its name, its repo URL, its README. Version 4.2.0 ships a credential-siphon. By the time the rollback PR lands, every CI run that pulled ^4.0.0 has been exposed. We track this with publisher anomaly: a new maintainer + a sudden version bump + a non-trivial source diff. Three out of three on the same day means the package is under active investigation.

Sourcemap and registry leak. This one is my favorite because it's almost always a hygiene failure rather than an attack. The package was built fine. The publish step accidentally included dist/*.map. The sourcemap rebuilds your internal API host, your auth flow, your feature flags. We grep these. They show up. The repo is clean. The published .tgz is not.

What we built to find them

Four passes, run continuously against the public registry slice that intersects each customer's authorized scope:

Namespace crawl. We track every package that matches the customer's known prefixes (org scope, internal namespace conventions, brand-similar strings). New publish events trigger a fresh pass within 30 minutes.
Name-similarity scoring. Levenshtein + keyboard-distance + visual-confusable check (0 vs O, rn vs m, etc.). Score above threshold → typosquat candidate. We compare against the customer's actual package.json declarations so we only surface candidates that resemble their deps, not the entire ecosystem.
Maintainer graph. Every package has a publisher set. We watch for adds, removes, and "first publish by user X" events on packages the customer depends on. Combine with the package's stable diff history and we get an anomaly score.
Published file scan. Same secret-detection pipeline we use for Docker layers — regex, false-positive filter, entropy, ONNX classifier, semantic LLM. Applied to every file in the published tarball, including build output that never made it back to the repo.

This isn't novel. Socket, Aikido, Snyk Advisor have parts of it. What I think we do differently is scope binding — we don't surface noise from the whole npm graveyard, we surface what's adjacent to your stack. A typosquat of a package you don't use isn't your problem.

Confidence vs. action

A finding from the published-file scan is high confidence. We have the bytes, we have the regex match, we have the entropy score. A finding from the maintainer graph is medium confidence — three out of three signals is suspicious, not conclusive, and we say so on the card.

The product decision behind this: we'd rather surface medium-confidence with explicit language ("3 of 3 anomaly signals; recommend manual review before allowing the bump") than collapse the difference and look definitive on something we're guessing about. The first time a security tool cries wolf on a ^4.0.0 bump that turned out to be fine, the team starts ignoring the badge. That kills the platform. We protect the badge by being honest about what each finding type means.

What this tier ships

NPM, PyPI, and container-registry dependency crawl ship in Community. Maintainer-graph anomalies and published-file scan ship in Community. Cross-correlation between dependency graph and your CI/CD workflows ships in Pulse. Scope-binding to private registry namespaces ships in Shield where the customer can declare which prefixes are theirs.

If you're not a customer yet and you want a one-shot dependency audit for a specific public org, the /scan endpoint takes a GitHub org name and gives you a free snapshot. It's not the full graph, it's the most-recently-updated public repos and a 12-pattern secret check. Useful as a starting point, not as the full audit.

What's next

The hard problem we haven't fully cracked: cross-customer signal amplification. If five customers depend on the same compromised package version, we'd ideally tell the sixth before they pull it. We can't, today, because cross-tenant aggregation crosses tenant-isolation boundaries that we take seriously. The fix is a one-way anonymous signal channel that lets us publish a public advisory without leaking which customer triggered the discovery. We have a design. It hasn't shipped. When it does, it'll be in /intel/advisories under BVA-2026-0XXX.

Until then, your best signal is your own graph, scanned continuously, with someone explaining the why on every alert rather than just the what. That's what we're trying to build.

Why your dependency graph leaks more than your repo.