Skip to main content
ShipSafe

Methodology

How we measure AI-built app security

Every statistic on this site comes from ShipSafe’s own independent, read-only scans of publicly-deployed apps. We read the code and probe the live app from the public internet, and we only report data exposure that an actual anonymous request returned, never a guess. Below is each study, its sample size, and exactly what we count.

The studies

Cursor

67% had at least one critical-severity finding

sample: 100 deployed apps

Lovable

89% had a Supabase table readable with no login (RLS off)

sample: 50 deployed apps

Bolt.new

80% shipped at least one route with no authentication

sample: deployed apps

Replit

77% had at least one critical-severity finding

sample: 30 deployed apps

The widely-quoted 67% figure is the Cursor study specifically. Across builders, the rate of at least one critical finding ranged from 67% to 77%.

How a scan runs

  • Read-only and non-destructive: GET requests only, bounded, never writing or deleting.
  • Both surfaces: the code (or repo) for logic bugs, and the deployed app from the public internet.
  • Data exposure is proven, not inferred: a single anonymous read against a Supabase table or storage bucket, or a Firebase database, that actually returned rows.
  • Critical-severity follows the scanner's CWE/OWASP-mapped scale; a finding is counted once per app.

Scope and limits

These are point-in-time, automated scans of a sample of apps. They describe how often a class of issue appears across AI-built apps, not the security of any single app. A passing scan is not a guarantee, a penetration test, or a compliance audit. New code introduces new risk, so re-scan after changes.

Want the same proof for your own app?

Scan your app free