What is a good identity match rate for B2B data?

Benchmarks vary by identifier type and use case. For email to device ID, 55-75% is strong and below 40% warrants investigation. For postal address to phone, 50-70% is typical. For hashed email to RampID or UID 2.0, 60-80% on a clean, engaged list is achievable. For CRM file onboarding to paid media platforms, 40-60% is standard and above 65% is excellent. If you're running significantly below these benchmarks, the gap is almost always a data hygiene or data vintage issue on your side rather than a coverage problem with the identity graph.

What causes low identity match rates?

The most common causes of low match rates are stale data (email and phone churn at roughly 25-30% per year), formatting inconsistencies such as mixed-case emails or non-standard phone number formats, role accounts and shared inboxes like info@ or admin@ that never match consumer profiles, thin demographic segments such as younger privacy-conscious consumers or rural households with fewer digital touchpoints, and wrong identifier types for the graph being queried, for example MD5-hashed emails submitted to a graph indexed on SHA-256.

How do you audit match rates step by step?

A proper match rate audit has five steps. First, segment your file before submission by data source, record age, and identifier type so you get rates by cohort rather than just an aggregate. Second, standardize formatting: lowercase emails, strip whitespace, normalize phone numbers, apply USPS CASS certification to postal addresses. Third, remove known-invalid records like role-based emails and high-bounce addresses. Fourth, review the confidence score distribution rather than just the headline match count, looking at how many matches cluster above 80, between 60-80, and below 60. Fifth, compare against a control set of verified customers to isolate whether underperformance is a graph coverage issue or a data quality issue in your file.

What is scored identity resolution and how does it improve match quality?

Scored identity resolution assigns a confidence index from 0 to 100 to every match rather than returning a binary accept or reject result. This lets you tune the quality threshold per use case: using only high-confidence matches (85+) for suppression where a false positive means messaging an opt-out, and opening to 65+ for broad prospecting where reach matters more. BIGDBM's Scored Identity Resolution gives full visibility into the confidence distribution, so teams can make explicit precision-versus-reach tradeoffs rather than accepting a black-box match rate number.

How much can match rates improve with data hygiene?

Data hygiene improvements consistently deliver 3-8 percentage points of match rate improvement before any changes to the underlying identity graph. Formatting standardization alone, lowercasing emails, stripping whitespace, normalizing phone formats, is responsible for most of that lift. Removing role-based emails and filtering records older than 24 months compounds the effect. In most cases, a poor match rate reflects recoverable data quality issues rather than a fundamental coverage gap, and the audit process above is designed to confirm which situation you're in.

How to Audit Match Rates and Improve Data Quality

Your match rate is one of the most telling numbers in your entire data stack, and most marketing teams never audit it. They accept the percentage their identity vendor reports, assume it's roughly correct, and move on. The problem is that a low match rate silently degrades every downstream campaign: smaller audiences, weaker lookalikes, gaps in suppression, and attribution that misses whole swaths of customer activity.

This guide walks through what match rate actually measures, what good looks like across different use cases, the most common root causes of low rates, and a step-by-step audit process you can run today.

What Match Rate Actually Means

Match rate is the percentage of records in your input file that can be resolved to a known identity in the graph. If you upload 100,000 customer emails and the identity vendor matches 62,000, your match rate is 62%.

But that single number hides important detail. Match rate varies significantly by identifier type, by data vintage, and by the demographic composition of your file. A 62% overall rate might mean 85% on email-to-device matches and 40% on postal-to-phone matches, two very different problems with two very different solutions. Auditing means pulling apart the aggregate and understanding what's happening at each layer.

Equally important is match quality, not just match quantity. A match that links the wrong person is worse than no match at all, it generates false positives in suppression lists, pollutes lookalike seeds, and can create compliance exposure. That's why BIGDBM's Scored Identity Resolution attaches a confidence index (0–100) to every match, so you can apply threshold-based filtering depending on the stakes of the use case.

What Good Match Rates Look Like

Benchmarks vary by use case, but here are reasonable targets for well-maintained first-party data against a high-quality identity graph:

• Email to device ID: 55–75% is strong; below 40% warrants investigation
• Postal address to phone: 50–70%; lower for older or rural records
• Hashed email (SHA-256) to RampID or UID 2.0: 60–80% on a clean, engaged list
• CRM file onboarding to paid media platforms: 40–60% is typical; above 65% is excellent

If you're running significantly below these benchmarks, the gap isn't usually the identity graph, it's almost always a data hygiene or data vintage issue on your side. The audit process below is designed to isolate exactly which factor is dragging your numbers down.

Common Causes of Low Match Rates

Stale data. Email addresses and phone numbers churn at roughly 25–30% per year for consumer lists. A file that hasn't been refreshed in 18 months will show materially worse match rates than one cleansed last quarter. The identifier is valid, it was once correct, but the graph can no longer find a current record attached to it.

Formatting inconsistencies. Emails submitted in mixed case, phone numbers with inconsistent country code prefixes, postal addresses without standardized abbreviations, all of these create matching failures that have nothing to do with actual coverage gaps. They're parsing failures, not identity failures.

Role accounts and shared inboxes. B2C files often contain a higher proportion of role-based emails (info@, admin@, noreply@) than you'd expect, especially if your data collection happens at checkout where people sometimes provide a shared business address. These will never match to a consumer identity profile.

Thin demographic segments. Match rates are lower for certain populations, younger consumers who are more protective of their contact information, rural households with fewer digital touchpoints, and very high-income segments that have opted out of more data collection channels. If your customer file is concentrated in these demographics, your aggregate rate will reflect that composition.

Wrong identifier type for the use case. Trying to match a list of MD5-hashed emails to a graph that's indexed on SHA-256 will produce near-zero results. Confirm the exact hash format your graph accepts before diagnosing a coverage problem.

A Step-by-Step Match Rate Audit

Step 1: Segment your file before submission. Don't submit your entire CRM as a single batch. Break it into cohorts by data source (web opt-in vs. purchase vs. loyalty), by age of record (acquired in the last 6 months vs. 6–24 months vs. older), and by identifier type (email only, phone only, email + postal). Submit each cohort separately so you get match rates by segment, not just an aggregate.

Step 2: Standardize formatting before submission. Run your file through a basic normalization pass: lowercase all emails, strip whitespace, standardize phone format to 10-digit US with no country code, apply USPS CASS address certification if you have postal data. This step alone typically improves match rates by 3–8 percentage points.

Step 3: Remove known-invalid records. Filter out role-based emails, unsubscribed records with high bounce history, and any records flagged as fraudulent in your own systems. These records inflate your denominator without contributing to the numerator.

Step 4: Review the confidence score distribution, not just the match count. When using a scored identity system like BIGDBM's, look at what percentage of your matches fall above 80, between 60–80, and below 60. A file where 70% of matches cluster below 60 is effectively low-quality, even if the headline match rate looks adequate.

Step 5: Compare against a control set. Take a subset of records where you have ground truth, customers who have transacted with you multiple times across channels and whose identity you can verify, and measure match rate on that subset specifically. If your known-good segment matches poorly, the issue is likely on the graph side. If your known-good segment matches well but the broader file doesn't, the issue is data quality in the broader file.

When Scored Resolution Changes the Equation

The traditional approach to low match rates is to cast a wider net, lower the threshold, accept more probabilistic matches, and accept the precision tradeoff. But that approach conflates quantity with usability. A high-volume, low-confidence match set will perform worse on suppression (you'll message people you should have excluded) and worse on lookalikes (you're seeding the model with questionable records).

Scored identity resolution inverts this logic. Instead of a binary accept/reject threshold, you tune the confidence floor per use case. For audience suppression, where a false positive means contacting someone who opted out, you want only high-confidence matches (85+). For broad prospecting, you can open the threshold to 65+ to maximize reach without sacrificing the ability to audit. BIGDBM's Scored Identity Resolution is built exactly for this kind of threshold-based workflow, giving your team full visibility into confidence distribution so you can make informed tradeoffs rather than accepting a black-box match rate number.

The bottom line: match rates are diagnostic, not fixed. Most low-rate situations are recoverable through data hygiene, formatting standardization, and smarter segmentation, before you ever need to evaluate a new identity vendor.

What Match Rate Actually Means

What Good Match Rates Look Like

Common Causes of Low Match Rates

A Step-by-Step Match Rate Audit

When Scored Resolution Changes the Equation

Share this article

Related Articles

Identity Graph vs. CDP: What's the Difference and Which Do You Need?

CCPA Compliance Checklist for Marketers: What You Need to Know in 2026

The Future of the Identity Graph

Stay Updated