A step-by-step guide to auditing and cleaning your HubSpot CRM: fix duplicate contacts, broken lifecycle stages, dead automations, and bad data without blowing everything up.
You set up HubSpot with good intentions. Contacts were imported, lifecycle stages were configured, a few automations were turned on. For a while it worked fine.
Then the team grew. A few CSVs were imported without a clear naming convention. Someone left and their deals stayed open. Marketing built workflows on top of lifecycle stages that sales had quietly stopped using. A new hire set up a duplicate pipeline because the existing one was too confusing to work with.
Now you have a CRM that technically has data in it but nobody fully trusts. Sales works around it. Marketing emails the wrong people. Reports show numbers that don't match what's actually happening in the business.
The tempting answer is to wipe everything and start from scratch. It's almost never the right one. The history in your CRM has real value - deal records, contact activity, customer data. What you need isn't a blank slate. You need a structured cleanup that removes the noise without destroying what's underneath it.
This guide walks you through exactly that, step by step.
TL;DR - Quick Answers for Skimmers
HubSpot is easy to get started with. That's one of its biggest selling points. But easy to start means easy to set up without thinking through what happens six months later - and that's where most CRM problems begin.
Here's a pattern we see constantly. A founder imports a CSV of 3,000 contacts from a conference. A marketer sets up lifecycle stages based on HubSpot's defaults without defining what each one means internally. An SDR creates deals manually and everyone does it slightly differently. Someone builds a lead nurture workflow and then leaves the company. A year later nobody's sure what's running, what's accurate, or what's safe to touch.
The instinct at that point is to wipe everything and start clean. That's almost always the wrong call. The data underneath the mess has value - deal history, contact activity, customer records. A reckless cleanup destroys it. A structured one preserves what matters and removes what doesn't.
You probably already know something's off. Here are the specific signals that tell you the problem is structural:
| Signal | What it usually means |
|---|---|
| Pipeline reports don't match what sales is seeing | Deals are sitting in wrong stages, lifecycle data is off |
| Marketing is emailing churned customers | Lifecycle stages or suppression lists are broken |
| Sales ignores HubSpot and tracks deals in a spreadsheet | They don't trust the data |
| Thousands of contacts with no owner or activity | Bad import hygiene, no clear ownership rules |
| Automations sending emails nobody intended | Workflows built on outdated logic, never reviewed |
| Attribution reports show nothing useful | Deal sources and contact origins not captured consistently |
If two or more of those are true, you're not dealing with surface-level hygiene. You're dealing with a structural problem that's actively costing you pipeline visibility.
The worst thing you can do is start deleting before you know what you have.
Checklist before you touch anything:
The documentation step feels like overhead. It isn't. You can't clean something against an undefined standard. Even a rough one-page doc covering what each lifecycle stage should mean will save you multiple arguments and a week of rework later.
Before fixing anything, map the problem. In HubSpot, use filters to pull the following segments and note the numbers.
Contact audit - pull these lists:
| Filter | What you're looking for |
|---|---|
| Lifecycle stage = none | Contacts that were never properly classified |
| Contact owner = none | Records nobody is responsible for |
| No email address | Unusable for any outreach |
| Last activity date > 12 months ago | Stale contacts clogging the database |
| Original source = Offline / Import | Batch imports with likely inconsistent data |
Do the same for Company records. Specifically look for:
You are not making decisions yet. You are sizing the problem. Once you have the numbers, the decision for each segment is one of three things: enrich and keep, move to a suppression list, or delete.
To give you a sense of scale: a SaaS company we worked with had 11,000 contacts in HubSpot. The audit revealed that 4,200 had no lifecycle stage, 1,800 had no email address, and 600 were duplicates. More than half the database needed attention before any campaign should have been running against it.
Duplicates do more damage than most people realise. A contact entered twice means split conversation history, double-counted activity in reports, and the same person potentially receiving two separate email sequences from your team.
How to find and merge duplicates in HubSpot:
Manual checks to run after the automated tool:
Two things to know before you start merging. The record you merge into keeps its HubSpot ID - if you are connected to Salesforce or another tool that references that ID, be deliberate about which record you keep. And merging cannot be undone, so if you are genuinely unsure whether two records are the same person, tag them for manual review rather than merging immediately.
Lifecycle stages are the backbone of your CRM. If they are wrong, everything built on top of them - reports, automations, lead scoring, sales handoff processes - is also wrong.
HubSpot's default stages are: Subscriber, Lead, Marketing Qualified Lead, Sales Qualified Lead, Opportunity, Customer, Evangelist, Other.
Most companies either use these defaults without defining what each one means for their specific business, or they customised them at some point in ways that no longer match how deals actually move. Both lead to the same problem: lifecycle data that nobody trusts.
Before you reassign anything, write a one-sentence definition for each stage you use:
| Lifecycle Stage | Example Definition |
|---|---|
| Lead | Any contact who has engaged with content or filled a form but hasn't been reviewed by sales |
| MQL | A contact who fits ICP criteria and has taken a high-intent action (demo request, pricing page visit, content download) |
| SQL | An MQL that an SDR has reviewed and confirmed as worth pursuing - a deal has been created |
| Opportunity | An SQL with an active deal in the pipeline and a scheduled next step |
| Customer | A contact associated with a closed-won deal |
Once you have definitions, pull a report of all contacts by lifecycle stage and look for obvious mismatches:
Fix these in batches using HubSpot's bulk edit. Select the filtered list, click Edit, update the lifecycle stage. Do not do this one record at a time - at any meaningful scale, that approach will take days.
Go to your deal pipeline and look at every deal with no activity in the last 30 days. For each one, you need a decision: still active, stalled and needs follow-up, or dead.
Deals sitting in a pipeline without activity are not harmless. A sales team that has 40 open deals when 25 of them are effectively dead is not managing a pipeline - they are managing a wishlist. And any forecast built on that data is fiction.
A simple triage framework:
| Last Activity | Action |
|---|---|
| 0 to 30 days | Leave open, no action needed |
| 30 to 60 days | Flag for rep to review and update close date |
| 60 to 90 days | Rep must provide a reason to keep open or close as lost |
| 90+ days with no reason | Close as lost, add a reason in the closed lost reason field |
Also review the pipeline stages themselves. If you have eight stages but deals only ever move through four of them, remove the ones nobody uses. A SaaS company we spoke to had a stage called "Legal Review" that hadn't had a deal in it for two years. It was creating confusion in reports and making the pipeline look more complex than it was.
Checklist for pipeline cleanup:
Automations are where CRM messes compound the fastest. A workflow built on bad data will execute on bad data at scale - and often silently, with no visible error to alert you.
Go to Automations > Workflows and sort by last updated date. Pull up every workflow that hasn't been reviewed in more than six months.
For each workflow, ask:
Common automation problems to look for specifically:
| Problem | What to do |
|---|---|
| Nurture sequence emailing customers | Add a suppression list: Lifecycle Stage = Customer |
| Workflow enrolling on a lifecycle stage you just redefined | Update the trigger to match the new definition |
| Internal notification going to a former employee's email | Update to current owner or a team inbox |
| Two workflows with overlapping triggers | Consolidate into one or add suppression logic to prevent double enrolment |
| Workflow with no unenrolment criteria | Add an exit condition so contacts don't stay in indefinitely |
Do not delete workflows immediately. Turn them off first, leave them inactive for 30 days, and confirm nothing breaks before you archive them.
Inconsistent data entry is one of the most persistent sources of CRM mess because it compounds quietly over time. Nobody notices until the data is so inconsistent that filters stop working and reports become meaningless.
Common examples: "United Kingdom", "UK", "England", and "GB" all sitting in the same Country field. Job titles entered as free text with 40 variations of "Head of Marketing". Deal sizes entered as exact numbers by some reps and vague ranges by others.
Checklist for property standardisation:
For a B2B SaaS company, the five properties worth getting right above everything else are: Industry, Company Size, Job Title (or Job Function as a dropdown), Lead Source, and Lifecycle Stage. If those five are clean and consistent, your segmentation and reporting become dramatically more useful overnight.
Once you have cleaned the data you are keeping, deal with what you are not.
Suppression list - who belongs here:
A suppression list keeps records in the database for reference without putting them in active contact pools. This matters for reporting - you may want to look back at a churned customer's history - but they should not be receiving nurture sequences.
Who to actually delete:
For company records:
A cleanup is wasted if the same mess rebuilds in the next six months. Three things prevent that.
Monthly hygiene review (should take under an hour):
Document decisions, not just outputs. What do lifecycle stages mean? What are the rules for closing deals as lost? What properties are required on a new contact? Put this somewhere the whole team can access. Decisions that live only in someone's head don't survive team changes - and every time someone new joins and has to figure it out themselves, the CRM drifts a little further from the standard.
Gate your entries. Use forms and integrations rather than manual imports wherever possible. Every manual CSV import is a potential data quality problem. Where manual imports are unavoidable, create an import template with required fields and a standard format for the most important properties.
A CRM cleanup is manageable internally if the database is under 20,000 contacts, the workflow library is limited, and someone on the team has the time and HubSpot depth to work through it systematically.
It is worth bringing in outside help when:
A well-run HubSpot cleanup for a mid-size B2B SaaS CRM takes four to six weeks. The output is a database your sales team trusts, automations that run on accurate logic, and reports that reflect what's actually happening in your pipeline. That last part - having data you can actually make decisions from - is what makes the work worth doing.
About MendMartech We work with lean B2B SaaS teams on GTM strategy, demand generation, positioning, and RevOps. HubSpot cleanup and rebuild is one of the most common starting points for our engagements. If your CRM is holding your pipeline visibility back, book a free 30-minute strategy call and we will tell you exactly where the gaps are.

Helps B2B Founders close the gap between present day MarTech and the GTM operations that haven't caught up yet