How to clean up your data before importing into the Hubspot CRM

For better or worse, your CRM system is only as good as the data you put into it, so it’s essential to clean up your data before you upload it into the system.

Inconsistent data means problems for your business. Bad data costs companies around the world $3 trillion per year. Some studies have shown that bad data could potentially cost companies as much as 10–25% of their revenue.

A fragmented import will lead to a CRM that is disorganized and difficult to use, but a clean CRM will ensure that none of your contacts slip through the cracks. We'll walk through the steps you need to take to clean and format your data for a flawless import. 

Components of a CRM

In any CRM, data is stored on objects. An object is, quite simply, a type of record that stores a certain category of information. On a given object, data is housed in properties, which provide specific characteristics about their containing object. HubSpot CRM is centered around four standard objects: Contacts, Companies, Deals, and Tickets

Contacts: These are the individuals you are going to interact with. The ones you’ll be calling and send emails to in order to start, continue, or maintain a business a relationship. A Contact record will store information like First Name, Last Name, Email, and Phone Number. It will also include a history of when and who last contacted them. Each contact can only be associated to one company, but can be associated with multiple Deals and/or Tasks.

Companies: These are the businesses where your Contacts work. You may have more than one contact that you engage with who work at the same Company. A Company record will store information like Name, Domain, Industry, and Phone Number. This is also where you’ll store information like what city the business is located in, how many employees they have, and revenue information. Companies can have multiple Contacts, Deals, and/or Tasks associated.

Deals: Deals are used to manage your sales process and track the revenue associated with a potential sale. Deals will move through stages, starting at the beginning of the sales process and ending with a sale (Closed Won) or not (Closed Lost). The Deal record will store the amount of the potential sale, when the sale is expected to close, and who is managing the sale. Each Deal can only be associated with one Company, but can be associated with multiple Contacts.

Tickets: Tickets are the service interactions you have with your customers. Tickets (like deals) will move through stages, from "created" to "closed." Tickets will store information like source, time to first agent reply, and time to close. Tickets can be associated with any Contact, Company, and/or Deal. They can be associated to just one object, or any combination.


Now that you're comfortable with CRM terminology, we can move on to cleaning up your data. The following steps will ensure that your data is clean, free of inconsistencies, and ready for importing. 

#1: Fix Formatting and Case Issues

Your CRM will be most useful if you keep the values in a property all in one format. No matter how you decide to divide up your properties, the values in a single category should be uniform in format across your contact records so that you can easily filter and search your database. Here are a few common areas that will need to be cleaned up: 

Names: Names are especially tricky to manage. A standard way to organize names is to have separate properties for first and last names, but some organizations may need additional name categories on record like title or informal address for mailings or nametags. Proper case for names is also important. Would you rather receive an email and be addressed as “Bob” or “bob"? The latter comes across as unprofessional and is a clear indicator of automation, which will hurt conversion rates and your reputation.  

Phone Numbers: There are many ways to format a phone number:

  • 555-555-5555
  • (555)-555-5555
  • 5555555555

Are the phone numbers 10 digits or 11 digits? Do they have a “1-” in front of the number? Even then, phone numbers might be separated by office, mobile, and home phone numbers. The right formatting makes a huge difference, as it will ensure that the phone numbers are compatible with any systems that use them and will make things easier for your teams that routinely pull numbers to contact customers and prospects. 

Mailing addresses: Ensuring your address formatting is correct is critical. If you’re going to be mailing important information to prospects, customers, or employees, make sure that you don’t waste your budget and time mailing to an improperly formatted address.

All the values in a single property should be uniform in format across your contact records. For example, you should make sure that all of your contacts’ addresses include their home state in a single format – either spelled out (Minnesota) or abbreviated (MN) – so that it will be easier to pull and sort your mailing lists. 

Email addresses: Being able to reach your customers through email is critical, but email formatting problems are quite common in most datasets. You may find that emails don’t have the proper “” formatting. There may be spaces. Someone may have submitted an email address as “name -at-” or something similar. There may be typos or extra whitespace. If your emails aren’t formatted correctly, you’ll find a larger percentage of your emails bouncing or experiencing delivery issues, which hurts your company’s reputation among email providers.

#2: Remove Whitespace and Unwanted Characters

Removing whitespace and unwanted characters from your datasets is critical to appropriately search and filter the data. Both are common problems but can have a seriously negative impact on your ability to use your data correctly. 

Whitespace: An extra space in a data field is a very common issue. Maybe the user accidentally hit the space bar after entering their data. They could have hit the space bar before entering their data as well, placing the whitespace at the beginning of the entry.

Another common issue comes from a user hitting the space bar twice between words when they only meant to hit it once. Whitespace can cause formatting and usability issues in some situations but can be hard to spot without the help of a tool or macro.

Unwanted Characters: Fixing these characters is often low on the priority list, but they can have a big impact on the usability of the data within the set. They might look something like this: Ã, ¢, â, ê. In most cases, these characters aren't manually entered but are caused by encoding issues that arise when you save, import, or export data. If you’re removing these characters by hand, they should be relatively easy to spot.

#3: Consolidate and Standardize to Improve Filtering

 Consolidating and standardizing similar fields makes your data more searchable and more useful to your teams.

Job titles: Job titles are perhaps the most common field for standardization issues. There are a few reasons for this. First, there are a lot of acronyms to describe different job titles—“Chief Executive Officer” and “CEO” hold the same position but wouldn’t be featured in the same list if you filtered your data by job title. Consolidating and standardizing similar job titles will make things easier for your marketing, sales, and customer service teams that attempt to engage with prospects and customers.  

Industry: Industry is another common data field where standardization issues arise. Competing companies might separately describe themselves as being in the “tech,” “software,” or “SaaS” industries. Determining how you would like to categorize different companies in your dataset is critical for delivering relevant marketing and sales materials. 

Company Associations: Over time, it’s common for HubSpot users to find that they have a lot of disconnected contacts and companies within their data. Regularly working to ensure the proper associations are in place will help your marketing, sales, and service teams better serve your customers. Failing to do so can make it difficult to find the right person to contact within the company and hamper your personalization efforts.

#4: Remove Extraneous Contacts

Companies that store a lot of data and collect that data over long periods of time will find that a percentage of that data will age-out or become less useful. Keeping your database clean and effective will help you lower costs on data storage and marketing campaigns and save your employees a great deal of time in the long run.

Remove bounces: You don’t want to continually try to reach out to someone who is never going to reply. Worse, high bounce rates hurt your standing with email providers and reduce the deliverability of your email marketing materials. The same can be said for disconnected or incorrect phone numbers — reaching out and never receiving a reply is a drain on your resources and budget. 

Remove unsubscribes: Under the CAN-SPAM act of 2003, it's illegal for companies to continue to send marketing materials to prospects that have unsubscribed from their mailing list. It’s easy to see how this might become a problem when companies have their data separated into several different platforms. A quick import of outdated data that includes unsubscribed individuals can quickly result in some pretty severe violations of the CAN-SPAM act. You should always ensure that the dataset you're using to send out your marketing communications uses your most updated subscriber information. 

Remove low engagement: If you have continually delivered new materials to a contact over a long period of time and they've failed to engage with those materials, there isn’t much point in keeping their data on hand and taking up space. You don’t want to continue sending emails to someone that subscribed to your mailing list seven years ago and hasn't engaged with a new email in the last four years. Your emails are likely hitting their “Spam” folder anyway.

While the timing of when it's appropriate to remove disengaged contacts is a policy that should be decided within your company, having a cutoff for disengagement is essential to ensuring your databases don’t become bloated with outdated information or uninterested contacts. 

Remove duplicates: Whenever you import data into HubSpot or transfer data between platforms, there's a high probability that at least some data duplication will occur. Experts have found that duplication rates between 10%–30% are not uncommon for companies without data quality initiatives in place. 

When your teams are updating two different entries for the same record, you quickly lose sight of which entry is the correct one to reference, making it difficult to maintain a single customer view.

Imagine a sales rep going to contact a prospect, only to find two different entries for the same prospect, each containing contradictory data. Now that sales rep has to spend their time sorting things out. This same scenario happens over and over again at companies with data duplication issues and can be a huge time-sink.

Fortunately, HubSpot has some automatic deduplication functions.

  • When you add Contacts to HubSpot CRM, the system will deduplicate by email address.
  • When you add Companies to HubSpot, the system will deduplicate by Company Domain Name.
  • When you add new Deals or Tickets, there is no deduplication.

While HubSpot does do some deduplication, it won't for example, deduplicate a contact record that is listed under both their personal and work emails, so is best practice to upload as clean of a list as possible. 


In order to import contacts, companies, deals, tickets, or products into HubSpot, you'll need to have your import data stored in a file on your computer. The file must be formatted in a specific way before you can import it into HubSpot.

Keep the following in mind when setting up your import file:

  • Include a header row in your file and match each column's header with a property. You can include values for any of the default HubSpot properties or custom properties.


  • You can include most properties in your spreadsheet when importing a file into HubSpot. These are recommended and required fields for each object listed below:

    • Contacts: first name, last name, email (required for deduplication)
    • Companies: company domain name (required for deduplication)
    • Deals: deal name, pipeline, deal stage
    • Tickets: ticket name, ticket status
    • Products: name
    • Notes: activity date, note body
  • If your spreadsheet has a column header that does not correspond to an existing property in HubSpot, you will be prompted to create a custom property for it. 



      If you upload a CSV with new information for existing objects, any existing information will be overwritten by new values you've imported. HubSpot checks for an existing object using:
    • If you have the Automatically create and associate companies with contacts setting enabled, contacts will be automatically associated with company records after the import by matching the email address domain of the contact to the Company Domain Name on the company record.
    • If you don't want to overwrite an existing value for a property, you can either include the current value in the relevant column or leave the cell blank. HubSpot will not overwrite a property value unless there is a new value present in the file.

    Example import files

    During the import process, you'll have the option to map the columns of your file to properties in HubSpot. Below are examples of what your files should look like: