Salesforce 101: How To Clean URLs in Salesforce

This post was last updated on 03/25/21.

Overview

One common complaint we hear from new customers is how inconsistently websites and URLs are stored in their Salesforce system.

Reps will often include “www” or “https” as part of the address, or fail to clean the URL parameters when entering data. Messy data makes it more difficult to identify duplicates, which makes matching leads to accounts more difficult, and generally makes data governance a pain.

One of the first steps we take for all of our customers is adding a simple domain parsing field in Salesforce, on Accounts, which helps solve all of the these problems.

What makes up a URL?

URLs are made up of a protocol, domain, path, query string/parameters, and a fragment.

The unique element of a URL is the domain. This is the critical component to pull out of any given URL that represents a unique entity, because there is only one (for example, “kicksaw.com” or “salesforce.com”).

Url_Structure.jpg

Why parse domain from a website in Salesforce?

URLs are often copy/pasted from the web as reps conduct research. It’s not uncommon for three identical accounts to have three completely different URLs. Take these as an example:

  • https://www.kicksaw.com
  • https://kicksaw.com
  • https://kicksaw.com/?info=123

If you had three separate Kicksaw accounts with these domains, there would be no simple way of identifying them as duplicates. We’ve seen this problem crop up in dozens of SFDC instances, and because of that, we’ve developed a URL parser that stores the root domain of a given record. We’ve found that it makes data cleanup simples and improves data quality.

Why parse domain from an email address in Salesforce?

Luckily, email addresses are structured with only mailbox name and domain as the key elements. For example, every employee at Nike uses “nike.com” for their email address. This makes it easy to parse the domain from a given email.

email address structurev2-01.png

How do I parse a domain from a website field?

The formula below looks complex, but what it does is relatively simple. We’re essentially trying to solve for all of the edge cases in a given URL. The most simple example would be removing “http” or “https” from the beginning of the URL. We may also need to remove any “www,” and any URL parameters or fragments. Salesforce doesn’t offer this functionality out-of-the-box, so a formula such as this one is the simplest way to clean up your website data:

SUBSTITUTE(
IF(
FIND("/",
IF(
FIND("www.", Website) > 0,
IF(
FIND("//", Website) > 0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND("//www.", Website) + 5),
NULL
),
Website
),
IF(
FIND("//", Website) > 0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND("//", Website) + 1),
NULL),
Website
)
)
) > 0,
LEFT(
IF(
FIND("www.",Website)>0,
IF(
FIND("//",Website)>0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND("//www.",Website) + 5
),
NULL),
Website
),
IF(
FIND("//",Website)>0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND("//",Website) + 1
),
NULL
),
Website
)
),
FIND("/",
IF(
FIND("www.",Website)>0,
IF(
FIND("//",Website)>0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND("//www.", Website) + 5
),
NULL
),
Website),
IF(
FIND("//",Website) > 0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND("//",Website) + 1
),
NULL
),
Website
)
)
) -1
),
IF(
FIND("www.",Website)>0,
IF(
FIND("//",Website)>0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND("//www.",Website)+5),
NULL
),
Website
),
IF(
FIND("//",Website)>0,
SUBSTITUTE(
Website,
LEFT(
Website,
FIND(
"//",
Website) + 1
),
NULL
),
Website)
)
),
'www.',
''
)

How do I parse a domain from an email field?

Both of the formulas below can be dropped into a formula field in Salesforce to parse the domain from a given email. This is extremely helpful when auto-converting leads to the appropriate account, or when matching leads against existing contacts.

Option 1:

SUBSTITUTE(
Email,
LEFT(
Email,
FIND(
"@",
Email
)
),
NULL
)

Option 2:

RIGHT(
Email,
LEN(
Email
) -
FIND(
"@",
Email
)
)