800+ Million Emails Leaked Online by Email Verification Service

On February 25th, 2019, I discovered a non-password protected 150GB-sized MongoDB instance. This is perhaps the biggest and most comprehensive email database I have ever reported. Upon verification I was shocked at the massive number of emails that were publicly accessible for anyone with an internet connection. Some of data was much more detailed than just the email address and included personally identifiable information (PII).

This database contained four separate collections of data and combined was an astounding 808,539,939 records. The largest part of it was named ‘mailEmailDatabase’ – and inside it contained three folders:

Emailrecords (count: 798,171,891 records)
emailWithPhone (count: 4,150,600 records)
businessLeads (count: 6,217,358 records)

‘Emailrecords’ was structured to include zip / phone / address / gender / email / user IP / DOB:

As part of the verification process I cross-checked a random selection of records with Troy Hunt’s HaveIBeenPwned database. Based on the results, I came to conclusion that this is not just another ‘Collection’ of previously leaked sources but a completely unique set of data. Although, not all records contained the detailed profile information about the email owner, a large amount of records were very detailed. We are still talking about millions of records.

I started to analyze the content in an attempt to identify the owner and responsibly disclose it – even despite the fact that this started to look very much like a spam organization dataset.

In addition to the email databases this unprotected Mongo instance it also uncovered details on the possible owner of the database – a company named ‘Verifications.io’ – which offered the services of ‘Enterprise Email Validation’. Unfortunately, it appears that once emails were uploaded for verification they were also stored in plain text. Once I reported my discovery to Verifications.io the site was taken offline and is currently down at the time of this publication. Here is the archived version

At this point I teamed up with Vinny Troia, owner of NightLion Security with whom I worked on other projects previously and who had a similar experience with finding the Exactis database . I also sent a data breach notification email to the company’s support (yes, I decided it is a right thing to do).

After researching more about Verifications.io online and comparing the information that was publicly available in the database we have come to the following conclusions.

How this all works:

Someone uploads a list of email addresses that they want to validate.
Verifications.io has a list of mail servers and internal email accounts that they use to “validate” an email address.
They do this by literally sending the people an email. If it does not bounce, the email is validated.
If it bounces, they put it in a bounce list so they can easily validate later on.

Here is the scenario:

“Mr. Threat Actor” has a list of 1000 companies that he wants to hack into. He has a bunch of potential users and passwords, but has no idea which ones are real. He could try to log in to a service or system using ALL of those accounts, but that type of brute force attack is very noisy and would likely be identified. Instead, he uploads all of his potential email addresses to a service like verifications.io. The email verification service then sends tens of thousands of emails to validate these users (some real, some not). Each one of the users on the list gets their own spam message saying “hi”. Then the threat actor gets a cleaned, verified, and valid list of users at these companies. Now he knows who works there and who does not, and he can start a more focused phishing or brute forcing campaign.

How do I know this?

The database(s) included email accounts they use for sending mail as well as hundreds of SMTP servers, email, spam traps, keywords to avoid, IP addresses to blacklist, and more. This is why I initially thought they were potentially engaged in spam related activities. It turns out that technically they actually are sending unwanted and unsolicited emails. This is the worst kind of spam because they send millions of completely worthless “hello” emails that no one can understand.

As I mentioned they did act fast and the database was taken down the same day I sent notification email to the company’s support. Ironically, they did reply to my notification. In the response they identified that what I had discovered was public data and not client data, so why close the database and take the site offline if it indeed was “public”? In addition to the email profiles this database also had access details and a user list of (130 records), with names and credentials to access FTP server to upload / download email lists (hosted on the same IP with MongoDB). We can only speculate that this was not meant to be public data.

You can read more details about this discovery that was featured in: Wired – Click Here

About author and security researcher:

Bob Diachenko has over 12 years experience working in corporate/product/internal communications with a strong focus on infosecurity, IT and technology. In the past Bob has worked with top tier media, government agencies, and law enforcement to help secure exposed data. Follow Bob on Twitter and his blog on Linkedin, Email: [email protected]