On January 10th, I identified an unprotected Elasticsearch cluster which contained 51 GB of what appeared to be OCR (Optical character recognition) credit and mortgages reports, with total number of records in the database more than 24 Million (24,349,524 to be exact).

BinaryEdge search engine screen capture with the database info

At the same time, each record did not represent a single report, but rather different parts of the documents. Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text and was structured as follows:

  • “text” (data from documents)
  • “documentName”
  • “pageLength”
  • “uri” (IP used to retrieve the data)
  • “clientID”
  • “apiVersion”
  • “createdEpoch”
  • “isStartPage” (true or false)

These documents contained highly sensitive data, such as social security numbers, names, phones, addresses, credit history, and other details which are usually part of a mortgage or credit report. This information would be a gold mine for cyber criminals who would have everything they need to steal identities, file false tax returns, get loans or credit cards.

It is hard to tell, how many people were actually affected in the breach. Given the sensitivity of data, I have immediately initiated a responsible disclosure protocol to privately alert the alleged owner of the Elasticsearch cluster.

For background: to discover data breaches, leakages, and vulnerabilities on the Internet, we use public search engines only, such as Shodan, Censys etc. When we find a public database (data that’s fully accessible to anyone without any restrictions) we collect several digital samples for further analysis. If these samples contain any kind of private and sensitive data, we employ a Responsible Disclosure model to privately communicate the findings with data owners (the company or organization that left the information publicly accessible) and help them implement specific security safeguards to protect their private data.Bob Diachenko

While researching the contents of the database I noticed that a massive amount of the documents referenced the CitiFinancial company. With this connection I sent a notification to Citi’s responsible disclosure emails on Jan 10. The following day I received additional request for technical details and got in touch with a Citi representative.

On Jan 15th, the instance has been taken offline and the data was secured. The representative from Citi was grateful for a responsible notification, but did not send any statement of clarification. In the message they sent to Zack Whitaker of Techcrunch (who assisted me in this investigation) reads as follows:

It appears the third party is a company that had purchased the loans and we have found no evidence that Citi’s systems were compromised

We teamed up with Zack to find out who was behind this data breach and analyzed history of IP that was part of database structure as “http://XX.XX.XXX.XXX:10013/api/documents/download/unique_ID“.

Read Zack’s take on this incident here: https://techcrunch.com/2019/01/23/financial-files/

RiskIQ Digital Footprint Snapshot showed a DNS resolution for that IP which lead us to a company named Ascension Data & Analytics, which specializes on variety of products and services for the financial industry, including document management with OCR, progressive property reports etc.

There were also references of Ascension Data & Analytics throughout the database as well, which left almost no doubts as of the ultimate owner of the exposed documents. In any case, we will update this publication if/when we hear back from Ascension.

The Danger of Leaving the Front Door Open

We have previously reported that the lack of authentication allowed the installation of malware or ransomware on the Elasticsearch servers. The public configuration allows the possibility of cybercriminals to manage the entire system with full administrative privileges. Once the malware is in place criminals could remotely access the server resources and even launch a code execution to steal or completely destroy any saved data the server contains.

Although companies acted fast to secure their data it is unclear how long it may have been publicly available or who else might have accessed the millions of records containing PII. Data privacy should always be a top priority, but companies need to be proactive when it comes to data protection.

About author and security researcher:

Bob Diachenko has over 12 years experience working in corporate/product/internal communications with a strong focus on infosecurity, IT and technology. In the past Bob has worked with top tier media, government agencies, and law enforcement to help secure exposed data. Follow Bob on Twitter and his blog on Linkedin, Email: bob@securitydiscovery.com