(CNN) A massive online database apparently containing the personal information of up to one billion Chinese citizens was left unsecured and publicly accessible for more than a year — until an anonymous user in a hacker forum offered to sell the data and brought it to wider attention last week.
The leak could be one of the biggest ever recorded in history, cybersecurity experts say, highlighting the risks of collecting and storing vast amounts of sensitive personal data online — especially in a country where authorities have broad and unchecked access to such data.
The vast trove of Chinese personal data had been publicly accessible via what appeared to be an unsecured backdoor link — a shortcut web address that offers unrestricted access to anyone with knowledge of it — since at least April 2021, according to LeakIX, a site that detects and indexes exposed databases online.
Access to the database, which did not require a password, was shut down after an anonymous user advertised the more than 23 terabytes (TB) of data for sale for 10 bitcoin — roughly $200,000 — in a post on a hacker forum last Thursday.
The user claimed the database was collated by the Shanghai police and contained sensitive information on one billion Chinese nationals, including their names, addresses, mobile numbers, national ID numbers, ages and birthplaces, as well as billions of records of phone calls made to police to report on civil disputes and crimes.
A sample of 750,000 data entries from the three main indexes of the database was included in the seller’s post. CNN verified the authenticity of more than two dozen entries from the sample provided by the seller, but was unable to access the original database.
The Shanghai government and police department did not respond to CNN’s repeated written requests for comment.
The seller also claimed the unsecured database had been hosted by Alibaba Cloud, a subsidiary of Chinese e-commerce giant Alibaba. When reached by CNN for comment on Monday, Alibaba said “we are looking into this” and would communicate any updates. On Wednesday, Alibaba said it declined to comment.
But experts CNN spoke with said it was the owner of the data who was at fault, not the company hosting it.
“As it stands today, I believe this would be the largest leak of public information yet — certainly in terms of the breadth of the impact in China, we’re talking about most of the population here,” said Troy Hunt, a Microsoft regional director based in Australia.
China is home to 1.4 billion people, which means the data breach could potentially affect more than 70% of the population.
“It’s a little bit of a case where the genie is not going to be able to go back in the bottle. Once the data is out there in the form it appears to be now, there’s no going back,” said Hunt.
It is unclear how many people have accessed or downloaded the database during the 14 months or more it was left publicly available online. Two Western cybersecurity experts who spoke to CNN were both aware of the existence of the database before it was thrust into the public spotlight last week, suggesting it could be easily discovered by people who knew where to look.
Vinny Troia, a cybersecurity researcher and founder of dark web intelligence firm Shadowbyte, said he first discovered the database “around January” while searching for open databases online.
“The site that I found it on is public, anybody (could) access it, all you have to do is register for an account,” Troia said. “Since it was opened in April 2021, any number of people could have downloaded the data,” he added.
Troia said he downloaded one of the main indexes of the database, which appears to contain information on nearly 970 million Chinese citizens. But it was difficult to judge whether the open access was an oversight from the owners of the database, or if it was an intentional shortcut intended to be shared among a small number of people, he said.
“Either they forgot about it, or they intentionally left it open because it’s easier for them to access,” he said, referring to the authorities responsible for the database. “I don’t know why they would. It sounds very careless.”
Unsecured personal data — exposed through leaks, breaches, or some form of incompetence — is an increasingly common problem faced by companies and governments around the world, and cybersecurity experts say it is not unusual to find databases that are left open to public access.
In 2018, Trioa discovered that a Florida-based marketing firm exposed close to 2 TB of data that appeared to include personal information on hundreds of millions of American adults on a publicly accessible server, according to Wired.
In 2019, Victor Gevers, a Dutch cybersecurity researcher, found an online database containing names, national ID numbers, birth dates and location data of more than 2.5 million people in China’s far-western region of Xinjiang, which was left unprotected for months by Chinese firm SenseNets Technology, according to Reuters.
But the latest data leak is particularly worrying, cybersecurity researchers say, not only because of its potentially unprecedented volume, but also the sensitive nature of the information contained.
A CNN analysis of the database sample found police records of cases spanning nearly two decades from 2001 to 2019. While the majority of the entries are civil disputes, there are also records of criminal cases ranging from fraud to rape.
In one case, a Shanghai resident was summoned by police in 2018 for using a virtual private network (VPN) to evade China’s firewall and access Twitter, allegedly retweeting “reactionary remarks involving the (Communist) Party, politics and leaders.”
In another record, a mother called the police in 2010, accusing her father-in-law of raping her 3-year-old daughter.
“There could be domestic violence, child abuse, all sorts of things in there, that to me is a lot more worrying,” said Hunt, the Microsoft regional director.
“Might this lead to extortion? We often see extortion of individuals after data leaks, examples where hackers can even try to ransom individuals.”
The Chinese government has recently stepped up efforts to improve protection of online user data privacy. Last year, the country passed its first Personal Information Protection Law, laying out ground rules on how personal data should be collected, used and stored. But experts have raised concerns that while the law can regulate technology companies, it could be challenging to enforce when applied to the Chinese state.
Bob Diachenko, a security researcher based in Ukraine, first came upon the database in April. In mid-June, his company detected that the database was attacked by an unknown malicious actor, who destroyed and copied the data and left a ransom note demanding 10 bitcoin for its recovery, Diachenko said.
It is not clear if this was the work of the same person who advertised the sale of the database information last week.
By July 1, the ransom note had disappeared, according to Diachenko, but only 7 gigabytes (GB) of data was available — instead of the 23 TB originally advertised.
Diachenko said it suggested the ransom had been resolved, but the database owners had continued to use the exposed database for storing, until it was shut down over the weekend.
“Maybe there was some junior developer who noticed it and tried to remove the notes before senior management noticed them,” he said.
Shanghai Police did not respond to CNN’s request for comments on the ransom note.
This story has been updated with additional developments Wednesday.
CNN’s Philip Wang contributed reporting.