If you’ve ever been to the IT security mega trade show, RSA Conference, you probably noticed the same thing. There is always a dominant theme. One year it is compliance. Another identity theft. This year, loud and clear, it was threat management. Some vendors, like Skybox Security and Core Security, showed that their products helped you predict and prepare for threats. Some, like Intel/McAfee, Kaspersky were all about detecting threats quickly as they are being exploited. Others, for example BalaBit and LogRhythm, help you to understand threats in context as they occur.
Threats here. Threats there. Threats everywhere.
The threat management vendors were having a heyday, but they weren’t alone. All the other vendors joined the fun. Entrust, an identity management vendor recently acquired by Datacard, told us that threats make us so vulnerable because our identity management is lacking. Firewall vendors CheckPoint and Palo Alto Networks reminded us that they were the original threat fighters. And RSA’s Archer product team said that threats are best managed with top flight governance, risk and compliance (GRC) software.
I wondered how an average security director could walk the aisles of the show floor, listen to the vendor pitches, and have any clue what products to buy. Two vendors may use nearly the same expressions to describe their wares, but sell products that are fundamentally different.
I see it from the vendor’s point of view. They want to get in on the spending spree happening around threat management these days. I hope they realize that as soon as they start using generic threat management language – as they all seemed to do – they increase their number of competitors to include every other vendor using the same language.
I liked the pitch I heard from Rick Gamache, CTO of Red Sky Alliance. His words stood out from the crowd with a fresh approach. Red Sky Alliance is just that, an alliance. In two years it has rapidly grown to over thirty member companies, including major banks, huge Internet retailers, an oil & gas company, and a smattering of others. In an alliance of high trust, members share with one another threats they are experiencing. Then the dedicated researchers at Red Sky Alliance and its associate, Wapack Labs, provide a deep analysis of the threat – the most thorough analysis of advanced persistent threats (APTs) available outside of government agencies, accompanied by actionable recommendations for mitigating all related threats. The members use that analysis to defend against any other attacks coming from the same Chinese, Russian or other sources.
Good luck to CISOs in their quest to manage threats. My advice is to listen with a critical ear and get customer references from their peers.
Each year since 2005, SecurityDreamer blogger and industry analyst, Steve Hunt, conducts surveys of end user security executives, tracking trends related to the business of security. We cover physical security and IT security equally at SecurityDreamer, carving our unique niche in the industry. Here is a taste of our findings. Sorry, the complete findings are not available except to Steve Hunt’s consulting clients and participants in the research.
I find that narratives yield more insight and are more accurate than statistics. Therefore, the SecurityDreamer approach is to conduct dozens of personal interviews, by phone, email or in person. Each interview covers a subset of topics. Data gathered is generally qualitative and anecdotal, rather than quantitative.
Consultants, Use of
Identity & Access Management
Operational Best Practices
Physical Information Protection
Strategy & Planning
Technology Lifecycle Management
Approximately 50 companies participated in the survey, representing 11 industries.
Summary Findings from the SecurityDreamer Research
While operational security budgets saw little growth across all industries, spending for new projects increased steadily in Energy, Finance, High-Tech and Entertainment. New IT security and physical security projects most notably included
- Security operations centers
- Virtual command centers
- Security information management systems (SIEM, PSIM)
- Networked cameras and sensors at high-risk facilities
CSOs and CISOs complained that their greatest business challenge is metrics: Normal operational metrics, such as improved response time to security incidents, or numbers of malicious code detections are not compelling to business leaders. Security executives seek better ways to calculate ROI, justify purchases, and measure the success of deployments.
Most Surprising finding of 2012
Collecting Company Wisdom. Far more companies in more industries are documenting processes than we’ve seen in previous surveys. Continual Improvement (a la Baldrige, Kaizen, Six Sigma, etc) appears to be the primary motivation. Security executives realize that much of the know how of security operations resides in the heads of its local security managers. In a hope to benefit from the sharing of this business intelligence, companies are using a variety of techniques (surveys, performance reviews, online forms) to gather it.
Least Aware of This Threat
Physical threats to information rose to the top of the list of issues about which CISOs and CSOs know the least. Every security executive we interviewed had an understanding of physical threats to information (unauthorized visitors, dumpster diving, etc) but almost none had studied or measured the risks associated with physical threats to information, nor did they have in place thorough procedures to protect against it.
Least Prepared for This Threat
Two related concepts represent the threat for which nearly all security executives feel least prepared to address: Social engineering and physical penetration. Every security executive confessed that confidential company information was as risk of social engineer attacks (phony phone conversations, pre-texting, impersonation, spear-phishing, etc.). Physical penetrations were even more frightening to some executives who were certain that their confidential company information could be collected and conveyed out of the building (in the form of printed documents, photos, memory sticks, etc) by
- an unauthorized visitor tailgating into the building
- an attacker bypassing security controls at doors and fences
- rogue employees or contractors
- an internal attacker of any type
Articulating the Value of Security…
It’s an uphill battle to convince the decision-makers in any business that they need to invest in security. Why? Because deep down, all professional businesspeople think security is an annoying layer of cost and inconvenience.
If you walk in and tell them, “We need more security,” they hear, “We need a more annoying layer of cost and inconvenience.”
Getting the buy-in for security products and services today means understanding what drives your company’s security purchase decisions—basically, what is going on in the mind of your bosses. Fear, uncertainty and doubt are not the cleverest tools to use anymore. The security industry is undergoing changes as it adjusts to the convergence of IT with physical security, and businesses are changing, too. Now businesses want something that sometimes seems like a foreign concept to the security profession: value. If you don’t adapt and start answering the questions your business is really interested in, you’ll never get the green light on new projects and upgrades.
Remember, nobody wants security; they want the benefits of security. That means that the housewife doesn’t want the finest deadbolt on the front door because of the excellence of its engineering or its impact resistance. She wants a comfortable, happy place to raise her family.
Businesses also want something other than security. If a bank manager has a mandate to reduce expenses related to bank tellers, she has a couple of options. She could fire all the tellers and lock up all the bank branches, but then the bank would have no interface with its customers. Or she could take all the money, put it in piles on the street corner under a clipboard that says, “Take what you want, but write it down so we may balance your account.” That wouldn’t work either, obviously.
The best solution for reducing teller expenses is to take the money, put in on the street corner locked in a box with a computer attached, and give customers a plastic card for authentication and auditing.
Security was never the point. The bank had a business objective and achieved it by using some security. That is how we all should think of security: as a way of helping our companies achieve the goals or value they seek.
Business managers, especially executives at the highest levels of an organization, have a very simple view of security: It is a tool in the corporate toolbox for enabling business. But they don’t even think of it as security.
The manager responsible for an online ecommerce business wants a few things. He wants to know who is using his Web site. He wants to ensure that each one can do everything on that site they need to do. He has a lot of people doing a lot of things, so he needs an easy way to manage it. And at the end of the day or the end of the quarter, he needs a report that tells him what has happened so he can improve customer satisfaction, reduce errors and increase profits.
In that example we have all four fundamental categories of security—authentication, authorization, administration and audit—but the manager doesn’t think of security once! That’s because security is not the point.
Focus on Value
I have suggested many times that, whenever possible, security professionals should purge the word “security” from their vocabulary. Instead, answer the questions inside your boss’s head, and don’t simply spout the ways security keeps bad things from happening.
Your upper management thinks in terms of money, not security. What people will be needed? What headcount can we reduce? How much will it cost? How much will we save? What new revenue can we earn as a result of this investment? And they think not in terms of security risks, but in terms of credit risk, market risks and operational risks. That’s where you can shine.
One U.S. company spent $35 million on physical security upgrades after 9-11, and $4 million on IT security upgrades. Last fall they failed their Sarbanes-Oxley audit because of poor security. How? Visitors were given a badge for the day, but they could still walk unescorted past cubicles with unattended computers logged into financial systems. At that moment the audit no longer had confidence in the integrity of the numbers. Anyone could have moved a decimal point or added a zero.
If you know your facilities need more security, tell your managers how it will help them measure or achieve compliance to regulations like Sarbanes-Oxley: You audit employee behavior, or lock up financial systems, or shred financial documents, or do background checks, or secure backup tapes. For any business problem, you should be prepared to help your management identify the ways that the authentication, authorization, administration or audit solutions you’re proposing will solve their problem, or help customers make the gains they hope for.
Remember, it is not our job to secure the building. Our job is to secure the business.
I am also the editor of the Neohapsis Labs blog. The following is reprinted with permission from
by J. Schumacher
Security professionals have been following the collective of Internet users calling themselves Anonymous for a few years now as they cause cyber mayhem to understand their tactics. There were two well written publications in recent weeks that caught my eye, The New York Times “In Attack on Vatican Web Site, a Glimpse of Hackers’ Tactics” and Imperva’s “Hacker Intelligence Summary Report, the Anatomy of an Anonymous Attack”. These articles shed light on how Anonymous takes a call to arms, recruits members, and searches for action. After reading these articles I kept thinking about current state of the Internet and wondering about the future of Anonymous’ with the cyber pandemonium it creates.
Taking the Imperva report as factual, the collective group of Anonymous has an approximate 10:1 ratio of laypeople to skilled hackers, which I believe limits the sophistication of attacks. I say “collective”, as targets for attacks are not often given from above, but must be approved or agreed upon by the masses before being launched. One very interesting note in Imperva’s report was that the attacks Imperva monitored in 2011 were not utilizing bots, malware or phishing techniques for exploit, but end users actively running tools or visiting special web sites to aid in the attack. There was a high level of public recruitment through social media of Twitter and Facebook, which can also act to inform the victim before the attack hits properly.
The New York Times article mentions that the attack on the Vatican took 18 days to gain enough recruitment and automated scanning tools were used for reconnaissance on the Vatican virtual front during this time. In this attack Anonymous was seeking to interrupt the International Youth Day by a certain date, but when that failed Anonymous changed tactics to widespread distribution of software for Distributed Denial of Service (DDoS) so they could to hit the Vatican with a thousand person attack. There were mixed statements from Anonymous and Imperva (who was a contractor for Internet security monitoring) regarding whether any sites across the globe were truly taken offline for any amount of time.
I think that Rob Rachwald, Imperva’s director of security, was quoted best by The New York Times article as saying “who is Anonymous? Anyone can use the Anonymous umbrella to hack anyone at anytime”. However, I believe Anonymous has currently reached their collective peak and will never be the same as in its early 4chan or even the 2008 days. However, by no means has the world heard the last of Anonymous, as people will be claiming affiliation to the collective “group” for a very long to come, and I believe it will also continue to evolve over time. How this change takes place is going to be exciting to see as Anonymous claims an “ideas without leaders” mentality and relies on general public for consensus of missions.
Recently, an interesting report from Symantec also came out about how Anonymous affiliates were tricked into installing the Zeus Trojan by a Pastebin tutorial covering how to install and use one of the attack tools, the Low Orbit Ion Cannon (LOIC), to support in DDoS attacks. Established Twitter handles for Anonymous contributors (YourAnonNews, AnonymousIRC, AnonOps) have tweeted that this was not done by Anonymous. But, with no leadership accountable (due to the collective nature of Anonymous), there is nothing to say whether this is a true, whether another entity is sabotaging Anonymous public fanfare, or if it was simply someone taking advantage of free publicity to trick users into installing malware. Since what many call the start of Anonymous in 2008 (Scientology attacks), there have not been any other large scale compromises of the those supporting attacks through infected tools, but this new activity could hurt the future of Anonymous recruitment and public support.
Depending on whether this recent instance of infected tools was a fluke, I see the future of Anonymous involving with skilled hackers increasing through a Wild West collaborative of honing their talents, while keeping the true base of Anonymous as largely unskilled hackers. The skilled will, at times, directly and indirectly work for entities (such as large scale crime syndicates as well as private entities) to whom they are lured by big pay for work that will never be reported in any news paper. The skilled hackers will still participate in Anonymous causes, and they will also enable other Anonymous members (through writing attack tools, scripts or apps), while also keeping knowledge of their well paid exploits limited to a smaller private offshoot group. These offshoots will put dedication into advanced exploits that require some financial backing to set up (such as servers for social engineering, injection data repository, proxies and bots) but these exploits will most likely never be communicated to the larger Anonymous collective or used for social causes of the masses but rather private gains.
At the same time though, the unskilled hackers, making up the majority of the group, are essential to Anonymous at large for bringing attention and support to causes, identifying weaknesses in networks, performing DDoS attacks and being a overall distraction and crowd to hide in. It seems bots will be unnecessary and replaced by humans where it is simpler. A large army that is not connected (outside of the odd one-off message to a public forums or social media) provides for a large pool that the authorities must sift through in finding the dedicated Anon. The collective group of Anonymous has showed support for many social causes, like the occupy movement and free speech outcries from proposed Internet legislation. At the same time Anonymous seems to have very publicly promoted every hack and breach that has been reported since 2010 whether the data exposed was government, private industry or public citizens.
I like to think of myself as a practical, but at times wishful, person. As I see it, the core ideology of the Anonymous’ movement is not going away, as their cause is not so much new as is the platform for their disobedience. There are some basic controls that organizations can implement to protect themselves from a virtual protest, whether the risk is from DDoS attacks or exploits of un-patched public devices. In the near term, I do not see a high probability of Anonymous becoming a super group of hackers that perform sophisticated attacks in the likes of Stuxnet. Nor do I see the possibility of a large scale take down of critical infrastructure. There will always be a risk and sometimes possible threats to critical infrastructure through technology but this risk can be largely mitigated through proper assessment and mitigating controls.
Side note –
If the recent instance of infected tools will continue on other causes then I believe we have seen the end of wide support for Anonymous. Distrust has always been a concern to involved members with very recent arrests across the globe for LulzSec. Anonymous will need to do internal damage control to prevent the collapse of the collective group and a public distrust in support for causes brought up by the Anons. Even if hacking group Anonymous goes in a different direct the damage has been done and Internet society can never reverse the damage physiologically from the last 5 years.
As writing this post there was news coming out that a prominent member of Anonymous, Sabu, along with 5 others have been arrested by the FBI. We will have more details once the dust settles a bit and all news sources can be processed, stay tuned.
The last time business managers were worked into a tizzy about “Mobile,” we called it Mobile Computing or Mobility and we talked about remote workers, laptop computers and USB memory sticks (thumb drives). Organizations routinely provisioned employees with both, and employees routinely wanted to use their personal laptops and USB sticks for business use. Therefore, companies had to deal with a mix of business and personal mobile computing devices. In those days Neohapsis recommended that its clients create strong “personal use” policies, promote awareness of the risks, and deploy technology measures to mitigate some those risks. Today’s mobile discussion is similar in theme but quite different in details.
Mobile no longer merely means mobile computing or mobile workforce. Its common use now includes social networking, mobile websites, mobile apps, new messaging and communication platforms, photos, crowd sourcing, and videos used for personal and business reasons on a vast range of technologies including home PCs, corporate workstations, laptops, smartphones and tablets on the business network, the home Internet connection and in the Cloud.
The technology eco-system of Mobile is vast. Phones, cameras, PDAs (like the iPod Touch©), portable storage devices (external storage, iPods, memory sticks) and tablets are all included, but so are computers, servers and entire data centers. The glue that connects all of these systems in the Mobile conversation is simply one thing: the Internet. Mobile touches nearly every aspect of your IT environment.
So what does a business manager need to know in order to approach Mobile with reasonable security? That is the subject of Neohapsis Labs’ newest paper: The Secure Mobile Enterprise. Download it here.
I am also the editor of the Neohapsis Labs blog. The following is reprinted with permission from
By Patrick Harbauer, Neohapsis Senior Security Consultant and PCI Technical Lead
There are several PCI DSS requirements that are related to tasks that must be performed on a regular basis. The frequency of these tasks varies from daily to annual. There are also a few requirements that make it important to have PCI DSS compliant data retention policies and procedures in place. An example of a requirement that calls for a task to be performed periodically is requirement 11.2.2: Perform quarterly external vulnerability scans via an Approved Scanning Vendor (ASV). An example of a requirement the calls for compliant data retention policies and procedures is requirement 9.4: Use a visitor log to maintain a physical audit trail of visitor activity. Retain this log for a minimum of three months, unless otherwise restricted by law. If processes or checklists are not in place to track your compliance with these reoccurring tasks, you may be in for an unpleasant surprise during your next annual ROC assessment.
Are You Certifiable?
11.2.2 is one of the classic requirements where we see this happen all too often. When we ask a customer if we can review the certified, passing ASV scans from the last four quarters and we get a response such as, “Oops, Susie was responsible for that and she was reassigned to a different department…” we stick our fingers in our ears and say “la la la la” but that hasn’t ever made the problem go away. Unfortunately, when this happens, instead of a 10 minute conversation reviewing 4 certified and passing ASV scans, we have to buy a few pizza’s, cross our fingers and review several external vulnerability scan reports in hopes that the customer can demonstrate they are scanning and remediating to meet the spirit and intent of requirement 11.2.2.
A Ruleset Only a Mother Could Love
We have seen some very ugly firewall rule sets. We do understand that the business must be able to function and exists to make as large a profit as possible – not to sing the praises of PCI. But as QSA’s, we do need to see six month firewall and router rule set reviews and evidence that the rule sets are being maintained with good hygiene. Maintaining clean and healthy firewall rule sets is similar to a good exercise regimen. If your doctor gives you a daily exercise program to maintain your health and you follow it in a haphazard fashion, your doctor is not going to be able to give you a good health report upon your next doctor’s visit. Similarly, you need a solid program in place to make sure that your firewall rule sets remain healthy and only allow the outbound and inbound network traffic that is actually needed and authorized. And let’s face it, automation is needed for most organizations to manage their firewall and router rule sets effectively. Fortunately there are several excellent solutions available on the market that give you the ability to manage your firewall and router rule sets. For example, these solutions can analyze your rule sets to find overlapping and redundant rules, rules that have not been used over that last X days or rules that allow “any” access – a big PCI no-no. They can also provide the change control mechanisms needed to make sure that changes to firewall rule sets are reviewed and approved by authorized individuals and are properly documented so that rule sets are closely and properly managed.
To assist you with making sure that your security program is giving proper attention to specific PCI requirements, we are providing the following two lists. These can be used to create a matrix, review your security operations and to correct any gaps that you may uncover. List 1 covers the frequency with which tasks must be performed related to specific PCI DSS requirements. List 2 shows data retention periods tied to specific requirements. With a little planning, you can keep your PCI compliance on track at all times and avoid unpleasant surprises when your friendly QSA shows up for your next ROC assessment!
List 1 – Recurring PCI Compliance Tasks
1.1.6 – Review firewall and router rule sets (Every 6 Months)
3.1.1 – Automatic or manual process for identifying and securely deleting stored cardholder data (Quarterly)
6.1 – All system components and software are protected from known vulnerabilities (Monthly)
6.6 – Address new threats and vulnerabilities for public-facing web applications (At least annually and after any changes)
8.5.5 – Remove/disable inactive user accounts (Quarterly)
9.5 – Review security of backup media storage location (Annually)
9.9.1 – Properly maintain inventory logs of all media and conduct media inventories (Annually)
10.6 – Review logs for all system components (Daily)
11.1 – Test for the presence of wireless access points and detect unauthorized wireless access points (Quarterly)
11.2.1 – Perform internal vulnerability scans (Quarterly)
11.2.2 – Perform external vulnerability scans via an Approved Scanning Vendor (Quarterly)
11.2.3 – Perform internal and external scans (After any significant change)
11.3 – Perform external and internal penetration testing (At least once a year and after any significant infrastructure or application upgrade or modification)
11.5 – Deploy file-integrity monitoring tools and perform critical file comparisons (Weekly)
12.1.2 – Perform and document a formal risk assessment (Annually)
12.1.3 – Review security policy and update when the environment changes (Annually)
12.2 – Develop daily operational security procedures (Daily)
12.6.1 – Educate personnel (Upon hire and at least annually)
12.6.2 – Require personnel to acknowledge that they have read and understand the security policy and procedures (Annually)
12.8.4 – Maintain a program to monitor service providers’ PCI DSS compliance status (Annually)
List 2 – Data Retention Periods
9.1.1 – Store video camera and/or controls mechanism log (3 months)
9.4 – Retain visitor logs (3 months)
10.7 – Retain audit trail history (1 year)
I am also the editor of the Neohapsis Labs blog. The following is reprinted with permission from
By Michael Pearce, Neohapsis & Neolabs
There has been a lot of concern and online chatter about iPhone/mobile applications and the private data that some send to various parties. Starting with the discovery of Path sending your entire address book to their servers, it has since also been revealed that other applications do the same thing. The other offenders include Facebook, Twitter, Instagram, Foursquare, Foodspotting, Yelp, and Gowalla. This corresponds nicely with some research I have been doing into device ID leakage on mobile devices, where I have seen the same leakages, excuses, and techniques applied and abused as those discussed around the address book leakages.
I have observed a few posts discussing the issues proposing solutions. These solutions range from requiring iOS to request permission for address book access (as it does for location) and advising developers to hash sensitive data that they send through and compare hashes server side.
The first idea is a very good one, I see few reasons a device geolocation is less sensitive than its address book. The second one as given by is only partial advice however, and if taken as it is given in Martin May’s post, or Matt Gemmel’s arguments; it will not solve the privacy problems on its own. This is because 1. anonymised data isn’t anonymous, and 2. no matter what hashing algorithm you use, if the input material is sufficiently constrained you can compute, or precompute all possible values.
Martin May’s two characteristics of a hash [link] :
- Identical inputs will yield the same hash
- It is virtually impossible to deduce the original input from a hash if a strong hashing algorithm is used.
This is because, of these two characteristics of a hash the privacy implications of first are not fully discussed, and the second is incorrect as stated.
Hashing will not solve the privacy concerns because:
- Hashing Data does not Guarantee Privacy (When the same data is input)
- Hashing Data does not Guarantee Secrecy (When the input values are constrained)
The reasons not discussed for this are centered on the fact that real world input is constrained, not infinite. Telephone numbers are an extreme case of this, as I will discuss later.
A quick primer on hashing
Hashing is a destructive, theoretically one-way process where some data is taken and put through an algorithm to produce some output that is a shadow of the input. Like a shadow, the same output is always produced by the same input, but not the other way around. (Same car, same shadow).
A very simple example of a hashing function is the modulus (or remainder). For instance the output from 3 mod 2 is the remainder when 3 is divided by 2, or 1. The percent sign is commonly used in programming languages to denote this operation, so similarly
1 % 3 is 1, 2 % 3 is 2 3 % 3 is 0 4 % 3 is 1 5 % 3 is 2 etc
If you take some input, you get the same output every time from the same hashing function. The reason the hashing process is one way is because it intentionally discards some data about the original. This results in what are called collisions, and we can see some in our earlier example using mod 3, 1 and 4 give the same hash, as do 2 and 5. The example given will cause collisions approximately one time in 1, however modern strong hashing functions are a great deal more complex than modulo 3. Even the “very broken” MD5 has collisions occur only one time in every 2^24 or 1 in ~17 000 000.
A key point is that, with a hashing algorithm for any output there are theoretically an infinite number of inputs that can give it and thus it is a one-way, irreversible, process.
A second key point is that any input gives the same output every time. So, by checking if the hashes of two items are the same you can be pretty sure they are from the same source material.
Cooking Some Phone Number Hash(es)
(All calculations are approximate, if I’m not out by two orders of magnitude then…)
Phone numbers conform to a rather well known format, or set of formats. A modern GPU can run about 20 million hashes per second (2*10^7), or 1.7 trillion (1.7 *10 11) per day. So, how does this fit with possible phone numbers?
A pretty standard phone number is made up of 1-3 digits for a country code, 3 local code, and 7 numbers, with perhaps 4 for the extension.
So, we have the following range of numbers:
0000000000000-0000 to 9999999999999-0000
Or, 10^13 possible numbers… About 60 days work to compute all possible values (and a LOT of storage space…)
If we now represent it in a few other forms that may occur to programmers…
+001 (234) 567-8910, 0012345678910, 001-234-5678910, 0012345678910(US), 001(234)5678910
We have maybe 10-20 times that, or several year’s calculations…
But, real world phone numbers don’t fill all possible values. For instance, take a US phone number. It is also made up of the country code, 3 for the local code , and 7 numbers, with perhaps 4 for the extension. But:
- The country code is known:
- The area code is only about 35% used since only 350 values are in use
- The 7 digit codes are not completely full (let’s guess 80%)
- Most numbers do not use extensions (let’s say 5% use them
Now, we only have 350 * (10 000 000 *.8) * 1.05 or 2.94 billion combinations (2.94*10^9). That is only a little over two minutes on a modern GPU. Even allowing for different representations of numbers you could store that in a few of gigabytes of RAM for instant lookup, or recalculate every time and take longer. This is what is called a time space tradeoff, the space of the memory or the time to recalculate.
Anyway, the two takeaways for our discussion here regarding privacy are:
1. Every unique output value probably corresponds to a unique input value, so this hashing anonymisation still has privacy concerns.
Since possible phone numbers are significantly fewer than the collision chance of even a broken hashing algorithm there is probably little chance of collisions.
2. Phone numbers can be reverse computed from raw hashes alone
Because of the known constraints of input values It is possible to either brute force reverse values, or to build a reasonable sized rainbow table on a modern system.
Hashing Does NOT Guarantee Privacy
Anonymising data by removing specific user identifying information but leaving in unique identifiers does not work to assuage privacy concerns. This is because often clues are in the data, or in linkages between the data. AOL learned this the hard way when they released “anonymised” search data.
Furthermore, the network effect can reveal a lot about you, how many people you connect to, and how many they connect to can be a powerful identifier of you. Not to mention predict a lot of things like your career area and salary point (since more connections tends to mean richer).
For a good discussion of some of the privacy issues related to hashes see Matt Gemmell’s post, Hashing for Privacy in social apps.
Mobile apps also often send the device hardware identifier (which cannot be changed or removed) to servers and advertising networks. And I have also observed the hash of this (or the WiFi MAC address) sent through. This hardly helps accomplish anything, as anyone who knows the device ID can hash it and look for that, and anyone who knows the hash can look for it, just as with the phone numbers. This hash is equally unique to my device, and unable to be changed.
Hashing Does not equal Secrecy
As discussed under “cooking some hash(es)” it is possible to work back from a hash to the input since we know some of the constraints operating upon phone numbers. Furthermore, even if we are not sure exactly how you are hashing data then we can simply put test data in and look for known hashes of it. If I know what 123456789 hashes to and I see it in the output, then I know how your app is hashing phone numbers.
The Full Solution to Privacy and Secrecy: Salt
Both of these issues can be greatly helped by increasing the complexity of the input into the hash function. This can both remove the tendency for anonymised data to carry identical identifiers across instances, and also reduce the chance of it becoming feasible to reverse-calculate all possible values. Unfortunately there is no perfect solution to this if user-matching functionality comes first.
The correct solution as it should be used to store passwords, entry specific salting (for example with bcrypt), is not feasible for a matching algorithm as it will only work for comparing hashed input to stored hashes, and it will not work for comparing stored hashes to stored hashes.
However, if you as a developer are determined to make a server side matching service for your users, then you need to apply a hybrid approach. This is not good practice for highly sensitive information, but it should retain the functionality needed for server side matching.
Your first privacy step is to make sure your hashes do not match those collected or used by anyone else, do this by adding some constant secret to them, a process called salting.
e.g., adding 9835476579080945368095468905486 to the start of every number before you hash
This will make all of your hashes different to those used by any other developer, but will still compare them properly. The same input will give the same output.
However, there is still a problem – If your secret salt is leaked or disclosed the reversing attacks outlined earlier become possible. To avoid this, increase the complexity of input by hashing more complex data. So, rather than just hashing the phone number, hash the name, email, and phone number together. This does introduce the problem of causing hashes to disagree if any part of the input differs by misspelling, typo’s etc…
The best way to protect your user’s data from disclosure, and your reputation from damage due to a privacy breach:
- Don’t collect or send sensitive user data or hashes in the first place – using the security principle of least privilege.
- Ask for access in a very obvious and unambiguous way – informed consent.
[Update] Added author byline and clarified some wording.