« We’re back | Main | Optimizing Website integration with Amazon’s S3 Service »
Passwords and Data Mining
By Matt | February 27, 2009
I believe the working assumption must be we’re under a persistent, long term attack by organized groups.
This is not just organized crime, but I suspect organized criminal groups that are tolerated by states like Russia and China if not outright state sponsored. It’s not just those old “boogey men” either — there are many nations from Poland to South Korea to Israel that have long histories of industrial espionage, and we must assume at the very least Western European and the descendents of the British Empire (U.S.A. included) at the very least collect and preserve usernames and passwords found passing the internet unencrypted.
When you read in the news about security breach after security breach, you have to assume organizations out there are selling, buying, collecting, and storing in data warehouses usernames and passwords. If people sell credit card information and known good email addresses, this data has value too.
Passwords don’t even have to be in cleartext to have value in such a market. As computing power comes down the ability to attack them through methods such as rainbow tables become more and more viable — by my calculations, you could build a highly efficient rainbow table of all possible Windows passwords on Amazon’s Elastic Computing Cloud for about $60,000/month. While that takes funding and organization, we’re not exactly talking about needing the resources of SPECTRE to pull it off. How many other databases out there store passwords also without salts, like Windows, making them very susceptible to rainbow tables? Or how many databases, if breached, are likely to have the salt also revealed?
Most people use variations on the same password. So once you start to data mine 15+ years of data collected in security breaches you start to have a big collection associating usernames, email addresses, and passwords. This now gives you a fairly small number of passwords to try in focused attacks.
For an example, let’s consider this data set:
Jdoe <unknown email> password Jdoe <unknown email> passw0rd Jdoe <unknown email> password1 Jdoe Jdoe@aol.com passw0rd4 JohnDoe Jdoe@aol.com <unknown password>
Either using data mining or simply people as “mechanical turks” we can make a good guess is we see the username and email “JohnDoe Jdoe@aol.com” that his password will be something like password, passw0rd, password3. So given just the username and email, we can try setup a low intensity attack — have a botnet try each one password a day so it doesn’t raise alarm bells for bad login attempts. The “holy grail” for hacker would be to get into Jdoe@aol.com itself and now can get account resets sent to themselves…and given that set of data it’s a darn good chance he’s using some sort of “password” variant for it.
We know single factor authentication is fundamentally insecure. There is no doubt today it’s wholely inadequate for logging into systems you expect to trust. So what do we do until there is universal two factor authentication schemes in place?
First, divide your accounts into ones you care about and ones you don’t.
For sites like newspaper comments or your favorite web forum use a simple password because you don’t care who knows it. This is analagous to a simple padlock — while it doesn’t stop a determined attacker, it stops the casual attacks and that is good enough.
For finances, work computers, and the like…use a highly secure password.
How do you generate them in a way that is highly resistant to data mining efforts?
My criteria is you can re-create them at will from something you know (in your mind), and can even leave them sitting in plainview and still not be compromised.
How?
Let’s first think up a pattern. That can be a standard word or phrase you use in conjunction with how you identify each resource you need to access. Then run that compound phrase (base phrase + resource) through a hash generator
So in Linux, we can issue a series of commands like this:
$ echo mybasicpassword@www.yahoo.com | sha256sum 9a7c3ff19da0207cae4c4c7f820d38397f672a47500795c4f56d6b45fe578603 $ echo mybasicpassword@www.d90.us | sha256sum f4d0ccb1eb6b8e40472132cd44efc5b6b9bc976a4f951205e9e1bb96a12a1fda $ echo mybasicpassword@bankofamerica | sha256sum 857a0d7ed6b510f7b7ab615072446552291429ba3c7ca40fe91553520b2f56a3 $ unset HISTFILE
The unset HISTFILE removes the history of the commands you just typed so they’re not stored after you log off, revealing to a hacker your secret “mybasicpassword” as well as the secret way you identify the resources. What you pipe into the hash generator can change — maybe you have to reset a password quarterly and make it a habit of adding the month and year when you generated it. The only place that pattern should be is in your head, plus maybe a note that helps you remember when you last generated it.
Now simply write them down…
yahoo 9a7c 3ff1 9da0 207c ae4c 4c7f 820d 3839 7f67 2a47 5007 95c4 f56d 6b45 fe57 8603 d90 f4d0 ccb1 eb6b 8e40 4721 32cd 44ef c5b6 b9bc 976a 4f95 1205 e9e1 bb96 a12a 1fda bankofamerica 857a 0d7e d6b5 10f7 b7ab 6150 7244 6552 2914 29ba 3c7c a40f e915 5352 0b2f 56a3
Now also pick a pattern of what part of the hash to use. Maybe it’s the 64th, 62nd, 60th, 1st, 3rd, and 7th characters in that order, so for “bankofamerica” you’d use 36f877.
The nice part being is anywhere you are you can re-create the password at will, yet it’s secure from other people unless they’re intercepting unencrypted signals or torture it out of you.
Depending on how you make them, you may need to write a note to yourself — like the date you made the password.
Let’s take a slight variation on this theme for another example:
$ echo mybasicpassword@mybank_022709 | sha256sum 15218a3a5bed25963213e9b558f62d36dffc916dcc874ff307a37b26e62b6257
So in a secure place, like a TrueCrypt encrypted volume you write a note like:
mybank 022709
That really doesn’t reveal much at all, since you still now the algorithm (in this case ‘base password’@'resource’_'date’) in your head.
Now maybe the bank requires special letters and characters. So on your cheat sheet you write:
mybank 1521 8a3a 5bed 2596 3213 e9b5 58f6 2d36 dffc 916d cc87 4ff3 07a3 7b26 e62b 6257 +A-
Using the same choice in characters I stated above, you look at that and realize you’ve set your password to be 72b123A- .
If you’re using a system you trust you can use a tool like Password Safe to keep your website passwords without having to type each one in each time.
Of course you *should* be using two factor authentication whenever you can. For the times you can’t, I believe the system I laid out here is almost as strong — and most importantly prevents the breach of one or any combination of resources from exposing many other resources where you have an account.
Topics: General Security, Linux | No Comments »
Comments
You must be logged in to post a comment.