The Problem
Improvements in AI capabilities have made it harder to prove your identity online, or even prove that you’re a real person and not just a computer program (a ‘bot’). For example, websites have long-relied on CAPTCHA tests to keep bots out, but the bots are now increasingly able to pass these tests. The difficulty of the CAPTCHA tests has been in an arms race with the ability of the bots to solve them, but we need only look at the massive number of bots on websites such as Twitter to see that the bots are winning. Other methods for proving your humanity such as sharing a picture or video of yourself are also becoming less reliable, as AI is able to create increasingly convincing deepfaked media.
Now, websites want to keep bots out for a variety of reasons:
Bots can waste the website’s resources without actually spending any money, or clicking any ads.
Bots can take advantage of promotions that the website wants to limit to one per customer, for example.
Bots can hurt the quality of the data that the website collects about its users, or influence the results of votes on the website.
On social networks, bots often pretend to be real people to inflate follower counts, run scams, spread messaging, etc. These fake accounts will continue to appear more human-like over time, and so will become more harmful.
We need to develop and adopt new systems for proving online identity and humanness, because our current systems are already insufficient and the problems will only become worse.
Solutions
Proof of Humanity (PoH), also known as Proof of Personhood (PoP), is any evidence that a user online is a real human. Having passed a CAPTCHA test is a PoH, but this evidence becomes less convincing over time as more bots are being run that can pass these tests. I’ll now talk about other possibilities for PoH, categorized into four general categories. I will also be using the term ‘credential’ to refer to any piece of online proof or evidence.
Government Issued
Government issued credentials such as your passport, driver’s license, or social security number, are a very strong source of PoH. In fact, they don’t just provide evidence for your humanity but also of your specific identity. High-security websites such as for banking are legally required to confirm these sorts of credentials for you to make an account. However, these credentials are very sensitive and most people do not want to share them unless absolutely necessary. However, new advances in zero-knowledge proof technology, particularly zkTLS services like Reclaim Protocol, now allow you to create proofs that you own a particular credential (like a log-in to a government website) without sharing any more information about who you are.
Economic
If a user has spent money or anything else of value on a website, then this can provide PoH. This is because the business model of most entities that are running bots requires that each bot is very cheap to operate. For example, if a service is selling (fake) followers on Twitter for $0.01 each, and each of these followers follows about 100 accounts, then the service only earns about $1 per bot account it creates. Imagine if you had to provide evidence that you spent $10 on some e-commerce site in order to make a Twitter account. Then this business model would completely collapse. Again, zkTLS services can allow you to prove information about your account on one website (like amount of money spent on Amazon) to any other website, without giving away any other information.
The advantage of economic PoH is that it’s very easy to quantify these (more on that in the next section), and many people spend substantial sums of money online. The disadvantage is that it is not accessible for low-income people, and so any service requiring economic PoH would be introducing a barrier to them. That is why ideally economic PoH should be an option in addition to other PoH options.
Biometric
Biometric authentication uses information about a person’s physical body, such as their fingerprints, voice, eyes, or face. Biometrics are extremely useful for physical security checks like at border controls, because the entity that is doing the security checks is operating the machines that actually do the scans and can verify that they are working properly.
However, using biometrics online is more of a challenge. Imagine a web service that requires users to confirm their identity when logging in by reading out a given sentence, and then voice identification software confirms if they are the right person. Well, with services like ElevenLabs a clone can be easily made of someone’s voice using only a short sample of them speaking, and this clone can be made to say anything in their voice. So this won’t work.
Now imagine that a website requires you to show your face to your phone or computer camera, to be identified by face recognition software. This would be more complicated at the moment to pull off than the voice example, but you could hack your camera to output an image of a face that you have supplied, and the website won’t be able to know this is not a real image taken by the camera. The problem in this example is that trusted hardware is needed to do secure biometric authentication, meaning that the hardware doing the scan can be trusted to not have been tampered with. The same problem exists for eye or finger print scans as well. There has been work on tamper-proof hardware such as the Orb, which scans people’s irises and saves the information in a privacy-preserving way, and which has built-in mechanisms for detecting whether it has been tampered with.
However, much like CAPTCHA tests, mechanisms for tampering detection will always be in an arms-race with people trying to find new ways to bypass these mechanisms. But this technology is quite new, so we will have to wait and see how useful it becomes.
Social
Lastly, the connections that people have in social networks can provide PoH. If you follow someone on Twitter and they follow you back, and they have provided some strong PoH, then you are probably a real human too. This idea exploits the fact that real humans tend to follow other real humans.
You can visualize a social network with what’s called a social graph (pictured below), where each node (dot) represents a user and each edge (line) represents some relationship between a pair of users. For example, you could create a social graph of the users on Twitter or Instagram, where an edge is drawn between a pair of users if they follow each other. You will tend to get clusters of humans who follow each other, and clusters of bots who follow each other, with fewer edges between the humans and bots because humans don’t tend to follow bots very often.
Now if there was no additional source of PoH information, it could be impossible to differentiate between the cluster of humans and the cluster of bots in a social graph. But, if some PoH information can be used in addition to the social graph, a statistical model can be used to estimate the probability that each node is a real human or not. You don’t necessarily need to have any additional PoH information yourself, but if other users who are likely to be real humans follow you then you are likely to be human too.
Quantifying PoH by Cost of Forgery
I have used terms like ‘strong’ or ‘weak’ evidence of humanity, but how can this strength actually be quantified into something like a PoH score? One way to assign an actual dollar value to various PoH’s is with an important concept called cost of forgery.
The cost of forgery of a credential is the cost to fraudulently obtain the credential. Almost every credential has a price: even passports are traded in black markets. (The only actual figure I could find about this is from a 2002 New York Times Magazine article, which said that stolen blank Belgian or French passports were selling for $5000-$7500 each, which is $8500-$13000 today. If you know any more data about this, let me know!)
Online credentials can also be fraudulently obtained. For example, suppose you want to prove your humanity by showing how many people follow you on Twitter. The ‘real’ version of this credential would be if each of your followers were real people, who followed you because they want to see what you’re saying. At the time of this writing, you can search ‘Buy Twitter followers’ on Google and find a variety of services that will sell you followers for about $0.01 each. This puts an upper-bound on the cost of forgery of twitter followers: there may be cheaper ways to gain fake followers, but it definitely doesn’t cost more than $0.01 to gain a fake follower.
We can find similar information for other popular websites for which whole economies have developed around. For instance, you can buy Instagram followers for about $0.01 each, depending of the ‘quality’ of the followers purchased.
The cost of forgery for credentials like number of followers has to be periodically monitored, because the cost could change over time. These websites are constantly fighting against the wave of bots, and when a website is better able to keep them out the cost of forgery goes up, but when the bots are able to get through the cost goes down.
One interesting idea proposed by Upala is to estimate the cost of forgery of a credential by offering ‘bribes’ to the users to destroy their credential. This would be done through an ascending auction: starting at a price of zero, the price is slowly raised until someone agrees to destroy their credential for that price. This then determines a lower bound for the cost of forgery of that credential: if someone could have forged the credential for a cheaper price than the result of the auction, then they would have agreed to destroy their credential for less than the result of the auction. For example, suppose someone offered everyone in the US an opportunity to burn their own passport and receive $500. If nobody accepts this offer, then the cost to forge (or buy someone else’s) US passport must be at least $500.
Most of the time we will just have to estimate the cost of forgery for a given credential, and different people will believe different estimates. We may see a future where people have digital wallets that contain proofs of their online credentials, and when someone wants to create a new account on a website they show their proofs to the website. The website then decides if the cost of forgery of those credentials is high enough. Or, for even more privacy, the website could display their estimates for the cost of forgery of various credentials, and a user could just generate a ZK proof that their credentials have a high enough cost of forgery. If this sounds overly-complicated or far-fetched, the key take-away from this section is this: online credentials will be essential for proving humanity online, and cost of forgery is how the strength of these credentials can be quantified.