So common, in fact, that OpenAI published a blog post which suggested that the amount of computational resources dedicated to training machine learning models has been growing exponentially. Meanwhile, people continue to eulogize Moore’s Law and CPU clock speeds stagnate. While this is a barrier in the status quo, advances in alternative hardware such as graphics processing units (GPUs) and tensor processing units (TPUs) may very well fill the demand for increased computational power.
The worrying news: Advances in training data could dramatically reduce the resources needs of deep-faking
While training time remains a source of weakness in these types of systems, one of the major contributions from the Deep Video Portraits paper is in reducing the amount of training data required: “We construct the training corpus […] based on the tracked video frames of the target video sequence. Typically, two thousand video frames, i.e., about one minute of video footage, are sufficient to train our network.
At a length requirement of only one minute worth of training data, a lot of publicly available video footage — newscasts, vlogs, SnapChats, and more — becomes weaponizable.
As we discussed in our recent POV, advances in the use of synthetic data have significantly reduced the amount of real training data that needs to be collected. We expect advances in synthetic and adversarial data to continue to reduce the prohibitively large data requirements associated with modern machine learning tactics.
Digital forensics techniques struggle to keep up
With an understanding that these technologies will advance, many researchers are interested in how to detect the counterfeits produced by the latest tactics. Unfortunately, fraud detection is a war of attrition.
For example, in June 2018 a paper titled “In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking” was published revealing that the (then) latest in counterfeit video could be detected by carefully watching the eyes. At the time the state of the art used still photos as training data, so they rarely generated videos where the subject blinked realistically. In August 2018, the Deep Video Portraits paper was published — that system has a module just for capturing and falsifying realistic blinking.
A research group within DARPA called Media Forensics has been devoting a lot of attention to the detection of fraudulent media. Hany Farid, A member of Media Forensics, and professor of computer science at Dartmouth University published an article titled “Digital Forensics in A Post-truth Age,” (published May 2018). In it he also noted the unique challenges posed by the latest tactics, especially those that use Generative Adversarial Networks (e.g. the Deep Video Portraits paper):
All indications are that fake news is a serious threat to our society and democracy. We in the digital forensic community must continue to develop and refine techniques that will allow individuals, media outlets, and governments to quickly and accurately authenticate digital videos, images, and audios. This task has recently been made even more difficult by rapid advances in machine learning that have made it easier than ever to create sophisticated and compelling fakes. These technologies have removed many of the time and skill barriers previously required to create high-quality fakes. Not only can these automatic tools be used to create compelling fakes, they can be turned against our forensic techniques in the form of generative adversarial networks (GANs) that modify fake content to bypass forensic detection.
It’s good to know that deeply skilled people such as Farid are working on this problem. It’s also a little disheartening to hear experts lament the difficulty of detecting the latest frauds. Farid lays out several barriers that the academic community faces in the battle against digital frauds. Funding is, of course, a pressing concern for all types of research but one of the most interesting struggles Farid mentions is the tenuous balance of academic openness and the escalatory nature of fraud creation and detection:
In the field of forensics, there has always been some tension between the goal of scientific openness and ensuring that our techniques are not easily circumvented. […]. Without necessarily advocating this as a solution for everyone, […], I have held back publication of new techniques for a year or so. This approach allows me to always have a few analyses that our adversaries are not aware of.
Clearly, it’s difficult to know what the true state of the art is on either side of this battle. The fraudsters don’t want to show their latest wares until the proverbial, “moment of (fake) truth,” and the counter-fraudsters don’t want to advertise their detection capabilities to the fraudsters for similar reasons.
The field of digital forensics will likely see increased investment over the next few years. But because of the culture of secrecy, and the arms-race ethos of the field, it would be unwise to rely entirely on digital forensics in the fight against frauds.
Fingerprinting and chain of custody is critical for media organizations
Information forgery is not a uniquely modern problem. In Medieval Europe sealing wax and precious house seals were used to verify the authenticity of letters and other missives. The Heirloom Seal of the Realm served a similar purpose in ancient China. Stealing such seals, or creating counterfeit seals, was a path to sending fake messages that appear to come from the king, emperor, or lord to whom the true seal belonged. Such seals were incredibly valuable, and kept under lock and key.
The sealing wax concept moves the goalposts — instead of trying to show that a letter was fraudulent by examining the handwriting, it establishes a chain of custody. Digital tactics, similar in concept to sealing wax, have been used in computer networking for a long time; digital signatures, public key encryption, SSL certificates, and protocols like DNSSec all come at the problem of authenticity from this angle. Instead of verifying that the data in question is “real” video, we can verify that the data originated from a reliable source.
Mountains of work has already been done by software security experts to create systems of trust. Transport Layer Security (TLS), the protocol that powers secure HTTPS connections, is a prolific examples. In TLS trusted parties called Certificate Authorities evaluate and administer “certificates” which are used to verify that the website you’re viewing was served to you by the owner of the URL you typed in. These certificates rely heavily on digital fingerprinting, and public key encryption to verify the authenticity of the data. Another protocol, DNSSec, uses public key encryption to establish the authenticity of DNS records.
Companies like Keybase are trying to increase adoption of public key encryption by making it easier. Keybase helps users integrate encryption across multiple devices. The service also uses email addresses and social media accounts to help verify the identity associated with a public encryption key.
There is a growing interest in decentralized computing. Identity and source authentication have been a huge aspect of this growth. From InterPlanetary File System (IPFS) to cryptocurrency, the nature of decentralized systems requires them to use digital fingerprinting extensively. As we wrote in January, blockchain technology has huge potential for creating consensus driven data integrity through the use of hashing.
There are already examples of blockchain technology being used to tackle the same issues that DNSSec attempts to solve. Handshake, Blockchain DNS, and Namecoin are all examples of this tactic. Perhaps media companies like The New York Times will announce their own blockchains, or start signing all of their articles using public key encryption.
Bottom Line: governments, media organizations, and any entity in the public sphere need to start signing and watermarking the digital information they create
The creation of counterfeit and fraudulent information is powerful, so we can be sure that powerful organizations will continue to explore this technology. Don’t get caught off guard, start building a chain of custody strategy today.