One of the essential ideas driving Reality Defender is that ceaseless innovation is necessary to stay ahead of the ever-changing threat of deepfakes. This is why our world-class team of researchers prioritize innovating cutting-edge multi-pronged detection methods capable of responding to the novel generative AI models of tomorrow.
Because we believe that collaboration across the industry is key to fighting the malicious misuse of AI, we actively look for opportunities to connect and share our findings, as we did most recently at the San Francisco RSAC. Later this month, our team will attend the 2024 Computer Vision and Pattern Recognition Conference in Seattle, where we will unveil a critical aspect of our cutting-edge research and methodology in a paper titled “AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection.”
Presenting at CVPR on June 21st
The paper, presented on Friday, June 21st during Session 6 in the conference’s Exhibit Hall, examines Reality Defender’s cross-modal learning method that captures the correspondence between audio and visual modalities for improved detection. The model’s first stage pursues representation learning via self-supervision on real videos to capture the intrinsic audio-visual correspondences. The learned representations are tuned in the second stage, where deepfake classification is pursued via supervised learning on both real and fake videos. This model achieves a 98.6% accuracy, outperforming the current state-of-the-art datasets by 14.9%.1
CVPR has accepted less than a quarter of submitted studies, and we are grateful that the expert panel chose our paper to be a part of the program filled to the brim with unmissable presentations.This year’s agenda is a clear indicator of the robust growth within our industry and future opportunities (and threats) researchers and developers face as AI grows more powerful by the day.
On behalf of the Reality Defender team, I look forward to forging new partnerships and exchanging ideas with our distinguished colleagues, while representing our research and the award-winning multimodal approach we hope will set the tone for the future of deepfake detection.
1 "AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection," Trevine Oorloff, Surya Koppisetti, Nicolo Bonettini, Divyaraj Solanki, Ben Colman, Yaser Yacoob, Ali Shahriyari, Gaurav Bharaj. "We report 98.6% accuracy and 99.1% AUC on the FakeAVCeleb dataset, outperforming the current audio-visual state-of-the-art by 14.9% and 9.9%, respectively."