FGSM Attack on CNN-based image classifiers: Vulnerability analysis and an effective defense strategy
Abstract
Convolutional Neural Networks (CNNs) have demonstrated significant advantages and have, therefore, been widely applied across various domains. However, adversarial attacks have exposed critical vulnerabilities in these models, posing threats to the security and reliability of deep learning systems. Although numerous studies have investigated adversarial attacks on deep learning models, the specific impact of such attacks on CNN-based image classifiers remains an open issue, especially considering that many widely-used CNN models form the foundation of essential real-world applications. This study analyzes the vulnerabilities of CNN image classifiers under the Fast Gradient Sign Method (FGSM) adversarial attack and proposes an effective defense strategy named WR_FGSM. Experimental results on standard benchmark datasets show that several CNN models suffer significantly from FGSM attacks. The adversarial images generated by this attack not only deceive CNN-based image classifiers but also appear visually indistinguishable from the original images to the human vision. Our proposed WR_FGSM defense incorporates adversarial training—one of the most effective existing defense strategies—along with a regularization technique during the training process. This approach effectively safeguards CNN models against FGSM attacks while maintaining a balance between adversarial robustness and the generalization capability of the models.