CVE VULNERABILITY CLASSIFICATION IN SOURCE CODE BASED ON TOKEN ANALYSIS AND LSTM NETWORKS

  • Van Cong Nguyen Institute of Information and Communication Technology, Le Quy Don Technical University
  • Huy Toan Le Department of Digital Transformation and Environment Resources Data Information – Ministry of Natural Resources and Environment.
  • Minh Thanh Ta Institute of Information and Communication Technology, Le Quy Don Technical University
Keywords: Tokens, source code, deep learning, vulnerability detection, natural language processing, PHP source code

Abstract

As web applications become increasingly widespread, the importance of source code security is growing rapidly. Exposed vulnerabilities present serious risks to both service providers and customers. Various models have been proposed to address this issue, however, most approaches rely on complex graph structures generated from source code or on expert-driven regular expression patterns. This paper introduces a model that utilizes token-based mechanisms combined with deep learning techniques for efficient vulnerability detection in PHP (Hypertext Preprocessor) web applications. By leveraging the PHP
tokenization process, we have developed a custom token that merges tokens, supports key PHP features, and optimizes parsing. Using datasets such as the Software Assurance Reference Dataset (SARD) and SQL Injection Labs (SQLI-LABS), this paper demonstrates the training of a deep learning model with enhanced tokens to effectively detect vulnerabilities in the source code.

điểm /   đánh giá
Published
2025-01-20
Section
Bài viết