Xây dựng cấu trúc phân tử bằng thuật toán kết hợp K-Nearest Neighbor và cây tìm kiếm K-Dimension
Abstract
The construction of molecular properties plays a significant role in various fields, such as material science, sensors, nanotechnology, drug design, and more. However, the construction of molecular structures on a raw dataset, which is noisy and incomplete, is a challenging but crucial task. K-Nearest neighbor Classification (KNN) is a lazy learning classification algorithm with tendency to search the nearest neighbors for a target in the entire training set. Nevertheless, each step of KNN is quite time-consuming. In comparison, the K-Dimension tree (K-D tree) algorithm is a multi-dimensional binary tree, a specific storage structure for time-efficiently representing training data. To that respect, in this journal article, we conduct and propose a method called the KNN-KD tree algorithm to process a raw labeled dataset of the molecular properties by combining the advantages of the KNN and K-D tree.