CLASSIFICATION OF BACTERIA USING FTIR WITH LDA, SVC, AND LR: EFFECTS OF PCA AND ANOVA PREPROCESSING
Abstract
This study aimed to apply three supervised models, including LDA, SVC, and LR, to classify 15 different bacterial species based on FTIR. These models were directly applied to the preprocessed data and the filtered spectral data. ANOVA was used to select key features, while PCA helped retain principal components. The results indicated that selecting feature values with an F-value threshold of 3 achieved the highest accuracy with the LDA model (94%), followed by the SVC (80%) and LR (74%). In addition, applying PCA to retain 300 principal components offered the accuracies of LDA, SVC, and LR of 93.1, 76.7, and 72.3%, respectively. Both feature selection methods were demonstrated to be informative and yielded higher accuracies than the unfiltered data in the classification of the bacterial species studied.