A PROPOSED DISTRIBUTED ARCHITECTURE FOR SEARCHING SEMANTICALLY ON A LARGE DATASET OF HACKING NEWS

  • Ngoc Long Do Institute 486, Command 86
  • The Hung Nguyen Institute 486, Command 86
  • Trung Dung Nguyen Institute 486, Command 86
  • Xuan Duc Le Institute 486, Command 86
  • Chi Thanh Nguyen Institute of Information Technology and Electronics, Academy of Military Science and Technology
  • Quoc Khanh Nguyen Institute of Information and Communication Technology, Le Quy Don Technical University
  • Thi Bich Van Pham Institute of Information and Communication Technology, Le Quy Don Technical University
Keywords: Distributed system, knowledge graph, semantic search, natural language processing, hackernews, large language models

Abstract

In this paper, we propose a distributed architecture to support semantic search on largescale datasets of online technology news. The solution combines knowledge graph modeling, natural language processing techniques, and distributed processing on Apache Spark. The
paper presents: (1) A distributed architecture that stores a large-scale semantic news dataset using resource description framework model; (2) A pipeline for extracting knowledge from text using natural language processing (NLP) tools such as dependency parsing and named entity recognition; (3) A distributed search engine that uses keyword expansion and graph reasoning to return semantically related results. The experimental results show that the proposed model improves the semantic search capabilities on large-scale data compared to traditional keywordbased search methods.

điểm /   đánh giá
Published
2025-08-28
Section
Bài viết