A PROPOSED DISTRIBUTED ARCHITECTURE FOR SEARCHING SEMANTICALLY ON A LARGE DATASET OF HACKING NEWS
Abstract
In this paper, we propose a distributed architecture to support semantic search on largescale datasets of online technology news. The solution combines knowledge graph modeling, natural language processing techniques, and distributed processing on Apache Spark. The
paper presents: (1) A distributed architecture that stores a large-scale semantic news dataset using resource description framework model; (2) A pipeline for extracting knowledge from text using natural language processing (NLP) tools such as dependency parsing and named entity recognition; (3) A distributed search engine that uses keyword expansion and graph reasoning to return semantically related results. The experimental results show that the proposed model improves the semantic search capabilities on large-scale data compared to traditional keywordbased search methods.