Gán nhãn từ loại cho Tiếng Việt dựa trên văn phong và tính toán xác suất

Nguyễn  Quang  Châu; Phan Thị  Tươi; Cao Hoàng  Trụ

Vietnamese part-of-speed tagging based on style of texts and probability model

Nguyễn Quang Châu
Phan Thị Tươi
Cao Hoàng Trụ

Abstract

Accurate part-of-speech (POS) tagging for words in Vietnamese texts is very important problem. It will support for texts parsing, resolve polysemy, assist with semantic information extraction systems, etc. Therefore, this paper presents an approach to POS tagging for Vietnamese texts. This method used probability model and based on a lexicon with information about possible POS tags for each word, a manually labelled corpus, syntax and context of texts. Concurrently, we also built a corpus with 75,000 entries and a lexicon with 80,000 entries for the purpose of Vietnamese language processing research and application development.

PDF (Vietnamese)

điểm / đánh giá

Published

2017-06-08

Issue

Vol. 9 No. 2 (2006)

Section

ARTILES

Copyright belongs to VNU-HCM “Science and Technology Development” Journal. Any copy or reprinting of any form must be permitted by the Journal.

Vietnamese part-of-speed tagging based on style of texts and probability model

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN, THỐNG KÊ - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION AND STATISTICS