برچسب زدن به گفتار بخش عربی با استفاده از نزدیکترین همسایه و طبقه بندی بیس های ترکیبی

ARABIC PART OF SPEECH TAGGING USING K-NEAREST NEIGHBOUR AND NAIVE BAYES CLASSIFIERS COMBINATION

نویسندگان	این بخش تنها برای اعضا قابل مشاهده است ورود عضویت
اطلاعات مجله	thescipub.com
سال انتشار	2014
فرمت فایل	PDF
کد مقاله	24124

پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.

چکیده (انگلیسی):

Part Of Speech (POS) tagging forms the important preprocessing step in many of the natural language
processing applications such as text summarization, question answering and information retrieval system. It
is the process of classifying every word in a given context to its appropriate part of speech. Different POS
tagging techniques in the literature have been developed and experimented. Currently, it is well known that
some POS tagging models are not performing well on the Quranic Arabic due to the complexity of the
Quranic Arabic text. This complexity presents several challenges for POS tagging such as high ambiguity,
data sparseness and large existence of unknown words. With this in mind, the main problem here is to find
out how existing and efficient methods perform in Arabic and how can Quranic corpus be utilized to
produce an efficient framework for Arabic POS tagging. We propose a classifiers combination experimental
framework for Arabic POS tagger, by selecting two best diverse probabilistic classifiers used in numerous
works in non-Arabic language; namely K-Nearest Neighbour (KNN) and Naive Bayes (NB). The Majority
voting is used here as the combination strategy to exploit classifiers advantages. In addition, an in-depth
study has been conducted on a large list of features for exploiting effective features and investigating their
role in enhancing the performance of POS taggers for the Quranic Arabic. Hence, this study aims to
efficiently integrate different feature sets and tagging algorithms to synthesize more accurate POS tagging
procedure. The data used in this study is the Arabic Quranic Corpus, an annotated linguistic resource
consisting of 77,430 words with Arabic grammar, syntax and morphology for each word in the Holy Quran.
The highest accuracy in the results achieved is 98.32%, which can be a significant enhancement for the
state-of-the-art for Arabic Quranic text. The most effective features that yield this accuracy are a
combination of w0 (the current word), p0 (POS of the current word), p-3 (POS of three words before), p-2
(POS of two words before) and p-1 (POS of the word before).

کلمات کلیدی مقاله (فارسی):

بخش گفتار ، پردازش طبيعي زبان ، طبقه بندي

کلمات کلیدی مقاله (انگلیسی):

Keywords: Part of Speech, Natural Language Processing, Classification

ثبت نام

دسته بندی مقالات

برچسب زدن به گفتار بخش عربی با استفاده از نزدیکترین همسایه و طبقه بندی بیس های ترکیبی

ARABIC PART OF SPEECH TAGGING USING K-NEAREST NEIGHBOUR AND NAIVE BAYES CLASSIFIERS COMBINATION

پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.

چکیده (انگلیسی):

کلمات کلیدی مقاله (فارسی):

کلمات کلیدی مقاله (انگلیسی):

پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.

ورود به سایت

ثبت نام

دسته بندی مقالات

برچسب زدن به گفتار بخش عربی با استفاده از نزدیکترین همسایه و طبقه بندی بیس های ترکیبی

ARABIC PART OF SPEECH TAGGING USING K-NEAREST NEIGHBOUR AND NAIVE BAYES CLASSIFIERS COMBINATION

پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.

چکیده (انگلیسی):

کلمات کلیدی مقاله (فارسی):

کلمات کلیدی مقاله (انگلیسی):

پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.

مقالات مشابه: