روش های نظارت بر مدت وزن کردن برای طبقه بندی آدرس سایت
SUPERVISED TERM WEIGHTING METHODS FOR URL CLASSIFICATION
نویسندگان |
این بخش تنها برای اعضا قابل مشاهده است ورودعضویت |
اطلاعات مجله |
thescipub.com |
سال انتشار |
2014 |
فرمت فایل |
PDF |
کد مقاله |
24163 |
پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.
چکیده (انگلیسی):
Many term weighting methods are suggested in the literature for Information Retrieval and Text
Categorization. Term weighting method, a part of feature selection process is not yet explored for URL
classification problem. We classify a web page using its URL alone without fetching its content and hence
URL based classification is faster than other methods. In this study, we investigate the use of term
weighting methods for selecting relevant URL features and their impact on the performance of URL
classification. We propose a New Relevance Factor (NRF) for the supervised term weighting method to
compute the URL weights and perform multiclass classification of URLs using Naive Bayes Classifier. To
evaluate the proposed method, we have conducted various experiments on ODP dataset and our
experimental results show that the proposed supervised term weighting method based on NRF is suitable for
URL classification. We have achieved 11% improvement in terms of Precision over the existing binary
classifier methods and 22% improvement in terms of F1 when compared with existing multiclass classifiers.
کلمات کلیدی مقاله (فارسی):
طبقه بندي صفحه وب سايت ، طبقه بندي ، ويژگي هاي آدرس سايت ، روش مدت وزن کردن ، پروژه فهرست باز
کلمات کلیدی مقاله (انگلیسی):
Keywords: Web Page Classification, URL Features, Term Weighting Method, ODP
پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.