مطالعه مقایسه ای خوشه بندی الگوریتم های از K-means و++ K-means در دامنه جرم
COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN
نویسندگان |
این بخش تنها برای اعضا قابل مشاهده است ورودعضویت |
اطلاعات مجله |
thescipub.com |
سال انتشار |
2014 |
فرمت فایل |
PDF |
کد مقاله |
23852 |
پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.
چکیده (انگلیسی):
This study presents the results of an experimental study of two document clustering techniques which are kmeans
and k-means++. In particular, we compare the two main approaches in crime document clustering.
The drawback of k-means is that the user needs to define the centroid point. This becomes more critical
when dealing with document clustering because each center point represented by a word and the calculation
of distance between words is not a trivial task. To overcome this problem, a k-means++ was introduced in
order to find a good initial center point. Since k-means++ has not being applied before in crime document
clustering, this study presented a comparative study between k-means and k-means++ to investigate
whether the initialization process in k-means++ does help to get a better results than k-means. We
proposes the k-means++ clustering algorithm, to identify best seed for initial cluster centers in
clustering crime document. The aim of this study is to conduct a comparative study of two main
clustering algorithms, namely k-means and k-means++. The method of this study includes a preprocessing
phase, which in turn involves tokeniza-tion, stop-words removal and stemming. In addition,
we evaluate the impact of the two similarity/distance measures (Cosine similarity and Jaccard
coefficient) on the results of the two clustering algorithms. Exper-imental results on several settings of
the crime data set showed that by identifying the best seed for initial cluster centers, k-mean++ can
significantly (with the significance interval at 95%) work better than k-means. These results
demonstrate the accuracy of k-mean++ clustering algorithm in clustering crime doc-uments.
کلمات کلیدی مقاله (فارسی):
خوشه بندی جرم و جنایت سند ، روش K++، روش K، شباهت اقدامات / فاصله
کلمات کلیدی مقاله (انگلیسی):
Keywords: Crime Document Clustering, K-Means++, K-Means Algorithm, Similarity/Distance Measures
پس از پرداخت آنلاین، فوراً لینک دانلود مقاله به شما نمایش داده می شود.