Search | Korea Science

K Nearest Neighbor Joins for Big Data Processing based on Spark (Spark 기반 빅데이터 처리를 위한 K-최근접 이웃 연결)

JIAQI, JI;Chung, Yeongjee
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.9
- /
- pp.1731-1737
- /
- 2017
K Nearest Neighbor Join (KNN Join) is a simple yet effective method in machine learning. It is widely used in small dataset of the past time. As the number of data increases, it is infeasible to run this model on an actual application by a single machine due to memory and time restrictions. Nowadays a popular batch process model called MapReduce which can run on a cluster with a large number of computers is widely used for large-scale data processing. Hadoop is a framework to implement MapReduce, but its performance can be further improved by a new framework named Spark. In the present study, we will provide a KNN Join implement based on Spark. With the advantage of its in-memory calculation capability, it will be faster and more effective than Hadoop. In our experiments, we study the influence of different factors on running time and demonstrate robustness and efficiency of our approach.
https://doi.org/10.6109/jkiice.2017.21.9.1731 인용 PDF KSCI

A Hashing Method Using PCA-based Clustering (PCA 기반 군집화를 이용한 해슁 기법)

Park, Cheong Hee
- KIPS Transactions on Software and Data Engineering
- /
- v.3 no.6
- /
- pp.215-218
- /
- 2014
In hashing-based methods for approximate nearest neighbors(ANN) search, by mapping data points to k-bit binary codes, nearest neighbors are searched in a binary embedding space. In this paper, we present a hashing method using a PCA-based clustering method, Principal Direction Divisive Partitioning(PDDP). PDDP is a clustering method which repeatedly partitions the cluster with the largest variance into two clusters by using the first principal direction. The proposed hashing method utilizes the first principal direction as a projective direction for binary coding. Experimental results demonstrate that the proposed method is competitive compared with other hashing methods.
https://doi.org/10.3745/KTSDE.2014.3.6.215 인용 PDF KSCI

Malware Classification System to Support Decision Making of App Installation on Android OS (안드로이드 OS에서 앱 설치 의사결정 지원을 위한 악성 앱 분류 시스템)

Ryu, Hong Ryeol;Jang, Yun;Kwon, Taekyoung
- Journal of KIISE
- /
- v.42 no.12
- /
- pp.1611-1622
- /
- 2015
Although Android systems provide a permission-based access control mechanism and demand a user to decide whether to install an app based on its permission list, many users tend to ignore this phase. Thus, an improved method is necessary for users to intuitively make informed decisions when installing a new app. In this paper, with regard to the permission-based access control system, we present a novel approach based on a machine-learning technique in order to support a user decision-making on the fly. We apply the K-NN (K-Nearest Neighbors) classification algorithm with necessary weighted modifications for malicious app classification, and use 152 Android permissions as features. Our experiment shows a superior classification result (93.5% accuracy) compared to other previous work. We expect that our method can help users make informed decisions at the installation step.
https://doi.org/10.5626/JOK.2015.42.12.1611 인용 KSCI

Location Estimation Method Employing Fingerprinting Scheme based on K-Nearest Neighbor Algorithm under WLAN Environment of Ship (선박의 WLAN 환경에서 K-최근접 이웃 알고리즘 기반 Fingerprinting 방식을 적용한 위치 추정 방법)

Kim, Beom-Mu;Jeong, Min A;Lee, Seong Ro
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.18 no.10
- /
- pp.2530-2536
- /
- 2014
Many studies have been made on location estimation under indoor environments which GPS signals do not reach, and, as a result, a variety of estimation methods have been proposed. In this paper, we deeply consider a problem of location estimation in a ship with a multi-story structure, and investigate a location estimation method using the fingerprint scheme based on the K-Nearest Neighbor algorithm. A reliable DB is constructed by measuring 100 received signals at each of 39 RPs in order to employ the fingerprint scheme, and, based on the DB, a simulation to estimate the location of a randomly-positioned terminal is performed. The simulation result confirms that the performance of location estimation by the fingerprint scheme is quite satisfactory.
https://doi.org/10.6109/jkiice.2014.18.10.2530 인용 PDF KSCI

Efficient Nearest Neighbor Search on Moving Object Trajectories (이동객체궤적에 대한 효율적인 최근접이웃검색)

Kim, Gyu-Jae;Park, Young-Hee;Cho, Woo-Hyun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.18 no.12
- /
- pp.2919-2925
- /
- 2014
Because of the rapid growth of mobile communication and wireless communication, Location-based services are handled in many applications. So, the management and analysis of spatio-temporal data are a hot issue in database research. Index structure and query processing of such contents are very important for these applications. This paper addressees algorithms that make index structure by using Douglas-Peucker Algorithm and process nearest neighbor search query efficiently on moving objects trajectories. We compare and analyze our algorithms by experiments. Our algorithms make small size of index structure and process the query more efficiently.
https://doi.org/10.6109/jkiice.2014.18.12.2919 인용 PDF KSCI KPUBS HTML

Machine Learning Model for Predicting the Residual Useful Lifetime of the CNC Milling Insert (공작기계의 절삭용 인서트의 잔여 유효 수명 예측 모형)

Won-Gun Choi;Heungseob Kim;Bong Jin Ko
- Journal of Advanced Navigation Technology
- /
- v.27 no.1
- /
- pp.111-118
- /
- 2023
For the implementation of a smart factory, it is necessary to collect data by connecting various sensors and devices in the manufacturing environment and to diagnose or predict failures in production facilities through data analysis. In this paper, to predict the residual useful lifetime of milling insert used for machining products in CNC machine, weight k-NN algorithm, Decision Tree, SVR, XGBoost, Random forest, 1D-CNN, and frequency spectrum based on vibration signal are investigated. As the results of the paper, the frequency spectrum does not provide a reliable criterion for an accurate prediction of the residual useful lifetime of an insert. And the weighted k-nearest neighbor algorithm performed best with an MAE of 0.0013, MSE of 0.004, and RMSE of 0.0192. This is an error of 0.001 seconds of the remaining useful lifetime of the insert predicted by the weighted-nearest neighbor algorithm, and it is considered to be a level that can be applied to actual industrial sites.
https://doi.org/10.12673/jant.2023.27.1.111 인용 PDF HTML

A study on the spatial neighborhood in spatial regression analysis (공간이웃정보를 고려한 공간회귀분석)

Kim, Sujung
- Journal of the Korean Data and Information Science Society
- /
- v.28 no.3
- /
- pp.505-513
- /
- 2017
Recently, numerous small area estimation studies have been conducted to obtain more detailed and accurate estimation results. Most of these studies have employed spatial regression models, which require a clear definition of spatial neighborhoods. In this study, we introduce the Delaunay triangulation as a method to define spatial neighborhood, and compare this method with the k-nearest neighbor method. A simulation was conducted to determine which of the two methods is more efficient in defining spatial neighborhood, and we demonstrate the performance of the proposed method using a land price data.
https://doi.org/10.7465/jkdi.2017.28.3.505 인용 PDF KSCI

Rejection Study of Mearest Meighbor Classifier for Diagnosis of Rotating Machine Fault (회전기계 고장 진단을 위한 최근접 이웃 분류기의 기각 전략)

최영일;박광호;기창두
- Proceedings of the Korean Society of Precision Engineering Conference
- /
- 2000.11a
- /
- pp.81-84
- /
- 2000
Rotating machine is used extensively and plays important roles in the industrial field. Therefore when rotating machine get out of order, it is necessary to know reasons then deal with the troubles immediately. So many studies far diagnosis of rotating machine are being done. However by this time most of study has an interest in gaining a high recognition But without considering error $rate^{(1)(2)(3)}$ , it is not desirable enough to apply h the actual application system. If the manager of system receives the result misjudging the condition of rotating machine and takes measures, we would lose heavily. So in order to play the creditable diagnosis, we must consider error rate. T h ~ t is. it must be able to reject the result of misjudgment. This study uses nearest neighbor classifier for diagnosis of rotating $machine^{(4)(8)}$ And the Smith's rejection $method^{(1)}$ used to recognize handwritten charter is done. Consequently creditable diagnosis of rotating machine is proposed.
PDF

Rejection Scheme of Nearest Neighbor Classifier for Diagnosis of Rotating Machine Fault (회전 기계 고장 진단을 위한 최근접 이웃 분류기의 기각 전략)

Choe, Yeong-Il;Park, Gwang-Ho;Gi, Chang-Du
- Journal of the Korean Society for Precision Engineering
- /
- v.19 no.3
- /
- pp.52-58
- /
- 2002
The purpose of condition monitoring and fault diagnosis is to detect faults occurring in machinery in order to improve the level of safety in plants and reduce operational and maintenance costs. The recognition performance is important not only to gain a high recognition rate bur a1so to minimize the diagnosis failures error rate by using off effective rejection module. We examined the problem of performance evaluation for the rejection scheme considering the accuracy of individual c1asses in order to increase the recognition performance. We use the Smith's method among the previous studies related to rejection method. Nearest neighbor classifier is used for classifying the machine conditions from the vibration signals. The experiment results for the performance evaluation of rejection show the modified optimum rejection method is superior to others.
PDF KSCI

A Movie Recommender Systems using Personal Disposition in Hadoop (하둡에서 개인 성향을 이용한 영화 추천시스템)

Kim, Sun-Ho;Kim, Se-Jun;Mo, Ha-Young;Kim, Chae-Reen;Park, Gyu-Tae;Park, Doo-Soon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.04a
- /
- pp.642-644
- /
- 2014
정보의 폭발적인 증가로 인해 사용자들은 오히려 원하는 정보를 빠른 시간에 얻는 것이 힘들어졌다. 따라서 이 문제를 해결하기 위한 다양한 방식의 새로운 서비스들이 제공되고 있다. 추천 시스템 중에서 영화를 추천해주는 방법에는 사용되는 알고리즘에는 협업필터링 방법이 가장 성공한 알고리즘으로 사용되고 있다. 협업 필터링 방법은 사용자가 자발적으로 입력한 선호도 평가치를 바탕으로 추천 하고자 하는 사용자와 취향이 비슷하다고 판단되는 사람들 즉, 최근접 이웃을 구하고 최근접 이웃의 선호도 평가치를 바탕으로 사용자에게 영화를 추천을 해주는 기법이다. 그러나 협업 필터링에는 몇 가지 대표적인 문제점이 있으며 희박성 및 확장성, 투명성이 있다. 본 논문에서는 영화 추천 시스템에서의 협업필터링의 희박성 문제를 보완하고자 개개인의 성향을 반영하여 효율이 좋은 추천 방법을 제안하고 하둡에서 성능평가를 하였다.
https://doi.org/10.3745/PKIPS.y2014m04a.642 인용 PDF

Search Result 187, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)