• Title/Summary/Keyword: K-최근이웃

Search Result 213, Processing Time 0.025 seconds

Implementation of DTW-kNN-based Decision Support System for Discriminating Emerging Technologies (DTW-kNN 기반의 유망 기술 식별을 위한 의사결정 지원 시스템 구현 방안)

  • Jeong, Do-Heon;Park, Ju-Yeon
    • Journal of Industrial Convergence
    • /
    • v.20 no.8
    • /
    • pp.77-84
    • /
    • 2022
  • This study aims to present a method for implementing a decision support system that can be used for selecting emerging technologies by applying a machine learning-based automatic classification technique. To conduct the research, the architecture of the entire system was built and detailed research steps were conducted. First, emerging technology candidate items were selected and trend data was automatically generated using a big data system. After defining the conceptual model and pattern classification structure of technological development, an efficient machine learning method was presented through an automatic classification experiment. Finally, the analysis results of the system were interpreted and methods for utilization were derived. In a DTW-kNN-based classification experiment that combines the Dynamic Time Warping(DTW) method and the k-Nearest Neighbors(kNN) classification model proposed in this study, the identification performance was up to 87.7%, and particularly in the 'eventual' section where the trend highly fluctuates, the maximum performance difference was 39.4% points compared to the Euclidean Distance(ED) algorithm. In addition, through the analysis results presented by the system, it was confirmed that this decision support system can be effectively utilized in the process of automatically classifying and filtering by type with a large amount of trend data.

A Parameter-Free Approach for Clustering and Outlier Detection in Image Databases (이미지 데이터베이스에서 매개변수를 필요로 하지 않는 클러스터링 및 아웃라이어 검출 방법)

  • Oh, Hyun-Kyo;Yoon, Seok-Ho;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.1
    • /
    • pp.80-91
    • /
    • 2010
  • As the volume of image data increases dramatically, its good organization of image data is crucial for efficient image retrieval. Clustering is a typical way of organizing image data. However, traditional clustering methods have a difficulty of requiring a user to provide the number of clusters as a parameter before clustering. In this paper, we discuss an approach for clustering image data that does not require the parameter. Basically, the proposed approach is based on Cross-Association that finds a structure or patterns hidden in data using the relationship between individual objects. In order to apply Cross-Association to clustering of image data, we convert the image data into a graph first. Then, we perform Cross-Association on the graph thus obtained and interpret the results in the clustering perspective. We also propose the method of hierarchical clustering and the method of outlier detection based on Cross-Association. By performing a series of experiments, we verify the effectiveness of the proposed approach. Finally, we discuss the finding of a good value of k used in k-nearest neighbor search and also compare the clustering results with symmetric and asymmetric ways used in building a graph.

Performance Comparison of Automatic Classification Using Word Embeddings of Book Titles (단행본 서명의 단어 임베딩에 따른 자동분류의 성능 비교)

  • Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.307-327
    • /
    • 2023
  • To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).

A Concordance Study of the Preprocessing Orders in Microarray Data (마이크로어레이 자료의 사전 처리 순서에 따른 검색의 일치도 분석)

  • Kim, Sang-Cheol;Lee, Jae-Hwi;Kim, Byung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.585-594
    • /
    • 2009
  • Researchers of microarray experiment transpose processed images of raw data to possible data of statistical analysis: it is preprocessing. Preprocessing of microarray has image filtering, imputation and normalization. There have been studied about several different methods of normalization and imputation, but there was not further study on the order of the procedures. We have no further study about which things put first on our procedure between normalization and imputation. This study is about the identification of differentially expressed genes(DEG) on the order of the preprocessing steps using two-dye cDNA microarray in colon cancer and gastric cancer. That is, we check for compare which combination of imputation and normalization steps can detect the DEG. We used imputation methods(K-nearly neighbor, Baysian principle comparison analysis) and normalization methods(global, within-print tip group, variance stabilization). Therefore, preprocessing steps have 12 methods. We identified concordance measure of DEG using the datasets to which the 12 different preprocessing orders were applied. When we applied preprocessing using variance stabilization of normalization method, there was a little variance in a sensitive way for detecting DEG.

Clustering Method based on Genre Interest for Cold-Start Problem in Movie Recommendation (영화 추천 시스템의 초기 사용자 문제를 위한 장르 선호 기반의 클러스터링 기법)

  • You, Tithrottanak;Rosli, Ahmad Nurzid;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.57-77
    • /
    • 2013
  • Social media has become one of the most popular media in web and mobile application. In 2011, social networks and blogs are still the top destination of online users, according to a study from Nielsen Company. In their studies, nearly 4 in 5active users visit social network and blog. Social Networks and Blogs sites rule Americans' Internet time, accounting to 23 percent of time spent online. Facebook is the main social network that the U.S internet users spend time more than the other social network services such as Yahoo, Google, AOL Media Network, Twitter, Linked In and so on. In recent trend, most of the companies promote their products in the Facebook by creating the "Facebook Page" that refers to specific product. The "Like" option allows user to subscribed and received updates their interested on from the page. The film makers which produce a lot of films around the world also take part to market and promote their films by exploiting the advantages of using the "Facebook Page". In addition, a great number of streaming service providers allows users to subscribe their service to watch and enjoy movies and TV program. They can instantly watch movies and TV program over the internet to PCs, Macs and TVs. Netflix alone as the world's leading subscription service have more than 30 million streaming members in the United States, Latin America, the United Kingdom and the Nordics. As the matter of facts, a million of movies and TV program with different of genres are offered to the subscriber. In contrast, users need spend a lot time to find the right movies which are related to their interest genre. Recent years there are many researchers who have been propose a method to improve prediction the rating or preference that would give the most related items such as books, music or movies to the garget user or the group of users that have the same interest in the particular items. One of the most popular methods to build recommendation system is traditional Collaborative Filtering (CF). The method compute the similarity of the target user and other users, which then are cluster in the same interest on items according which items that users have been rated. The method then predicts other items from the same group of users to recommend to a group of users. Moreover, There are many items that need to study for suggesting to users such as books, music, movies, news, videos and so on. However, in this paper we only focus on movie as item to recommend to users. In addition, there are many challenges for CF task. Firstly, the "sparsity problem"; it occurs when user information preference is not enough. The recommendation accuracies result is lower compared to the neighbor who composed with a large amount of ratings. The second problem is "cold-start problem"; it occurs whenever new users or items are added into the system, which each has norating or a few rating. For instance, no personalized predictions can be made for a new user without any ratings on the record. In this research we propose a clustering method according to the users' genre interest extracted from social network service (SNS) and user's movies rating information system to solve the "cold-start problem." Our proposed method will clusters the target user together with the other users by combining the user genre interest and the rating information. It is important to realize a huge amount of interesting and useful user's information from Facebook Graph, we can extract information from the "Facebook Page" which "Like" by them. Moreover, we use the Internet Movie Database(IMDb) as the main dataset. The IMDbis online databases that consist of a large amount of information related to movies, TV programs and including actors. This dataset not only used to provide movie information in our Movie Rating Systems, but also as resources to provide movie genre information which extracted from the "Facebook Page". Formerly, the user must login with their Facebook account to login to the Movie Rating System, at the same time our system will collect the genre interest from the "Facebook Page". We conduct many experiments with other methods to see how our method performs and we also compare to the other methods. First, we compared our proposed method in the case of the normal recommendation to see how our system improves the recommendation result. Then we experiment method in case of cold-start problem. Our experiment show that our method is outperform than the other methods. In these two cases of our experimentation, we see that our proposed method produces better result in case both cases.

A Learning Agent for Automatic Bookmark Classification (북 마크 자동 분류를 위한 학습 에이전트)

  • Kim, In-Cheol;Cho, Soo-Sun
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.455-462
    • /
    • 2001
  • The World Wide Web has become one of the major services provided through Internet. When searching the vast web space, users use bookmarking facilities to record the sites of interests encountered during the course of navigation. One of the typical problems arising from bookmarking is that the list of bookmarks lose coherent organization when the the becomes too lengthy, thus ceasing to function as a practical finding aid. In order to maintain the bookmark file in an efficient, organized manner, the user has to classify all the bookmarks newly added to the file, and update the folders. This paper introduces our learning agent called BClassifier that automatically classifies bookmarks by analyzing the contents of the corresponding web documents. The chief source for the training examples are the bookmarks already classified into several bookmark folders according to their subject by the user. Additionally, the web pages found under top categories of Yahoo site are collected and included in the training examples for diversifying the subject categories to be represented, and the training examples for these categories as well. Our agent employs naive Bayesian learning method that is a well-tested, probability-based categorizing technique. In this paper, the outcome of some experimentation is also outlined and evaluated. A comparison of naive Bayesian learning method alongside other learning methods such as k-Nearest Neighbor and TFIDF is also presented.

  • PDF

Multiple Period Forecasting of Motorway Traffic Volumes by Using Big Historical Data (대용량 이력자료를 활용한 다중시간대 고속도로 교통량 예측)

  • Chang, Hyun-ho;Yoon, Byoung-jo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.1
    • /
    • pp.73-80
    • /
    • 2018
  • In motorway traffic flow control, the conventional way based on real-time response has been changed into advanced way based on proactive response. Future traffic conditions over multiple time intervals are crucial input data for advanced motorway traffic flow control. It is necessary to overcome the uncertainty of the future state in order for forecasting multiple-period traffic volumes, as the number of uncertainty concurrently increase when the forecasting horizon expands. In this vein, multi-interval forecasting of traffic volumes requires a viable approach to conquer future uncertainties successfully. In this paper, a forecasting model is proposed which effectively addresses the uncertainties of future state based on the behaviors of temporal evolution of traffic volume states that intrinsically exits in the big past data. The model selects the past states from the big past data based on the state evolution of current traffic volumes, and then the selected past states are employed for estimating future states. The model was also designed to be suitable for data management systems in practice. Test results demonstrated that the model can effectively overcome the uncertainties over multiple time periods and can generate very reliable predictions in term of prediction accuracy. Hence, it is indicated that the model can be mounted and utilized on advanced data management systems.

The Road condition-based Braking Strength Calculation System for a fully autonomous driving vehicle (완전 자율주행을 위한 도로 상태 기반 제동 강도 계산 시스템)

  • Son, Su-Rak;Jeong, Yi-Na
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.53-59
    • /
    • 2022
  • After the 3rd level autonomous driving vehicle, the 4th and 5th level of autonomous driving technology is trying to maintain the optimal condition of the passengers as well as the perfect driving of the vehicle. However current autonomous driving technology is too dependent on visual information such as LiDAR and front camera, so it is difficult to fully autonomously drive on roads other than designated roads. Therefore this paper proposes a Braking Strength Calculation System (BSCS), in which a vehicle classifies road conditions using data other than visual information and calculates optimal braking strength according to road conditions and driving conditions. The BSCS consists of RCDM (Road Condition Definition Module), which classifies road conditions based on KNN algorithm, and BSCM (Braking Strength Calculation Module), which calculates optimal braking strength while driving based on current driving conditions and road conditions. As a result of the experiment in this paper, it was possible to find the most suitable number of Ks for the KNN algorithm, and it was proved that the RCDM proposed in this paper is more accurate than the unsupervised K-means algorithm. By using not only visual information but also vibration data applied to the suspension, the BSCS of the paper can make the braking of autonomous vehicles smoother in various environments where visual information is limited.

Welding Bead Detection Inspection Using the Brightness Value of Vertical and Horizontal Direction (수직 및 수평 방향의 밝깃값을 이용한 용접 비드 검출 검사)

  • Jae Eun Lee;Jong-Nam Kim
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.4
    • /
    • pp.241-248
    • /
    • 2022
  • Shear Reinforcement of Dual Anchorage(SRD) is used to reinforce the safety of reinforced concrete structures at construction sites. Welding is used to make shear reinforcement, and welding plays an important role in determining productivity and competitiveness of products. Therefore, a weld bead detection inspection is required. In this paper, we suggest an algorithm for inspecting welding beads using image data of welding beads. First, the proposed algorithm calculates a brightness value in a vertical direction in an image, and then divides a welding bead in a vertical direction by finding a position corresponding to a 50% height point of the brightness value distribution in the image. The welding bead area is also divided in the same way for the horizontal direction, and then the segmentation image is analyzed if there is a welding bead. The proposed algorithm reduced the amount of computation by performing analysis after specifying the region of interest. In addition, accuracy could be improved by using all brightness values in the vertical and horizontal directions using the difference of brightness between the base metal and the welding bead region in the SRD image. The experiment compared the analysis results using five algorithms, such as K-mean and K-neighborhood, as a method to detect if there is a welding bead, and the experimental result proved that the proposed algorithm was the most accurate.

Prediction of Divided Traffic Demands Based on Knowledge Discovery at Expressway Toll Plaza (지식발견 기반의 고속도로 영업소 분할 교통수요 예측)

  • Ahn, Byeong-Tak;Yoon, Byoung-Jo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.36 no.3
    • /
    • pp.521-528
    • /
    • 2016
  • The tollbooths of a main motorway toll plaza are usually operated proactively responding to the variations of traffic demands of two-type vehicles, i.e. cars and the other (heavy) vehicles, respectively. In this vein, it is one of key elements to forecast accurate traffic volumes for the two vehicle types in advanced tollgate operation. Unfortunately, it is not easy for existing univariate short-term prediction techniques to simultaneously generate the two-vehicle-type traffic demands in literature. These practical and academic backgrounds make it one of attractive research topics in Intelligent Transportation System (ITS) forecasting area to forecast the future traffic volumes of the two-type vehicles at an acceptable level of accuracy. In order to address the shortcomings of univariate short-term prediction techniques, a Multiple In-and-Out (MIO) forecasting model to simultaneously generate the two-type traffic volumes is introduced in this article. The MIO model based on a non-parametric approach is devised under the on-line access conditions of large-scale historical data. In a feasible test with actual data, the proposed model outperformed Kalman filtering, one of a widely-used univariate models, in terms of prediction accuracy in spite of multivariate prediction scheme.