• Title/Summary/Keyword: t-Nearest Neighbor

Search Result 46, Processing Time 0.025 seconds

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.

Empirical Analysis & Comparisons of Web Document Classification Methods (문서분류 기법을 이용한 웹 문서 분류의 실험적 비교)

  • Lee, Sang-Soon;Choi, Jung-Min;Jang, Geun;Lee, Byung-Soo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.154-156
    • /
    • 2002
  • 인터넷의 발전으로 우리는 많은 정보와 지식을 인터넷에서 제공받을 수 있으며 HTML, 뉴스그룹 문서, 전자메일 등의 웹 문서로 존재한다. 이러한 웹 문서들은 여러가지 목적으로 분류해야 할 필요가 있으며 이를 적용한 시스템으로는 Personal WebWatcher, InfoFinder, Webby, NewT 등이 있다. 웹 문서 분류 시스템에서는 문서분류 기법을 사용하여 웹 문서의 소속 클래스를 결정하는데 문서분류를 위한 기법 중 대표적인 알고리즘으로 나이브 베이지안(Naive Baysian), k-NN(k-Nearest Neighbor), TFIDF(Term Frequency Inverse Document Frequency)방법을 이용한다. 본 논문에서는 웹 문서를 대상으로 이러한 문서분류 알고리즘 각각의 성능을 비교 및 평가하고자 한다.

  • PDF

Gesture Recognition Using Higher Correlation Feature Information and PCA

  • Kim, Jong-Min;Lee, Kee-Jun
    • Journal of Integrative Natural Science
    • /
    • v.5 no.2
    • /
    • pp.120-126
    • /
    • 2012
  • This paper describes the algorithm that lowers the dimension, maintains the gesture recognition and significantly reduces the eigenspace configuration time by combining the higher correlation feature information and Principle Component Analysis. Since the suggested method doesn't require a lot of computation than the method using existing geometric information or stereo image, the fact that it is very suitable for building the real-time system has been proved through the experiment. In addition, since the existing point to point method which is a simple distance calculation has many errors, in this paper to improve recognition rate the recognition error could be reduced by using several successive input images as a unit of recognition with K-Nearest Neighbor which is the improved Class to Class method.

CO-CLUSTER HOMOTOPY QUEUING MODEL IN NONLINEAR ALGEBRAIC TOPOLOGICAL STRUCTURE FOR IMPROVING POISON DISTRIBUTION NETWORK COMMUNICATION

  • V. RAJESWARI;T. NITHIYA
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.4
    • /
    • pp.861-868
    • /
    • 2023
  • Nonlinear network creates complex homotopy structural communication in wireless network medium because of complex distribution approach. Due to this multicast topological connection structure, the queuing probability was non regular principles to create routing structures. To resolve this problem, we propose a Co-cluster homotopy queuing model (Co-CHQT) for Nonlinear Algebraic Topological Structure (NLTS-) for improving poison distribution network communication. Initially this collects the routing propagation based on Nonlinear Distance Theory (NLDT) to estimate the nearest neighbor network nodes undernon linear at x(a,b)→ax2+bx2 = c. Then Quillen Network Decomposition Theorem (QNDT) was applied to sustain the non-regular routing propagation to create cluster path. Each cluster be form with co variance structure based on Two unicast 2(n+1)-Z2(n+1)-Z network. Based on the poison distribution theory X(a,b) ≠ µ(C), at number of distribution routing strategies weights are estimated based on node response rate. Deriving shorte;'l/st path from behavioral of the node response, Hilbert -Krylov subspace clustering estimates the Cluster Head (CH) to the routing head. This solves the approximation routing strategy from the nonlinear communication depending on Max- equivalence theory (Max-T). This proposed system improves communication to construction topological cluster based on optimized level to produce better performance in distance theory, throughput latency in non-variation delay tolerant.

Machine Learning Algorithms for Predicting Anxiety and Depression (불안과 우울 예측을 위한 기계학습 알고리즘)

  • Kang, Yun-Jeong;Lee, Min-Hye;Park, Hyuk-Gyu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.207-209
    • /
    • 2022
  • In the IoT environment, it is possible to collect life pattern data by recognizing human physical activity from smart devices. In this paper, the proposed model consists of a prediction stage and a recommendation stage. The prediction stage predicts the scale of anxiety and depression by using logistic regression and k-nearest neighbor algorithm through machine learning on the dataset collected from life pattern data. In the recommendation step, if the symptoms of anxiety and depression are classified, the principal component analysis algorithm is applied to recommend food and light exercise that can improve them. It is expected that the proposed anxiety/depression prediction and food/exercise recommendations will have a ripple effect on improving the quality of life of individuals.

  • PDF

Development of Interactive Content Services through an Intelligent IoT Mirror System (지능형 IoT 미러 시스템을 활용한 인터랙티브 콘텐츠 서비스 구현)

  • Jung, Wonseok;Seo, Jeongwook
    • Journal of Advanced Navigation Technology
    • /
    • v.22 no.5
    • /
    • pp.472-477
    • /
    • 2018
  • In this paper, we develop interactive content services for preventing depression of users through an intelligent Internet of Things(IoT) mirror system. For interactive content services, an IoT mirror device measures attention and meditation data from an EEG headset device and also measures facial expression data such as "sad", "angery", "disgust", "neutral", " happy", and "surprise" classified by a multi-layer perceptron algorithm through an webcam. Then, it sends the measured data to an oneM2M-compliant IoT server. Based on the collected data in the IoT server, a machine learning model is built to classify three levels of depression (RED, YELLOW, and GREEN) given by a proposed merge labeling method. It was verified that the k-nearest neighbor (k-NN) model could achieve about 93% of accuracy by experimental results. In addition, according to the classified level, a social network service agent sent a corresponding alert message to the family, friends and social workers. Thus, we were able to provide an interactive content service between users and caregivers.

Machine Learning-based Detection of DoS and DRDoS Attacks in IoT Networks

  • Yeo, Seung-Yeon;Jo, So-Young;Kim, Jiyeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.7
    • /
    • pp.101-108
    • /
    • 2022
  • We propose an intrusion detection model that detects denial-of-service(DoS) and distributed reflection denial-of-service(DRDoS) attacks, based on the empirical data of each internet of things(IoT) device by training system and network metrics that can be commonly collected from various IoT devices. First, we collect 37 system and network metrics from each IoT device considering IoT attack scenarios; further, we train them using six types of machine learning models to identify the most effective machine learning models as well as important metrics in detecting and distinguishing IoT attacks. Our experimental results show that the Random Forest model has the best performance with accuracy of over 96%, followed by the K-Nearest Neighbor model and Decision Tree model. Of the 37 metrics, we identified five types of CPU, memory, and network metrics that best imply the characteristics of the attacks in all the experimental scenarios. Furthermore, we found out that packets with higher transmission speeds than larger size packets represent the characteristics of DoS and DRDoS attacks more clearly in IoT networks.

Object-oriented Information Extraction and Application in High-resolution Remote Sensing Image

  • WEI Wenxia;Ma Ainai;Chen Xunwan
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.125-127
    • /
    • 2004
  • High-resolution satellite images offer abundance information of the earth surface for remote sensing applications. The information includes geometry, texture and attribute characteristic. The pixel-based image classification can't satisfy high-resolution satellite image's classification precision and produce large data redundancy. Object-oriented information extraction not only depends on spectrum character, but also use geometry and structure information. It can provide an accessible and truly revolutionary approach. Using Beijing Spot 5 high-resolution image and object-oriented classification with the eCognition software, we accomplish the cultures' precise classification. The test areas have five culture types including water, vegetation, road, building and bare lands. We use nearest neighbor classification and appraise the overall classification accuracy. The average of five species reaches 0.90. All of maximum is 1. The standard deviation is less than 0.11. The overall accuracy can reach $95.47\%.$ This method offers a new technology for high-resolution satellite images' available applications in remote sensing culture classification.

  • PDF

One-class Classification based Fault Classification for Semiconductor Process Cyclic Signal (단일 클래스 분류기법을 이용한 반도체 공정 주기 신호의 이상분류)

  • Cho, Min-Young;Baek, Jun-Geol
    • IE interfaces
    • /
    • v.25 no.2
    • /
    • pp.170-177
    • /
    • 2012
  • Process control is essential to operate the semiconductor process efficiently. This paper consider fault classification of semiconductor based cyclic signal for process control. In general, process signal usually take the different pattern depending on some different cause of fault. If faults can be classified by cause of faults, it could improve the process control through a definite and rapid diagnosis. One of the most important thing is a finding definite diagnosis in fault classification, even-though it is classified several times. This paper proposes the method that one-class classifier classify fault causes as each classes. Hotelling T2 chart, kNNDD(k-Nearest Neighbor Data Description), Distance based Novelty Detection are used to perform the one-class classifier. PCA(Principal Component Analysis) is also used to reduce the data dimension because the length of process signal is too long generally. In experiment, it generates the data based real signal patterns from semiconductor process. The objective of this experiment is to compare between the proposed method and SVM(Support Vector Machine). Most of the experiments' results show that proposed method using Distance based Novelty Detection has a good performance in classification and diagnosis problems.

Synthesis, Structure, and Magnetic Properties of 1D Nickel Coordination Polymer Ni(en)(ox)·2H2O (en = ethylenediamine; ox = oxalate)

  • Chun, Ji-Eun;Lee, Yu-Mi;Pyo, Seung-Moon;Im, Chan;Kim, Seung-Joo;Yun, Ho-Seop;Do, Jung-Hwan
    • Bulletin of the Korean Chemical Society
    • /
    • v.30 no.7
    • /
    • pp.1603-1606
    • /
    • 2009
  • A new 1D oxalato bridged compound Ni(en)(ox)-2$H_2$O, (ox = oxalate; en = ethylenediamine) has been hydrothermally synthesized and characterized by single crystal X-ray diffraction, IR spectrum, TG analysis, and magnetic measurements. In the structure the Ni atoms are coordinated with four oxygen atoms in two oxalate ions and two nitrogen atoms in one ethylenediamine molecule. The oxalate anion acts as a bis-bidentate ligand bridging Ni atoms in cis-configuration. This completes the infinite zigzag neutral chain, [Ni(en)(ox)]. The interchain space is filled by water molecules that link the chains through a network of hydrogen bonds. Thermal variance of the magnetic susceptibility shows a broad maximum around 50 K characteristic of one-dimensional antiferromagnetic coupling. The theoretical fit of the data for T > 20 K led to the nearest neighbor spin interaction J = -43 K and g = 2.25. The rapid decrease in susceptibility below 20 K indicate this compound to be a likely Haldane gap candidate material with S = 1.