• Title/Summary/Keyword: 최근접 이웃 방법

Search Result 109, Processing Time 0.023 seconds

Development of a Server-independent System to Identify and Communicate Fire Information and Location Tracking of Evacuees (화재정보 확인과 대피자 위치추적을 위한 서버 독립형 시스템 개발)

  • Lee, Chijoo;Lee, Taekwan
    • Journal of the Korea Institute of Building Construction
    • /
    • v.21 no.6
    • /
    • pp.677-687
    • /
    • 2021
  • If a fire breaks out in a building, occupants can evacuate more rapidly if they are able to identify the location of the fire, the exits, and themselves. This study derives the requirements of system development, such as distance non-limitation, a non-additional device, a non-centralized server system, and low power for an emergency, to identify information about the fire and the location of evacuees. The objective is to receive and transmit information and reduce the time and effort of the database for location tracking. Accordingly, this study develops a server-independent system that collects information related to a building fire and an evacuee's location and provides information to the evacuee on their mobile device. The system is composed of a transmitting unit to disseminate fire location information and a mobile device application to determine the locations of the fire and the evacuee. The developed system can contribute to reducing the damage to humans because evacuees can identify the location of the fire, exits, and themselves regardless of the impaired server system by fire, the interruption of power source, and the evacuee's location. Furthermore, this study proposes a theoretical basis for reducing the effort required for database construction of the k-nearest neighbor fingerprint.

Comparison of Effective Soil Depth Classification Methods Using Topographic Information (지형정보를 이용한 유효토심 분류방법비교)

  • Byung-Soo Kim;Ju-Sung Choi;Ja-Kyung Lee;Na-Young Jung;Tae-Hyung Kim
    • Journal of the Korean Geosynthetics Society
    • /
    • v.22 no.2
    • /
    • pp.1-12
    • /
    • 2023
  • Research on the causes of landslides and prediction of vulnerable areas is being conducted globally. This study aims to predict the effective soil depth, a critical element in analyzing and forecasting landslide disasters, using topographic information. Topographic data from various institutions were collected and assigned as attribute information to a 100 m × 100 m grid, which was then reduced through data grading. The study predicted effective soil depth for two cases: three depths (shallow, normal, deep) and five depths (very shallow, shallow, normal, deep, very deep). Three classification models, including K-Nearest Neighbor, Random Forest, and Deep Artificial Neural Network, were used, and their performance was evaluated by calculating accuracy, precision, recall, and F1-score. Results showed that the performance was in the high 50% to early 70% range, with the accuracy of the three classification criteria being about 5% higher than the five criteria. Although the grading criteria and classification model's performance presented in this study are still insufficient, the application of the classification model is possible in predicting the effective soil depth. This study suggests the possibility of predicting more reliable values than the current effective soil depth, which assumes a large area uniformly.

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.

Bibliometric Analysis on Health Information-Related Research in Korea (국내 건강정보관련 연구에 대한 계량서지학적 분석)

  • Jin Won Kim;Hanseul Lee
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.1
    • /
    • pp.411-438
    • /
    • 2024
  • This study aims to identify and comprehensively view health information-related research trends using a bibliometric analysis. To this end, 1,193 papers from 2002 to 2023 related to "health information" were collected through the Korea Citation Index (KCI) database and analyzed in diverse aspects: research trends by period, academic fields, intellectual structure, and keyword changes. Results indicated that the number of papers related to health information continued to increase and has been decreasing since 2021. The main academic fields of health information-related research included "biomedical engineering," "preventive medicine/occupational environmental medicine," "law," "nursing," "library and information science," and "interdisciplinary research." Moreover, a co-word analysis was performed to understand the intellectual structure of research related to health information. As a result of applying the parallel nearest neighbor clustering (PNNC) algorithm to identify the structure and cluster of the derived network, four clusters and 17 subgroups belonging to them could be identified, centering on two conglomerates: "medical engineering perspective on health information" and "social science perspective on health information." An inflection point analysis was attempted to track the timing of change in the academic field and keywords, and common changes were observed between 2010 and 2011. Finally, a strategy diagram was derived through the average publication year and word frequency, and high-frequency keywords were presented by dividing them into "promising," "growth," and "mature." Unlike previous studies that mainly focused on content analysis, this study is meaningful in that it viewed the research area related to health information from an integrated perspective using various bibliometric methods.

Development of Regularized Expectation Maximization Algorithms for Fan-Beam SPECT Data (부채살 SPECT 데이터를 위한 정칙화된 기댓값 최대화 재구성기법 개발)

  • Kim, Soo-Mee;Lee, Jae-Sung;Lee, Soo-Jin;Kim, Kyeong-Min;Lee, Dong-Soo
    • The Korean Journal of Nuclear Medicine
    • /
    • v.39 no.6
    • /
    • pp.464-472
    • /
    • 2005
  • Purpose: SPECT using a fan-beam collimator improves spatial resolution and sensitivity. For the reconstruction from fan-beam projections, it is necessary to implement direct fan-beam reconstruction methods without transforming the data into the parallel geometry. In this study, various fan-beam reconstruction algorithms were implemented and their performances were compared. Materials and Methods: The projector for fan-beam SPECT was implemented using a ray-tracing method. The direct reconstruction algorithms implemented for fan-beam projection data were FBP (filtered backprojection), EM (expectation maximization), OS-EM (ordered subsets EM) and MAP-EM OSL (maximum a posteriori EM using the one-step late method) with membrane and thin-plate models as priors. For comparison, the fan-beam protection data were also rebinned into the parallel data using various interpolation methods, such as the nearest neighbor, bilinear and bicubic interpolations, and reconstructed using the conventional EM algorithm for parallel data. Noiseless and noisy projection data from the digital Hoffman brain and Shepp/Logan phantoms were reconstructed using the above algorithms. The reconstructed images were compared in terms of a percent error metric. Results: for the fan-beam data with Poisson noise, the MAP-EM OSL algorithm with the thin-plate prior showed the best result in both percent error and stability. Bilinear interpolation was the most effective method for rebinning from the fan-beam to parallel geometry when the accuracy and computation load were considered. Direct fan-beam EM reconstructions were more accurate than the standard EM reconstructions obtained from rebinned parallel data. Conclusion: Direct fan-beam reconstruction algorithms were implemented, which provided significantly improved reconstructions.

One-probe P300 based concealed information test with machine learning (기계학습을 이용한 단일 관련자극 P300기반 숨김정보검사)

  • Hyuk Kim;Hyun-Taek Kim
    • Korean Journal of Cognitive Science
    • /
    • v.35 no.1
    • /
    • pp.49-95
    • /
    • 2024
  • Polygraph examination, statement validity analysis and P300-based concealed information test are major three examination tools, which are use to determine a person's truthfulness and credibility in criminal procedure. Although polygraph examination is most common in criminal procedure, but it has little admissibility of evidence due to the weakness of scientific basis. In 1990s to support the weakness of scientific basis about polygraph, Farwell and Donchin proposed the P300-based concealed information test technique. The P300-based concealed information test has two strong points. First, the P300-based concealed information test is easy to conduct with polygraph. Second, the P300-based concealed information test has plentiful scientific basis. Nevertheless, the utilization of P300-based concealed information test is infrequent, because of the quantity of probe stimulus. The probe stimulus contains closed information that is relevant to the crime or other investigated situation. In tradition P300-based concealed information test protocol, three or more probe stimuli are necessarily needed. But it is hard to acquire three or more probe stimuli, because most of the crime relevant information is opened in investigative situation. In addition, P300-based concealed information test uses oddball paradigm, and oddball paradigm makes imbalance between the number of probe and irrelevant stimulus. Thus, there is a possibility that the unbalanced number of probe and irrelevant stimulus caused systematic underestimation of P300 amplitude of irrelevant stimuli. To overcome the these two limitation of P300-based concealed information test, one-probe P300-based concealed information test protocol is explored with various machine learning algorithms. According to this study, parameters of the modified one-probe protocol are as follows. In the condition of female and male face stimuli, the duration of stimuli are encouraged 400ms, the repetition of stimuli are encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. In the condition of two-syllable word stimulus, the duration of stimulus is encouraged 300ms, the repetition of stimulus is encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. It was also conformed that the logistic regression (LR), linear discriminant analysis (LDA), K Neighbors (KNN) algorithms were probable methods for analysis of P300 amplitude. The one-probe P300-based concealed information test with machine learning protocol is helpful to increase utilization of P300-based concealed information test, and supports to determine a person's truthfulness and credibility with the polygraph examination in criminal procedure.

Rainfall image DB construction for rainfall intensity estimation from CCTV videos: focusing on experimental data in a climatic environment chamber (CCTV 영상 기반 강우강도 산정을 위한 실환경 실험 자료 중심 적정 강우 이미지 DB 구축 방법론 개발)

  • Byun, Jongyun;Jun, Changhyun;Kim, Hyeon-Joon;Lee, Jae Joon;Park, Hunil;Lee, Jinwook
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.6
    • /
    • pp.403-417
    • /
    • 2023
  • In this research, a methodology was developed for constructing an appropriate rainfall image database for estimating rainfall intensity based on CCTV video. The database was constructed in the Large-Scale Climate Environment Chamber of the Korea Conformity Laboratories, which can control variables with high irregularity and variability in real environments. 1,728 scenarios were designed under five different experimental conditions. 36 scenarios and a total of 97,200 frames were selected. Rain streaks were extracted using the k-nearest neighbor algorithm by calculating the difference between each image and the background. To prevent overfitting, data with pixel values greater than set threshold, compared to the average pixel value for each image, were selected. The area with maximum pixel variability was determined by shifting with every 10 pixels and set as a representative area (180×180) for the original image. After re-transforming to 120×120 size as an input data for convolutional neural networks model, image augmentation was progressed under unified shooting conditions. 92% of the data showed within the 10% absolute range of PBIAS. It is clear that the final results in this study have the potential to enhance the accuracy and efficacy of existing real-world CCTV systems with transfer learning.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.