• Title/Summary/Keyword: k-Nearest neighbor

Search Result 641, Processing Time 0.033 seconds

Assessment of Climate Chanage Effect on Temperature and Drought in Seoul : Based on the AR4 SRES A2 Senario (기후변화가 서울지역의 기온 및 가뭄에 미치는 영향 평가 : AR4 SRES A2 시나리오를 기반으로)

  • Kyoung, Minsoo;Lee, Yongwon;Kim, Hungsoo;Kim, Byungsik
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.2B
    • /
    • pp.181-191
    • /
    • 2009
  • This study suggests the assessment technique for climate change effect on drought in Korea based on the AR4 SRES A2 scenario reported in IPCC fourth assessment report in 2007. IPCC provides monthly outputs of 24 climate models through the DDC. One of the models is BCM2 model which was developed at BCCR in Norway and NCEP data is used for downscaling. The K-NN(K-Nearest Neighbor) and ANN(Artificial Neural Network) are selected as downscaling technique to downscale the temperature and precipitation at Seoul station in Korea. K-NN could downscale both temperature and precipitation well. ANN made a good result for temperature, but it gave a divergence result in precipitation. Finally, SPI of Seoul station is computed to evaluate the effect of climate change on drought. BCM2 predicted that temperature will increase and drought severity will increase because of the increased drought spell at Seoul station.

Overview of Research Trends in Estimation of Forest Carbon Stocks Based on Remote Sensing and GIS (원격탐사와 GIS 기반의 산림탄소저장량 추정에 관한 주요국 연구동향 개관)

  • Kim, Kyoung-Min;Lee, Jung-Bin;Kim, Eun-Sook;Park, Hyun-Ju;Roh, Young-Hee;Lee, Seung-Ho;Park, Key-Ho;Shin, Hyu-Seok
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.3
    • /
    • pp.236-256
    • /
    • 2011
  • Forest carbon stocks change due to land use change is an important data required by UNFCCC(United Nations framework convention on climate change). Spatially explicit estimation of forest carbon stocks based on IPCC GPG(intergovernmental panel on climate change good practice guidance) tier 3 gives high reliability. But a current estimation which was aggregated from NFI data doesn't have detail forest carbon stocks by polygon or cell. In order to improve an estimation remote sensing and GIS have been used especially in Europe and North America. We divided research trends in main countries into 4 categories such as remote sensing, GIS, geostatistics and environmental modeling considering spatial heterogeneity. The easiest way to apply is combination NFI data with forest type map based on GIS. Considering especially complicated forest structure of Korea, geostatistics is useful to estimate local variation of forest carbon. In addition, fine scale image is good for verification of forest carbon stocks and determination of CDM site. Related domestic researches are still on initial status and forest carbon stocks are mainly estimated using k-nearest neighbor(k-NN). In order to select suitable method for forest in Korea, an applicability of diverse spatial data and algorithm must be considered. Also the comparison between methods is required.

Research on Oriental Medicine Diagnosis and Classification System by Using Neck Pain Questionnaire (경항통 설문지를 이용한 한의학적 진단 및 분류체계에 관한 연구)

  • Song, In;Lee, Geon-Mok;Hong, Kwon-Eui
    • Journal of Acupuncture Research
    • /
    • v.28 no.3
    • /
    • pp.85-100
    • /
    • 2011
  • Objectives : The purpose of this thesis is to help the preparation of oriental medicine clinical guidelines for drawing up the standards of oriental medicine demonstration and diagnosis classification about the neck pain. Methods : Statistical analysis about Gyeonghangtong(頸項痛), Nakchim(落枕), Sagyeong(斜頸), Hanggang (項强) classified experts' opinions about neck pain patients by Delphi method is conducted by using oriental medicine diagnosis questionnaire. The result was classified by using linear discriminant analysis (LDA), diagonal linear discriminant analysis (DLDA), diagonal quadratic discriminant analysis (DQDA), K-nearest neighbor classification (KNN), classification and regression trees (CART), support vector machines (SVM). Results : The results are summarized as follows. 1. The result analyzed by using LDA has a hit rate of 84.47% in comparison with the original diagnosis. 2. High hit rate was shown when the test for three categories such as Gyeonghangtong and Hanggang category, Sagyeong caterogy and Nakchim caterogy was conducted. 3. The result analyzed by using DLDA has a hit rate of 58.25% in comparison with the original diagnosis. The result analyzed by using DQDA has a accuracy of 57.28% in comparison with the original diagnosis. 4. The result analyzed by using KNN has a hit rate of 69.90% in comparison with the original diagnosis. 5. The result analyzed by using CART has a hit rate of 69.60% in comparison with the original diagnosis. There was a hit rate of 70.87% When the test of selected 8 significant questions based on analysis of variance was performed. 6. The result analyzed by using SVM has a hit rate of 80.58% in comparison with the original diagnosis. Conclusions : Statistical analysis using oriental medicine diagnosis questionnaire on neck pain generally turned out to have a significant result.

Experiments of Unmanned Underwater Vehicle's 3 Degrees of Freedom Motion Applied the SLAM based on the Unscented Kalman Filter (무인 잠수정 3자유도 운동 실험에 대한 무향 칼만 필터 기반 SLAM기법 적용)

  • Hwang, A-Rom;Seong, Woo-Jae;Jun, Bong-Huan;Lee, Pan-Mook
    • Journal of Ocean Engineering and Technology
    • /
    • v.23 no.2
    • /
    • pp.58-68
    • /
    • 2009
  • The increased use of unmanned underwater vehicles (UUV) has led to the development of alternative navigational methods that do not employ acoustic beacons and dead reckoning sensors. This paper describes a simultaneous localization and mapping (SLAM) scheme that uses range sonars mounted on a small UUV. A SLAM scheme is an alternative navigation method for measuring the environment through which the vehicle is passing and providing the relative position of the UUV. A technique for a SLAM algorithm that uses several ranging sonars is presented. This technique utilizes an unscented Kalman filter to estimate the locations of the UUV and surrounding objects. In order to work efficiently, the nearest neighbor standard filter is introduced as the data association algorithm in the SLAM for associating the stored targets returned by the sonar at each time step. The proposed SLAM algorithm was tested by experiments under various three degrees of freedom motion conditions. The results of these experiments showed that the proposed SLAM algorithm was capable of estimating the position of the UUV and the surrounding objects and demonstrated that the algorithm will perform well in various environments.

Effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment

  • Kim, Byung-Soo;Rha, Sun-Young
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.67-72
    • /
    • 2006
  • The aim of this paper is to discuss the effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment in the context of a one sample problem. We conducted a cDNA micro array experiment to detect differentially expressed genes for the metastasis of colorectal cancer based on twenty patients who underwent liver resection due to liver metastasis from colorectal cancer. Total RNAs from metastatic liver tumor and adjacent normal liver tissue from a single patient were labeled with cy5 and cy3, respectively, and competitively hybridized to a cDNA microarray with 7775 human genes. We used $M=log_2(R/G)$ for the signal evaluation, where Rand G denoted the fluorescent intensities of Cy5 and Cy3 dyes, respectively. The statistical problem comprises a one sample test of testing E(M)=0 for each gene and involves multiple tests. The twenty cDNA microarray data would comprise a matrix of dimension 7775 by 20, if there were no missing values. However, missing values occur for various reasons. For each gene, the no missing proportion (NMP) was defined to be the proportion of non-missing values out of twenty. In detecting differentially expressed (DE) genes, we used the genes whose NMP is greater than or equal to 0.4 and then sequentially increased NMP by 0.1 for investigating its effect on the detection of DE genes. For each fixed NMP, we imputed the missing values with K-nearest neighbor method (K=10) and applied the nonparametric t-test of Dudoit et al. (2002), SAM by Tusher et al. (2001) and empirical Bayes procedure by $L\ddot{o}nnstedt$ and Speed (2002) to find out the effect of missing values in the final outcome. These three procedures yielded substantially agreeable result in detecting DE genes. Of these three procedures we used SAM for exploring the acceptable NMP level. The result showed that the optimum no missing proportion (NMP) found in this data set turned out to be 80%. It is more desirable to find the optimum level of NMP for each data set by applying the method described in this note, when the plot of (NMP, Number of overlapping genes) shows a turning point.

  • PDF

Fast Search with Data-Oriented Multi-Index Hashing for Multimedia Data

  • Ma, Yanping;Zou, Hailin;Xie, Hongtao;Su, Qingtang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2599-2613
    • /
    • 2015
  • Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it di-vides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method (DOMIH). We first compute the covariance ma-trix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned bina-ry codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of DOMIH can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%. To pinpoint the potential of DOMIH, we further use near-duplicate image retrieval as examples to show the applications and the good performance of our method.

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

Performance comparison of machine learning classification methods for decision of disc cutter replacement of shield TBM (쉴드 TBM 디스크 커터 교체 유무 판단을 위한 머신러닝 분류기법 성능 비교)

  • Kim, Yunhee;Hong, Jiyeon;Kim, Bumjoo
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.22 no.5
    • /
    • pp.575-589
    • /
    • 2020
  • In recent years, Shield TBM construction has been continuously increasing in domestic tunnels. The main excavation tool in the shield TBM construction is a disc cutter which naturally wears during the excavation process and significantly degrades the excavation efficiency. Therefore, it is important to know the appropriate time of the disc cutter replacement. In this study, it is proposed a predictive model that can determine yes/no of disc cutter replacement using machine learning algorithm. To do this, the shield TBM machine data which is highly correlated to the disc cutter wears and the disc cutter replacement from the shield TBM field which is already constructed are used as the input data in the model. Also, the algorithms used in the study were the support vector machine, k-nearest neighbor algorithm, and decision tree algorithm are all classification methods used in machine learning. In order to construct an optimal predictive model and to evaluate the performance of the model, the classification performance evaluation index was compared and analyzed.

Stock prediction analysis through artificial intelligence using big data (빅데이터를 활용한 인공지능 주식 예측 분석)

  • Choi, Hun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1435-1440
    • /
    • 2021
  • With the advent of the low interest rate era, many investors are flocking to the stock market. In the past stock market, people invested in stocks labor-intensively through company analysis and their own investment techniques. However, in recent years, stock investment using artificial intelligence and data has been widely used. The success rate of stock prediction through artificial intelligence is currently not high, so various artificial intelligence models are trying to increase the stock prediction rate. In this study, we will look at various artificial intelligence models and examine the pros and cons and prediction rates between each model. This study investigated as stock prediction programs using artificial intelligence artificial neural network (ANN), deep learning or hierarchical learning (DNN), k-nearest neighbor algorithm(k-NN), convolutional neural network (CNN), recurrent neural network (RNN), and LSTMs.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF