• Title/Summary/Keyword: euclidean similarity

Search Result 121, Processing Time 0.024 seconds

Path Prediction of Moving Objects on Road Networks through Analyzing Past Trajectories (도로 네트워크에서 이동 객체의 과거 궤적 분석을 통한 미래 경로 예측)

  • Kim, Jong-Dae;Won, Jung-Im;Kim, Sang-Wook
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.2 s.17
    • /
    • pp.109-120
    • /
    • 2006
  • This paper addresses techniques for predicting a future path of an object moving on a road network. Most prior methods for future prediction mainly focus their attention on objects moving in Euclidean space. A variety of applications such as telematics, however, deal with objects that move only over road networks in most cases, thereby requiring an effective method of future prediction of moving objects on road networks. In this paper, we propose a novel method for predicting a future path of an object by analyzing past trajectories whose changing pattern is similar to that of a current trajectory of a query object. We devise a new function that measures a similarity between trajectories by reflecting the characteristics of road networks. By using this function, we predict a future path of a given moving object as follows: First, we search for candidate trajectories that contain subtrajectories similar to a given query trajectory by accessing past trajectories stored in moving object databases. Then, we predict a future path of a query object by analyzing the moving paths along with a current position to a destination of candidate trajectories thus retrieved. Also, we suggest a method that improves the accuracy of path prediction by regarding moving paths that have just small differences as the same group.

  • PDF

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Habitat Classification and Distribution Characteristic of Aquatic Insect Functional Feeding Groups in the Geum River, Korea (금강 수계 서식지 유형분류 및 수서곤충 섭식기능군 분포특성)

  • Park, Young-Jun;Kim, Ki-Dong;Cho, Young-Ho;Han, Yong-Gu;Kim, Yeong-Jin;Nam, Sang-Ho
    • Korean Journal of Environment and Ecology
    • /
    • v.25 no.5
    • /
    • pp.691-709
    • /
    • 2011
  • This study was performed to classify habitat types depending on environmental factors and to find out distribution characteristics of functional feeding groups of aquatic insects which were collected at that habitat types. Field survey was conducted twice in a year for every spring and fall from 2007 to 2008 for 38 sites in the Geum River. During the field survey 15 environmental factors were measured at each 38 sites and analyzed by similarity analysis method to classify habitat types. The result of similarity analysis showed that the 38 sites could be grouped into 7 classes like as C1 and C3 class belong to Head water(HD), C2 and C4 and C5 class belong to Middle stream(MS), C6 and C7 class belong to Large River(LR) based on euclidean distances 4. And also, we could extract the main environmental factors affecting the classification of habitat types such as Stream Width and Elevation of physical environmental factors, Water Temperature, Conductivity and DO of chemical environmental factors, percentages of Sand, Silt and Gravel of substrate factors. Total 142 species of aquatic insects in 46 families, 9 orders were collected during the field surveys and the occurrence number of species and individuals showed high correlation with the Velocity factor and the percentage of Sand factor of each habitat types. In addition, correlation analysis between functional feeding groups and environmental factors represented that (1) Filtering-collectors(FC) affected by Velocity, Stream Width and Silt, (2) Gathering-collector(GC) affected by Velocity, (3) Predator(P) affected by Elevation, Velocity, Boulder, Conductivity and Sand, (4) Plant-piecer(PP) affected by Water Width and Silt, (5) Scraper(SC) affected by Elevation and Conductivity, (6) Shredder(SH) affected by Elevation, Boulder, DO, pH, Conductivity and Water Temperature respectively. As a result of this study, Elevation, Stream Width, Velocity, Conductivity, Water Temperature and percentage of Sand factors which were deduced by stepwise multiple regression analysis had correlations($r{\geqq}0.600$, p<0.01) with biota community inhabitation. Therefore these six environmental factors were regarded as major environmental factors that might affect highly the distribution of functional feeding groups in stream ecosystem of the Geum River.

Video Scene Detection using Shot Clustering based on Visual Features (시각적 특징을 기반한 샷 클러스터링을 통한 비디오 씬 탐지 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.47-60
    • /
    • 2012
  • Video data comes in the form of the unstructured and the complex structure. As the importance of efficient management and retrieval for video data increases, studies on the video parsing based on the visual features contained in the video contents are researched to reconstruct video data as the meaningful structure. The early studies on video parsing are focused on splitting video data into shots, but detecting the shot boundary defined with the physical boundary does not cosider the semantic association of video data. Recently, studies on structuralizing video shots having the semantic association to the video scene defined with the semantic boundary by utilizing clustering methods are actively progressed. Previous studies on detecting the video scene try to detect video scenes by utilizing clustering algorithms based on the similarity measure between video shots mainly depended on color features. However, the correct identification of a video shot or scene and the detection of the gradual transitions such as dissolve, fade and wipe are difficult because color features of video data contain a noise and are abruptly changed due to the intervention of an unexpected object. In this paper, to solve these problems, we propose the Scene Detector by using Color histogram, corner Edge and Object color histogram (SDCEO) that clusters similar shots organizing same event based on visual features including the color histogram, the corner edge and the object color histogram to detect video scenes. The SDCEO is worthy of notice in a sense that it uses the edge feature with the color feature, and as a result, it effectively detects the gradual transitions as well as the abrupt transitions. The SDCEO consists of the Shot Bound Identifier and the Video Scene Detector. The Shot Bound Identifier is comprised of the Color Histogram Analysis step and the Corner Edge Analysis step. In the Color Histogram Analysis step, SDCEO uses the color histogram feature to organizing shot boundaries. The color histogram, recording the percentage of each quantized color among all pixels in a frame, are chosen for their good performance, as also reported in other work of content-based image and video analysis. To organize shot boundaries, SDCEO joins associated sequential frames into shot boundaries by measuring the similarity of the color histogram between frames. In the Corner Edge Analysis step, SDCEO identifies the final shot boundaries by using the corner edge feature. SDCEO detect associated shot boundaries comparing the corner edge feature between the last frame of previous shot boundary and the first frame of next shot boundary. In the Key-frame Extraction step, SDCEO compares each frame with all frames and measures the similarity by using histogram euclidean distance, and then select the frame the most similar with all frames contained in same shot boundary as the key-frame. Video Scene Detector clusters associated shots organizing same event by utilizing the hierarchical agglomerative clustering method based on the visual features including the color histogram and the object color histogram. After detecting video scenes, SDCEO organizes final video scene by repetitive clustering until the simiarity distance between shot boundaries less than the threshold h. In this paper, we construct the prototype of SDCEO and experiments are carried out with the baseline data that are manually constructed, and the experimental results that the precision of shot boundary detection is 93.3% and the precision of video scene detection is 83.3% are satisfactory.

An Expert System for the Estimation of the Growth Curve Parameters of New Markets (신규시장 성장모형의 모수 추정을 위한 전문가 시스템)

  • Lee, Dongwon;Jung, Yeojin;Jung, Jaekwon;Park, Dohyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.17-35
    • /
    • 2015
  • Demand forecasting is the activity of estimating the quantity of a product or service that consumers will purchase for a certain period of time. Developing precise forecasting models are considered important since corporates can make strategic decisions on new markets based on future demand estimated by the models. Many studies have developed market growth curve models, such as Bass, Logistic, Gompertz models, which estimate future demand when a market is in its early stage. Among the models, Bass model, which explains the demand from two types of adopters, innovators and imitators, has been widely used in forecasting. Such models require sufficient demand observations to ensure qualified results. In the beginning of a new market, however, observations are not sufficient for the models to precisely estimate the market's future demand. For this reason, as an alternative, demands guessed from those of most adjacent markets are often used as references in such cases. Reference markets can be those whose products are developed with the same categorical technologies. A market's demand may be expected to have the similar pattern with that of a reference market in case the adoption pattern of a product in the market is determined mainly by the technology related to the product. However, such processes may not always ensure pleasing results because the similarity between markets depends on intuition and/or experience. There are two major drawbacks that human experts cannot effectively handle in this approach. One is the abundance of candidate reference markets to consider, and the other is the difficulty in calculating the similarity between markets. First, there can be too many markets to consider in selecting reference markets. Mostly, markets in the same category in an industrial hierarchy can be reference markets because they are usually based on the similar technologies. However, markets can be classified into different categories even if they are based on the same generic technologies. Therefore, markets in other categories also need to be considered as potential candidates. Next, even domain experts cannot consistently calculate the similarity between markets with their own qualitative standards. The inconsistency implies missing adjacent reference markets, which may lead to the imprecise estimation of future demand. Even though there are no missing reference markets, the new market's parameters can be hardly estimated from the reference markets without quantitative standards. For this reason, this study proposes a case-based expert system that helps experts overcome the drawbacks in discovering referential markets. First, this study proposes the use of Euclidean distance measure to calculate the similarity between markets. Based on their similarities, markets are grouped into clusters. Then, missing markets with the characteristics of the cluster are searched for. Potential candidate reference markets are extracted and recommended to users. After the iteration of these steps, definite reference markets are determined according to the user's selection among those candidates. Then, finally, the new market's parameters are estimated from the reference markets. For this procedure, two techniques are used in the model. One is clustering data mining technique, and the other content-based filtering of recommender systems. The proposed system implemented with those techniques can determine the most adjacent markets based on whether a user accepts candidate markets. Experiments were conducted to validate the usefulness of the system with five ICT experts involved. In the experiments, the experts were given the list of 16 ICT markets whose parameters to be estimated. For each of the markets, the experts estimated its parameters of growth curve models with intuition at first, and then with the system. The comparison of the experiments results show that the estimated parameters are closer when they use the system in comparison with the results when they guessed them without the system.

The Principles of Fractal Geometry and Its Applications for Pulp & Paper Industry (펄프·제지 산업에서의 프랙탈 기하 원리 및 그 응용)

  • Ko, Young Chan;Park, Jong-Moon;Shin, Soo-Jung
    • Journal of Korea Technical Association of The Pulp and Paper Industry
    • /
    • v.47 no.4
    • /
    • pp.177-186
    • /
    • 2015
  • Until Mandelbrot introduced the concept of fractal geometry and fractal dimension in early 1970s, it has been generally considered that the geometry of nature should be too complex and irregular to describe analytically or mathematically. Here fractal dimension indicates a non-integer number such as 0.5, 1.5, or 2.5 instead of only integers used in the traditional Euclidean geometry, i.e., 0 for point, 1 for line, 2 for area, and 3 for volume. Since his pioneering work on fractal geometry, the geometry of nature has been found fractal. Mandelbrot introduced the concept of fractal geometry. For example, fractal geometry has been found in mountains, coastlines, clouds, lightning, earthquakes, turbulence, trees and plants. Even human organs are found to be fractal. This suggests that the fractal geometry should be the law for Nature rather than the exception. Fractal geometry has a hierarchical structure consisting of the elements having the same shape, but the different sizes from the largest to the smallest. Thus, fractal geometry can be characterized by the similarity and hierarchical structure. A process requires driving energy to proceed. Otherwise, the process would stop. A hierarchical structure is considered ideal to generate such driving force. This explains why natural process or phenomena such as lightning, thunderstorm, earth quakes, and turbulence has fractal geometry. It would not be surprising to find that even the human organs such as the brain, the lung, and the circulatory system have fractal geometry. Until now, a normal frequency distribution (or Gaussian frequency distribution) has been commonly used to describe frequencies of an object. However, a log-normal frequency distribution has been most frequently found in natural phenomena and chemical processes such as corrosion and coagulation. It can be mathematically shown that if an object has a log-normal frequency distribution, it has fractal geometry. In other words, these two go hand in hand. Lastly, applying fractal principles is discussed, focusing on pulp and paper industry. The principles should be applicable to characterizing surface roughness, particle size distributions, and formation. They should be also applicable to wet-end chemistry for ideal mixing, felt and fabric design for papermaking process, dewatering, drying, creping, and post-converting such as laminating, embossing, and printing.

A Systematic Study of the Theaceae 6 Species in Korea (한국산(韓國產) 차나무과(科) 6종(種)의 계통(系統) 분류학적(分類學的) 연구(硏究))

  • Kim, Sam Sik;Lee, Jeong Hwan
    • Journal of Korean Society of Forest Science
    • /
    • v.82 no.4
    • /
    • pp.431-440
    • /
    • 1993
  • This study was carried out to clarify a taxonomical relationships of the Korean Theaceae using characters from morphological, anatomical, electrophoretic and numerical methods. The results are summarized as follows ; Morphological data were cluster analysis by Euclidean distance, the complete and average linkage cluster were most distinctly classified into subfamily level. At the principal components analysis(PCA), the commutative contribution rate of three principal components showed to 91.1% total variance. By the leaf venation were classified semicraspedromous type of Theoideae and brochidodromous type of Ternstroemioideae. The stomatal types were classified Paracytic of Theoideae and Anomocytic type of Ternstroemioideae ; the former has founded subsideary cell the latter has not found. All taxa possessed common isozyme bands did not found out of Theaceae banding patterns. But, the activity of Theoideae were existed in below No.5(Rf. 4.0-4.4), in contrast to Ternstroemioideae were existed in more than No.7(Rf. 5.7-6.2). The cluster analysis of leaf characters and peroxidase isozymes were similarity between two methods.

  • PDF

Detection of M:N corresponding class group pairs between two spatial datasets with agglomerative hierarchical clustering (응집 계층 군집화 기법을 이용한 이종 공간정보의 M:N 대응 클래스 군집 쌍 탐색)

  • Huh, Yong;Kim, Jung-Ok;Yu, Ki-Yun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.2
    • /
    • pp.125-134
    • /
    • 2012
  • In this paper, we propose a method to analyze M:N corresponding relations in semantic matching, especially focusing on feature class matching. Similarities between any class pairs are measured by spatial objects which coexist in the class pairs, and corresponding classes are obtained by clustering with these pairwise similarities. We applied a graph embedding method, which constructs a global configuration of each class in a low-dimensional Euclidean space while preserving the above pairwise similarities, so that the distances between the embedded classes are proportional to the overall degree of similarity on the edge paths in the graph. Thus, the clustering problem could be solved by employing a general clustering algorithm with the embedded coordinates. We applied the proposed method to polygon object layers in a topographic map and land parcel categories in a cadastral map of Suwon area and evaluated the results. F-measures of the detected class pairs were analyzed to validate the results. And some class pairs which would not detected by analysis on nominal class names were detected by the proposed method.

Managing the Reverse Extrapolation Model of Radar Threats Based Upon an Incremental Machine Learning Technique (점진적 기계학습 기반의 레이더 위협체 역추정 모델 생성 및 갱신)

  • Kim, Chulpyo;Noh, Sanguk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.4
    • /
    • pp.29-39
    • /
    • 2017
  • Various electronic warfare situations drive the need to develop an integrated electronic warfare simulator that can perform electronic warfare modeling and simulation on radar threats. In this paper, we analyze the components of a simulation system to reversely model the radar threats that emit electromagnetic signals based on the parameters of the electronic information, and propose a method to gradually maintain the reverse extrapolation model of RF threats. In the experiment, we will evaluate the effectiveness of the incremental model update and also assess the integration method of reverse extrapolation models. The individual model of RF threats are constructed by using decision tree, naive Bayesian classifier, artificial neural network, and clustering algorithms through Euclidean distance and cosine similarity measurement, respectively. Experimental results show that the accuracy of reverse extrapolation models improves, while the size of the threat sample increases. In addition, we use voting, weighted voting, and the Dempster-Shafer algorithm to integrate the results of the five different models of RF threats. As a result, the final decision of reverse extrapolation through the Dempster-Shafer algorithm shows the best performance in its accuracy.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.