• Title/Summary/Keyword: 밀도기반군집

Search Result 41, Processing Time 0.021 seconds

Discretization of Continuous-Valued Attributes considering Data Distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • Lee, Sang-Hoon;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.391-396
    • /
    • 2003
  • This paper proposes a new approach that converts continuous-valued attributes to categorical-valued ones considering the distribution of target attributes(classes). In this approach, It can be possible to get optimal interval boundaries by considering the distribution of data itself without any requirements of parameters. For each attributes, the distribution of target attributes is projected to one-dimensional space. And this space is clustered according to the criteria like as the density value of each target attributes and the amount of overlapped areas among each density values of target attributes. Clusters which are made in this ways are based on the probabilities that can predict a target attribute of instances. Therefore it has an interval boundaries that minimize a loss of information of original data. An improved performance of proposed discretization method can be validated using C4.5 algorithm and UCI Machine Learning Data Repository data sets.

Response of Terrestrial Insect Community to the Vegetation Invasion at a Sand-Bed Stream (모래하천에서 식생 침입에 대한 육상곤충 군집의 반응)

  • Cho, Geonho;Cho, Kang-Hyun
    • Ecology and Resilient Infrastructure
    • /
    • v.4 no.1
    • /
    • pp.44-53
    • /
    • 2017
  • In order to investigate the response in fauna and biological communities of terrestrial insects to the vegetation encroachment on the sandbar, species composition, species diversity, functional species traits and community structure of land-dwelling insects sampled by a pit-fall trap were compared at the bare and vegetated sandbar of a typical sand-bed stream, the Naeseong Stream, Korea. Species diversity of the insects was increased but their density was decreased as the riparian vegetation encroached at the sandbar. In particular, indicator species of bare sandbar such as Cicindela laetescripta and Dianemobius csikii, were found at the bar sandbar. The insect communities were clearly classified at the bare and vegetated sandbar according to coverages of riparian plants. The food web of the bare sandbar was composed of detritus - detritivore and scavenger - predator consisted mainly of Coleoptera. On the other hand, the food web of the vegetated sandbar was composed of plants - sucking and chewing herbivore - parasitoid and predator. These results showed that biodiversity of terrestrial insects was increased, food web was changed from grazing to detritus food chain, and insect fauna specific bare sandbar disappeared as the riparian vegetation invaded on the sandbar of a sand-bed stream.

Design and Development of Clustering Algorithm Considering Influences of Spatial Objects (공간객체의 영향력을 고려한 클러스터링 알고리즘의 설계와 구현)

  • Kim, Byung-Cheol
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.113-120
    • /
    • 2006
  • This paper proposes DBSCAN-SI that is an algorithm for clustering with influences of spatial objects. DBSCAN-SI that is extended from existing DBSCAN and DBSCAN-W converts from non-spatial properties to the influences of spatial objects during the spatial clustering. It increases probability of inclusion to the cluster according to the higher the influences that is affected by the properties used in clustering and executes the clustering not only respect the spatial distances, but also volume of influences. For the perspective of specific property-centered, the clustering technique proposed in this paper can makeup the disadvantage of existing algorithms that exclude the objects in spite of high influences from cluster by means of being scarcely close objects around the cluster.

  • PDF

Development of Traffic State Classification Technique (교통상황 분류를 위한 클러스터링 기법 개발)

  • Woojin Kang;Youngho Kim
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.1
    • /
    • pp.81-92
    • /
    • 2023
  • Traffic state classification is crucial for time-of-day (TOD) traffic signal control. This paper proposed a traffic state classification technique applying Deep-Embedded Clustering (DEC) method that uses a high dimensional traffic data observed at all signalized intersections in a traffic signal control sub area (SA). So far, signal timing plan has been determined based on the traffic data observed at the critical intersection in SA. The current method has a limitation that it cannot consider a comprehensive traffic situation in SA. The proposed method alleviates the curse of dimensionality and turns out to overcome the shortcomings of the current signal timing plan.

Validation of Suitable Zooplankton Enumeration Method for Species Diversity Study Using Rarefaction Curve and Extrapolation (종 다양성 평가를 위한 호소 생태계 동물플랑크톤 조사 방법 연구: 희박화 분석(rarefaction analysis)을 이용한 적정 시료 농축 정도 및 부차 시료 추출량의 검증)

  • Hye-Ji Oh;Yerim Choi;Hyunjoon Kim;Geun-Hyeok Hong;Young-Seuk Park;Yong-Jae Kim;Kwang-Hyeon Chang
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.4
    • /
    • pp.274-284
    • /
    • 2022
  • Through sample-size-based rarefaction analyses, we tried to suggest the appropriate degree of sample concentration and sub-sample extraction, as a way to estimate more accurate zooplankton species diversity when assessing biodiversity. When we collected zooplankton from three reservoirs with different environmental characteristics, the estimated species richness (S) and Shannon's H' values showed different changing patterns according to the amount of sub-sample extracted from the whole sample by reservoir. However, consequently, their zooplankton diversity indices were estimated the highest values when analyzed by extracting the largest amount of sub-sample. As a result of rarefaction analysis about sample coverage, in the case of deep eutrophic reservoir (Juam) with high zooplankton species and individual numbers, it was analyzed that 99.8% of the whole samples were represented by only 1 mL of sub-sample based on 100 mL of concentrated samples. On the other hand, in Soyang reservoir, which showed very small species and individual numbers, a relatively low representation at 97% when 10 mL of sub-sample was extracted from the same amount of concentrated sample. As such, the representation of sub-sample for the whole zooplankton sample varies depending on the individual density in the sample collected from the field. If the degree of concentration of samples and the amount of sub-sample extraction are adjusted according to the collected individual density, it is believed that errors that occur when comparing the number of species and diversity indices among different water bodies can be minimized.

Graph Cut-based Automatic Color Image Segmentation using Mean Shift Analysis (Mean Shift 분석을 이용한 그래프 컷 기반의 자동 칼라 영상 분할)

  • Park, An-Jin;Kim, Jung-Whan;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.11
    • /
    • pp.936-946
    • /
    • 2009
  • A graph cuts method has recently attracted a lot of attentions for image segmentation, as it can globally minimize energy functions composed of data term that reflects how each pixel fits into prior information for each class and smoothness term that penalizes discontinuities between neighboring pixels. In previous approaches to graph cuts-based automatic image segmentation, GMM(Gaussian mixture models) is generally used, and means and covariance matrixes calculated by EM algorithm were used as prior information for each cluster. However, it is practicable only for clusters with a hyper-spherical or hyper-ellipsoidal shape, as the cluster was represented based on the covariance matrix centered on the mean. For arbitrary-shaped clusters, this paper proposes graph cuts-based image segmentation using mean shift analysis. As a prior information to estimate the data term, we use the set of mean trajectories toward each mode from initial means randomly selected in $L^*u^*{\upsilon}^*$ color space. Since the mean shift procedure requires many computational times, we transform features in continuous feature space into 3D discrete grid, and use 3D kernel based on the first moment in the grid, which are needed to move the means to modes. In the experiments, we investigate the problems of mean shift-based and normalized cuts-based image segmentation methods that are recently popular methods, and the proposed method showed better performance than previous two methods and graph cuts-based automatic image segmentation using GMM on Berkeley segmentation dataset.

GC-Tree: A Hierarchical Index Structure for Image Databases (GC-트리 : 이미지 데이타베이스를 위한 계층 색인 구조)

  • 차광호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.13-22
    • /
    • 2004
  • With the proliferation of multimedia data, there is an increasing need to support the indexing and retrieval of high-dimensional image data. Although there have been many efforts, the performance of existing multidimensional indexing methods is not satisfactory in high dimensions. Thus the dimensionality reduction and the approximate solution methods were tried to deal with the so-called dimensionality curse. But these methods are inevitably accompanied by the loss of precision of query results. Therefore, recently, the vector approximation-based methods such as the VA- file and the LPC-file were developed to preserve the precision of query results. However, the performance of the vector approximation-based methods depend largely on the size of the approximation file and they lose the advantages of the multidimensional indexing methods that prune much search space. In this paper, we propose a new index structure called the GC-tree for efficient similarity search in image databases. The GC-tree is based on a special subspace partitioning strategy which is optimized for clustered high-dimensional images. It adaptively partitions the data space based on a density function and dynamically constructs an index structure. The resultant index structure adapts well to the strongly clustered distribution of high-dimensional images.

Game-bot detection based on Clustering of asset-varied location coordinates (자산변동 좌표 클러스터링 기반 게임봇 탐지)

  • Song, Hyun Min;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.5
    • /
    • pp.1131-1141
    • /
    • 2015
  • In this paper, we proposed a new approach of machine learning based method for detecting game-bots from normal players in MMORPG by inspecting the player's action log data especially in-game money increasing/decreasing event log data. DBSCAN (Density Based Spatial Clustering of Applications with Noise), an one of density based clustering algorithms, is used to extract the attributes of spatial characteristics of each players such as a number of clusters, a ratio of core points, member points and noise points. Most of all, even game-bot developers know principles of this detection system, they cannot avoid the system because moving a wide area to hunt the monster is very inefficient and unproductive. As the result, game-bots show definite differences from normal players in spatial characteristics such as very low ratio, less than 5%, of noise points while normal player's ratio of noise points is high. In experiments on real action log data of MMORPG, our game-bot detection system shows a good performance with high game-bot detection accuracy.

Patent data analysis using clique analysis in a keyword network (키워드 네트워크의 클릭 분석을 이용한 특허 데이터 분석)

  • Kim, Hyon Hee;Kim, Donggeon;Jo, Jinnam
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1273-1284
    • /
    • 2016
  • In this paper, we analyzed the patents on machine learning using keyword network analysis and clique analysis. To construct a keyword network, important keywords were extracted based on the TF-IDF weight and their association, and network structure analysis and clique analysis was performed. Density and clustering coefficient of the patent keyword network are low, which shows that patent keywords on machine learning are weakly connected with each other. It is because the important patents on machine learning are mainly registered in the application system of machine learning rather thant machine learning techniques. Also, our results of clique analysis showed that the keywords found by cliques in 2005 patents are the subjects such as newsmaker verification, product forecasting, virus detection, biomarkers, and workflow management, while those in 2015 patents contain the subjects such as digital imaging, payment card, calling system, mammogram system, price prediction, etc. The clique analysis can be used not only for identifying specialized subjects, but also for search keywords in patent search systems.

Analysis of Vegetation Variation after the Rehabilitation Treatment of Stream (자연형 하천 공법 적용후의 식생변화분석 - 서울시 양재천의 학여울 구간을 중심으로 -)

  • Shin, Joung-Yi
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.2 no.3
    • /
    • pp.10-17
    • /
    • 1999
  • In order to confirm the effectiveness of the natural river improvement technique, the analysis of vegetation was carried out in Yangjae stream between 1996 and 1998. The results of this study showed the numbers of riparian plants had increased from 41 species to 53 species, and the dominant species had changed from annual and biannual(Humulus japonicus, Persicaria thunbergii, Persicaria hydropiper, Panicum dichotomiflorum, Echinochloa crus-galli) to perennials (Phragmites communis). The variation in biomass and biodiversity index were measured and calculated according to the rehabilitation method. Biomass were varied 302 to $828g/m^2$ and biodiversity index was varied 1.53 to 1.52 at point bar plots(A treatment plots) from 1996 to 1998. In conclusion, the natural river improvement technique which has operated in Yanjaecheon for three years has contributed to restoration of riparian plants. Additionally, subsequent study using this technique should be followed in the near future.

  • PDF