Search | Korea Science

Classification Techniques for XML Document Using Text Mining (텍스트 마이닝을 이용한 XML 문서 분류 기술)

Kim Cheon-Shik;Hong You-Sik
- Journal of the Korea Society of Computer and Information
- /
- v.11 no.2 s.40
- /
- pp.15-23
- /
- 2006
Millions of documents are already on the Internet, and new documents are being formed all the time. This poses a very important problem in the management and querying of documents to classify them on the Internet by the most suitable means. However, most users have been using the document classification method based on a keyword. This method does not classify documents efficiently, and there is a weakness in the category of document that includes meaning. Document classification by a person can be very correct sometimes and often times is required. Therefore, in this paper, We wish to classify documents by using a neural network algorithm and C4.5 algorithms. We used resume data forming by XML for a document classification experiment. The result showed excellent possibilities in the document category. Therefore, We expect an applicable solution for various document classification problems.
PDF

Data Streams classification using Local Concept-adapted IOLIN System (지역적 컨셉트 적응형 IOLIN시스템을 사용한 데이터 스트림의 분류)

Kim, Jae-Woo;Song, Jae-Won;Lee, Ju-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.1
- /
- pp.37-44
- /
- 2008
Data stream has the tendency to change in Patterns over time. Also known as concept drift, such problem can reduce the predictive performance of a classification model CVFDT and IOLIN tried to solve the problem of a concept drift through incremental classification model updates. The local changes in patterns. however was revealed to be unable to resolve the problems of local concept drift that occurs by influencing on total classification results. In this paper, we propose adapted IOLIN system that improves system's predictive performance by detecting the local concept drift. The experimental result shows that adaptive IOLIN, the Proposed method, is about 2.8% in accuracy better than IOLIN and about 11.2% in accuracy better than CVFDT.
PDF

Data Clustering Algorithm Adaptive to Data Forms (데이터 형태에 적응하는 클러스터링 알고리즘)

Lee, K.H.;Lee, K.C.
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.10b
- /
- pp.1433-1436
- /
- 2000
클러스터링에 있어서 k-means[7], DBSCAN[2], CURE[4], ROCK[5], PAM[8], 같은 기존의 알고리즘은 원형이나 타원형 등의 어느 고정된 모양에 의해 클러스터를 결정한다. 만약 클러스터 하려는 데이터의 분포가 우연히 알고리즘의 결정된 모양과 일치하면 정확한 해를 얻을 수 있다. 하지만 자연적인 데이터의 분포에서는 발생하기 어렵다. 데이터의 형태를 추적하여 이러한 문제점을 해결한 CHAMELEON[1] 알고리즘이 최근에 발표되었다. 하지만 모양에는 독립적이나 데이터의 양이 증가함에 따라 소요되는 시간이 폭발적으로 증가한다. 이것은 기존의 마이닝 데이터들이 대용량이라는 것을 고려하면 현실에 적용하기 힘든 문제점이 있다. 이러한 문제점을 해결하기 위해 본 논문에서는 K-means[7]]를 이용한 대표를 선출하는 방법으로 CHAMELEON[1]의 문제점 개선(EF-CHAMELEON)을 시도하였으며 여러 자연적인 형태의 도형들은 아주 작은 원형들의 집합으로 구성 될 수 있다는 생각을 기본으로 잡음에 영향을 받지 않을 정도로 아주 작은 초기 다수의 소형 클러스터를 K-mean을 이용하여 구성하고 이를 다시 크러스터간의 상대적인 거리를 이용하여 다시 머지 하는 방법으로 모양에 의존적인 문제를 해결하며 비교사 학습(unsupervised learning)에 충실하기 위해 임계값을 적용 적정 단계에서 알고리즘을 멈추게 한 ADF 알고리즘을 소개한다. 실험 데이터는 기존의 여러 클러스터링 알고리즘이 판별 할 수 없었던 다양한 모양을 가지고있는 2차원 배열을 사용하여 ADF. CHAMELEON[1], EF-CHAMELEON,의 성능을 비교하였다.
PDF

Data Mining Techniques for Analyzing Promoter Sequences (프로모터 염기서열 분석을 위한 데이터 마이닝 기법)

김정자;이도헌
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.4 no.4
- /
- pp.739-744
- /
- 2000
As DNA sequences have been known through the Genome project the techniques for dealing with molecule-level gene information are being made researches briskly. It is also urgent to develop new computer algorithms for making databases and analyzing it efficiently considering the vastness of the information for known sequences. In this respect, this paper studies the association rule search algorithms for finding out the characteristics shown by means of the association between promoter sequences and genes, which is one of the important research areas in molecular biology. This paper treat biological data, while previous search algorithms used transaction data. So, we design a transformed association rule algorithm that covers data types and biological properties. These research results will contribute to reducing the time and the cost for biological experiments by minimizing their candidates.
PDF

Efficient k-Nearest Neighbor Join Query Processing Algorithm using MapReduce (맵리듀스를 이용한 효율적인 k-NN 조인 질의처리 알고리즘)

Yun, Deulnyeok;Jang, Miyoung;Chang, Jaewoo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.11a
- /
- pp.767-770
- /
- 2014
대용량 데이터를 분석하기 위한 맵리듀스 기반 k-NN 조인 질의처리 알고리즘은 최근 데이터 마이닝 및 분석을 기반으로 하는 응용 분야에서 매우 중요하게 활용되고 있다. 그러나, 대표적인 연구인 보로노이 기반 k-NN 조인 질의처리 알고리즘은 보로노이 인덱스 구축 비용이 매우 크기 때문에 대용량 데이터에 적합하지 못하다. 아울러 보로노이 셀 정보를 저장하기 위해 사용하는 R-트리는 맵리듀스 환경의 분산 병렬 처리에 적합하지 않다. 따라서 본 논문에서는 새로운 그리드 인덱스 기반의 k-NN 조인 질의 처리 알고리즘을 제안한다. 첫째, 높은 인덱스 구축 비용 문제를 해결하기 위해, 데이터 분포를 고려한 동적 그리드 인덱스 생성 기법을 제안한다. 둘째, 맵리듀스 환경에서 효율적으로 k-NN 조인 질의를 수행하기 위해, 인접셀 정보를 시그니처로 활용하는 후보영역 탐색 및 필터링 알고리즘을 제안한다. 마지막으로 성능 평가를 통해 제안하는 기법이 질의 처리 시간 측면에서 기존 기법에 비해 최대 3배 높은 질의 처리 성능을 나타냄을 보인다.
https://doi.org/10.3745/PKIPS.y2014m11a.767 인용 PDF

Forecasting Electric Power Demand Using Census Information and Electric Power Load (센서스 정보 및 전력 부하를 활용한 전력 수요 예측)

Lee, Heon Gyu;Shin, Yong Ho
- Journal of Korea Society of Industrial Information Systems
- /
- v.18 no.3
- /
- pp.35-46
- /
- 2013
In order to develop an accurate analytical model for domestic electricity demand forecasting, we propose a prediction method of the electric power demand pattern by combining SMO classification techniques and a dimension reduction conceptualized subspace clustering techniques suitable for high-dimensional data cluster analysis. In terms of electricity demand pattern prediction, hourly electricity load patterns and the demographic and geographic characteristics can be analyzed by integrating the wireless load monitoring data as well as sub-regional unit of census information. There are composed of a total of 18 characteristics clusters in the prediction result for the sub-regional demand pattern by using census information and power load of Seoul metropolitan area. The power demand pattern prediction accuracy was approximately 85%.
https://doi.org/10.9723/jksiis.2013.18.3.035 인용 PDF KSCI

An Incremental Clustering Technique of XML Documents using Cluster Histograms (클러스터의 히스토그램을 이용한 XML 문서의 점진적 클러스터링 기법)

Hwang, Jeong-Hee
- Journal of KIISE:Databases
- /
- v.34 no.3
- /
- pp.261-269
- /
- 2007
As a basic research to integrate and to retrieve XML documents efficiently, this paper proposes a clustering method by structures of XML documents. We apply an algorithm processing the many transaction data to the clustering of XML documents, which is a quite different method from the previous algorithms measuring structure similarity. Our method performs the clustering of XML documents not only using the cluster histograms that represent the distribution of items in clusters but also considering the global cluster cohesion. We compare the proposed method with the existing techniques by performing experiments. Experiments show that our method not only creates good quality clusters but also improves the processing time.
PDF KSCI

Automatic Error Detection of Morpho-syntactic Errors of English Writing Using Association Rule Analysis Algorithm (연관 규칙 분석 알고리즘을 활용한 영작문 형태.통사 오류 자동 발견)

Kim, Dong-Sung
- Annual Conference on Human and Language Technology
- /
- 2010.10a
- /
- pp.3-8
- /
- 2010
본 연구에서는 일련의 연구에서 수집된 영작문 오류 유형의 정제된 자료를 토대로 연관 규칙을 생성하고, 학습을 통해서 효용성이 검증된 연관 규칙을 활용해서 영작문 데이터의 형태 통사 오류를 자동으로 탐지한다. 영작문 데이터에서 형태 통사 오류를 찾아내는 작업은 많은 시간과 자원이 소요되는 작업이므로 자동화가 필수적이다. 기존의 연구들이 통계적 모델을 활용한 어휘적 오류에 치중하거나 언어 이론적 틀에 근거한 통사 처리에 집중하는 반면에, 본 연구는 데이터 마이닝을 통해서 정제된 데이터에서 연관 규칙을 생성하고 이를 검증한 후 형태 통사 오류를 감지한다. 이전 연구들에서는 이론적 틀에 맞추어진 규칙 생성이나 언어 모델 생성을 위한 대량의 코퍼스 데이터와 같은 다량의 지식 베이스 생성이 필수적인데, 본 연구는 적은 양의 정제된 데이터를 활용한다. 영작문 오류 유형의 형태 통사 연관 규칙을 생성하기 위해서 Apriori 알고리즘을 활용하였다. 알고리즘을 통해서 생성된 연관 규칙 중 잘못된 규칙이 생성될 가능성이 있으므로, 상관성 검정, 코사인 유사도와 같은 규칙 효용성의 통계적 검증을 활용해서 타당한 규칙만을 학습하였다. 이를 통해서 축적된 연관 규칙들을 영작문 오류를 자동으로 탐지하는 실험에 활용하였다.
PDF

Sequence Pattern Mining Using Meaning-based Transaction Structure for USN system (USN 환경에서 의미 기반 트랜잭션 구조를 이용한 순차 패턴 탐사 기법)

Choi, Pilsun;Kang, Donghyun;Kim, Hwan;Kim, Daein;Hwang, Buhyun
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.04a
- /
- pp.1105-1108
- /
- 2012
순차 패턴 탐사 기법은 순서를 갖는 패턴들의 집합 중에 빈발하게 발생하는 패턴을 찾아내는 기법이다. USN 환경에서 발생하는 스트림 데이터는 시간 속성을 갖는 이벤트들의 집합으로 표현할 수 있으며 순차 패턴 탐사 기법을 이용하여 유용한 정보를 탐사할 수 있다. 그러나 스트림 데이터 환경에서는 데이터가 무한하고 연속적으로 발생하기 때문에 모든 데이터를 저장하여 패턴을 탐사하는 기법을 적용하는 데는 문제가 있다. 이 논문에서는 향상된 데이터 처리방식을 사용하여 순차패턴을 탐사하는 스트림 데이터 마이닝 기법에 대하여 제안한다. 제안하는 기법은 의미 단위의 가변적 윈도우를 사용하여 스트림 데이터로부터 트랜잭션을 생성하고 이 트랜잭션들의 집합을 해시와 슬라이딩 윈도우를 사용하여 스트림 데이터의 순차 패턴을 탐사한다. 이를 이용한 제안 기법은 실시간 시스템에 적합하게 데이터 저장 공간 사용의 효율성을 높이고 신속하게 유용한 패턴을 탐사할 수 있다.
https://doi.org/10.3745/PKIPS.y2012m04a.1105 인용 PDF

Analysis of Algae Occurrence Characteristics According to Multifunctional Weir Structures in the Nakdong River (낙동강 보 구조물에 따른 조류발생 특성 분석)

Jo Bu Geon;Lee Sang Ung;Young Do Kim
- Proceedings of the Korea Water Resources Association Conference
- /
- 2023.05a
- /
- pp.147-147
- /
- 2023
낙동강은 4대강 사업을 통한 다기능 보 건설로 하천 환경에 변화가 일어났다. 하천 수심이 증가하고 유속이 느려지는 정체성 수역 특성을 나타내고 있다. 이는 남조류 발생에 영향을 주며 남조류가 분비하는 독성물질 또한 수생태계와 인체에 유해하며 남조류 발생에 따른 다양한 원인인자들이 있다. 이러한 남조류 발생 특성을 정량적으로 규명하기 위하여 최근 조류 관리에 있어 데이터 마이닝 및 머신러닝 기법을 적용한 연구가 이루어지고 있다. 머신러닝에서는 학습자료 선정에 따라 예측 결과가 다르게 나타나며 이는 발생원인이 복잡한 남조류에 있어 중요한 부분이라 볼 수 있다. 낙동강의 다기능보는 하나의 유체에 직렬형으로 8개의 다기능보가 위치하고 있다. 8개의 보로 나누어져있는 하천은 각 구간별로 보의 수리학적 특성, 유역 특성이 다르다. 따라서 구간별 조류 발생 특성이 다르게 나타난다. 본 연구에서는 구간별 특성을 분류하고 조류 발생에 영향을 미치는 주요 인자들을 분석하고자 한다. 조류 발생에 있어 낙동강 8개 보 지점에 대하여 복잡한 남조류 발생 주요 영향인자 분석과 더불어 머신러닝 기법을 이용하여 영향인자에 따른 남조류 발생조건을 정량적으로 분석하였다. 수질 인자뿐만이 아닌 수리학적 인자를 고려하여 수리학적 체적시간이 다른 각 보에서의 조류발생 특성을 분석하고자 하였다. 또한 학습인자에 따라 남조류 예측에 대한 정확도 향상이 가능한지를 확인하고 이를 통해 정체성 하천에서의 남조류 발생 특성에 대해 연구하고자 하였으며 이를 통해 낙동강 남조류 발생 및 관리에 있어 선제적 관리에 활용하고자 한다.
PDF

Search Result 400, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)