• Title/Summary/Keyword: k-means clustering Algorithm

Search Result 545, Processing Time 0.02 seconds

Determination of Tumor Boundaries on CT Images Using Unsupervised Clustering Algorithm (비교사적 군집화 알고리즘을 이용한 전산화 단층영상의 병소부위 결정에 관한 연구)

  • Lee, Kyung-Hoo;Ji, Young-Hoon;Lee, Dong-Han;Yoo, Seoung-Yul;Cho, Chul-Koo;Kim, Mi-Sook;Yoo, Hyung-Jun;Kwon, Soo-Il;Chun, Jun-Chul
    • Journal of Radiation Protection and Research
    • /
    • v.26 no.2
    • /
    • pp.59-66
    • /
    • 2001
  • It is a hot issue to determine the spatial location and shape of tumor boundary in fractionated stereotactic radiotherapy (FSRT). We could get consecutive transaxial plane images from the phantom (paraffin) and 4 patients with brain tumor using helical computed tomography(HCT). K-means classification algorithm was adjusted to change raw data pixel value in CT images into classified average pixel value. The classified images consists of 5 regions that ate tumor region (TR), normal region (NR), combination region (CR), uncommitted region (UR) and artifact region (AR). The major concern was how to separate the normal region from tumor region in the combination area. Relative average deviation analysis was adjusted to alter average pixel values of 5 regions into 2 regions of normal and tumor region to define maximum point among average deviation pixel values. And then we drawn gross tumor volume (GTV) boundary by connecting maximum points in images using semi-automatic contour method by IDL(Interactive Data Language) program. The error limit of the ROI boundary in homogeneous phantom is estimated within ${\pm}1%$. In case of 4 patients, we could confirm that the tumor lesions described by physician and the lesions described automatically by the K-mean classification algorithm and relative average deviation analyses were similar. These methods can make uncertain boundary between normal and tumor region into clear boundary. Therefore it will be useful in the CT images-based treatment planning especially to use above procedure apply prescribed method when CT images intermittently fail to visualize tumor volume comparing to MRI images.

  • PDF

Development of Drought Map Based on Three-dimensional Spatio-temporal Analysis of Drought (가뭄사상에 대한 3차원적 시공간 분석을 통한 가뭄지도 개발)

  • Yoo, Jiyoung;So, Byung-Jin;Kwon, Hyun-Han;Kim, Tae-Woong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.40 no.1
    • /
    • pp.25-33
    • /
    • 2020
  • A drought event is characterized by duration, severity and affected area. In general, after calculating a drought index using hydro-meteorological time series at a station, a drought event is defined based on the run theory to identify the beginning and end time. However, this one-dimensional analysis has limitations for analyzing the spatio-temporal occurrence characteristics and movement paths of drought. Therefore, this study is to define a three-dimensional drought event using a simple clustering algorithm and to develop a drought map that can be used to understand the drought severity according to the spatio-temporal expansion of drought. As a result, compared with the two-dimensional monitoring information to show spatial distribution of drought index, a proposed drought map is able to show three-dimensional drought characteristics inclusing drought duration, spatial cumulative severity, and centroid of drought. The analysis of drought map indicated that there was a drought event which had the affected area less than 10 % while on occations while there were 11 drought events (44 %) which had the affected area more a than 90 % of the total area. This means that it is important to understand the relationship between spatial variation of drought affected area and severity corresponding to various drought durations. The development of drought map based on three-dimensional drought analysis is useful to analyze the spatio-temporal occurrence characteristics and propagation patterns of regional drought which can be utilized in developing mitigation measures for future extreme droughts.

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.

The Behavior Analysis of Exhibition Visitors using Data Mining Technique at the KIDS & EDU EXPO for Children (유아교육 박람회에서 데이터마이닝 기법을 이용한 전시 관람 행동 패턴 분석)

  • Jung, Min-Kyu;Kim, Hyea-Kyeong;Choi, Il-Young;Lee, Kyoung-Jun;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.77-96
    • /
    • 2011
  • An exhibition is defined as market events for specific duration to present exhibitors' main products to business or private visitors, and it plays a key role as effective marketing channels. As the importance of exhibition is getting more and more, domestic exhibition industry has achieved such a great quantitative growth. But, In contrast to the quantitative growth of domestic exhibition industry, the qualitative growth of Exhibition has not achieved competent growth. In order to improve the quality of exhibition, we need to understand the preference or behavior characteristics of visitors and to increase the level of visitors' attention and satisfaction through the understanding of visitors. So, in this paper, we used the observation survey method which is a kind of field research to understand visitors and collect the real data for the analysis of behavior pattern. And this research proposed the following methodology framework consisting of three steps. First step is to select a suitable exhibition to apply for our method. Second step is to implement the observation survey method. And we collect the real data for further analysis. In this paper, we conducted the observation survey method to obtain the real data of the KIDS & EDU EXPO for Children in SETEC. Our methodology was conducted on 160 visitors and 78 booths from November 4th to 6th in 2010. And, the last step is to analyze the record data through observation. In this step, we analyze the feature of exhibition using Demographic Characteristics collected by observation survey method at first. And then we analyze the individual booth features by the records of visited booth. Through the analysis of individual booth features, we can figure out what kind of events attract the attention of visitors and what kind of marketing activities affect the behavior pattern of visitors. But, since previous research considered only individual features influenced by exhibition, the research about the correlation among features is not performed much. So, in this research, additional analysis is carried out to supplement the existing research with data mining techniques. And we analyze the relation among booths using data mining techniques to know behavior patterns of visitors. Among data mining techniques, we make use of two data mining techniques, such as clustering analysis and ARM(Association Rule Mining) analysis. In clustering analysis, we use K-means algorithm to figure out the correlation among booths. Through data mining techniques, we figure out that there are two important features to affect visitors' behavior patterns in exhibition. One is the geographical features of booths. The other is the exhibit contents of booths. Those features are considered when the organizer of exhibition plans next exhibition. Therefore, the results of our analysis are expected to provide guideline to understanding visitors and some valuable insights for the exhibition from the earlier phases of exhibition planning. Also, this research would be a good way to increase the quality of visitor satisfaction. Visitors' movement paths, booth location, and distances between each booth are considered to plan next exhibition in advance. This research was conducted at the KIDS & EDU EXPO for Children in SETEC(Seoul Trade Exhibition & Convention), but it has some constraints to be applied directly to other exhibitions. Also, the results were derived from a limited number of data samples. In order to obtain more accurate and reliable results, it is necessary to conduct more experiments based on larger data samples and exhibitions on a variety of genres.

High-Risk Area for Human Infection with Avian Influenza Based on Novel Risk Assessment Matrix (위험 매트릭스(Risk Matrix)를 활용한 조류인플루엔자 인체감염증 위험지역 평가)

  • Sung-dae Park;Dae-sung Yoo
    • Korean Journal of Poultry Science
    • /
    • v.50 no.1
    • /
    • pp.41-50
    • /
    • 2023
  • Over the last decade, avian influenza (AI) has been considered an emerging disease that would become the next pandemic, particularly in countries like South Korea, with continuous animal outbreaks. In this situation, risk assessment is highly needed to prevent and prepare for human infection with AI. Thus, we developed the risk assessment matrix for a high-risk area of human infection with AI in South Korea based on the notion that risk is the multiplication of hazards with vulnerability. This matrix consisted of highly pathogenic avian influenza (HPAI) in poultry farms and the number of poultry-associated production facilities assumed as hazards of avian influenza and vulnerability, respectively. The average number of HPAI in poultry farms at the 229-municipal level as the hazard axis of the matrix was predicted using a negative binomial regression with nationwide outbreaks data from 2003 to 2018. The two components of the matrix were classified into five groups using the K-means clustering algorithm and multiplied, consequently producing the area-specific risk level of human infection. As a result, Naju-si, Jeongeup-si, and Namwon-si were categorized as high-risk areas for human infection with AI. These findings would contribute to designing the policies for human infection to minimize socio-economic damages.