• Title/Summary/Keyword: Big Data Clustering

Search Result 147, Processing Time 0.023 seconds

Medical Image Analysis Using Artificial Intelligence

  • Yoon, Hyun Jin;Jeong, Young Jin;Kang, Hyun;Jeong, Ji Eun;Kang, Do-Young
    • Progress in Medical Physics
    • /
    • v.30 no.2
    • /
    • pp.49-58
    • /
    • 2019
  • Purpose: Automated analytical systems have begun to emerge as a database system that enables the scanning of medical images to be performed on computers and the construction of big data. Deep-learning artificial intelligence (AI) architectures have been developed and applied to medical images, making high-precision diagnosis possible. Materials and Methods: For diagnosis, the medical images need to be labeled and standardized. After pre-processing the data and entering them into the deep-learning architecture, the final diagnosis results can be obtained quickly and accurately. To solve the problem of overfitting because of an insufficient amount of labeled data, data augmentation is performed through rotation, using left and right flips to artificially increase the amount of data. Because various deep-learning architectures have been developed and publicized over the past few years, the results of the diagnosis can be obtained by entering a medical image. Results: Classification and regression are performed by a supervised machine-learning method and clustering and generation are performed by an unsupervised machine-learning method. When the convolutional neural network (CNN) method is applied to the deep-learning layer, feature extraction can be used to classify diseases very efficiently and thus to diagnose various diseases. Conclusions: AI, using a deep-learning architecture, has expertise in medical image analysis of the nerves, retina, lungs, digital pathology, breast, heart, abdomen, and musculo-skeletal system.

Development of Multidimensional Analysis System for Bio-pathways (바이오 패스웨이 다차원 분석 시스템 개발)

  • Seo, Dongmin;Choi, Yunsoo;Jeon, Sun-Hee;Lee, Min-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.11
    • /
    • pp.467-475
    • /
    • 2014
  • With the development of genomics, wearable device and IT/NT, a vast amount of bio-medical data are generated recently. Also, healthcare industries based on big-data are booming and big-data technology based on bio-medical data is rising rapidly as a core technology for improving the national health and aged society. A pathway is the biological deep knowledge that represents the relations of dynamics and interaction among proteins, genes and cells by a network. A pathway is wildly being used as an important part of a bio-medical big-data analysis. However, a pathway analysis requires a lot of time and effort because a pathway is very diverse and high volume. Also, multidimensional analysis systems for various pathways are nonexistent even now. In this paper, we proposed a pathway analysis system that collects user interest pathways from KEGG pathway database that supports the most widely used pathways, constructs a network based on a hierarchy structure of pathways and analyzes the relations of dynamics and interaction among pathways by clustering and selecting core pathways from the network. Finally, to verify the superiority of our pathway analysis system, we evaluate the performance of our system in various experiments.

Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm (텍스트 마이닝 기법을 활용한 기후변화관련 식품분야 논문초록 분석)

  • Bae, Kyu Yong;Park, Ju-Hyun;Kim, Jeong Seon;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1429-1437
    • /
    • 2013
  • Research articles in food related to climate change were analyzed by implementing a text-mining algorithm, which is one of nonstructural data analysis tools in big data analysis with a focus on frequencies of terms appearing in the abstracts. As a first step, a term-document matrix was established, followed by implementing a hierarchical clustering algorithm based on dissimilarities among the selected terms and expertise in the field to classify the documents under consideration into a few labeled groups. Through this research, we were able to find out important topics appearing in the field of food related to climate change and their trends over past years. It is expected that the results of the article can be utilized for future research to make systematic responses and adaptation to climate change.

A Study on Social Issues and Consumption Behavior Using Big Data (빅데이터를 활용한 사회적 이슈와 소비행동 연구)

  • Baek, Seung-Heon;Kim, Gi-Tak
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.377-389
    • /
    • 2019
  • This study conducted social network big data analysis to investigate consumer's perception of Japanese sporting goods related to Japanese boycott and to extract problems and variables by recognition. Social network big data analysis was conducted in two areas, "Japanese boycott" and "Japanese sporting goods". Months of data were collected and investigated. If you specify the research method, you will identify the issues of the times - keyword setting using social network analysis - clustering using CONCOR analysis using TEXTOM and Ucinet 6 programs - variable selection through expert meetings - questionnaire preparation and answering - and validity of questionnaire Reliability Verification - It consists of hypothesis verification using the structural model equation. Based on the results of using the big data of social networks, four variables of relevant characteristics, nationality, attitude, and consumption behavior were extracted. A total of 30 questions and 292 questionnaires were used for final hypothesis verification. As a result of the analysis, first, the boycott-related characteristics showed a positive relationship with nationality. Specifically, all of the characteristics related to boycotts (necessary boycott, sense of boycott, and perceived boycott benefits were positively related to nationality. In addition, nationality was found to have a positive relationship with consumption behavior.

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

Analyzing fashion item purchase patterns and channel transition patterns using association rules and brand loyalty in big data (빅데이터의 연관규칙과 브랜드 충성도를 활용한 패션품목 구매패턴과 구매채널 전환패턴 분석)

  • Ki Yong Kwon
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.2
    • /
    • pp.199-214
    • /
    • 2024
  • Until now, research on consumers' purchasing behavior has primarily focused on psychological aspects or depended on consumer surveys. However, there may be a gap between consumers' self-reported perceptions and their observable actions. In response, this study aimed to investigate consumer purchasing behavior utilizing a big data approach. To this end, this study investigated the purchasing patterns of fashion items, both online and in retail stores, from a data-driven perspective. We also investigated whether individual consumers switched between online websites and retail establishments for making purchases. Data on 516,474 purchases were obtained from fashion companies. We used association rule analysis and K-means clustering to identify purchase patterns that were influenced by customer loyalty. Furthermore, sequential pattern analysis was applied to investigate the usage patterns of online and offline channels by consumers. The results showed that high-loyalty consumers mainly purchased infrequently bought items in the brand line, as well as high-priced items, and that these purchase patterns were similar both online and in stores. In contrast, the low-loyalty group showed different purchasing behaviors for online versus in-store purchases. In physical environments, the low-loyalty consumers tended to purchase less popular or more expensive items from the brand line, whereas in online environments, their purchases centered around items with relatively high sales volumes. Finally, we found that both high and low loyalty groups exclusively used a single preferred channel, either online or in-store. The findings help companies better understand consumer purchase patterns and build future marketing strategies around items with high brand centrality.

Spatial Clustering Analysis based on Text Mining of Location-Based Social Media Data (위치기반 소셜 미디어 데이터의 텍스트 마이닝 기반 공간적 클러스터링 분석 연구)

  • Park, Woo Jin;Yu, Ki Yun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.23 no.2
    • /
    • pp.89-96
    • /
    • 2015
  • Location-based social media data have high potential to be used in various area such as big data, location based services and so on. In this study, we applied a series of analysis methodology to figure out how the important keywords in location-based social media are spatially distributed by analyzing text information. For this purpose, we collected tweet data with geo-tag in Gangnam district and its environs in Seoul for a month of August 2013. From this tweet data, principle keywords are extracted. Among these, keywords of three categories such as food, entertainment and work and study are selected and classified by category. The spatial clustering is conducted to the tweet data which contains keywords in each category. Clusters of each category are compared with buildings and benchmark POIs in the same position. As a result of comparison, clusters of food category showed high consistency with commercial areas of large scale. Clusters of entertainment category corresponded with theaters and sports complex. Clusters of work and study showed high consistency with areas where private institutes and office buildings are concentrated.

Clustering for Home Healthcare Service Satisfaction using Parameter Selection

  • Lee, Jae Hong;Kim, Hyo Sun;Jung, Yong Gyu;Cha, Byung Heon
    • International Journal of Advanced Culture Technology
    • /
    • v.7 no.2
    • /
    • pp.238-243
    • /
    • 2019
  • Recently, the importance of big data continues to be emphasized, and it is applied in various fields based on data mining techniques, which has a great influence on the health care industry. There are many healthcare industries, but only home health care is considered here. However, applying this to real problems does not always give perfect results, which is a problem. Therefore, data mining techniques are used to solve these problems, and the algorithms that affect performance are evaluated. This paper focuses on the effects of healthcare services on patient satisfaction and satisfaction. In order to use the CVParameterSelectin algorithm and the SMOreg algorithm of the classify method of data mining, it was evaluated based on the experiment and the verification of the results. In this paper, we analyzed the services of home health care institutions and the patient satisfaction analysis based on the name, address, service provided by the institution, mood of the patients, etc. In particular, we evaluated the results based on the results of cross validation using these two algorithms. However, the existence of variables that affect the outcome does not give a perfect result. We used the cluster analysis method of weka system to conduct the research of this paper.

ACCELERATION OF MACHINE LEARNING ALGORITHMS BY TCHEBYCHEV ITERATION TECHNIQUE

  • LEVIN, MIKHAIL P.
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.22 no.1
    • /
    • pp.15-28
    • /
    • 2018
  • Recently Machine Learning algorithms are widely used to process Big Data in various applications and a lot of these applications are executed in run time. Therefore the speed of Machine Learning algorithms is a critical issue in these applications. However the most of modern iteration Machine Learning algorithms use a successive iteration technique well-known in Numerical Linear Algebra. But this technique has a very low convergence, needs a lot of iterations to get solution of considering problems and therefore a lot of time for processing even on modern multi-core computers and clusters. Tchebychev iteration technique is well-known in Numerical Linear Algebra as an attractive candidate to decrease the number of iterations in Machine Learning iteration algorithms and also to decrease the running time of these algorithms those is very important especially in run time applications. In this paper we consider the usage of Tchebychev iterations for acceleration of well-known K-Means and SVM (Support Vector Machine) clustering algorithms in Machine Leaning. Some examples of usage of our approach on modern multi-core computers under Apache Spark framework will be considered and discussed.

A Study on Initial Seeds Selection of K-Means for Big Data Clustering (빅데이터 클러스터링을 위한 K-Means 초기 중심 선정 연구)

  • Kim, Yeong-Ju;Heo, Yu-Gyeong;Back, Jong-Sang;Jeong, Hwan-Jong;Lee, Sung-Ro;Jung, Min-A
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.750-752
    • /
    • 2014
  • K-Means 알고리즘은 구현이 쉽고, 패턴수가 n일 때 시간 복잡도가 O(n)인 장점을 가져 대용량 데이터에서 널리 이용된다. 그러나, K-Means 알고리즘은 초기 클러스터 중심을 어떻게 선정하는가에 따라 할당-재계산 횟수, 클러스터링 결과를 결정짓는다. 본 논문에서는 K-Means 알고리즘에서 클러스터 초기 중심 선정 연구를 살펴보고 계통임의추출법을 적용하여 K-Means 초기 중심 선정 방법을 제안한다. 제안한 방법은 대용량 데이터의 클러스터링 시간을 감소하고 정확도를 향상시킬 수 있다.