• Title/Summary/Keyword: Public Dataset

Search Result 253, Processing Time 0.026 seconds

Medical Costs between Dietary Supplement Users and Non-users Using the Korea Health Panel Data (한국의료패널 자료를 활용한 건강기능식품 섭취에 따른 의료비 지출 비교분석)

  • Hye-Young Kwon;Soohyun Oh
    • Health Policy and Management
    • /
    • v.34 no.1
    • /
    • pp.87-93
    • /
    • 2024
  • Background: In recent years, studies have shown conflicting results regarding the benefits of dietary supplements in reducing healthcare expenditures. This study aimed to address this inconsistency by examining the association between supplement consumption and health expenditures using nationally representative data from the Korea Health Panel Survey (2019-2020). Methods: A 1:1 matched case-control dataset was established using propensity score matching technique based on supplement consumption. Then, total annual healthcare expenditures were compared between the two groups. In addition, a multivariate regression analysis (Proc Surveyreg) was performed to determine the association between the supplement consumption and medical costs. Results: The supplement user group spent about 1.72 million Korean won, while the non-user group spent about 1.43 million Korean won on medical services (p=0.0186). The results of multivariate regression showed that the costs were approximately 26.15% higher in the user group than in the non-user group (p=0.0004). Conclusion: Contrary to the previous studies that have shown the benefits of supplement use in reducing healthcare costs, this study showed that those who consistently consumed supplements spent more on medical services. This can be interpreted in the same context as previous studies suggesting that dietary supplement intake is a healthy behavior for managing one's health. However, we caution against drawing firm conclusions due to data limitations. Further analysis using patient-level epidemiologic data is needed.

A School-tailored High School Integrated Science Q&A Chatbot with Sentence-BERT: Development and One-Year Usage Analysis (인공지능 문장 분류 모델 Sentence-BERT 기반 학교 맞춤형 고등학교 통합과학 질문-답변 챗봇 -개발 및 1년간 사용 분석-)

  • Gyeongmo Min;Junehee Yoo
    • Journal of The Korean Association For Science Education
    • /
    • v.44 no.3
    • /
    • pp.231-248
    • /
    • 2024
  • This study developed a chatbot for first-year high school students, employing open-source software and the Korean Sentence-BERT model for AI-powered document classification. The chatbot utilizes the Sentence-BERT model to find the six most similar Q&A pairs to a student's query and presents them in a carousel format. The initial dataset, built from online resources, was refined and expanded based on student feedback and usability throughout over the operational period. By the end of the 2023 academic year, the chatbot integrated a total of 30,819 datasets and recorded 3,457 student interactions. Analysis revealed students' inclination to use the chatbot when prompted by teachers during classes and primarily during self-study sessions after school, with an average of 2.1 to 2.2 inquiries per session, mostly via mobile phones. Text mining identified student input terms encompassing not only science-related queries but also aspects of school life such as assessment scope. Topic modeling using BERTopic, based on Sentence-BERT, categorized 88% of student questions into 35 topics, shedding light on common student interests. A year-end survey confirmed the efficacy of the carousel format and the chatbot's role in addressing curiosities beyond integrated science learning objectives. This study underscores the importance of developing chatbots tailored for student use in public education and highlights their educational potential through long-term usage analysis.

Comprehensive RNA-sequencing analysis of colorectal cancer in a Korean cohort

  • Jaeim Lee;Jong-Hwan Kim;Hoang Bao Khanh Chu;Seong-Taek Oh;Sung-Bum Kang;Sejoon Lee;Duck-Woo Kim;Heung-Kwon Oh;Ji-Hwan Park;Jisu Kim;Jisun Kang;Jin-Young Lee;Sheehyun Cho;Hyeran Shim;Hong Seok Lee;Seon-Young Kim;Young-Joon Kim;Jin Ok Yang;Kil-yong Lee
    • Molecules and Cells
    • /
    • v.47 no.3
    • /
    • pp.100033.1-100033.13
    • /
    • 2024
  • Considering the recent increase in the number of colorectal cancer (CRC) cases in South Korea, we aimed to clarify the molecular characteristics of CRC unique to the Korean population. To gain insights into the complexities of CRC and promote the exchange of critical data, RNA-sequencing analysis was performed to reveal the molecular mechanisms that drive the development and progression of CRC; this analysis is critical for developing effective treatment strategies. We performed RNA-sequencing analysis of CRC and adjacent normal tissue samples from 214 Korean participants (comprising a total of 381 including 169 normal and 212 tumor samples) to investigate differential gene expression between the groups. We identified 19,575 genes expressed in CRC and normal tissues, with 3,830 differentially expressed genes (DEGs) between the groups. Functional annotation analysis revealed that the upregulated DEGs were significantly enriched in pathways related to the cell cycle, DNA replication, and IL-17, whereas the downregulated DEGs were enriched in metabolic pathways. We also analyzed the relationship between clinical information and subtypes using the Consensus Molecular Subtype (CMS) classification. Furthermore, we compared groups clustered within our dataset to CMS groups and performed additional analysis of the methylation data between DEGs and CMS groups to provide comprehensive biological insights from various perspectives. Our study provides valuable insights into the molecular mechanisms underlying CRC in Korean patients and serves as a platform for identifying potential target genes for this disease. The raw data and processed results have been deposited in a public repository for further analysis and exploration.

Analysis of Factors Affecting the Length of Stay in Children(Aged 0 to 12) with Injuries: Centering Around the Data from the Korea National Hospital Discharge In-Depth Injury Surveys (어린이(0-12세) 손상환자의 재원일수에 미치는 요인분석: 퇴원손상심층자료를 중심으로)

  • Lee Chae Kyung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.137-143
    • /
    • 2023
  • This study was conducted to analyze factors affecting the length of stay in children with injuries by determining relationships between length of stay and characteristics of children(aged 0 to 12) with injuries. 7,804 patients aged 0 to 12 who participated in the Korea Nation Hospital Discharge In-Depth Injury Surveys, got a diagnosis of sequelae of injuries and of other consequences of external causes(S00-T98), and were discharged between 1 January 2016 and 31 December 2020 were investigated. A frequency analysis, independent samples t-test, and ANOVA were performed. Also, to identify factors affecting the length of stay, a regression analysis was performed. The average length of stay for the patients investigated in this study was 5.5 days. The length of stay for school-age children(aged 7 to 12) and children who had either public or private coverage was higher than that for preschoolers(aged 0 to 6) and children who didn't have public or private coverage, respectively. The length of stay for children admitted to a hospital in a rural area(Jeolla-do or Gyeongsang-do) was higher than that for children admitted to a hospital in a metropolitan area and the length of stay for children admitted to a hospital that had 100-299 hospital beds was relatively long. However, children who first visited a hospital for outpatient care stayed relatively short in hospital and children who had been burned or injured in traffic crashes stayed relatively long in hospital. Children who got a secondary diagnosis and had a principal procedure or who died after being discharged were in hospital for a long time. The findings of this study shall be useful, as they identified characteristics related to the length of stay for Korean children with injuries and factors that determine the length of stay for those children by analyzing the national dataset, or more specifically, the data from the Korea National Hospital Discharge In-Depth Injury Surveys. The risk of child injuries can be easily reduced by taking actions to prevent them and providing safety education programs. The present study has provided essential baseline data for the provision of aggressive care for child injuries and the establishment of a range of policies for child injury prevention.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.

A Study on the Buyer's Decision Making Models for Introducing Intelligent Online Handmade Services (지능형 온라인 핸드메이드 서비스 도입을 위한 구매자 의사결정모형에 관한 연구)

  • Park, Jong-Won;Yang, Sung-Byung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.119-138
    • /
    • 2016
  • Since the Industrial Revolution, which made the mass production and mass distribution of standardized goods possible, machine-made (manufactured) products have accounted for the majority of the market. However, in recent years, the phenomenon of purchasing even more expensive handmade products has become a noticeable trend as consumers have started to acknowledge the value of handmade products, such as the craftsman's commitment, belief in their quality and scarcity, and the sense of self-esteem from having them,. Consumer interest in these handmade products has shown explosive growth and has been coupled with the recent development of three-dimensional (3D) printing technologies. Etsy.com is the world's largest online handmade platform. It is no different from any other online platform; it provides an online market where buyers and sellers virtually meet to share information and transact business. However, Etsy.com is different in that shops within this platform only deal with handmade products in a variety of categories, ranging from jewelry to toys. Since its establishment in 2005, despite being limited to handmade products, Etsy.com has enjoyed rapid growth in membership, transaction volume, and revenue. Most recently in April 2015, it raised funds through an initial public offering (IPO) of more than 1.8 billion USD, which demonstrates the huge potential of online handmade platforms. After the success of Etsy.com, various types of online handmade platforms such as Handmade at Amazon, ArtFire, DaWanda, and Craft is ART have emerged and are now competing with each other, at the same time, which has increased the size of the market. According to Deloitte's 2015 holiday survey on which types of gifts the respondents plan to buy during the holiday season, about 16% of U.S. consumers chose "homemade or craft items (e.g., Etsy purchase)," which was the same rate as those for the computer game and shoes categories. This indicates that consumer interests in online handmade platforms will continue to rise in the future. However, this high interest in the market for handmade products and their platforms has not yet led to academic research. Most extant studies have only focused on machine-made products and intelligent services for them. This indicates a lack of studies on handmade products and their intelligent services on virtual platforms. Therefore, this study used signaling theory and prior research on the effects of sellers' characteristics on their performance (e.g., total sales and price premiums) in the buyer-seller relationship to identify the key influencing e-Image factors (e.g., reputation, size, information sharing, and length of relationship). Then, their impacts on the performance of shops within the online handmade platform were empirically examined; the dataset was collected from Etsy.com through the application of web harvesting technology. The results from the structural equation modeling revealed that the reputation, size, and information sharing have significant effects on the total sales, while the reputation and length of relationship influence price premiums. This study extended the online platform research into online handmade platform research by identifying key influencing e-Image factors on within-platform shop's total sales and price premiums based on signaling theory and then performed a statistical investigation. These findings are expected to be a stepping stone for future studies on intelligent online handmade services as well as handmade products themselves. Furthermore, the findings of the study provide online handmade platform operators with practical guidelines on how to implement intelligent online handmade services. They should also help shop managers build their marketing strategies in a more specific and effective manner by suggesting key influencing e-Image factors. The results of this study should contribute to the vitalization of intelligent online handmade services by providing clues on how to maximize within-platform shops' total sales and price premiums.

Inferring the Transit Trip Destination Zone of Smart Card User Using Trip Chain Structure (통행사슬 구조를 이용한 교통카드 이용자의 대중교통 통행종점 추정)

  • SHIN, Kangwon
    • Journal of Korean Society of Transportation
    • /
    • v.34 no.5
    • /
    • pp.437-448
    • /
    • 2016
  • Some previous researches suggested a transit trip destination inference method by constructing trip chains with incomplete(missing destination) smart card dataset obtained on the entry fare control systems. To explore the feasibility of the transit trip destination inference method, the transit trip chains are constructed from the pre-paid smart card tagging data collected in Busan on October 2014 weekdays by tracing the card IDs, tagging times(boarding, alighting, transfer), and the trip linking distances between two consecutive transit trips in a daily sequences. Assuming that most trips in the transit trip chains are linked successively, the individual transit trip destination zones are inferred as the consecutive linking trip's origin zones. Applying the model to the complete trips with observed OD reveals that about 82% of the inferred trip destinations are the same as those of the observed trip destinations and the inference error defined as the difference in distance between the inferred and observed alighting stops is minimized when the trip linking distance is less than or equal to 0.5km. When applying the model to the incomplete trips with missing destinations, the overall destination missing rate decreases from 71.40% to 21.74% and approximately 77% of the destination missing trips are the single transit trips for which the destinations can not be inferable. In addition, the model remarkably reduces the destination missing rate of the multiple incomplete transit trips from 69.56% to 6.27%. Spearman's rank correlation and Chi-squared goodness-of-fit tests showed that the ranks for transit trips of each zone are not significantly affected by the inferred trips, but the transit trip distributions only using small complete trips are significantly different from those using complete and inferred trips. Therefore, it is concluded that the model should be applicable to derive a realistic transit trip patterns in cities with the incomplete smart card data.

The Factors Affecting the Population Outflow from Busan to the Seoul Metropolitan Area (지역별 수도권으로의 인구유출에 영향을 미치는 요인 연구: 부산시 사례를 중심으로)

  • LIM, Jaebin;Jeong, Kiseong
    • Land and Housing Review
    • /
    • v.12 no.2
    • /
    • pp.47-59
    • /
    • 2021
  • This study aims to review the trends of the population outflows in the metropolitan area of Busan and to investigate the factors that affect population out-migration to the Seoul metropolitan area. The following variables are considered for analysis: traditional population movement variables and quality of life variables, such as population, society, employment, housing, culture, safety, medical care, greenery, education, and childcare. The 'domestic population movement data', provided by the MDIS of the National Statistical Office, was used for this research. Out of the total of 57 million population movement data in the period 2012 - 2017, population outmigration from Busan to the Seoul metropolitan area was extracted. Independent variables were drawn from public data sources in accordance with the temporal and spatial settings of the study. The multiple linear regression model was specified based on the dataset, and the fit of the model was measured by the p-value, and the values of Adjusted R2, Durbin-Watson analysis, and F-statistics. The results of the analysis showed that the variables that have a significant effect on population movement from Busan to the Seoul metropolitan area were as follows: 'single-person households', 'the elderly population', 'the total birth rate', 'the number of companies', 'the number of employees', 'the housing sales price index', 'cultural facilities', and 'the number of students per teacher'. More positive (+) influences of the population out-movement were observed in areas with higher numbers of single-person households, lowers proportions of the elderly, lower numbers of businesses, higher numbers of employees, higher numbers of housing sales, lower numbers of cultural facilities, and lower numbers of students. The findings suggest that policies should enhance the environments such as quality jobs, culture, and welfare that can retain young people within Busan. Improvements in the quality of life and job creation are critical factors that can mitigate the outflows of the Busan residents to the Seoul metropolitan area.