• Title/Summary/Keyword: 데이터 표현

Search Result 3,850, Processing Time 0.031 seconds

Temporal variation in the community structure of green tide forming macroalgae(Chlorophyta; genus Ulva) on the coast of Jeju Island, Korea based on DNA barcoding (DNA 바코드를 이용한 제주도 연안 파래대발생(green tide)을 형성하는 갈파래(genus Ulva) 군집구조 및 주요 종 구성의 시간적 변이)

  • Hye Jin Park;Seo Yeon Byeon;Sang Rul Park;Hyuk Je Lee
    • Korean Journal of Environmental Biology
    • /
    • v.40 no.4
    • /
    • pp.464-476
    • /
    • 2022
  • In recent years, macroalgal bloom occurs frequently in coastal oceans worldwide. It might be attributed to accelerating climate change. "Green tide" events caused by proliferation of green macroalgae (Ulva spp.) not only damage the local economy, but also harm coastal environments. These nuisance events have become common across several coastal regions of continents. In Korea, green tide incidences are readily seen throughout the year along the coastlines of Jeju Island, particularly the northeastern coast, since the 2000s. Ulva species are notorious to be difficult for morphology-based species identification due to their high degrees of phenotypic plasticity. In this study, to investigate temporal variation in Ulva community structure on Jeju Island between 2015 and 2020, chloroplast barcode tufA gene was sequenced and phylogenetically analyzed for 152 specimens from 24 sites. We found that Ulva ohnoi and Ulva pertusa known to be originated from subtropical regions were the most predominant all year round, suggesting that these two species contributed the most to local green tides in this region. While U. pertusa was relatively stable in frequency during 2015 to 2020, U. ohnoi increased 16% in frequency in 2020 (36.84%), which might be associated with rising sea surface temperature from which U. ohnoi could benefit. Two species (Ulva flexuosa, Ulva procera) of origins of Europe should be continuously monitored. The findings of this study provide valuable information and molecular genetic data of genus Ulva occurring in southern coasts of Korea, which will help mitigate negative influences of green tide events on Korea coast.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.

School Experiences and the Next Gate Path : An analysis of Univ. Student activity log (대학생의 학창경험이 사회 진출에 미치는 영향: 대학생활 활동 로그분석을 중심으로)

  • YI, EUNJU;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.149-171
    • /
    • 2020
  • The period at university is to make decision about getting an actual job. As our society develops rapidly and highly, jobs are diversified, subdivided, and specialized, and students' job preparation period is also getting longer and longer. This study analyzed the log data of college students to see how the various activities that college students experience inside and outside of school might have influences on employment. For this experiment, students' various activities were systematically classified, recorded as an activity data and were divided into six core competencies (Job reinforcement competency, Leadership & teamwork competency, Globalization competency, Organizational commitment competency, Job exploration competency, and Autonomous implementation competency). The effect of the six competency levels on the employment status (employed group, unemployed group) was analyzed. As a result of the analysis, it was confirmed that the difference in level between the employed group and the unemployed group was significant for all of the six competencies, so it was possible to infer that the activities at the school are significant for employment. Next, in order to analyze the impact of the six competencies on the qualitative performance of employment, we had ANOVA analysis after dividing the each competency level into 2 groups (low and high group), and creating 6 groups by the range of first annual salary. Students with high levels of globalization capability, job search capability, and autonomous implementation capability were also found to belong to a higher annual salary group. The theoretical contributions of this study are as follows. First, it connects the competencies that can be extracted from the school experience with the competencies in the Human Resource Management field and adds job search competencies and autonomous implementation competencies which are required for university students to have their own successful career & life. Second, we have conducted this analysis with the competency data measured form actual activity and result data collected from the interview and research. Third, it analyzed not only quantitative performance (employment rate) but also qualitative performance (annual salary level). The practical use of this study is as follows. First, it can be a guide when establishing career development plans for college students. It is necessary to prepare for a job that can express one's strengths based on an analysis of the world of work and job, rather than having a no-strategy, unbalanced, or accumulating excessive specifications competition. Second, the person in charge of experience design for college students, at an organizations such as schools, businesses, local governments, and governments, can refer to the six competencies suggested in this study to for the user-useful experiences design that may motivate more participation. By doing so, one event may bring mutual benefits for both event designers and students. Third, in the era of digital transformation, the government's policy manager who envisions the balanced development of the country can make a policy in the direction of achieving the curiosity and energy of college students together with the balanced development of the country. A lot of manpower is required to start up novel platform services that have not existed before or to digitize existing analog products, services and corporate culture. The activities of current digital-generation-college-students are not only catalysts in all industries, but also for very benefit and necessary for college students by themselves for their own successful career development.

Improved Social Network Analysis Method in SNS (SNS에서의 개선된 소셜 네트워크 분석 방법)

  • Sohn, Jong-Soo;Cho, Soo-Whan;Kwon, Kyung-Lag;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.117-127
    • /
    • 2012
  • Due to the recent expansion of the Web 2.0 -based services, along with the widespread of smartphones, online social network services are being popularized among users. Online social network services are the online community services which enable users to communicate each other, share information and expand human relationships. In the social network services, each relation between users is represented by a graph consisting of nodes and links. As the users of online social network services are increasing rapidly, the SNS are actively utilized in enterprise marketing, analysis of social phenomenon and so on. Social Network Analysis (SNA) is the systematic way to analyze social relationships among the members of the social network using the network theory. In general social network theory consists of nodes and arcs, and it is often depicted in a social network diagram. In a social network diagram, nodes represent individual actors within the network and arcs represent relationships between the nodes. With SNA, we can measure relationships among the people such as degree of intimacy, intensity of connection and classification of the groups. Ever since Social Networking Services (SNS) have drawn increasing attention from millions of users, numerous researches have made to analyze their user relationships and messages. There are typical representative SNA methods: degree centrality, betweenness centrality and closeness centrality. In the degree of centrality analysis, the shortest path between nodes is not considered. However, it is used as a crucial factor in betweenness centrality, closeness centrality and other SNA methods. In previous researches in SNA, the computation time was not too expensive since the size of social network was small. Unfortunately, most SNA methods require significant time to process relevant data, and it makes difficult to apply the ever increasing SNS data in social network studies. For instance, if the number of nodes in online social network is n, the maximum number of link in social network is n(n-1)/2. It means that it is too expensive to analyze the social network, for example, if the number of nodes is 10,000 the number of links is 49,995,000. Therefore, we propose a heuristic-based method for finding the shortest path among users in the SNS user graph. Through the shortest path finding method, we will show how efficient our proposed approach may be by conducting betweenness centrality analysis and closeness centrality analysis, both of which are widely used in social network studies. Moreover, we devised an enhanced method with addition of best-first-search method and preprocessing step for the reduction of computation time and rapid search of the shortest paths in a huge size of online social network. Best-first-search method finds the shortest path heuristically, which generalizes human experiences. As large number of links is shared by only a few nodes in online social networks, most nods have relatively few connections. As a result, a node with multiple connections functions as a hub node. When searching for a particular node, looking for users with numerous links instead of searching all users indiscriminately has a better chance of finding the desired node more quickly. In this paper, we employ the degree of user node vn as heuristic evaluation function in a graph G = (N, E), where N is a set of vertices, and E is a set of links between two different nodes. As the heuristic evaluation function is used, the worst case could happen when the target node is situated in the bottom of skewed tree. In order to remove such a target node, the preprocessing step is conducted. Next, we find the shortest path between two nodes in social network efficiently and then analyze the social network. For the verification of the proposed method, we crawled 160,000 people from online and then constructed social network. Then we compared with previous methods, which are best-first-search and breath-first-search, in time for searching and analyzing. The suggested method takes 240 seconds to search nodes where breath-first-search based method takes 1,781 seconds (7.4 times faster). Moreover, for social network analysis, the suggested method is 6.8 times and 1.8 times faster than betweenness centrality analysis and closeness centrality analysis, respectively. The proposed method in this paper shows the possibility to analyze a large size of social network with the better performance in time. As a result, our method would improve the efficiency of social network analysis, making it particularly useful in studying social trends or phenomena.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Application of MicroPACS Using the Open Source (Open Source를 이용한 MicroPACS의 구성과 활용)

  • You, Yeon-Wook;Kim, Yong-Keun;Kim, Yeong-Seok;Won, Woo-Jae;Kim, Tae-Sung;Kim, Seok-Ki
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.13 no.1
    • /
    • pp.51-56
    • /
    • 2009
  • Purpose: Recently, most hospitals are introducing the PACS system and use of the system continues to expand. But small-scaled PACS called MicroPACS has already been in use through open source programs. The aim of this study is to prove utility of operating a MicroPACS, as a substitute back-up device for conventional storage media like CDs and DVDs, in addition to the full-PACS already in use. This study contains the way of setting up a MicroPACS with open source programs and assessment of its storage capability, stability, compatibility and performance of operations such as "retrieve", "query". Materials and Methods: 1. To start with, we searched open source software to correspond with the following standards to establish MicroPACS, (1) It must be available in Windows Operating System. (2) It must be free ware. (3) It must be compatible with PET/CT scanner. (4) It must be easy to use. (5) It must not be limited of storage capacity. (6) It must have DICOM supporting. 2. (1) To evaluate availability of data storage, we compared the time spent to back up data in the open source software with the optical discs (CDs and DVD-RAMs), and we also compared the time needed to retrieve data with the system and with optical discs respectively. (2) To estimate work efficiency, we measured the time spent to find data in CDs, DVD-RAMs and MicroPACS. 7 technologists participated in this study. 3. In order to evaluate stability of the software, we examined whether there is a data loss during the system is maintained for a year. Comparison object; How many errors occurred in randomly selected data of 500 CDs. Result: 1. We chose the Conquest DICOM Server among 11 open source software used MySQL as a database management system. 2. (1) Comparison of back up and retrieval time (min) showed the result of the following: DVD-RAM (5.13,2.26)/Conquest DICOM Server (1.49,1.19) by GE DSTE (p<0.001), CD (6.12,3.61)/Conquest (0.82,2.23) by GE DLS (p<0.001), CD (5.88,3.25)/Conquest (1.05,2.06) by SIEMENS. (2) The wasted time (sec) to find some data is as follows: CD ($156{\pm}46$), DVD-RAM ($115{\pm}21$) and Conquest DICOM Server ($13{\pm}6$). 3. There was no data loss (0%) for a year and it was stored 12741 PET/CT studies in 1.81 TB memory. In case of CDs, On the other hand, 14 errors among 500 CDs (2.8%) is generated. Conclusions: We found that MicroPACS could be set up with the open source software and its performance was excellent. The system built with open source proved more efficient and more robust than back-up process using CDs or DVD-RAMs. We believe that the operation of the MicroPACS would be effective data storage device as long as its operators develop and systematize it.

  • PDF

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

Prediction of Sleep Disturbances in Korean Rural Elderly through Longitudinal Follow Up (추적 관찰을 통한 한국 농촌 노인의 수면 장애 예측)

  • Park, Kyung Mee;Kim, Woo Jung;Choi, Eun Chae;An, Suk Kyoon;Namkoong, Kee;Youm, Yoosik;Kim, Hyeon Chang;Lee, Eun
    • Sleep Medicine and Psychophysiology
    • /
    • v.24 no.1
    • /
    • pp.38-45
    • /
    • 2017
  • Objectives: Sleep disturbance is a very rapidly growing disease with aging. The purpose of this study was to investigate the prevalence of sleep disturbances and its predictive factors in a three-year cohort study of people aged 60 years and over in Korea. Methods: In 2012 and 2014, we obtained data from a survey of the Korean Social Life, Health, and Aging Project. We asked participants if they had been diagnosed with stroke, myocardial infarction, angina pectoris, arthritis, pulmonary tuberculosis, asthma, cataract, glaucoma, hepatitis B, urinary incontinence, prostate hypertrophy, cancer, osteoporosis, hypertension, diabetes, hyperlipidemia, or metabolic syndrome. Cognitive function was assessed using the Mini-Mental State Examination for dementia screening in 2012, and depression was assessed using the Center for Epidemiologic Studies Depression Scale in 2012 and 2014. In 2015, a structured clinical interview for Axis I psychiatric disorders was administered to 235 people, and sleep disturbance was assessed using the Pittsburgh Sleep Quality Index. The perceived stress scale and the State-trait Anger Expression Inventory were also administered. Logistic regression analysis was used to predict sleep disturbance by gender, age, education, depression score, number of coexisting diseases in 2012 and 2014, current anger score, and perceived stress score. Results: Twenty-seven percent of the participants had sleep disturbances. Logistic regression analysis showed that the number of medical diseases three years ago, the depression score one year ago, and the current perceived stress significantly predicted sleep disturbances. Conclusion: Comorbid medical disease three years previous and depressive symptoms evaluated one year previous were predictive of current sleep disturbances. Further studies are needed to determine whether treatment of medical disease and depressive symptoms can improve sleep disturbances.