• Title/Summary/Keyword: 계층분류

Search Result 925, Processing Time 0.03 seconds

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

Topic Modeling Insomnia Social Media Corpus using BERTopic and Building Automatic Deep Learning Classification Model (BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축)

  • Ko, Young Soo;Lee, Soobin;Cha, Minjung;Kim, Seongdeok;Lee, Juhee;Han, Ji Yeong;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.2
    • /
    • pp.111-129
    • /
    • 2022
  • Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from 'insomnia', a community on 'Reddit', a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups ('Negative emotions', 'Advice and help and gratitude', 'Insomnia-related diseases', 'Sleeping pills', 'Exercise and eating habits', 'Physical characteristics', 'Activity characteristics', 'Environmental characteristics') could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

Weights for Evaluation items of Conformity index of Bird breeding sites on the West and South coasts of Korea (서·남해 연안성 조류번식지 적합성지수 평가항목 가중치 설정)

  • Kim, Chang-Hyeon;Kim, Won-Bin;Kim, Kyou-Sub;Lee, Chang-Hun
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.41 no.4
    • /
    • pp.40-48
    • /
    • 2023
  • This study is part of a foundational research effort aimed at developing a suitability index for breeding grounds related to avian activities along the domestic South and West coasts, including islands. Focus Group Interviews (FGI) and Analytic Hierarchy Process (AHP) analyses were conducted. The results are as follows. First, as a result of determining the value of the suitability of coastal bird breeding sites, the 'Natural Value(0.763)' was higher than the 'Artificial Value(0.237)'. Other artificial values were identified as sub-ranked except for 'Protected Areas' to ensure continuous integrity of breeding spaces. Second, as a result of re-establishing the 25 evaluation items classified in the two-time FGI as higher concepts, nine natural values and five artificial values were finally selected as a total of 14. Third, the results of the mid-classification evaluation of the importance of the suitability of coastal bird breeding sites were identified in the order of 'Ecological Value(0.392)', 'Topographic Value(0.251)', 'Passive Interference(0.124)', 'Geological Value(0.120)', and 'Active Interference(0.113)'. Fourth, the results of the priority of evaluation items of coastal bird breeding sites were in the order of 'Vegetation Distribution (0.187)', 'Area of Mudflats(0.118)', 'Presence or Absence of Mudflats(0.092)', 'Appearance of Natural Enemies(0.087)', 'Protected Areas(0.08)', 'Island Area (0.069)', 'Over-Breeding devastation(0.064)', 'Soil Composition Ratio(0.056)', 'Distance from Land(0.054)', 'Ocean farm area (0.045)', 'Cultivated land area(0.041)', 'Cultivation behavior(0.038)', 'Angle of the Surface(0.036)', and 'Land Use(0.033)'. It is judged that the weighting result value of the evaluation items derived in this study can be used for priority evaluation focusing on the coastal bird breeding area space. However, it seems that the correlation with the unique habitat suitability of bird individuals needs to be supplemented, and spatial analysis research incorporating species-specific characteristics will be left as a future task.

Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)

  • Jeon, Min Jin;Hwang, Ji Won;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-22
    • /
    • 2021
  • Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.

Ecological Studies on the Forest Vegetation in the Mt. Joghe (조계산(曹溪山) 삼림식생(森林植生)의 생태학적(生態學的) 연구(硏究))

  • Chang, Seok Mo
    • Journal of Korean Society of Forest Science
    • /
    • v.80 no.1
    • /
    • pp.54-71
    • /
    • 1991
  • To classify and analyze the forest communities and their structures, the vegetation in Mt. Joghe was investigated from July, 1980 to August, 1989. The results obtained are as follow ; 1. A total of 750 kinds of vascular plant(49 orders, 122 families, 434 genera, 627 species, 1 subspecies, 111 varieties and 11 forma)were observed in Mt. Ioghe. The newly observed plant species were Dioscorea quingueloba, Spiranthes sinensis, Cephalanthera falcata, Angelica gigas, Clematis patents, Paeonia obovata, Hibiscus mulabilis, Ainsliaea acerifolia, Dictamnus dasycarpus, Cynachum ascyrifolia, Vaccinium koreanum, Erythrortium japonicum, Indigofera kirilowii (17species), Broussonetia kazinoki var, humillis, Euonymus, fortunei var. radicans, Juniperus communis var, nippnnica, Callicarpa japonica var. radicans, Joniperus communis var. rzipponica, Callicarpa japonica var. taquetii (4 varieties) and L indera obtusiloba for. billosum (1 forma). 2. The life spectrum of flora in Mt. Joghe was classified into $CH-D_1-R_5-e$ type. Distribution area was identical to Southern type by Nakai, Lee, and Yim. A few subtropical species were also observed. 3. Simpson's species diversity index(Ds) was 0.9 and Shannon-Weiner's diversity index (H') was 1.004. These indice suggest that the vegetation in Mt. Joghe is of complicated forest communities. 4. Pte-Q was 1.81 which was higher than the nationwide mean of 1.68. Urbanization Index (UI) was 28.75 for naturalized plant species, and 17.49 for exotic woody plant species, which were similar to those of Mt. Baekun and Mt. Naejang. 5. The forest vegetation of Mt. Joghe was grouped in 3 vegetation types : 7 natural plant Communities dominated by Quercus serrat, Quercus acutissima, Quercus variabilis, Carpinus laxiflora, Pinus derasiflora and Platycarya strobilacea, 8 substitutional plant communities Styrax japonica, Stewartia koreana, Lindera erytlrrocarpa, Zelkova serrata, Rhtrs chinensis, Controversa, and Frzrxirtus manrlshurica, and 7 plantation Communities composed of Pinus koraiensis, Pinus rigida, Magnolia nbnvata, Chamecyparis obkrsa, Larie ieptolepis, Castanea crenata and Cryptomeria japonica. 6. Actual vegetation maps and profile diagrams were made by phytosocialogical classification. 7. As the important and unique species in Mt. Joghe, Lindera sericea, Penicaria tilitorme, Hex macropoda, Hex macropoda for. pseudo-macropoda, Steroartia koreana, Adenopkora palustris and Corylop.,is coreana, which were also seported by Lee(1977), Kim and Yark(1989), were identified and Vaccinium coreanum, Cremastra appendiculinium, Juniperus comminis van. nipponica, Cephalanthera falcata, Broussortetia kazinoki var. humilis, paeonia obovata, Deutzia prunifolia, Dictamnus dasyarpus, Angelica gigics and Bupleurum falcatum were odditionally observed.

  • PDF

Implementation Strategy for the Elderly Care Solution Based on Usage Log Analysis: Focusing on the Case of Hyodol Product (사용자 로그 분석에 기반한 노인 돌봄 솔루션 구축 전략: 효돌 제품의 사례를 중심으로)

  • Lee, Junsik;Yoo, In-Jin;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.117-140
    • /
    • 2019
  • As the aging phenomenon accelerates and various social problems related to the elderly of the vulnerable are raised, the need for effective elderly care solutions to protect the health and safety of the elderly generation is growing. Recently, more and more people are using Smart Toys equipped with ICT technology for care for elderly. In particular, log data collected through smart toys is highly valuable to be used as a quantitative and objective indicator in areas such as policy-making and service planning. However, research related to smart toys is limited, such as the development of smart toys and the validation of smart toy effectiveness. In other words, there is a dearth of research to derive insights based on log data collected through smart toys and to use them for decision making. This study will analyze log data collected from smart toy and derive effective insights to improve the quality of life for elderly users. Specifically, the user profiling-based analysis and elicitation of a change in quality of life mechanism based on behavior were performed. First, in the user profiling analysis, two important dimensions of classifying the type of elderly group from five factors of elderly user's living management were derived: 'Routine Activities' and 'Work-out Activities'. Based on the dimensions derived, a hierarchical cluster analysis and K-Means clustering were performed to classify the entire elderly user into three groups. Through a profiling analysis, the demographic characteristics of each group of elderlies and the behavior of using smart toy were identified. Second, stepwise regression was performed in eliciting the mechanism of change in quality of life. The effects of interaction, content usage, and indoor activity have been identified on the improvement of depression and lifestyle for the elderly. In addition, it identified the role of user performance evaluation and satisfaction with smart toy as a parameter that mediated the relationship between usage behavior and quality of life change. Specific mechanisms are as follows. First, the interaction between smart toy and elderly was found to have an effect of improving the depression by mediating attitudes to smart toy. The 'Satisfaction toward Smart Toy,' a variable that affects the improvement of the elderly's depression, changes how users evaluate smart toy performance. At this time, it has been identified that it is the interaction with smart toy that has a positive effect on smart toy These results can be interpreted as an elderly with a desire to meet emotional stability interact actively with smart toy, and a positive assessment of smart toy, greatly appreciating the effectiveness of smart toy. Second, the content usage has been confirmed to have a direct effect on improving lifestyle without going through other variables. Elderly who use a lot of the content provided by smart toy have improved their lifestyle. However, this effect has occurred regardless of the attitude the user has toward smart toy. Third, log data show that a high degree of indoor activity improves both the lifestyle and depression of the elderly. The more indoor activity, the better the lifestyle of the elderly, and these effects occur regardless of the user's attitude toward smart toy. In addition, elderly with a high degree of indoor activity are satisfied with smart toys, which cause improvement in the elderly's depression. However, it can be interpreted that elderly who prefer outdoor activities than indoor activities, or those who are less active due to health problems, are hard to satisfied with smart toys, and are not able to get the effects of improving depression. In summary, based on the activities of the elderly, three groups of elderly were identified and the important characteristics of each type were identified. In addition, this study sought to identify the mechanism by which the behavior of the elderly on smart toy affects the lives of the actual elderly, and to derive user needs and insights.

A study on the readability of web interface for the elderly user -Focused on readability of Typeface- (고령사용자를 위한 웹 인터페이스에서의 가독성에 관한 연구 -Typeface의 가독성을 중심으로-)

  • Lee, Hyun-Ju;Woo, Seo-Hye;Park, Eun-Young;Suh, Hye-Young;Back, Seung-Chul
    • Archives of design research
    • /
    • v.20 no.3 s.71
    • /
    • pp.315-324
    • /
    • 2007
  • The fast development of the information technology makes Korea one of the most advanced countries in information communication in the world in a short period of time. However, the gap between the aged and the young has been seriously increased. Those who are less than 10% of the older adults are using the internet at present. It means the elderly has many difficulties in using the internet because of their physical and cognitive differences. The purpose of this study is that the aged can easily achieve and use information by developing a guidelines for the Korean typography in the web interface. A literature search was conducted on the web interface design guidelines for older adults. These guidelines were classified by interface component and the study subjects needed for the Korean internet environment were selected. The subjects are a more comfortably readable typeface according to the sizes, a proper text size of Gulim and Batang, a more comfortably readable leading size, the appropriate letter spacing, the proper line length of body, the suitable size proportion between a title and a body, and a more comfortably readable text alignment. Survey questions were made and these Questions were improved after the pretest. Both online and offline survey programs were written and the aged and the young were tested with these programs. The result of this survey shows that there are satisfaction differences between the aged and the young in the readability and legibility of the web contents. Therefore these universal guidelines to be used in the Korean typographical environment for the future aged population were specified. It is expected that this study will be used as basic data for the universal web interface where the older adults can easily use and acquire information.

  • PDF

The Rsearch Trends of Papers in the Journal of Dental Hygiene Science (한국치위생과학회지 게재논문의 연구경향 분석)

  • Lee, Sun-Mi;Ahn, Se-Youn;Han, Hwa-Jin;Han, Ji-Youn;Lee, Chun-Sun;Kim, Chang-Hee
    • Journal of dental hygiene science
    • /
    • v.14 no.1
    • /
    • pp.67-73
    • /
    • 2014
  • This study analyzed 548 pieces of these, which were reported in the Journal of Dental Hygiene Science of having been published from 2001 to Vol. 12, No. 6 in 2012. In conclusion, as for analysis of research design, first, it was the largest in cross sectional research. Second, the research subjects of survey theses were higher in order of dental hygienist and dental hygiene student. Third, number of thesis authors was the largest in order of two persons and three persons. Fourth, statistical method was in order of descriptive statistics, t-test, and ANOVA. Research theme was in order of dental health behavioral science and clinical dental hygiene. Fifth, as for research-expense benefit, only 17.7% was supported research funds. As a result of this study, there should be a research on thesis of diverse designs in the future. There is a need of being performed actively a research on alienated classer or special subjects as well as a research on activity related to dental hygiene.

Quality Characteristics of Doenjang by Aging Period (전통 된장의 숙성 기간에 따른 감각·화학적 품질특성)

  • Ku, Kyung-Hyung;Park, Kyungmin;Kim, Hyun Jung;Kim, Yoonsook;Koo, Minseon
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.43 no.5
    • /
    • pp.720-728
    • /
    • 2014
  • In order to characterize the quality of Doenjang, fermented Korean soybean paste, subjected to long-term aging, this study performed physico-chemical analyses and sensory evaluation according to aging period (from 1 to 9 years). Regarding the proximate composition of Doenjang according to aging period, moisture, crude protein, crude lipid, crude ash, and salt contents showed little differences among Doenjang samples. Amino-type nitrogen content was 1,046.7 mg% in the 1 year-aged sample, 990.9~996.9 mg% in the 2~5 year-aged samples, and 1,214.1~1,304.8 mg% in the samples fermented more than 5 years. ${\Delta}$E value, reflecting total color differences between the samples, increased according to aging period. Ratios of linoleic and linolenic acids, which are essential fatty acids in soybeans, constituted 55% of total fatty acids, which was the most abundant among all fatty acids. The major free sugar in Doenjang was fructose at a content of 1.6~2.2% in 1~9 year-aged Doenjang. Glycoside form of isoflavones in Meju constituted 77.1%, and the aglycon form constituted 22.9%. However, the glycoside type of isoflavones in soybeans was converted to aglycon type in Doenjang through fermentation and aging. In the sensory evaluation of Doenjang samples, brown color, salt smell, soy sauce flavor, and viscosity all increased according to aging period, whereas sweet flavor, roast smell, beany flavor, salty taste, and acrid taste showed no significant differences. In cluster analysis of the sensory attributes of Doenjang according to aging period, 1 year-aged Doenjang was significantly different between 2 year- and 3~5 year-aged Doenjang.

Analysis of Effect of Environment on Growth and Yield of Autumn Kimchi Cabbage in Jeonnam Province using Big Data (빅데이터를 활용한 재배환경이 전라남도 지방 가을배추의 생육과 수량에 미치는 영향 분석)

  • Wi, Seung Hwan;Lee, Hee Ju;Yu, In Ho;Jang, YoonAh;Yeo, Kyung-Hwan;An, Sewoong;Lee, Jin Hyoung
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.3
    • /
    • pp.183-193
    • /
    • 2020
  • This study was conducted to evaluate the effect of environment factors on the growth of autumn season cultivation of Kimchi cabbage using the big data in terms of public open data(weather, soil information, and growth of crop, etc.). The growth data and the environment data such as temperature, daylength, and rainfall from 2010 to 2019 were collected. As a result of composing the correlation matrix, the height and leaf number showed high correlation in growing degree days(GDDs) and daylength, and the yield showed negative correlation in growing degree days and the concentration of clay. GDDs and daylength explained about 89% and 84% of variation in height, respectively. These two environmental factors also explained about 85% and 79% of variation in leaf numbers, respectively. In contrast, the coefficient of determination was low for yield when GDDs and concentration of clay was used. The outcome of regional statistical analysis indicated that relationship between yield and sum of sand and silt were high in Haenam and Jindo areas. Hierarchical cluster analysis, which was performed to verify the association of yield, GDDs, and concentration of clay, showed that Haenam and Jindo were clustered together. Although GDDs and yield vary by year and region, and there are regions with similar concentration of clays, observation data are grouped as the result. These suggests that GDDs and soil texture are expected to be related to yield. The cluster analysis results can be used for further data analysis and agricultural policy establishment.