• Title/Summary/Keyword: Python 3

Search Result 221, Processing Time 0.023 seconds

A Study on Analysis of national R&D research trends for Artificial Intelligence using LDA topic modeling (LDA 토픽모델링을 활용한 인공지능 관련 국가R&D 연구동향 분석)

  • Yang, MyungSeok;Lee, SungHee;Park, KeunHee;Choi, KwangNam;Kim, TaeHyun
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.47-55
    • /
    • 2021
  • Analysis of research trends in specific subject areas is performed by examining related topics and subject changes by using topic modeling techniques through keyword extraction for most of the literature information (paper, patents, etc.). Unlike existing research methods, this paper extracts topics related to the research topic using the LDA topic modeling technique for the project information of national R&D projects provided by the National Science and Technology Knowledge Information Service (NTIS) in the field of artificial intelligence. By analyzing these topics, this study aims to analyze research topics and investment directions for national R&D projects. NTIS provides a vast amount of national R&D information, from information on tasks carried out through national R&D projects to research results (thesis, patents, etc.) generated through research. In this paper, the search results were confirmed by performing artificial intelligence keywords and related classification searches in NTIS integrated search, and basic data was constructed by downloading the latest three-year project information. Using the LDA topic modeling library provided by Python, related topics and keywords were extracted and analyzed for basic data (research goals, research content, expected effects, keywords, etc.) to derive insights on the direction of research investment.

Development and Application of the Butterfly Algorithm Based on Decision Making Tree for Contradiction Problem Solving (모순 문제 해결을 위한 의사결정트리 기반 나비 알고리즘의 개발과 적용)

  • Hyun, Jung Suk;Ko, Ye June;Kim, Yung Gyeol;Jean, Seungjae;Park, Chan Jung
    • The Journal of Korean Association of Computer Education
    • /
    • v.22 no.1
    • /
    • pp.87-98
    • /
    • 2019
  • It is easy to assume that contradictions are logically incorrect or empty sets that have no solvability. This dilemma, which can not be done, is difficult to solve because it has to solve the contradiction hidden in it. Paradoxically, therefore, contradiction resolution has been viewed as an innovative and creative problem-solving. TRIZ, which analyzes the solution of the problem from the perspective of resolving contradictions, has been used for people rather than computers. The Butterfly model, which analyzes the problem from the perspective of solving the contradiction like TRIZ, analyzed the type of contradiction problem using symbolic logic. In order to apply an appropriate concrete solution strategy for a given contradiction problems, we designed the Butterfly algorithm based on decision making tree. We also developed a visualization tool based on Python tkInter to find concrete solution strategies for given contradiction problems. In order to verify the developed tool, the third grade students of middle school learned the Butterfly algorithm, analyzed the contradiction of the wooden support, and won the grand prize at an invention contest in search of a new solution. The Butterfly algorithm developed in this paper systematically reduces the solution space of contradictory problems in the beginning of problem solving and can help solve contradiction problems without trial and errors.

Position of Hungarian Merino among other Merinos, within-breed genetic similarity network and markers associated with daily weight gain

  • Attila, Zsolnai;Istvan, Egerszegi;Laszlo, Rozsa;David, Mezoszentgyorgyi;Istvan, Anton
    • Animal Bioscience
    • /
    • v.36 no.1
    • /
    • pp.10-18
    • /
    • 2023
  • Objective: In this study, we aimed to position the Hungarian Merino among other Merinoderived sheep breeds, explore the characteristics of our sampled animals' genetic similarity network within the breed, and highlight single nucleotide polymorphisms (SNPs) associated with daily weight-gain. Methods: Hungarian Merino (n = 138) was genotyped on Ovine SNP50 Bead Chip (Illumina, San Diego, CA, USA) and positioned among 30 Merino and Merino-derived breeds (n = 555). Population characteristics were obtained via PLINK, SVS, Admixture, and Treemix software, within-breed network was analysed with python networkx 2.3 library. Daily weight gain of Hungarian Merino was standardised to 60 days and was collected from the database of the Association of Hungarian Sheep and Goat Breeders. For the identification of loci associated with daily weight gain, a multi-locus mixed-model was used. Results: Supporting the breed's written history, the closest breeds to Hungarian Merino were Estremadura and Rambouillet (pairwise FST values are 0.035 and 0.036, respectively). Among Hungarian Merino, a highly centralised connectedness has been revealed by network analysis of pairwise values of identity-by-state, where the animal in the central node had a betweenness centrality value equal to 0.936. Probing of daily weight gain against the SNP data of Hungarian Merinos revealed five associated loci. Two of them, OAR8_17854216.1 and s42441.1 on chromosome 8 and 9 (-log10P>22, false discovery rate<5.5e-20) and one locus on chromosome 20, s28948.1 (-log10P = 13.46, false discovery rate = 4.1e-11), were close to the markers reported in other breeds concerning daily weight gain, six-month weight, and post-weaning gain. Conclusion: The position of Hungarian Merino among other Merino breeds has been determined. We have described the similarity network of the individuals to be applied in breeding practices and highlighted several markers useful for elevating the daily weight gain of Hungarian Merino.

RPA Log Mining-based Process Automation Status Analysis - An Empirical Study on SMEs (RPA 로그 마이닝 기반 프로세스 자동화 현황 분석 - 중소기업대상 실증 연구)

  • Young Sik Kang;Jinwoo Jung;Seonyoung Shim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.265-288
    • /
    • 2023
  • Process mining has generally analyzed the default logs of Information Systems such as SAP ERP, but as the use of automation software called RPA expands, the logs by RPA bots can be utilized. In this study, the actual status of RPA automation in the field was identified by applying RPA bots to the work of three domestic manufacturing companies (cosmetic field) and analyzing them after leaving logs. Using Uipath and Python, we implemented RPA bots and wrote logs. We used Disco, a software dedicated to process mining to analyze the bot logs. As a result of log analysis in two aspects of bot utilization and performance through process mining, improvement requirements were found. In particular, we found that there was a point of improvement in all cases in that the utilization of the bot and errors or exceptions were found in many cases of process. Our approach is very scientific and empirical in that it analyzes the automation status and performance of bots using data rather than existing qualitative methods such as surveys or interviews. Furthermore, our study will be a meaningful basic step for bot behavior optimization, and can be seen as the foundation for ultimately performing process management.

Remote Sensing based Algae Monitoring in Dams using High-resolution Satellite Image and Machine Learning (고해상도 위성영상과 머신러닝을 활용한 녹조 모니터링 기법 연구)

  • Jung, Jiyoung;Jang, Hyeon June;Kim, Sung Hoon;Choi, Young Don;Yi, Hye-Suk;Choi, Sunghwa
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.42-42
    • /
    • 2022
  • 지금까지도 유역에서의 녹조 모니터링은 현장채수를 통한 점 단위 모니터링에 크게 의존하고 있어 기후, 유속, 수온조건 등에 따라 수체에 광범위하게 발생하는 녹조를 효율적으로 모니터링하고 대응하기에는 어려운 점들이 있어왔다. 또한, 그동안 제한된 관측 데이터로 인해 현장 측정된 실측 데이터 보다는 녹조와 관련이 높은 NDVI, FGAI, SEI 등의 파생적인 지수를 산정하여 원격탐사자료와 매핑하는 방식의 분석연구 등이 선행되었다. 본 연구는 녹조의 모니터링시 정확도와 효율성을 향상을 목표로 하여, 우선은 녹조 측정장비를 활용, 7000개 이상의 녹조 관측 데이터를 확보하였으며, 이를 바탕으로 동기간의 고해상도 위성 자료와 실측자료를 매핑하기 위해 다양한Machine Learning기법을 적용함으로써 그 효과성을 검토하고자 하였다. 연구대상지는 낙동강 내성천 상류에 위치한 영주댐 유역으로서 데이터 수집단계에서는 면단위 현장(in-situ) 관측을 위해 2020년 2~9월까지 4회에 걸쳐 7291개의 녹조를 측정하고, 동일 시간 및 공간의 Sentinel-2자료 중 Band 1~12까지 총 13개(Band 8은 8과 8A로 2개)의 분광특성자료를 추출하였다. 다음으로 Machine Learning 분석기법의 적용을 위해 algae_monitoring Python library를 구축하였다. 개발된 library는 1) Training Set과 Test Set의 구분을 위한 Data 준비단계, 2) Random Forest, Gradient Boosting Regression, XGBoosting 알고리즘 중 선택하여 적용할 수 있는 모델적용단계, 3) 모델적용결과를 확인하는 Performance test단계(R2, MSE, MAE, RMSE, NSE, KGE 등), 4) 모델결과의 Visualization단계, 5) 선정된 모델을 활용 위성자료를 녹조값으로 변환하는 적용단계로 구분하여 영주댐뿐만 아니라 다양한 유역에 범용적으로 적용할 수 있도록 구성하였다. 본 연구의 사례에서는 Sentinel-2위성의 12개 밴드, 기상자료(대기온도, 구름비율) 총 14개자료를 활용하여 Machine Learning기법 중 Random Forest를 적용하였을 경우에, 전반적으로 가장 높은 적합도를 나타내었으며, 적용결과 Test Set을 기준으로 NSE(Nash Sutcliffe Efficiency)가 0.96(Training Set의 경우에는 0.99) 수준의 성능을 나타내어, 광역적인 위성자료와 충분히 확보된 현장실측 자료간의 데이터 학습을 통해서 조류 모니터링 분석의 효율성이 획기적으로 증대될 수 있음을 확인하였다.

  • PDF

Three-dimensional analysis of the positional relationship between the dentition and basal bone region in patients with skeletal Class I and Class II malocclusion with mandibular retrusion

  • Jun Wan;Xi Wen;Jing Geng;Yan Gu
    • The korean journal of orthodontics
    • /
    • v.54 no.3
    • /
    • pp.171-184
    • /
    • 2024
  • Objective: This study aimed to determine the maxillary and mandibular basal bone regions and explore the three-dimensional positional relationship between the dentition and basal bone regions in patients with skeletal Class I and Class II malocclusions with mandibular retrusion. Methods: Eighty patients (40 each with Class I and Class II malocclusion) were enrolled. Maxillary and mandibular basal bone regions were determined using cone-beam computed tomography images. To measure the relationship between the dentition and basal bone region, the root position and root inclination were calculated using the coordinates of specific fixed points by a computer program written in Python. Results: In the Class II group, the mandibular anterior teeth inclined more labially (P < 0.05), with their apices positioned closer to the external boundary. The apex of the maxillary anterior root was positioned closer to the external boundary in both groups. Considering the molar region, the maxillary first molars tended to be more lingually inclined in females (P = 0.037), whereas the mandibular first molars were significantly more labially inclined in the Class II group (P < 0.05). Conclusions: Mandibular anterior teeth in Class II malocclusion exhibit a compensatory labial inclination trend with the crown and apex relative to the basal bone region when mandibular retrusion occurs. Moreover, as the root apices of the maxillary anterior teeth are much closer to the labial side in Class I and Class II malocclusion, the range of movement at the root apex should be limited to avoid extensive labial movement.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Perception and Appraisal of Urban Park Users Using Text Mining of Google Maps Review - Cases of Seoul Forest, Boramae Park, Olympic Park - (구글맵리뷰 텍스트마이닝을 활용한 공원 이용자의 인식 및 평가 - 서울숲, 보라매공원, 올림픽공원을 대상으로 -)

  • Lee, Ju-Kyung;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.4
    • /
    • pp.15-29
    • /
    • 2021
  • The study aims to grasp the perception and appraisal of urban park users through text analysis. This study used Google review data provided by Google Maps. Google Maps Review is an online review platform that provides information evaluating locations through social media and provides an understanding of locations from the perspective of general reviewers and regional guides who are registered as members of Google Maps. The study determined if the Google Maps Reviews were useful for extracting meaningful information about the user perceptions and appraisals for parks management plans. The study chose three urban parks in Seoul, South Korea; Seoul Forest, Boramae Park, and Olympic Park. Review data for each of these three parks were collected via web crawling using Python. Through text analysis, the keywords and network structure characteristics for each park were analyzed. The text was analyzed, as were park ratings, and the analysis compared the reviews of residents and foreign tourists. The common keywords found in the review comments for the three parks were "walking", "bicycle", "rest" and "picnic" for activities, "family", "child" and "dogs" for accompanying types, and "playground" and "walking trail" for park facilities. Looking at the characteristics of each park, Seoul Forest shows many outdoor activities based on nature, while the lack of parking spaces and congestion on weekends negatively impacted users. Boramae Park has the appearance of a city park, with various facilities providing numerous activities, but reviewers often cited the park's complexity and the negative aspects in terms of dog walking groups. At Olympic Park, large-scale complex facilities and cultural events were frequently mentioned, emphasizing its entertainment functions. Google Maps Review can function as useful data to identify parks' overall users' experiences and general feelings. Compared to data from other social media sites, Google Maps Review's data provides ratings and understanding factors, including user satisfaction and dissatisfaction.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Analysis of Football Fans' Uniform Consumption: Before and After Son Heung-Min's Transfer to Tottenham Hotspur FC (국내 프로축구 팬들의 유니폼 소비 분석: 손흥민의 토트넘 홋스퍼 FC 이적 전후 비교)

  • Choi, Yeong-Hyeon;Lee, Kyu-Hye
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.91-108
    • /
    • 2020
  • Korea's famous soccer players are steadily performing well in international leagues, which led to higher interests of Korean fans in the international leagues. Reflecting the growing social phenomenon of rising interests on international leagues by Korean fans, the study examined the overall consumer perception in the consumption of uniform by domestic soccer fans and compared the changes in perception following the transfers of the players. Among others, the paper examined the consumer perception and purchase factors of soccer fans shown in social media, focusing on periods before and after the recruitment of Heung-Min Son to English Premier League's Tottenham Football Club. To this end, the EPL uniform is the collection keyword the paper utilized and collected consumer postings from domestic website and social media via Python 3.7, and analyzed them using Ucinet 6, NodeXL 1.0.1, and SPSS 25.0 programs. The results of this study can be summarized as follows. First, the uniform of the club that consistently topped the league, has been gaining attention as a popular uniform, and the players' performance, and the players' position have been identified as key factors in the purchase and search of professional football uniforms. In the case of the club, the actual ranking and whether the league won are shown to be important factors in the purchase and search of professional soccer uniforms. The club's emblem and the sponsor logo that will be attached to the uniform are also factors of interest to consumers. In addition, in the decision making process of purchase of a uniform by professional soccer fan, uniform's form, marking, authenticity, and sponsors are found to be more important than price, design, size, and logo. The official online store has emerged as a major purchasing channel, followed by gifts for friends or requests from acquaintances when someone travels to the United Kingdom. Second, a classification of key control categories through the convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm shows differences in the classification of individual groups, but groups that include the EPL's club and player keywords are identified as the key topics in relation to professional football uniforms. Third, between 2002 and 2006, the central theme for professional football uniforms was World Cup and English Premier League, but from 2012 to 2015, the focus has shifted to more interest of domestic and international players in the English Premier League. The subject has changed to the uniform itself from this time on. In this context, the paper can confirm that the major issues regarding the uniforms of professional soccer players have changed since Ji-Sung Park's transfer to Manchester United, and Sung-Yong Ki, Chung-Yong Lee, and Heung-Min Son's good performances in these leagues. The paper also identified that the uniforms of the clubs to which the players have transferred to are of interest. Fourth, both male and female consumers are showing increasing interest in Son's league, the English Premier League, which Tottenham FC belongs to. In particular, the increasing interest in Son has shown a tendency to increase interest in football uniforms for female consumers. This study presents a variety of researches on sports consumption and has value as a consumer study by identifying unique consumption patterns. It is meaningful in that the accuracy of the interpretation has been enhanced by using a cluster analysis via convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm to identify the main topics. Based on the results of this study, the clubs will be able to maximize its profits and maintain good relationships with fans by identifying key drivers of consumer awareness and purchasing for professional soccer fans and establishing an effective marketing strategy.