• Title/Summary/Keyword: machine-tool

Search Result 4,182, Processing Time 0.028 seconds

Causal inference from nonrandomized data: key concepts and recent trends (비실험 자료로부터의 인과 추론: 핵심 개념과 최근 동향)

  • Choi, Young-Geun;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.173-185
    • /
    • 2019
  • Causal questions are prevalent in scientific research, for example, how effective a treatment was for preventing an infectious disease, how much a policy increased utility, or which advertisement would give the highest click rate for a given customer. Causal inference theory in statistics interprets those questions as inferring the effect of a given intervention (treatment or policy) in the data generating process. Causal inference has been used in medicine, public health, and economics; in addition, it has received recent attention as a tool for data-driven decision making processes. Many recent datasets are observational, rather than experimental, which makes the causal inference theory more complex. This review introduces key concepts and recent trends of statistical causal inference in observational studies. We first introduce the Neyman-Rubin's potential outcome framework to formularize from causal questions to average treatment effects as well as discuss popular methods to estimate treatment effects such as propensity score approaches and regression approaches. For recent trends, we briefly discuss (1) conditional (heterogeneous) treatment effects and machine learning-based approaches, (2) curse of dimensionality on the estimation of treatment effect and its remedies, and (3) Pearl's structural causal model to deal with more complex causal relationships and its connection to the Neyman-Rubin's potential outcome model.

Topic Model Analysis of Research Themes and Trends in the Journal of Economic and Environmental Geology (기계학습 기반 토픽모델링을 이용한 학술지 "자원환경지질"의 연구주제 분류 및 연구동향 분석)

  • Kim, Taeyong;Park, Hyemin;Heo, Junyong;Yang, Minjune
    • Economic and Environmental Geology
    • /
    • v.54 no.3
    • /
    • pp.353-364
    • /
    • 2021
  • Since the mid-twentieth century, geology has gradually evolved as an interdisciplinary context in South Korea. The journal of Economic and Environmental Geology (EEG) has a long history of over 52 years and published interdisciplinary articles based on geology. In this study, we performed a literature review using topic modeling based on Latent Dirichlet Allocation (LDA), an unsupervised machine learning model, to identify geological topics, historical trends (classic topics and emerging topics), and association by analyzing titles, keywords, and abstracts of 2,571 publications in EEG during 1968-2020. The results showed that 8 topics ('petrology and geochemistry', 'hydrology and hydrogeology', 'economic geology', 'volcanology', 'soil contaminant and remediation', 'general and structural geology', 'geophysics and geophysical exploration', and 'clay mineral') were identified in the EEG. Before 1994, classic topics ('economic geology', 'volcanology', and 'general and structure geology') were dominant research trends. After 1994, emerging topics ('hydrology and hydrogeology', 'soil contaminant and remediation', 'clay mineral') have arisen, and its portion has gradually increased. The result of association analysis showed that EEG tends to be more comprehensive based on 'economic geology'. Our results provide understanding of how geological research topics branch out and merge with other fields using a useful literature review tool for geological research in South Korea.

A Study on Elementary Education Examples for Data Science using Entry (엔트리를 활용한 초등 데이터 과학 교육 사례 연구)

  • Hur, Kyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.24 no.5
    • /
    • pp.473-481
    • /
    • 2020
  • Data science starts with small data analysis and includes machine learning and deep learning for big data analysis. Data science is a core area of artificial intelligence technology and should be systematically reflected in the school curriculum. For data science education, The Entry also provides a data analysis tool for elementary education. In a big data analysis, data samples are extracted and analysis results are interpreted through statistical guesses and judgments. In this paper, the big data analysis area that requires statistical knowledge is excluded from the elementary area, and data science education examples focusing on the elementary area are proposed. To this end, the general data science education stage was explained first, and the elementary data science education stage was newly proposed. After that, an example of comparing values of data variables and an example of analyzing correlations between data variables were proposed with public small data provided by Entry, according to the elementary data science education stage. By using these Entry data-analysis examples proposed in this paper, it is possible to provide data science convergence education in elementary school, with given data generated from various subjects. In addition, data science educational materials combined with text, audio and video recognition AI tools can be developed by using the Entry.

Development of Cloud-Based Medical Image Labeling System and It's Quantitative Analysis of Sarcopenia (클라우드기반 의료영상 라벨링 시스템 개발 및 근감소증 정량 분석)

  • Lee, Chung-Sub;Lim, Dong-Wook;Kim, Ji-Eon;Noh, Si-Hyeong;Yu, Yeong-Ju;Kim, Tae-Hoon;Yoon, Kwon-Ha;Jeong, Chang-Won
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.7
    • /
    • pp.233-240
    • /
    • 2022
  • Most of the recent AI researches has focused on developing AI models. However, recently, artificial intelligence research has gradually changed from model-centric to data-centric, and the importance of learning data is getting a lot of attention based on this trend. However, it takes a lot of time and effort because the preparation of learning data takes up a significant part of the entire process, and the generation of labeling data also differs depending on the purpose of development. Therefore, it is need to develop a tool with various labeling functions to solve the existing unmetneeds. In this paper, we describe a labeling system for creating precise and fast labeling data of medical images. To implement this, a semi-automatic method using Back Projection, Grabcut techniques and an automatic method predicted through a machine learning model were implemented. We not only showed the advantage of running time for the generation of labeling data of the proposed system, but also showed superiority through comparative evaluation of accuracy. In addition, by analyzing the image data set of about 1,000 patients, meaningful diagnostic indexes were presented for men and women in the diagnosis of sarcopenia.

Analysis of the Relationship between Macpa Stress Index and Korean Job Stress Level - Focusing on Subway Construction Workers (맥파 스트레스와 한국인 직무스트레스의 상관관계 분석 - 도시철도 건설종사자를 대상으로)

  • Chae, Joung Sik;Lee, Yu Jeong;Chang, Seong Rok
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.1
    • /
    • pp.64-69
    • /
    • 2022
  • The study measured a subway construction worker's Macpa stress by Heart Rate Variability measuring instrument and conducted a survey of Korean job stress from subway construction workers. Also, the study analyzed the relationship between Macpa stress index and Korean job stress result and suggested managing stress method for each item. According to National Statistical Office data, the first line subway in Seoul was started to open in 1974. The extended total length is 996 kilometers until 2019. Many aged workers are currently working at subway construction sites due to the avoidance of young workers since the past until now. It means that the elderly has a substantial portion among subway construction workers. The productivity has been adversely affected by health problems due to the aging of workers, job stress due to heavy work, and personal health problems. So, the regulation and policies on job stress health management are being strengthened. The data were measured Macpa stress by machine measuring heart rate variability and conducted Korean job stress survey(shortened) from Sa-sang to Ha-dan line Busan subway construction workers for analyzing the relationship. Independent variable were age, job duration, job position, employment type, working type in this study. Macpa's dependent variable was stress index and Korean job stress survey(shortened)'s dependent variables were job requirements, job autonomy, relationship conflict, job instability, organizational structure, inappropriate compensation, working place culture, and total score. SPSS 12.0 K Statistics Program was used for statistical analysis. Kruskal-wallis test, a nonparametric statistical analysis, was used because the data are difficult to be assumed as normal distribution. As a result, the paper indicated the significant correlation between Macpa stress index and Korean job stress(short version). The elderly workers presented higher Macpa index and higher job stress due to aging and heavy-duty work. The majority workers were daily workers who had unstable working condition and uncertainty about the future. The study suggested a manual that could reduce job stress for subway construction workers and future study deriving management tool through analyzing job stress factor is necessary.

Development of an intelligent skin condition diagnosis information system based on social media

  • Kim, Hyung-Hoon;Ohk, Seung-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.8
    • /
    • pp.241-251
    • /
    • 2022
  • Diagnosis and management of customer's skin condition is an important essential function in the cosmetics and beauty industry. As the social media environment spreads and generalizes to all fields of society, the interaction of questions and answers to various and delicate concerns and requirements regarding the diagnosis and management of skin conditions is being actively dealt with in the social media community. However, since social media information is very diverse and atypical big data, an intelligent skin condition diagnosis system that combines appropriate skin condition information analysis and artificial intelligence technology is necessary. In this paper, we developed the skin condition diagnosis system SCDIS to intelligently diagnose and manage the skin condition of customers by processing the text analysis information of social media into learning data. In SCDIS, an artificial neural network model, AnnTFIDF, that automatically diagnoses skin condition types using artificial neural network technology, a deep learning machine learning method, was built up and used. The performance of the artificial neural network model AnnTFIDF was analyzed using test sample data, and the accuracy of the skin condition type diagnosis prediction value showed a high performance of about 95%. Through the experimental and performance analysis results of this paper, SCDIS can be evaluated as an intelligent tool that can be used efficiently in the skin condition analysis and diagnosis management process in the cosmetic and beauty industry. And this study can be used as a basic research to solve the new technology trend, customized cosmetics manufacturing and consumer-oriented beauty industry technology demand.

A study on the prediction of aquatic ecosystem health grade in ungauged rivers through the machine learning model based on GAN data (GAN 데이터 기반의 머신러닝 모델을 통한 미계측 하천에서의 수생태계 건강성 등급 예측 방안 연구)

  • Lee, Seoro;Lee, Jimin;Lee, Gwanjae;Kim, Jonggun;Lim, Kyoung Jae
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.448-448
    • /
    • 2021
  • 최근 급격한 기후변화와 도시화 및 산업화로 인한 지류하천에서의 수량과 수질의 변동은 생물 다양성 감소와 수생태계 건강성 저하에 큰 영향을 미치고 있다. 효율적인 수생태 관리를 위해서는 지속적인 유량, 수질, 그리고 수생태 모니터링을 통한 데이터 축적과 더불어 면밀한 상관 분석을 통해 수생태계 건강성의 악화 원인을 규명해야 할 필요가 있다. 그러나 수많은 지류하천을 대상으로 한 지속적인 모니터링은 현실적으로 어려움이 있으며, 수생태계의 특성 상 단일 영향 인자만으로 수생태계의 건강성 변화와의 관계를 정확히 파악하는데 한계가 있다. 따라서 지류하천에서의 유량 및 수질의 시공간적인 변동성과 다양한 영향 인자를 고려하여 수생태계의 건강성을 효율적으로 예측할 수 있는 기술이 필요하다. 이에 본 연구에서는 경험적 데이터 기반의 머신러닝 모델 구축을 통해 미계측 하천에서의 수생태계 건강성 지수(BMI, TDI, FAI)의 등급(A to E)을 예측하고자 하였다. 머신러닝 모델은 학습 데이터셋의 양과 질에 따라 성능이 크게 달라질 수 있으며, 학습 데이터셋의 분포가 불균형적일 경우 과적합 또는 과소적합 문제가 발생할 수 있다. 이를 보완하고자 본 연구에서는 실제 측정망 데이터셋을 바탕으로 생성적 적대 신경망 GAN(Generative Adversarial Network) 알고리즘을 통해 머신러닝 모델 학습에 필요한 추가 데이터셋(유량, 수질, 기상, 수생태 등급)을 확보하였다. 머신러닝 모델의 성능은 5차 교차검증 과정을 통해 평가하였으며, GAN 데이터셋의 정확도는 실제 측정망 데이터셋의 정규분포와의 비교 분석을 통해 평가하였다. 최종적으로 SWAT(Soil and Water Assessment Tool) 모형을 통해 예측 된 미계측 하천에서의 데이터셋을 머신러닝 모델의 검증 자료로 사용하여 수생태계 건강성 등급 예측 정확도를 평가하였다. 본 연구에서의 GAN에 의해 강화된 머신러닝 모델은 수질 및 수생태 관리가 필요한 우심 지류하천 선정과 구조적/비구조적 최적관리기법에 따른 수생태계 건강성 개선 효과를 평가하는데 활용될 수 있을 것이다. 또한 이를 통해 예측된 미계측 하천에서의 수생태계 건강성 등급 자료는 수량-수질-수생태를 유기적으로 연계한 통합 물관리 정책을 수립하는데 기초자료로 활용될 수 있을 것이라 사료된다.

  • PDF

Visual Expression Effect by Digitization of Embroidery Design (자수 디자인의 디지털화에 의한 시각적 표현효과)

  • Kyung Ja Paek
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.407-413
    • /
    • 2023
  • The purpose of this study is to provide basic information about various methods to easily affix unique embroidery effects to clothes due to the current expansion of digital fashion technology. A comparison of design techniques using virtual and real clothing was used to show the visual expression of embroidery designs. Actual embroidery motifs were created using a computer embroidery machine, DTP embroidery motifs were made by utilizing digitalization techniques, and digital motifs were produced. Then patch pocket type T-shirts were produced using each embroidery technique to compare the visual expression effects on clothing. The results of this comparison are as follows: for real clothing color (3.5), texture (4.0), gloss (3.8), and thickness (3.5). It was found that the color and thickness of the embroidery floss was visually sufficiently show the design texture and gloss. In terms of the embroidery design on virtual garments, the resutls of color (3.8), texture (4.3), gloss (3.9), and thickness (3.6) showed a high degree of similarity to the non-virtual results, confirming that digitized embroidery motifs are also a tool that can fully realize unique embroidery effect.

Implementation of reliable dynamic honeypot file creation system for ransomware attack detection (랜섬웨어 공격탐지를 위한 신뢰성 있는 동적 허니팟 파일 생성 시스템 구현)

  • Kyoung Wan Kug;Yeon Seung Ryu;Sam Beom Shin
    • Convergence Security Journal
    • /
    • v.23 no.2
    • /
    • pp.27-36
    • /
    • 2023
  • In recent years, ransomware attacks have become more organized and specialized, with the sophistication of attacks targeting specific individuals or organizations using tactics such as social engineering, spear phishing, and even machine learning, some operating as business models. In order to effectively respond to this, various researches and solutions are being developed and operated to detect and prevent attacks before they cause serious damage. In particular, honeypots can be used to minimize the risk of attack on IT systems and networks, as well as act as an early warning and advanced security monitoring tool, but in cases where ransomware does not have priority access to the decoy file, or bypasses it completely. has a disadvantage that effective ransomware response is limited. In this paper, this honeypot is optimized for the user environment to create a reliable real-time dynamic honeypot file, minimizing the possibility of an attacker bypassing the honeypot, and increasing the detection rate by preventing the attacker from recognizing that it is a honeypot file. To this end, four models, including a basic data collection model for dynamic honeypot generation, were designed (basic data collection model / user-defined model / sample statistical model / experience accumulation model), and their validity was verified.

Analysis and Orange Utilization of Training Data and Basic Artificial Neural Network Development Results of Non-majors (비전공자 학부생의 훈련데이터와 기초 인공신경망 개발 결과 분석 및 Orange 활용)

  • Kyeong Hur
    • Journal of Practical Engineering Education
    • /
    • v.15 no.2
    • /
    • pp.381-388
    • /
    • 2023
  • Through artificial neural network education using spreadsheets, non-major undergraduate students can understand the operation principle of artificial neural networks and develop their own artificial neural network software. Here, training of the operation principle of artificial neural networks starts with the generation of training data and the assignment of correct answer labels. Then, the output value calculated from the firing and activation function of the artificial neuron, the parameters of the input layer, hidden layer, and output layer is learned. Finally, learning the process of calculating the error between the correct label of each initially defined training data and the output value calculated by the artificial neural network, and learning the process of calculating the parameters of the input layer, hidden layer, and output layer that minimize the total sum of squared errors. Training on the operation principles of artificial neural networks using a spreadsheet was conducted for undergraduate non-major students. And image training data and basic artificial neural network development results were collected. In this paper, we analyzed the results of collecting two types of training data and the corresponding artificial neural network SW with small 12-pixel images, and presented methods and execution results of using the collected training data for Orange machine learning model learning and analysis tools.