• 제목/요약/키워드: unstructured data

검색결과 717건 처리시간 0.025초

Structuring of Pulmonary Function Test Paper Using Deep Learning

  • Jo, Sang-Hyun;Kim, Dae-Hoon;Kim, Yoon;Kwon, Sung-Ok;Kim, Woo-Jin;Lee, Sang-Ah
    • Journal of the Korea Society of Computer and Information
    • /
    • 제26권12호
    • /
    • pp.61-67
    • /
    • 2021
  • In this paper, we propose a method of extracting and recognizing related information for research from images of the unstructured pulmonary function test papers using character detection and recognition techniques. Also, we develop a post-processing method to reduce the character recognition error rate. The proposed structuring method uses a character detection model for the pulmonary function test paper images to detect all characters in the test paper and passes the detected character image through the character recognition model to obtain a string. The obtained string is reviewed for validity using string matching and structuring is completed. We confirm that our proposed structuring system is a more efficient and stable method than the structuring method through manual work of professionals because our system's error rate is within about 1% and the processing speed per pulmonary function test paper is within 2 seconds.

A Case Study on the 'Theory of Home Economics Education' Using Online ProblemBased Learning (온라인 문제중심학습을 활용한 '가정교육론' 수업 사례 연구)

  • Choi, Seong-Youn
    • Human Ecology Research
    • /
    • 제60권2호
    • /
    • pp.187-209
    • /
    • 2022
  • The objective of this study was to conduct a 'Theory of Home Economics Education' class using online problem-based learning(PBL) for prospective home economics(HE) teachers. The aim was to enable teachers to analyze the learning experience in the classroom, and to prepare operational strategies for online PBL on this basis. In order to achieve this, online PBL was applied to 31 students participating in the 'Theory of Home Economics Education' at the Department of HE in a university in Seoul, and the results were collected from the learning process. This also involved a reflective journal, a survey on the learning experience and the impacts was conducted. Moreover, analysis was undertaken on the learning activities, learning difficulties, and improvements. The main research results are as follows. Firstly, students accessed Webex, an online video conferencing program, and performed two PBL tasks: 'Making Home Economics Promotion Materials' and 'Presenting Teaching Strategies to Improve Learner's Immersion in Online Classes'. Secondly, learners established their own identity of HE learned about the HE class plans themselves. They also encountered realistic experience as HE teachers and learned communication and collaboration skills. Furthermore, they acquired creative problem-solving and self-directed learning ability, community consciousness, as well as the attitude of consideration and respect. Thirdly, students lacked knowledge of learning content and encountered difficulty in solving data research, analysis processes, and unstructured problems. They were affected by a lack of time and encountered problem in communicating with other team members in an online environment. As an improvement in online class operation, it was considrered necessary to reduce the learning burden by securing time and reducing the number of assignments, as well as to explain active interaction with instructors and PBL.

Fashion attribute-based mixed reality visualization service (패션 속성기반 혼합현실 시각화 서비스)

  • Yoo, Yongmin;Lee, Kyounguk;Kim, Kyungsun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국정보통신학회 2022년도 춘계학술대회
    • /
    • pp.2-5
    • /
    • 2022
  • With the advent of deep learning and the rapid development of ICT (Information and Communication Technology), research using artificial intelligence is being actively conducted in various fields of society such as politics, economy, and culture and so on. Deep learning-based artificial intelligence technology is subdivided into various domains such as natural language processing, image processing, speech processing, and recommendation system. In particular, as the industry is advanced, the need for a recommendation system that analyzes market trends and individual characteristics and recommends them to consumers is increasingly required. In line with these technological developments, this paper extracts and classifies attribute information from structured or unstructured text and image big data through deep learning-based technology development of 'language processing intelligence' and 'image processing intelligence', and We propose an artificial intelligence-based 'customized fashion advisor' service integration system that analyzes trends and new materials, discovers 'market-consumer' insights through consumer taste analysis, and can recommend style, virtual fitting, and design support.

  • PDF

Analysis of online parenting community posts on expanded newborn screening for metabolic disorders using topic modeling: a quantitative content analysis (토픽 모델링을 활용한 광범위 선천성 대사이상 신생아 선별검사 관련 온라인 육아 커뮤니티 게시 글 분석: 계량적 내용분석 연구)

  • Myeong Seon Lee;Hyun-Sook Chung;Jin Sun Kim
    • Women's Health Nursing
    • /
    • 제29권1호
    • /
    • pp.20-31
    • /
    • 2023
  • Purpose: As more newborns have received expanded newborn screening (NBS) for metabolic disorders, the overall number of false-positive results has increased. The purpose of this study was to explore the psychological impacts experienced by mothers related to the NBS process. Methods: An online parenting community in Korea was selected, and questions regarding NBS were collected using web crawling for the period from October 2018 to August 2021. In total, 634 posts were analyzed. The collected unstructured text data were preprocessed, and keyword analysis, topic modeling, and visualization were performed. Results: Of 1,057 words extracted from posts, the top keyword based on 'term frequency-inverse document frequency' values was "hypothyroidism," followed by "discharge," "close examination," "thyroid-stimulating hormone levels," and "jaundice." The top keyword based on the simple frequency of appearance was "XXX hospital," followed by "close examination," "discharge," "breastfeeding," "hypothyroidism," and "professor." As a result of LDA topic modeling, posts related to inborn errors of metabolism (IEMs) were classified into four main themes: "confirmatory tests of IEMs," "mother and newborn with thyroid function problems," "retests of IEMs," and "feeding related to IEMs." Mothers experienced substantial frustration, stress, and anxiety when they received positive NBS results. Conclusion: The online parenting community played an important role in acquiring and sharing information, as well as psychological support related to NBS in newborn mothers. Nurses can use this study's findings to develop timely and evidence-based information for parents whose children receive positive NBS results to reduce the negative psychological impact.

Chinese-clinical-record Named Entity Recognition using IDCNN-BiLSTM-Highway Network

  • Tinglong Tang;Yunqiao Guo;Qixin Li;Mate Zhou;Wei Huang;Yirong Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1759-1772
    • /
    • 2023
  • Chinese named entity recognition (NER) is a challenging work that seeks to find, recognize and classify various types of information elements in unstructured text. Due to the Chinese text has no natural boundary like the spaces in the English text, Chinese named entity identification is much more difficult. At present, most deep learning based NER models are developed using a bidirectional long short-term memory network (BiLSTM), yet the performance still has some space to improve. To further improve their performance in Chinese NER tasks, we propose a new NER model, IDCNN-BiLSTM-Highway, which is a combination of the BiLSTM, the iterated dilated convolutional neural network (IDCNN) and the highway network. In our model, IDCNN is used to achieve multiscale context aggregation from a long sequence of words. Highway network is used to effectively connect different layers of networks, allowing information to pass through network layers smoothly without attenuation. Finally, the global optimum tag result is obtained by introducing conditional random field (CRF). The experimental results show that compared with other popular deep learning-based NER models, our model shows superior performance on two Chinese NER data sets: Resume and Yidu-S4k, The F1-scores are 94.98 and 77.59, respectively.

Development of a Standardized Clinical Practice Education Program in Occupational Therapy Student (작업치료 대학생의 임상실습 교육 프로그램 개발)

  • Lee, Min-Jae;Lee, Sun-Min
    • Journal of The Korean Society of Integrative Medicine
    • /
    • 제10권1호
    • /
    • pp.27-38
    • /
    • 2022
  • Purpose : This study is aimed to develop and validate the clinical practice education program and clinical competence scale of occupational therapy student. Methods : The development of the clinical practice education program used the delphi technique method, which had a total of five steps. Based on the occupational therapist's job analysis, the first stage assessed the importance of 21 experts, and the second stage examined the importance of 19 new specialists to derive constitutive factors. In the third stage, in-depth interviews were conducted with three experts based on the derived factors, and in the fourth stage, the final clinical practice education program was derived. In the final stage, the details of the clinical training program were drawn up based on the themes and were reviewed by two experts. Structured and unstructured interviews were conducted with 43 job experts. Results : The expert survey through the delphi technique was conducted three times, and content analysis and descriptive statistics were conducted to examine the distribution of responses. The final 11 educational program topics and contents were derived. Topics are confirmation of client information, evaluation and intervention, cognitive therapy, spinal cord injury, brain injury, musculoskeletal disorders, pediatric occupational therapy, interventions in activities of daily living, driving rehabilitation, vocational rehabilitation, occupational therapy assessment tool, safety training and management. Conclusion : The clinical practice education program reduce the difference between school education and clinical education of occupational therapy student. Occupational therapy helps college student understand occupational therapy practices and improve the quality of clinical education. Through more research and supplementation of clinical practice education programs in the future, it is suggested that clinical practice education be successfully operated in various practice institutions and used as basic data for designing and evaluating useful educational models.

Automated Prioritization of Construction Project Requirements using Machine Learning and Fuzzy Logic System

  • Hassan, Fahad ul;Le, Tuyen;Le, Chau;Shrestha, K. Joseph
    • International conference on construction engineering and project management
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.304-311
    • /
    • 2022
  • Construction inspection is a crucial stage that ensures that all contractual requirements of a construction project are verified. The construction inspection capabilities among state highway agencies have been greatly affected due to budget reduction. As a result, efficient inspection practices such as risk-based inspection are required to optimize the use of limited resources without compromising inspection quality. Automated prioritization of textual requirements according to their criticality would be extremely helpful since contractual requirements are typically presented in an unstructured natural language in voluminous text documents. The current study introduces a novel model for predicting the risk level of requirements using machine learning (ML) algorithms. The ML algorithms tested in this study included naïve Bayes, support vector machines, logistic regression, and random forest. The training data includes sequences of requirement texts which were labeled with risk levels (such as very low, low, medium, high, very high) using the fuzzy logic systems. The fuzzy model treats the three risk factors (severity, probability, detectability) as fuzzy input variables, and implements the fuzzy inference rules to determine the labels of requirements. The performance of the model was examined on labeled dataset created by fuzzy inference rules and three different membership functions. The developed requirement risk prediction model yielded a precision, recall, and f-score of 78.18%, 77.75%, and 75.82%, respectively. The proposed model is expected to provide construction inspectors with a means for the automated prioritization of voluminous requirements by their importance, thus help to maximize the effectiveness of inspection activities under resource constraints.

  • PDF

Analysis of Trends in Patients with Work-related Musculoskeletal Disorders and Literature Review of Risk Factors and Prevalence (작업관련 근골격계질환의 요양재해 추이 분석 및 위험요인과 유병률에 관한 고찰)

  • Nam-Soo Kim;Yong-Bae Kim
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • 제33권3호
    • /
    • pp.298-307
    • /
    • 2023
  • Objectives: The purpose of this study is to analyze the recent trends in patients with work-related musculoskeletal disorders in South Korea and to check the major results by reviewing the literature on the risk factors and prevalence of musculoskeletal diseases related to work. Methods: Industrial disaster data from the Ministry of Employment and Labor from 2012 to 2021 were used, and the literature was reviewed regarding risk factors for musculoskeletal diseases related to work using PubMed and RISS. Results: The trend of patients with work-related musculoskeletal disorders has increased overall since 2017 after declining until 2016, with a particularly notable increase in the average annual number of patients with physical burden work. The average annual rate per ten thousand people for patients with body burden work, non-accidental lower back pain, and carpal tunnel syndrome among work-related diseases was high in the mining industry. The average annual rate per ten thousand people for patients with accidental lower back pain was the highest in the fishing industry. Within the manufacturing field, it was the highest in the shipbuilding and ship repair industry. As a result of the literature review, the search rate for work-related musculoskeletal disease papers in unstructured work was high. In addition, physical stress factors were high among risk factors, and pain areas showed a high rate for the waist. Conclusion: Even after the institutional implementation of a hazard investigation system related to musculoskeletal diseases is implemented, the number of patients with occupational musculoskeletal disorders continues to increase. Therefore, it is necessary to conduct regular surveys and implement effective improvement activities for vulnerable industries or occupations.

Analyzing Issues on Environment-Friendly Agriculture Using Topic Modeling and Network Analysis (토픽모델링과 네트워크분석을 활용한 친환경농업 이슈분석에 관한 연구)

  • Shin, Ye-Eun;Shin, Eun-Seo;Kim, Sang-Bum;Choi, Jin-Ah;Kim, Myunghyun;Han, Seokjun;An, Kyungjin
    • Journal of Korean Society of Rural Planning
    • /
    • 제29권4호
    • /
    • pp.35-53
    • /
    • 2023
  • This study attempts to identify the flow of key topics and issues of research trends related to environment-friendly agriculture conducted around the 2000s in South Korea and compare them with the environment-friendly agriculture promotion plan to seek the level of consistency and the direction of future development of environment-friendly agriculture. For the analysis of environment-friendly agriculture research trends and policy consistency, 'topic modeling', which is suitable for subject classification of large amounts of unstructured data, and 'text network analysis', which visualizes the relationship between keywords as a network and interprets its characteristics, were utilized. Overall, active discussions were held on 'technical discussions for the production and cultivation of environment-friendly agricultural products' and 'food safety & consumer awareness', and keywords such as production, cultivation, consumption, and safety were consistently linked to other keywords regardless of time. In addition, it was found that the issue of environment-friendly agriculture was partially consistent with the policy direction of the period. Considering the fact that the ongoing '5th Environment-Friendly Agriculture Promotion Phase' emphasizes the strengthening of rural environment management and aims to ensure the continuous quantitative and qualitative development of environment-friendly agriculture, active discussions and research on its environmental contributions and management methods are needed.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • 제24권3호
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.