• Title/Summary/Keyword: Natural Language Process

Search Result 242, Processing Time 0.026 seconds

Worker Symptom-based Chemical Substance Estimation System Design Using Knowledge Base (지식베이스를 이용한 작업자 증상 기반 화학물질 추정 시스템 설계)

  • Ju, Yongtaek;Lee, Donghoon;Shin, Eunji;Yoo, Sangwoo;Shin, Dongil
    • Journal of the Korean Institute of Gas
    • /
    • v.25 no.3
    • /
    • pp.9-15
    • /
    • 2021
  • In this paper, a study on the construction of a knowledge base based on natural language processing and the design of a chemical substance estimation system for the development of a knowledge service for a real-time sensor information fusion detection system and symptoms of contact with chemical substances in industrial sites. The information on 499 chemical substances contact symptoms from the Wireless Information System for Emergency Responders(WISER) program provided by the National Institutes of Health(NIH) in the United States was used as a reference. AllegroGraph 7.0.1 was used, input triples are Cas No., Synonyms, Symptom, SMILES, InChl, and Formula. As a result of establishing the knowledge base, it was confirmed that 39 symptoms based on ammonia (CAS No: 7664-41-7) were the same as those of the WISER program. Through this, a method of establishing was proposed knowledge base for the symptom extraction process of the chemical substance estimation system.

Development of Supervised Machine Learning based Catalog Entry Classification and Recommendation System (지도학습 머신러닝 기반 카테고리 목록 분류 및 추천 시스템 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.57-65
    • /
    • 2019
  • In the case of Domeggook B2B online shopping malls, it has a market share of over 70% with more than 2 million members and 800,000 items are sold per one day. However, since the same or similar items are stored and registered in different catalog entries, it is difficult for the buyer to search for items, and problems are also encountered in managing B2B large shopping malls. Therefore, in this study, we developed a catalog entry auto classification and recommendation system for products by using semi-supervised machine learning method based on previous huge shopping mall purchase information. Specifically, when the seller enters the item registration information in the form of natural language, KoNLPy morphological analysis process is performed, and the Naïve Bayes classification method is applied to implement a system that automatically recommends the most suitable catalog information for the article. As a result, it was possible to improve both the search speed and total sales of shopping mall by building accuracy in catalog entry efficiently.

Creating Songs Using Note Embedding and Bar Embedding and Quantitatively Evaluating Methods (음표 임베딩과 마디 임베딩을 이용한 곡의 생성 및 정량적 평가 방법)

  • Lee, Young-Bae;Jung, Sung Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.483-490
    • /
    • 2021
  • In order to learn an existing song and create a new song using an artificial neural network, it is necessary to convert the song into numerical data that the neural network can recognize as a preprocessing process, and one-hot encoding has been used until now. In this paper, we proposed a note embedding method using notes as a basic unit and a bar embedding method that uses the bar as the basic unit, and compared the performance with the existing one-hot encoding. The performance comparison was conducted based on quantitative evaluation to determine which method produced a song more similar to the song composed by the composer, and quantitative evaluation methods used in the field of natural language processing were used as the evaluation method. As a result of the evaluation, the song created with bar embedding was the best, followed by note embedding. This is significant in that the note embedding and bar embedding proposed in this paper create a song that is more similar to the song composed by the composer than the existing one-hot encoding.

System for Supporting the Decision about the Possibility of Concluding the Civil Law Agreements for Medical, Therapeutic and Dental Services

  • Hnatchuk, Yelyzaveta;Hovorushchenko, Tetiana;Shteinbrekher, Daria;Kysil, Tetiana
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.155-164
    • /
    • 2022
  • The review of known decisions showed that currently there are no systems and technologies for supporting the decision about the possibility of concluding the civil law agreements for medical, therapeutic and dental services. The paper models the decision-making support process on the possibility of concluding the civil law agreements for medical, therapeutic and dental services, which is the theoretical basis for the development of rules, methods and system for supporting the decision about the possibility of concluding the civil law agreements for medical, therapeutic and dental services. The paper also developed the system for supporting the decision about the possibility of concluding the civil law agreements for medical, therapeutic and dental services, which automatically and free determines the possibility or impossibility of concluding the corresponding civil law agreement for the provision of a corresponding medical service. In the case of formation of a conclusion about the possibility of concluding the agreement, further conclusion and signing of the corresponding agreement takes place. In the case of forming a conclusion about the impossibility of concluding the agreement, a request is made for finalizing the relevant agreement for the provision of the relevant medical service, indicating the reasons for the impossibility of concluding the agreement - missing essential conditions in the agreement. After finalization, the agreement can be analyzed again by the developed system for supporting the decision.

Technology of Decision-Making Support Regarding the Possibility of Donation and Transplantation Considering Civil Law

  • Hnatchuk, Yelyzaveta;Hovorushchenko, Tetiana;Drapak, Georgii;Kysil, Tetiana
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.307-315
    • /
    • 2022
  • The review of known decision-making support systems and technologies regarding the possibility of donation and transplantation showed that currently there are no systems and technologies of decision-making support regarding the possibility of donation and transplantation considering civil law. The paper models the decision-making support process regarding the possibility of donation and transplantation, which is a theoretical basis for the development of rules, methods and technology of decision-making support regarding the possibility of donation and transplantation considering civil law. The paper also developed the technology of decision-making support regarding the possibility of donation and transplantation considering civil law as a component of the Unified State Information System for Organ and Tissue Transplantation, which automatically and free of charge determines the possibility/impossibility of donation and transplantation. In the case of the possibility of donation, the admissible type of donation is also determined - over-life or after-life donation - and data about potential donor is entered in the relevant Donor Register. In the case of the possibility of transplantation, if the recipient needs a transplant of one of the paired organs or a part of the organ/tissue, then data about potential recipient are entered in the Transplantation List from both over-life and after-life donor, otherwise, if the recipient needs a transplant of a non-paired organ or both paired organs, then data about potential recipient are entered only in the Transplantation List from after-life donor.

Identification of User Preference Factor Using Review Information (리뷰 정보를 활용한 이용자의 선호요인 식별에 관한 연구)

  • Song, Sungjeon;Shim, Jiyoung
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.311-336
    • /
    • 2022
  • This study analyzed the contents of Goodreads review data, which is a social cataloging service with the participation of book users around the world, to identify the preference factors that affect book users' book recommendations in the library information service environment. To understand user preferences from a more detailed point of view, sub-datasets for each rating group, each book, and each user were constructed in the sample selection process. Stratified sampling was also performed based on the result of topic modeling of review text data to include various topics. As a result, a total of 90 preference factors belonging to 7 categories('Content', 'Character', 'Writing', 'Reading', 'Author', 'Story', 'Form') were identified. Also, the general preference factors revealed according to the ratings, as well as the patterns of preference factors revealed in books and users with clear likes and dislikes were identified. The results of this study are expected to contribute to more sophisticated recommendations in future recommendation systems by identifying specific aspects of user preference factors.

A Study on the Definition of Data Literacy for Elementary and Secondary Artificial Intelligence Education (초·중등 인공지능 교육을 위한 데이터 리터러시 정의 연구)

  • Kim, SeulKi;Kim, Taeyoung
    • 한국정보교육학회:학술대회논문집
    • /
    • 2021.08a
    • /
    • pp.59-67
    • /
    • 2021
  • The development of AI technology has brought about a big change in our lives. As AI's influence grows from life to society to the economy, the importance of education on AI and data is also growing. In particular, the OECD Education Research Report and various domestic information and curriculum studies address data literacy and present it as an essential competency. Looking at domestic and international studies, one can see that the definition of data literacy differs in its specific content and scope from researchers to researchers. Thus, the definition of major research related to data literacy was analyzed from various angles and derived from various angles. In key studies, Word2vec natural language processing methods, along with word frequency analysis used to define data literacy, are used to analyze semantic similarities and nominate them based on content elements of curriculum research to derive the definition of 'understanding and using data to process information'. Based on the definition of data literacy derived from this study, we hope that the contents will be revised and supplemented, and more research will be conducted to provide a good foundation for educational research that develops students' future capabilities.

  • PDF

A Study on Book Recovery Method Depending on Book Damage Levels Using Book Scan (북스캔을 이용한 도서 손상 단계에 따른 딥 러닝 기반 도서 복구 방법에 관한 연구)

  • Kyungho Seok;Johui Lee;Byeongchan Park;Seok-Yoon Kim;Youngmo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.154-160
    • /
    • 2023
  • Recently, with the activation of eBook services, books are being published simultaneously as physical books and digitized eBooks. Paper books are more expensive than e-books due to printing and distribution costs, so demand for relatively inexpensive e-books is increasing. There are cases where previously published physical books cannot be digitized due to the circumstances of the publisher or author, so there is a movement among individual users to digitize books that have been published for a long time. However, existing research has only studied the advancement of the pre-processing process that can improve text recognition before applying OCR technology, and there are limitations to digitization depending on the condition of the book. Therefore, support for book digitization services depending on the condition of the physical book is needed. need. In this paper, we propose a method to support digitalization services according to the status of physical books held by book owners. Create images by scanning books and extract text information from the images through OCR. We propose a method to recover text that cannot be extracted depending on the state of the book using BERT, a natural language processing deep learning model. As a result, it was confirmed that the recovery method using BERT is superior when compared to RNN, which is widely used in recommendation technology.

  • PDF

Improving the Classification of Population and Housing Census with AI: An Industry and Job Code Study

  • Byung-Il Yun;Dahye Kim;Young-Jin Kim;Medard Edmund Mswahili;Young-Seob Jeong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.21-29
    • /
    • 2023
  • In this paper, we propose an AI-based system for automatically classifying industry and occupation codes in the population census. The accurate classification of industry and occupation codes is crucial for informing policy decisions, allocating resources, and conducting research. However, this task has traditionally been performed by human coders, which is time-consuming, resource-intensive, and prone to errors. Our system represents a significant improvement over the existing rule-based system used by the statistics agency, which relies on user-entered data for code classification. In this paper, we trained and evaluated several models, and developed an ensemble model that achieved an 86.76% match accuracy in industry and 81.84% in occupation, outperforming the best individual model. Additionally, we propose process improvement work based on the classification probability results of the model. Our proposed method utilizes an ensemble model that combines transfer learning techniques with pre-trained models. In this paper, we demonstrate the potential for AI-based systems to improve the accuracy and efficiency of population census data classification. By automating this process with AI, we can achieve more accurate and consistent results while reducing the workload on agency staff.

A Study of Segmental and Syllabic Intervals of Canonical Babbling and Early Speech

  • Chen, Xiaoxiang;Xiao, Yunnan
    • Cross-Cultural Studies
    • /
    • v.28
    • /
    • pp.115-139
    • /
    • 2012
  • Interval or duration of segments, syllables, words and phrases is an important acoustic feature which influences the naturalness of speech. A number of cross-sectional studies regarding acoustic characteristics of children's speech development found that intervals of segments, syllables, words and phrases tend to change with the growing age. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991), it has been supported by quite a number of researches on the basis of cross-sectional studies (Tingley & Allen,1975; Kent & Forner,1980; Chermak & Schneiderman, 1986), but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). Researchers seem to come up with conflicting postulations and inconsistent results about the change trends concerning intervals of segments, syllables, words and phrases, leaving it as an issue unresolved. Most acoustic investigations of children's speech production have been conducted via cross-sectional designs, which involves studying several groups of children. So far, there are only a few longitudinal studies. This issue needs more longitudinal investigations; moreover, the acoustic measures of the intervals of child speech are hardly available. All former studies focus on word stages excluding the babbling stages especially the canonical babbling stage, but we need to find out when concrete changes of intervals begin to occur and what causes the changes. Therefore, we conducted an acoustic study of interval characteristics of segments and words concerning Canonical Babble ( CB) and early speech in an infant aged from 0;9 to 2;4 acquiring Mandarin Chinese. The current research addresses the following two questions: 1. Whether decreases in interval would be greater when children were younger and smaller when they were older or vice versa? 2. Whether the child speech concerning the acoustic features of interval drifts in the direction of the language they are exposed to? The female infant whose L1 was Southern Mandarin living in Changsha was audio- and video-taped at her home for about one hour almost on a weekly basis during her age range from 0;9 to 2;4 under natural observation by us investigators. The recordings were digitized. Parts of the digitized material were labeled. All the repetitions were excluded. The utterances were extracted from 44 sessions ranging from 30 minutes to one hour. The utterances were divided into segments as well as syllable-sized units. Age stages are 0;9-1;0,1;1-1;5, 1;6-2;0, 2;1-2;4. The subject was a monolingual normal child from parents with a good education. The infant was audio-and video-taped in her home almost every week. The data were digitized, segments and syllables from 44 sessions spanning the transition from babble to speech were transcribed in narrow IPA and coded for analysis. Babble was coded from age 0;9-1;0, and words were coded from 1;0 to 2;4, the data has been checked by two professionally trained persons who majored in phonetics. The present investigation is a longitudinal analysis of some temporal characteristics of the child speech during the age periods of 0;9-1;0, 1;1-1;5, 1;6-2;0, 2;1-2;4. The answer to Research Question 1 is that our results are in agreement with neither of the hypotheses. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991); but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). On the whole, there is a tendency of decrease in segmental and syllabic duration with the growing age, but the changes are not drastic and abrupt. For example, /a/ after /k/ in Table 1 has greater decrease during 1;1-1;5, while /a/ after /p/, /t/ and /w/ has greater decrease during 2;1-2;4. /ka/ has greater decrease during 1;1-1;5, while /ta/ and /na/ has greater decrease during 2;1-2;4.Across the age periods, interval change experiences lots of fluctuation all the time. The answer to Research Question 2 is yes. Babbling stage is a period in which the children's acoustic features of intervals of segments, syllables, words and phrases is shifted in the direction of the language to be learned, babbling and children's speech emergence is greatly influenced by ambient language. The phonetic changes in terms of duration would go on until as late as 10-12 years of age before reaching adult-like levels. Definitely, with the increase of exposure to ambient language, the variation would be less and less until they attain the adult-like competence. Via the analysis of the SPSS 15.0, the decrease of segmental and syllabic intervals across the four age periods proves to be of no significant difference (p>0.05). It means that the change of segmental and syllabic intervals is continuous. It reveals that the process of child speech development is gradual and cumulative.