• Title/Summary/Keyword: Korean NLP

Search Result 207, Processing Time 0.033 seconds

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

KorLexClas 1.5: A Lexical Semantic Network for Korean Numeral Classifiers (한국어 수분류사 어휘의미망 KorLexClas 1.5)

  • Hwang, Soon-Hee;Kwon, Hyuk-Chul;Yoon, Ae-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.1
    • /
    • pp.60-73
    • /
    • 2010
  • This paper aims to describe KorLexClas 1.5 which provides us with a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeral classifier. Differently from KorLex of other POS, of which the structure depends largely on their reference model (Princeton WordNet), KorLexClas 1.0 and its extended version 1.5 adopt a direct building method. They demand a considerable time and expert knowledge to establish the hierarchies of numeral classifiers and the relationships between lexical items. For the efficiency of construction as well as the reliability of KorLexClas 1.5, we use following processes: (1) to use various language resources while their cross-checking for the selection of classifier candidates; (2) to extend the list of numeral classifiers by using a shallow parsing techniques; (3) to set up the hierarchies of the numeral classifiers based on the previous linguistic studies; and (4) to determine LUB(Least Upper Bound) of the numeral classifiers in KorLexNoun 1.5. The last process provides the open list of the co-occurring nouns for KorLexClas 1.5 with the extensibility. KorLexClas 1.5 is expected to be used in a variety of NLP applications, including MT.

Automatic Recognition and Normalization System of Korean Time Expression using the individual time units (시간의 단위별 처리를 이용한 자동화된 한국어 시간 표현 인식 및 정규화 시스템)

  • Seon, Choong-Nyoung;Kang, Sang-Woo;Seo, Jung-Yun
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.447-458
    • /
    • 2010
  • Time expressions are a very important form of information in different types of data. Thus, the recognition of a time expression is an important factor in the field of information extraction. However, most previously designed systems consider only a specific domain, because time expressions do not have a regular form and frequently include different ellipsis phenomena. We present a two-level recognition method consisting of extraction and transformation phases to achieve generality and portability. In the extraction phase, time expressions are extracted by atomic time units for extensibility. Then, in the transformation phase, omitted information is restored using basis time and prior knowledge. Finally, every complete atomic time unit is transformed into a normalized form. The proposed system can be used as a general-purpose system, because it has a language- and domain-independent architecture. In addition, this system performs robustly in noisy data like SMS data, which include various errors. For SMS data, the accuracies of time-expression extraction and time-expression normalization by using the proposed system are 93.8% and 93.2%, respectively. On the basis of these experimental results, we conclude that the proposed system shows high performance in noisy data.

  • PDF

Characterization of Poly(methyl methacrylate)-tin (IV) Chloride Blend by TG-DTG-DTA, IR and Pyrolysis-GC-MS Techniques

  • Arshad, Muhammad;Masud, Khalid;Arif, Muhammad;Rehman, Saeed-Ur;Saeed, Aamer;Zaidi, Jamshed Hussain
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.9
    • /
    • pp.3295-3305
    • /
    • 2011
  • Thermal behavior of poly (methyl methacrylate) was analyzed in the presence of tin (IV) chloride. Five different proportions - polymer to additive - were selected for casting films from common solvent. TG, DTG and DTA were employed to monitor thermal degradation of the systems. IR and py-GC-MS helped identify the decomposition products. The blends start degrading at a temperature lower than that of the neat polymer and higher than that of the pure additive. Complex formation between tin of additive and carbonyl oxygen (pendent groups of MMA units) was noticed in the films soon after the mixing of the components in the blends. The samples were also heated at three different temperatures to determine the composition of residues left after the expulsion of volatiles. The polymer, blends and additive exhibited a one step, two-step and three-step degradation, respectively. $T_0$ is highest for the polymer, lowest for the additive and is either $60^{\circ}C$ or $70^{\circ}C$ for the blends. The amount of residue increases down the series [moving from blend-1 (minimum additive concentration) to blend-5 (maximum additive concentration)]. For blend-1, it is 7% of the original mass whereas it is 16% for blend-5. $T_{max}$ also goes up as the concentration of additive in the blends is elevated. The complexation appears to be the cause of observed stabilization. Some new products of degradation were noted apart from those reported earlier. These included methanol, isobutyric acid, acid chloride, etc. Molecular-level mixing of the constituents and "positioning effect" of the additive may have brought about the formation of new compounds. Routes are proposed for the appearance of these substances. Horizontal burning tests were also conducted on polymer and blends and the results are discussed. Activation energies and reaction orders were calculated. Activation energy is highest for the polymer, i.e., 138.9 Kcal/mol while the range for blends is from 51 to 39 Kcal/mol. Stability zones are highlighted for the blends. The interaction between the blended parts seems to be chemical in nature.

Effect of Dietary Microalgae, Diatom-Dominant, Oil Extracts on Growth, Body Composition and Shell Color of Juvenile Abalone Haliotis discus (배합사료내 규조류 우점인 미세조류 오일 추출물 첨가가 까막전복(Haliotis discus)의 성장, 체조성 및 패각 색채에 미치는 영향)

  • Kim, Hee Sung;Lee, Ki Wook;Jeong, Hae Seung;Kim, June;Yun, Ahyeong;Cho, Sung Hwoan;Lee, Gye-An;Kim, Keun-Yong
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.50 no.6
    • /
    • pp.738-744
    • /
    • 2017
  • Effect of dietary inclusion of microalgae, diatom-dominant, oil extracts (MOE) on growth, body composition and shell color of juvenile abalone Haliotis discus was investigated. One thousand four hundred and seventy juvenile abalone were distributed into 21 plastic rectangular containers. Seven experimental diets were prepared: MOE0, MOE0.01, MOE0.05, MOE0.1, MOE0.5, MOE1 and MOE2 diets containing MOE at the concentrations of 0, 0.01, 0.05, 0.1, 0.5, 1 and 2% at the expense of mixture of squid liver and soybean oils, respectively. The experimental diets were fed to abalone in triplicate once a day with a little leftover for 16 weeks. Weight gain and specific growth rate of abalone fed the MOE1 and MOE2 diets were higher than those of abalone fed the all other diets. The shell length and soft body weight of abalone fed the MOE2 diet were longer and heavier than those of abalone fed the all other diets. Crude protein and ash content of the soft body of abalone were affected by dietary inclusion of MOE. The shell color of abalone fed the all experimental diets was different from that of wild abalone. In conclusion, dietary inclusion of MOE improved growth of abalone, but did not shell color of abalone.

Research Suggestion for Disaster Prediction using Safety Report of Korea Government (안전신문고를 이용한 재난 예측 방법론 제안)

  • Lee, Jun;Shin, Jindong;Cho, Sangmyeong;Lee, Sanghwa
    • Journal of Korean Society of Disaster and Security
    • /
    • v.12 no.4
    • /
    • pp.15-26
    • /
    • 2019
  • Anjunshinmungo (The safety e-report) has been in operation since 2014, and there are about 1 million cumulative reports by June 2019. This study analyzes the contents of more than 1 million safety newspapers reported at the present time of information age to determine how powerful and meaningful the people's voice and interest are. In particular, we are interested in forecasting ability. We wanted to check whether the report of the safety newspaper was related to possible disasters. To this end, the researchers received data reported in the safety newspaper as text and analyzed it by natural language analysis methodology. Based on this, the newspaper articles during the analysis of the safety newspaper were analyzed, and the correlation between the contents of the newspaper and the newspaper was analyzed. As a result, accidents occurred within a few months as the number of reports related to response and confirmation increased, and analyzing the contents of safety reports previously reported on social instability can be used to predict future disasters.

The Big Data Analytics Regarding the Cadastral Resurvey News Articles

  • Joo, Yong-Jin;Kim, Duck-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.32 no.6
    • /
    • pp.651-659
    • /
    • 2014
  • With the popularization of big data environment, big data have been highlighted as a key information strategy to establish national spatial data infrastructure for a scientific land policy and the extension of the creative economy. Especially interesting from our point of view is the cadastral information is a core national information source that forms the basis of spatial information that leads to people's daily life including the production and consumption of information related to real estate. The purpose of our paper is to suggest the scheme of big data analytics with respect to the articles of cadastral resurvey project in order to approach cadastral information in terms of spatial data integration. As specific research method, the TM (Text Mining) package from R was used to read various formats of news reports as texts, and nouns were extracted by using the KoNLP package. That is, we searched the main keywords regarding cadastral resurvey, performing extraction of compound noun and data mining analysis. And visualization of the results was presented. In addition, new reports related to cadastral resurvey between 2012 and 2014 were searched in newspapers, and nouns were extracted from the searched data for the data mining analysis of cadastral information. Furthermore, the approval rating, reliability, and improvement of rules were presented through correlation analyses among the extracted compound nouns. As a result of the correlation analysis among the most frequently used ones of the extracted nouns, five groups of data consisting of 133 keywords were generated. The most frequently appeared words were "cadastral resurvey," "civil complaint," "dispute," "cadastral survey," "lawsuit," "settlement," "mediation," "discrepant land," and "parcel." In Conclusions, the cadastral resurvey performed in some local governments has been proceeding smoothly as positive results. On the other hands, disputes from owner of land have been provoking a stream of complaints from parcel surveying for the cadastral resurvey. Through such keyword analysis, various public opinion and the types of civil complaints related to the cadastral resurvey project can be identified to prevent them through pre-emptive responses for direct call centre on the cadastral surveying, Electronic civil service and customer counseling, and high quality services about cadastral information can be provided. This study, therefore, provides a stepping stones for developing an account of big data analytics which is able to comprehensively examine and visualize a variety of news report and opinions in cadastral resurvey project promotion. Henceforth, this will contribute to establish the foundation for a framework of the information utilization, enabling scientific decision making with speediness and correctness.

Design and Optimization of a Biomass Production System Combined with Wind Power Generation and LED on Marine Environment (LED가 결합된 야간풍력발전 활용을 포함한 해상환경 바이오매스 생산시스템의 최적 설계)

  • Hong, Gi Hoon;Cho, Sunghyun;Kang, Hoon;Park, Jeongpil;Kim, Tae-Ok;Shin, Dongil
    • Journal of the Korean Institute of Gas
    • /
    • v.19 no.2
    • /
    • pp.74-82
    • /
    • 2015
  • Carbon dioxide was designated as one of greenhouse gases that cause global warming. Among various ways to solve the $CO_2$ emission issue, the 3rd-generation biomass (algae) production is considered as a viable method to reduce $CO_2$ in the atmosphere. In this research, we propose a design of an innovative sustainable production system by utilizing the 3rd generation biomass in the environment of floating production storage and offloading (FPSO). Existing biomass production systems depend on the solar energy and they cannot continue producing biomass at night. Electricity produced from offshore wind farms also need an efficient way to store the energy through energy storage system (ESS) or deliver it real-time through power grid, both requiring heavy investment of capital. Thus, we design an offshore grid structure harnessing LED lights to supply the necessary light energy, by using the electricity produced from the wind farm, resulting in the maximized production of biomass and efficient use of wind farm energy. The final design integrates the biomass production system enhanced by LED lights with a wind power generation. The suggested NLP model for the optimal design, implemented in GAMS, would be useful for designing improved offshore biomass production systems combined with the wind farm.

Automatic Generation of Bibliographic Metadata with Reference Information for Academic Journals (학술논문 내에서 참고문헌 정보가 포함된 서지 메타데이터 자동 생성 연구)

  • Jeong, Seonki;Shin, Hyeonho;Ji, Seon-Yeong;Choi, Sungphil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.3
    • /
    • pp.241-264
    • /
    • 2022
  • Bibliographic metadata can help researchers effectively utilize essential publications that they need and grasp academic trends of their own fields. With the manual creation of the metadata costly and time-consuming. it is nontrivial to effectively automatize the metadata construction using rule-based methods due to the immoderate variety of the article forms and styles according to publishers and academic societies. Therefore, this study proposes a two-step extraction process based on rules and deep neural networks for generating bibliographic metadata of scientific articlles to overcome the difficulties above. The extraction target areas in articles were identified by using a deep neural network-based model, and then the details in the areas were analyzed and sub-divided into relevant metadata elements. IThe proposed model also includes a model for generating reference summary information, which is able to separate the end of the text and the starting point of a reference, and to extract individual references by essential rule set, and to identify all the bibliographic items in each reference by a deep neural network. In addition, in order to confirm the possibility of a model that generates the bibliographic information of academic papers without pre- and post-processing, we conducted an in-depth comparative experiment with various settings and configurations. As a result of the experiment, the method proposed in this paper showed higher performance.

Semantic Search System using Ontology-based Inference (온톨로지기반 추론을 이용한 시맨틱 검색 시스템)

  • Ha Sang-Bum;Park Yong-Tack
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.202-214
    • /
    • 2005
  • The semantic web is the web paradigm that represents not general link of documents but semantics and relation of document. In addition it enables software agents to understand semantics of documents. We propose a semantic search based on inference with ontologies, which has the following characteristics. First, our search engine enables retrieval using explicit ontologies to reason though a search keyword is different from that of documents. Second, although the concept of two ontologies does not match exactly, can be found out similar results from a rule based translator and ontological reasoning. Third, our approach enables search engine to increase accuracy and precision by using explicit ontologies to reason about meanings of documents rather than guessing meanings of documents just by keyword. Fourth, domain ontology enables users to use more detailed queries based on ontology-based automated query generator that has search area and accuracy similar to NLP. Fifth, it enables agents to do automated search not only documents with keyword but also user-preferable information and knowledge from ontologies. It can perform search more accurately than current retrieval systems which use query to databases or keyword matching. We demonstrate our system, which use ontologies and inference based on explicit ontologies, can perform better than keyword matching approach .