• Title/Summary/Keyword: stemming

Search Result 263, Processing Time 0.029 seconds

Embedding with different levels for idiom disambiguation (관용표현 중의성 해소를 위한 다층위 임베딩 연구)

  • Park, Seo-Yoon;Kang, Ye-Jee;Kang, Hye-Rin;Jang, Yeon-Ji;Kim, Han-Saem
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.167-172
    • /
    • 2021
  • 관용표현 중에는 중의성을 가진 표현이 많다. 즉 하나의 표현이 맥락에 따라 일반적 의미와 관용적 의미 두 가지 이상으로 해석될 가능성이 있어 이런 유형의 관용표현을 중의성 해소 없이 자연어 처리 태스크에 적용할 경우 문제가 발생하게 된다. 본 연구에서는 관용표현의 특성인 중의성과 더불어 '관용표현은 이미 사용자의 머릿속에 하나의 토큰으로 저장되어 있다'라는 'Idiom Principle'을 바탕으로 관용표현에 대해 각각 표면형, 단순 단일 토큰형, stemming 단일 토큰형 층위의 임베딩을 만들어 관용표현 분류 연구를 진행하였으며, 실험 결과 표면형 및 stemming을 적용하지 않은 단순 단일 토큰으로 학습하는 것보다, stemming을 적용한 후 단일 토큰으로 학습하는 것이 관용표현의 중의성 해소에 유의미한 효과가 있음을 확인하였다.

  • PDF

Effects of Preprocessing on Text Classification in Balanced and Imbalanced Datasets

  • Mehmet F. Karaca
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.591-609
    • /
    • 2024
  • In this study, preprocessings with all combinations were examined in terms of the effects on decreasing word number, shortening the duration of the process and the classification success in balanced and imbalanced datasets which were unbalanced in different ratios. The decreases in the word number and the processing time provided by preprocessings were interrelated. It was seen that more successful classifications were made with Turkish datasets and English datasets were affected more from the situation of whether the dataset is balanced or not. It was found out that the incorrect classifications, which are in the classes having few documents in highly imbalanced datasets, were made by assigning to the class close to the related class in terms of topic in Turkish datasets and to the class which have many documents in English datasets. In terms of average scores, the highest classification was obtained in Turkish datasets as follows: with not applying lowercase, applying stemming and removing stop words, and in English datasets as follows: with applying lowercase and stemming, removing stop words. Applying stemming was the most important preprocessing method which increases the success in Turkish datasets, whereas removing stop words in English datasets. The maximum scores revealed that feature selection, feature size and classifier are more effective than preprocessing in classification success. It was concluded that preprocessing is necessary for text classification because it shortens the processing time and can achieve high classification success, a preprocessing method does not have the same effect in all languages, and different preprocessing methods are more successful for different languages.

An Experimental and Numerical Study on the Stemming Effect of a Polymer Gel in Explosive Blasting (화약발파에서 폴리머 겔의 전색효과에 관한 실험적 및 수치해석적 연구)

  • Baluch, Khaqan;Kim, Jung-Gyu;Ko, Young-Hun;Kim, Seung-Jun;Jung, Seung-Won;Yang, Hyung-Sik;Kim, Youg-Kye;Kim, Jong-Gwan
    • Explosives and Blasting
    • /
    • v.36 no.4
    • /
    • pp.35-47
    • /
    • 2018
  • In this study, several concrete-block blast tests and AUTODYN numerical analyses were conducted to analyze the effects of different stemming and coupling materials on explosion results. Air, sand, and polymer gel were used as both the stemming and coupling materials. The stemming and coupling effects of these materials were compared with those of the full-charge condition. Soil-covered or buried concrete blocks were used for field crater tests. It was found from the concrete block tests and numerical analyses that both the crater size and the peak pressure around the blast hole were higher when the polymer gel was used than when the sand and the decoupling condition were used. The numerical analyses revealed the same trend as those of the field tests. Pressure peaks in concrete block models were calculated to be 37, 30, and 16 MPa, respectively, for the cases of the polymer gel, sand, and no stemming and decoupling condition. The pressure peak was 52 MPa in the case of full-charge condition, which was the highest pressure. But the damage area for the case was smaller than that obtained from the use of polymer gel. Full-charge was also used as a reference test.

Comparative Study of Various Persian Stemmers in the Field of Information Retrieval

  • Moghadam, Fatemeh Momenipour;Keyvanpour, MohammadReza
    • Journal of Information Processing Systems
    • /
    • v.11 no.3
    • /
    • pp.450-464
    • /
    • 2015
  • In linguistics, stemming is the operation of reducing words to their more general form, which is called the 'stem'. Stemming is an important step in information retrieval systems, natural language processing, and text mining. Information retrieval systems are evaluated by metrics like precision and recall and the fundamental superiority of an information retrieval system over another one is measured by them. Stemmers decrease the indexed file, increase the speed of information retrieval systems, and improve the performance of these systems by boosting precision and recall. There are few Persian stemmers and most of them work based on morphological rules. In this paper we carefully study Persian stemmers, which are classified into three main classes: structural stemmers, lookup table stemmers, and statistical stemmers. We describe the algorithms of each class carefully and present the weaknesses and strengths of each Persian stemmer. We also propose some metrics to compare and evaluate each stemmer by them.

Information Retrieval Systems: Between Morphological Analyzers and Systemming Algorithms

  • Mohamed, Afaf Abdel Rhman;Ouni, Chafika;Eljack, Sarah Mustafa;Alfayez, Fayez
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.375-381
    • /
    • 2022
  • The main objective of an Information Retrieval System (IRS) is to obtain suitable information within a reasonable time to satisfy a user need. To achieve this purpose, an IRS should have a good indexing system that is based on natural language processing.In this context, we focus on the available Arabic language processing techniques for an IRS with the goal of contributing to an improvement in the performance. Our contribution consists of integrating morphological analysis into an IRS in order to compare the impact of morphological analysis with that of stemming algorithms.

Estimation of Future Death Burden of High Temperatures from Climate Change (기후변화로 인한 고온의 미래 사망부담 추정)

  • Yang, Jihoon;Ha, Jongsik
    • Journal of Environmental Health Sciences
    • /
    • v.39 no.1
    • /
    • pp.19-31
    • /
    • 2013
  • Objectives: Elevated temperatures during summer months have been reported since the early 20th century to be associated with increased daily mortality. However, future death impacts of high temperatures resulting from climate change could be variously estimated in consideration of the future changes in historical temperature-mortality relationships, mortality, and population. This study examined the future death burden of high temperatures resulting from climate change in Seoul over the period of 2001-2040. Methods: We calculated yearly death burden attributable to high temperatures stemming from climate change in Seoul from 2001-2040. These future death burdens from high temperature were computed by multiplying relative risk, temperature, mortality, and population at any future point. To incorporate adaptation, we assumed future changes in temperature-mortality relationships (i.e. threshold temperatures and slopes), which were estimated as short-term temperature effects using a Poisson regression model. Results: The results show that climate change will lead to a substantial increase in summer high temperature-related death burden in the future, even considering adaptation by the population group. The yearly death burden attributable to elevated temperatures ranged from approximately 0.7 deaths per 100,000 people in 2001-2010 to about 1.5 deaths per 100,000 people in Seoul in 2036-2040. Conclusions: This study suggests that adaptation strategies and communication regarding future health risks stemming from climate change are necessary for the public and for the political leadership of South Korea.

Stability of intervalwise receding horizon control for linear tie-varying systems

  • Ki, Ki-Baek;Kwon, Wook-Hyun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.430-433
    • /
    • 1997
  • In this paper, an intervalwise receding horizon control (IRHC) is proposed which stabilizes linear continuous and discrete time-varying systems each other by means of a feedback control stemming from a receding horizon concept and a minimum quadratic cost. The results parallel those obtained for continuous [4],[9] and discrete time varying system [5],[15] each other.

  • PDF

The Pollution Potential of Animal Production Systems : Origin and Atmospheric Cycling of Their Pollutants (축산환경의 오염 잠재력 : 축산오염 물질의 발생과 대기환경계 순환)

  • 김기현;김동균;윤종만
    • Journal of Animal Environmental Science
    • /
    • v.1 no.2
    • /
    • pp.155-164
    • /
    • 1995
  • Despite considerable progresses made in our understanding of environmental fate of pollutants stemming from animal production systems, relatively little is known about the processes and mechanisms regulating their dispersement (via emission) into and deposition from the earth's atmospheric system. Here we present and summarize up-to-date knowledge on this topic with a main emphasis on their origin, physico-chemical characteristics, and geochemical distribution behavior.

  • PDF

A Comparative Study on Requirements Analysis Techniques using Natural Language Processing and Machine Learning

  • Cho, Byung-Sun;Lee, Seok-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.27-37
    • /
    • 2020
  • In this paper, we propose the methodology based on data-driven approach using Natural Language Processing and Machine Learning for classifying requirements into functional requirements and non-functional requirements. Through the analysis of the results of the requirements classification, we have learned that the trained models derived from requirements classification with data-preprocessing and classification algorithm based on the characteristics and information of existing requirements that used term weights based on TF and IDF outperformed the results that used stemming and stop words to classify the requirements into functional and non-functional requirements. This observation also shows that the term weight calculated without removal of the stemming and stop words influenced the results positively. Furthermore, we investigate an optimized method for the study of classifying software requirements into functional and non-functional requirements.