• Title/Summary/Keyword: text base

Search Result 211, Processing Time 0.021 seconds

A Rule-Based Analysis from Raw Korean Text to Morphologically Annotated Corpora

  • Lee, Ki-Yong;Markus Schulze
    • Language and Information
    • /
    • v.6 no.2
    • /
    • pp.105-128
    • /
    • 2002
  • Morphologically annotated corpora are the basis for many tasks of computational linguistics. Most current approaches use statistically driven methods of morphological analysis, that provide just POS-tags. While this is sufficient for some applications, a rule-based full morphological analysis also yielding lemmatization and segmentation is needed for many others. This work thus aims at 〔1〕 introducing a rule-based Korean morphological analyzer called Kormoran based on the principle of linearity that prohibits any combination of left-to-right or right-to-left analysis or backtracking and then at 〔2〕 showing how it on be used as a POS-tagger by adopting an ordinary technique of preprocessing and also by filtering out irrelevant morpho-syntactic information in analyzed feature structures. It is shown that, besides providing a basis for subsequent syntactic or semantic processing, full morphological analyzers like Kormoran have the greater power of resolving ambiguities than simple POS-taggers. The focus of our present analysis is on Korean text.

  • PDF

Analysis of Research Trends in Mathematics Education regarding the Educational Environment based on Digital Technology (디지털 기술 교육 환경 기반 수학교육에 대한 국내 선행 연구의 경향성 분석 연구)

  • Ko, Ho Kyoung;Maeng, Unkyoung;Son, Bok Eun
    • East Asian mathematical journal
    • /
    • v.39 no.4
    • /
    • pp.437-454
    • /
    • 2023
  • The core of the change in the era of the 4th industrial revolution is the change in the base of 'digital technology'. These changes are incomparably large and are expected to have a more important impact on our lives than ever before. One of the major inflection points in the transition to the digital era is the education field, and IT technology has become an essential element in the educational field. Accordingly, this study examines domestic research trends related to the educational environment based on digital technology. Then, we would like to provide implications for the establishment of a digital-based educational environment that will change in the future. To this end, Semantic network analysis has been conducted to quantitatively structure text data obtained from studies related to digital technology in the field of mathematics education over the past 10 years, and the discussion will continue based on the results.

A Text Mining-based Intrusion Log Recommendation in Digital Forensics (디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천)

  • Ko, Sujeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.279-290
    • /
    • 2013
  • In digital forensics log files have been stored as a form of large data for the purpose of tracing users' past behaviors. It is difficult for investigators to manually analysis the large log data without clues. In this paper, we propose a text mining technique for extracting intrusion logs from a large log set to recommend reliable evidences to investigators. In the training stage, the proposed method extracts intrusion association words from a training log set by using Apriori algorithm after preprocessing and the probability of intrusion for association words are computed by combining support and confidence. Robinson's method of computing confidences for filtering spam mails is applied to extracting intrusion logs in the proposed method. As the results, the association word knowledge base is constructed by including the weights of the probability of intrusion for association words to improve the accuracy. In the test stage, the probability of intrusion logs and the probability of normal logs in a test log set are computed by Fisher's inverse chi-square classification algorithm based on the association word knowledge base respectively and intrusion logs are extracted from combining the results. Then, the intrusion logs are recommended to investigators. The proposed method uses a training method of clearly analyzing the meaning of data from an unstructured large log data. As the results, it complements the problem of reduction in accuracy caused by data ambiguity. In addition, the proposed method recommends intrusion logs by using Fisher's inverse chi-square classification algorithm. So, it reduces the rate of false positive(FP) and decreases in laborious effort to extract evidences manually.

Estimate Customer Churn Rate with the Review-Feedback Process: Empirical Study with Text Mining, Econometrics, and Quai-Experiment Methodologies (리뷰-피드백 프로세스를 통한 고객 이탈률 추정: 텍스트 마이닝, 계량경제학, 준실험설계 방법론을 활용한 실증적 연구)

  • Choi Kim;Jaemin Kim;Gahyung Jeong;Jaehong Park
    • Information Systems Review
    • /
    • v.23 no.3
    • /
    • pp.159-176
    • /
    • 2021
  • Obviating user churn is a prominent strategy to capitalize on online games, eluding the initial investments required for the development of another. Extant literature has examined factors that may induce user churn, mainly from perspectives of motives to play and game as a virtual society. However, such works largely dismiss the service aspects of online games. Dissatisfaction of user needs constitutes a crucial aspect for user churn, especially with online services where users expect a continuous improvement in service quality via software updates. Hence, we examine the relationship between a game's quality management and its user base. With text mining and survival analysis, we identify complaint factors that act as key predictors of user churn. Additionally, we find that enjoyment-related factors are greater threats to user base than usability-related ones. Furthermore, subsequent quasi-experiment shows that improvements in the complaint factors (i.e., via game patches) curb churn and foster user retention. Our results shed light on the responsive role of developers in retaining the user base of online games. Moreover, we provide practical insights for game operators, i.e., to identify and prioritize more perilous complaint factors in planning successive game patches.

A Study on Lee Hae-Rang's Realism and Direction Standpoint - Focusing on The Performance Direction of Text "Hamlet" - (이해랑의 리얼리즘과 연출 관점에 대한 소고 - 텍스트 "햄릿" 공연 연출을 중심으로 -)

  • Ahn, Jang whan
    • (The) Research of the performance art and culture
    • /
    • no.22
    • /
    • pp.327-370
    • /
    • 2011
  • Shakespeare's text "Hamlet" was first introduced in Korea in the first part of 1920s by Hyeon Cheol via 『Gaebyeok』. Its performance of whole acts was realized in Kinema Theater in Daegu by the direction of Lee Hae-Rang (translated by Han Lo-Dan) in September, 1951, during the Korean War. Since then, a variety of performances were carried out by numberless performing artists and performing groups in the 1960s, 1970s, 1980s and 1990s. The purpose of this study was, among numberless performing artists and performances appeared in the history of performance of "Hamlet", to examine Lee Hae-rang's direction standpoint of "Hamlet", which has been one of the mainstays since the 1950s. For this, among many performances directed by Lee Hae-rang, the investigator referred to the performing scripts and performance criticisms for the opening performance of Drama Center in 1962 and the performances in HOAM Art Hall in 1985 and 1989, focusing on the text "Hamlet" performance in 1951. In the second chapter, the concept, standpoint and background of realism, the base of his theatrical activities in his lifetime, were examined. In the third chapter, before analyzing his direction standpoint for text "Hamlet", the traditional and modern concept of text was summarized and a variety of standpoints and viewpoints for the text were analyzed. And based on the above summary and analysis, his direction standpoint was analyzed and examined, thus presenting a clue for the discussion on the position of Shakespeare's text "Hamlet" directed by Lee Hae-rang in the Korean history of performance and its performance aesthetics.

Text-Dependent Speaker Recognition Using DTW and State-Dependent Parameter Weighting Method of HMM (DTW 와 HMM의 상태별 파라미터 가중 기법을 이용한 문맥 종속형 화자인식)

  • 이철희;정성환;김종교
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.77-80
    • /
    • 2000
  • In this paper, the speaker-recognition process based on both DTW and discrete HMM was performed using the method to evaluate state-dependent parameter weighting from training data so as the personal audio-characteristics are to be well reflected. In the suggested method below, we found the optimal state sequence using the Viterbi algorithm. The optimal path could be evaluated after comparing the sequence of base pattern which already have, with that of the other patterns. After that the frame of which the pattern was matched with the base pattern in the same state are to be found so that the reference pattern can be gained by weighting on the numbers of matched frames.

  • PDF

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.81-86
    • /
    • 2016
  • Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.

Development of Welding Information System for Power and Industrial Plant (발전 및 산업 설비 지원 용접 기술 정보 시스템 개발)

  • 박주용;홍성호
    • Journal of Welding and Joining
    • /
    • v.17 no.3
    • /
    • pp.44-49
    • /
    • 1999
  • Power and industrial plant use various welding processes and many kinds of materials. Thus, it is a difficult task to get the proper welding information. In this research, a welding information system was developed to solve the difficulty. It consists of database system, knowledge base system and diagram analysis programs. Database system contains a large database and various searching method corresponding to the kind of information. A large part of welding information is managed by this database system. Knowledge based system is used for decision of proper welding process and analysis of weld defects. It has conversion program from text to knowledge, and inference mechanism. Finally, Diagram analysis programs carry out the calculation of ferrite content in the weld metal. By the calculation, a crack occurrence can be avoided. The developed system can be a useful tool for welding in the field of power and industrial plant.

  • PDF

A Study of Construction of Character Image Data for Recognition Handwritten Text (필기체 문자 인식을 위한 문자 영상 데이터 구축에 관한 연구)

  • Lee, H.R.;Ko, K.C.;Lee, M.R.
    • Annual Conference on Human and Language Technology
    • /
    • 2000.10d
    • /
    • pp.63-67
    • /
    • 2000
  • In order to develop a character recognition system, it is an essential preceding work that gathers an image data of the standard. On this purpose a data of the digitized images of a handwritten characters was collected. The types of a gathered image data are Korean character, Chiness character, Numeral, English character, Special character, and so on. This paper deals with a handwritten character image data base, and the image data base different from the general storage structure of a lame capacity multimedia was designed and builded.

  • PDF

A Structural Analysis of Dictionary Text for the Construction of Lexical Data Base (어휘정보구축을 위한 사전텍스트의 구조분석 및 변환)

  • 최병진
    • Language and Information
    • /
    • v.6 no.2
    • /
    • pp.33-55
    • /
    • 2002
  • This research aims at transforming the definition tort of an English-English-Korean Dictionary (EEKD) which is encoded in EST files for the purpose of publishing into a structured format for Lexical Data Base (LDB). The construction of LDB is very time-consuming and expensive work. In order to save time and efforts in building new lexical information, the present study tries to extract useful linguistic information from an existing printed dictionary. In this paper, the process of extraction and structuring of lexical information from a printed dictionary (EEKD) as a lexical resource is described. The extracted information is represented in XML format, which can be transformed into another representation for different application requirements.

  • PDF