• Title/Summary/Keyword: Lexical similarity

Search Result 39, Processing Time 0.021 seconds

An Example-Based Engligh Learing Environment for Writing

  • Miyoshi, Yasuo;Ochi, Youji;Okamoto, Ryo;Yano, Yoneo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.292-297
    • /
    • 2001
  • In writing learning as a second/foreign language, a learner has to acquire not only lexical and syntactical knowledge but also the skills to choose suitable words for content which s/he is interested in. A learning system should extrapolate learner\\`s intention and give example phrases that concern with the content in order to support this on the system. However, a learner cannot always represent a content of his/her desired phrase as inputs to the system. Therefore, the system should be equipped with a diagnosis function for learner\\`s intention. Additionally, a system also should be equipped with an analysis function to score similarity between learner\\`s intention and phrases which is stored in the system on both syntactic and idiomatic level in order to present appropriate example phrases to a learner. In this paper, we propose architecture of an interactive support method for English writing learning which is based an analogical search technique of sample phrases from corpora. Our system can show a candidate of variation/next phrases to write and an analogous sentence that a learner wants to represents from corpora.

  • PDF

A Natural Language Question Answering System-an Application for e-learning

  • Gupta, Akash;Rajaraman, Prof. V.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.285-291
    • /
    • 2001
  • This paper describes a natural language question answering system that can be used by students in getting as solution to their queries. Unlike AI question answering system that focus on the generation of new answers, the present system retrieves existing ones from question-answer files. Unlike information retrieval approaches that rely on a purely lexical metric of similarity between query and document, it uses a semantic knowledge base (WordNet) to improve its ability to match question. Paper describes the design and the current implementation of the system as an intelligent tutoring system. Main drawback of the existing tutoring systems is that the computer poses a question to the students and guides them in reaching the solution to the problem. In the present approach, a student asks any question related to the topic and gets a suitable reply. Based on his query, he can either get a direct answer to his question or a set of questions (to a maximum of 3 or 4) which bear the greatest resemblance to the user input. We further analyze-application fields for such kind of a system and discuss the scope for future research in this area.

  • PDF

A Machine Learning based Method for Measuring Inter-utterance Similarity for Example-based Chatbot (예제 기반 챗봇을 위한 기계 학습 기반의 발화 간 유사도 측정 방법)

  • Yang, Min-Chul;Lee, Yeon-Su;Rim, Hae-Chang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.8
    • /
    • pp.3021-3027
    • /
    • 2010
  • Example-based chatBot generates a response to user's utterance by searching the most similar utterance in a collection of dialogue examples. Though finding an appropriate example is very important as it is closely related to a response quality, few studies have reported regarding what features should be considered and how to use the features for similar utterance searching. In this paper, we propose a machine learning framework which uses various linguistic features. Experimental results show that simultaneously using both semantic features and lexical features significantly improves the performance, compared to conventional approaches, in terms of 1) the utilization of example database, 2) precision of example matching, and 3) the quality of responses.

Error Correction Methode Improve System using Out-of Vocabulary Rejection (미등록어 거절을 이용한 오류 보정 방법 개선 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.10 no.8
    • /
    • pp.173-178
    • /
    • 2012
  • In the generated model for the recognition vocabulary, tri-phones which is not make preparations are produced. Therefore this model does not generate an initial estimate of parameter words, and the system can not configure the model appear as disadvantages. As a result, the sophistication of the Gaussian model is fall will degrade recognition. In this system, we propose the error correction system using out-of vocabulary rejection algorithm. When the systems are creating a vocabulary recognition model, recognition rates are improved to refuse the vocabulary which is not registered. In addition, this system is seized the lexical analysis and meaning using probability distributions, and this system deactivates the string before phoneme change was applied. System analysis determine the rate of error correction using phoneme similarity rate and reliability, system performance comparison as a result of error correction rate improve represent 2.8% by method using error patterns, fault patterns, meaning patterns.

An analysis on streetscape using the Model of Emotion Evaluation (가로경관에 대한 감성평가모형 적용 분석 연구)

  • Lee, Jin-Sook;Kim, Ji-Hye
    • Science of Emotion and Sensibility
    • /
    • v.16 no.2
    • /
    • pp.149-156
    • /
    • 2013
  • In this study, the Model of Emotion Evaluation, an emotional analysis actively applied in environmental assessment, was divided into two parts, the abbreviated model and the inferential model, through pilot study and experiment. In addition, an analysis was conducted through the experiment on the attributes of the evaluation vocabularies of two additional types of representative models, the EPA Model and PAD Model, and the results show a huge difference in the development approach and lexical constitution of the two models. It was also identified through factor analysis that the vocabularies were abbreviated according to the respective models. Similarity relationships were analyzed using multidimensional scaling and the results show that mutual relationship was established to some degree. Based on this, we can conclude that, rather than a biased use of the Model of Emotion Evaluation in emotion evaluation, a more objective image analysis is possible by analyzing the characteristics of the model before applying it. In this study, the evaluation target was confined only to the environmental assessment of streetscape and continuous research on the Model of Emotion Evaluation that allows for the comparison of evaluation models in various areas is needed.

  • PDF

An XML Tag Indexing Method Using on Lexical Similarity (XML 태그를 분류에 따른 가중치 결정)

  • Jeong, Hye-Jin;Kim, Yong-Sung
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.71-78
    • /
    • 2009
  • For more effective index extraction and index weight determination, studies of extracting indices are carried out by using document content as well as structure. However, most of studies are concentrating in calculating the importance of context rather than that of XML tag. These conventional studies determine its importance from the aspect of common sense rather than verifying that through an objective experiment. This paper, for the automatic indexing by using the tag information of XML document that has taken its place as the standard for web document management, classifies major tags of constructing a paper according to its importance and calculates the term weight extracted from the tag of low weight. By using the weight obtained, this paper proposes a method of calculating the final weight while updating the term weight extracted from the tag of high weight. In order to determine more objective weight, this paper tests the tag that user considers as important and reflects it in calculating the weight by classifying its importance according to the result. Then by comparing with the search performance while using the index weight calculated by applying a method of determining existing tag importance, it verifies effectiveness of the index weight calculated by applying the method proposed in this paper.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Analyzing dependency of Korean subordinate clauses using a composit kernel (복합 커널을 사용한 한국어 종속절의 의존관계 분석)

  • Kim, Sang-Soo;Park, Seong-Bae;Park, Se-Young;Lee, Sang-Jo
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.1
    • /
    • pp.1-15
    • /
    • 2008
  • Analyzing of dependency relation among clauses is one of the most critical parts in parsing Korean sentences because it generates severe ambiguities. To get successful results of analyzing dependency relation, this task has been the target of various machine learning methods including SVM. Especially, kernel methods are usually used to analyze dependency relation and it is reported that they show high performance. This paper proposes an expression and a composit kernel for dependency analysis of Korean clauses. The proposed expression adopts a composite kernel to obtain the similarity among clauses. The composite kernel consists of a parse tree kernel and a liner kernel. A parse tree kernel is used for treating structure information and a liner kernel is applied for using lexical information. the proposed expression is defined as three types. One is a expression of layers in clause, another is relation expression between clause and the other is an expression of inner clause. The experiment is processed by two steps that first is a relation expression between clauses and the second is a expression of inner clauses. The experimental results show that the proposed expression achieves 83.31% of accuracy.

  • PDF

Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization (부분 단어 토큰화 기법을 이용한 뉴스 기사 정치적 편향성 자동 분류 및 어휘 분석)

  • Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.