• Title/Summary/Keyword: Text Representation Model

Search Result 48, Processing Time 0.022 seconds

A Multi-level Representation of the Korean Narrative Text Processing and Construction-Integration Theory: Morpho- syntactic and Discourse-Pragmatic Effects of Verb Modality on Topic Continuity (한국어 서사 텍스트 처리의 다중 표상과 구성 통합 이론: 주제어 연속성에 대한 양태 어미의 형태 통사적, 담화 화용적 기능)

  • Cho Sook-Whan;Kim Say-Young
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.2
    • /
    • pp.103-118
    • /
    • 2006
  • The main purpose of this paper is to investigate the effects of discourse topic and morpho-syntactic verbal information on the resolution of null pronouns in the Korean narrative text within the framework of the construction-integration theory (Kintsch, 1988, Singer & Kintsch, 2001, Graesser, Gernsbacher, & Goldman. 2003). For the purpose of this paper, two conditions were designed: an explicit condition with both a consistently maintained discourse topic and the person-specific verb modals on one hand, and a neutral condition with no discourse topic or morpho-syntactic information provided, on the other. We measured the reading tines far the target sentence containing a null pronoun and the question response times for finding an antecedent, and the accuracy rates for finding an antecedent. During the experiments each passage was presented at a tine on a computer-controlled display. Each new sentence was presented on the screen at the moment the participant pressed the button on the computer keyboard. Main findings indicate that processing is facilitated by macro-structure (topicality) in conjunction with micro-structure (morpho-syntax) in pronoun interpretation. It is speculated that global processing alone may not be able to determine which potential antecedent is to be focused unless aided by lexical information. It is argued that the results largely support the resonance-based model, but not the minimalist hypothesis.

  • PDF

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System (멀티모달 패션 추천 대화 시스템을 위한 개선된 트랜스포머 모델)

  • Park, Yeong Joon;Jo, Byeong Cheol;Lee, Kyoung Uk;Kim, Kyung Sun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.138-147
    • /
    • 2022
  • Recently, chatbots have been applied in various fields and have shown good results, and many attempts to use chatbots in shopping mall product recommendation services are being conducted on e-commerce platforms. In this paper, for a conversation system that recommends a fashion that a user wants based on conversation between the user and the system and fashion image information, a transformer model that is currently performing well in various AI fields such as natural language processing, voice recognition, and image recognition. We propose a multimodal-based improved transformer model that is improved to increase the accuracy of recommendation by using dialogue (text) and fashion (image) information together for data preprocessing and data representation. We also propose a method to improve accuracy through data improvement by analyzing the data. The proposed system has a recommendation accuracy score of 0.6563 WKT (Weighted Kendall's tau), which significantly improved the existing system's 0.3372 WKT by 0.3191 WKT or more.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

A Study on the Conceptual Modeling and Implementation of a Semantic Search System (시맨틱 검색 시스템의 개념적 모형화와 그 구현에 대한 연구)

  • Hana, Dong-Il;Kwonb, Hyeong-In;Chong, Hak-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.1
    • /
    • pp.67-84
    • /
    • 2008
  • This paper proposes a design and realization for the semantic search system. The proposed model includes three Architecture Layers of a Semantic Search System ; (they are conceptually named as) the Knowledge Acquisition, the Knowledge Representation and the Knowledge Utilization. Each of these three Layers are designed to interactively work together, so as to maximize the users' information needs. The Knowledge Acquisition Layer includes index and storage of Semantic Metadata from various source of web contents(eg : text, image, multimedia and so on). The Knowledge Representation Layer includes the ontology schema and instance, through the process of semantic search by ontology based query expansion. Finally, the Knowledge Utilization Layer includes the users to search query intuitively, and get its results without the users'knowledge of semantic web language or ontology. So far as the design and the realization of the semantic search site is concerned, the proposedsemantic search system will offer useful implications to the researchers and practitioners so as to improve the research level to the commercial use.

  • PDF

Extracting and Clustering of Story Events from a Story Corpus

  • Yu, Hye-Yeon;Cheong, Yun-Gyung;Bae, Byung-Chull
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3498-3512
    • /
    • 2021
  • This article describes how events that make up text stories can be represented and extracted. We also address the results from our simple experiment on extracting and clustering events in terms of emotions, under the assumption that different emotional events can be associated with the classified clusters. Each emotion cluster is based on Plutchik's eight basic emotion model, and the attributes of the NLTK-VADER are used for the classification criterion. While comparisons of the results with human raters show less accuracy for certain emotion types, emotion types such as joy and sadness show relatively high accuracy. The evaluation results with NRC Word Emotion Association Lexicon (aka EmoLex) show high accuracy values (more than 90% accuracy in anger, disgust, fear, and surprise), though precision and recall values are relatively low.

Comparative Study of Sentiment Analysis Model based on Korean Linguistic Characteristics (한국어 언어학적 특성 기반 감성분석 모델 비교 분석)

  • Kim, Gyeong-Min;Park, Chanjun;Jo, Jaechoon;Lim, Heui-Seok
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.149-152
    • /
    • 2019
  • 감성분석이란 입력된 텍스트의 감성을 분류하는 자연어처리의 한 분야로, 최근 CNN, RNN, Transformer등의 딥러닝 기법을 적용한 다양한 연구가 있다. 한국어 감성분석을 진행하기 위해서는 형태소, 음절 등의 추가 자질을 활용하는 것이 효과적이며 성능 향상을 기대할 수 있는 방법이다. 모델 생성에 있어서 아키텍쳐 구성도 중요하지만 문맥에 따른 언어를 컴퓨터가 표현할 수 있는 지식 표현 체계 구성도 상당히 중요하다. 이러한 맥락에서 BERT모델은 문맥을 완전한 양방향으로 이해할 수있는 Language Representation 기반 모델이다. 본 논문에서는 최근 CNN, RNN이 융합된 모델과 Transformer 기반의 한국어 KoBERT 모델에 대해 감성분석 task에서 다양한 성능비교를 진행했다. 성능분석 결과 어절단위 한국어 KoBERT모델에서 90.50%의 성능을 보여주었다.

  • PDF

Effective Method to Change Multimedia Scene Configuration Information Using DOM Update (DOM update를 이용한 효율적인 멀티미디어 장면 구성 정보 변경 방안)

  • Kim, Kyuheon;Park, JungWook;Kim, Byungchul
    • Journal of Broadcast Engineering
    • /
    • v.18 no.1
    • /
    • pp.43-58
    • /
    • 2013
  • Richmedia Service means that interactive media service can provide view with various multimedia elements(such as Video, Audio, Text) at same time. Various Multimedia elements can be serviced by Scene Description technology standards like BIFS(Binary Format for Scenes) and LASeR(Light Application Scene Representation). By providing Scene Component information, richmedia service is available to various multimedia services. so users is available to personalized services fitting temporal and spatial options. In conventional technology, when the scene is changed by user or service, mobile deletes the scene of configuration information and makes new scene of configuration information. this is a very inefficient way. In this paper, Propoesed that by using DOM(Document Object Model) method, to pass only the dynamic configuration part, changes scene method.

The Case Study for The Construction of Similarities and Affordance (유사성 구성과 어포던스(affordance)에 대한 사례 연구 -대수 문장제 해결 과정에서-)

  • Park, Hyun-Jeong
    • The Mathematical Education
    • /
    • v.46 no.4
    • /
    • pp.371-388
    • /
    • 2007
  • This is a case study trying to understand from the view of affordance which certain three middle school students perceive an activation of previous knowledge in the course of problem solving when they solve algebra word problems with a previous knowledge. The results of this study showed that at first, every subjects perceived the text as affordance which explaining superficial similarities, that is, a working(painting)situation rather than problem structure and then activated the related solution knowledge on the ground of the experience of previous problem solving which is similar to current situation. The subject's applying process for solving knowledge could be arranged largely into two types. The first type is a numeral information connected with the described problem situation or a symbolic representation of mathematical meaning which are the transformed solution applied process with a suitable solution formula to the current problem. This process achieved by constructing a virtual mental model that indicating mathematical situation about the problem when the solver read the problem integrating symbolized information from the described text. The second type is a case that those subjects symbolizing a formal mathematical concept which is not connected with the problem situation about the described numeral information from the applied problem or the text of mathematical meaning, which process is the case to perceive superficial phrases or words that described from the problem as affordance and then applied previously used algorithmatical formula as it was. In conclusion, on the ground of the results of this case study, it is guessed that many students put only algorithmatical knowledge in their memories through previous experiences of problem solving, and the memories are connected with the particular phrases described from the problems. And it is also recognizable when the reflection process which is the last step of problem solving carried out in the process of understanding the problem and making a plan showed the most successful in problem solving.

  • PDF

Multimedia Document Databases : Representation, Query Processing and Navigation

  • Kalakota, Ravi S.;Whinston, Andrew B.
    • The Journal of Information Technology and Database
    • /
    • v.1 no.1
    • /
    • pp.31-62
    • /
    • 1994
  • Information systems for application areas like office automation, customer service or computer aided manufacturing are usually highly interactive and deal with complex document structures composed of multiple media formats. For the realization of these systems, nonstandard database systems, which we call document databases, need to handle different types of coarse-and fine-grained document objects(like full-text documents, graphics and images), hierarchical and non-hierarchical relationships between objects(like composition-links and cross-references using hypertext structures) and document attributes of different types such as formatting/presentation information and access control. In this paper, we present the underlying data model for document databases based on descriptive markup languages that provide mechanisms for specifying the logical structure(or schema) of individual documents stored in the database. We then describe extensions to the data model for supporting notion of composite structures("join" operators for documents) --composition and hyperlinking mechanisms for representing compound documents and inter-linked documents as unique entites separate from their components. Furthermore, due to the interactive nature of the application domains, the database system in conjunction with clients(or browsers) has to support visual navigation and graphical query mechanisms. We describe the functionality of a new user interface paradigm called HyBrow for meeting the above mentioned requirements. The underlying implementation strategy is also discussed.discussed.

  • PDF

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.