• Title/Summary/Keyword: data dictionary

Search Result 346, Processing Time 0.025 seconds

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

Development of Emotional Word Collection System using Hash Tag of SNS (SNS의 해시태그를 이용한 감정 단어 수집 시스템 개발)

  • Lee, Jong-Hwa;Lee, Yun-Jae;Lee, Hyun-Kyu
    • The Journal of Information Systems
    • /
    • v.27 no.2
    • /
    • pp.77-94
    • /
    • 2018
  • Purpose As the amount of data became enormous, it became a time when more efforts were needed to find the necessary information. Curation is a new term similarly to the museum curator, which is a service that helps people to collect, share, and value the contents of the Internet. In SNS, hash tag is used for emotional vocabulary to be transmitted between users by using (#) tag. Design/methodology/approach As the amount of data became enormous, it became a time when more efforts were needed to find the necessary information. Curation is a new term similarly to the museum curator, which is a service that helps people to collect, share, and value the contents of the Internet. In SNS, hash tag is used for emotional vocabulary to be transmitted between users by using (#) tag. Findings This study base on seven emotional sets such as 'Happy', 'Angry', 'Sad', 'Bad', 'Fearful', 'Surprised', 'Disgusted' to construct 327 emotional seeds and utilize the autofill function of web browser to collect 1.5 million emotional words from emotional seeds. The emotional dictionary of this study is considered to be meaningful as a tool to make emotional judgment from unstructured data.

Performance of speech recognition unit considering morphological pronunciation variation (형태소 발음변이를 고려한 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.111-119
    • /
    • 2018
  • This paper proposes a method to improve speech recognition performance by extracting various pronunciations of the pseudo-morpheme unit from an eojeol unit corpus and generating a new recognition unit considering pronunciation variations. In the proposed method, we first align the pronunciation of the eojeol units and the pseudo-morpheme units, and then expand the pronunciation dictionary by extracting the new pronunciations of the pseudo-morpheme units at the pronunciation of the eojeol units. Then, we propose a new recognition unit that relies on pronunciation by tagging the obtained phoneme symbols according to the pseudo-morpheme units. The proposed units and their extended pronunciations are incorporated into the lexicon and language model of the speech recognizer. Experiments for performance evaluation are performed using the Korean speech recognizer with a trigram language model obtained by a 100 million pseudo-morpheme corpus and an acoustic model trained by a multi-genre broadcast speech data of 445 hours. The proposed method is shown to reduce the word error rate relatively by 13.8% in the news-genre evaluation data and by 4.5% in the total evaluation data.

Work Experience of Irregular Clinical Research Nurses (비정규직 임상연구 간호사의 근무경험)

  • Kim, Hae-Ok
    • Journal of Digital Contents Society
    • /
    • v.16 no.4
    • /
    • pp.623-634
    • /
    • 2015
  • This research aims to perform an in-depth investigation about meanings and essence of working as clinical research nurses in local general hospitals. In order to interpret and reveal the meanings of role experience, data were collected from objects of 7 participants for 3 months. Data were analyzed by ethnographic research tools of Spradley. Themes conducted from this study were 'new experience about social learning process' and 'joys and sorrows through study participants ', 'lack of specialized learning course in nursing curriculums' and 'roles of general research planner', 'one's own work space' and 'proactive work environment that is relaxing and filled with consideration for others', 'hardship of being temporary employees. Clinical research nurses have experienced expansion of roles through new social learning processes. Conclusively, this study will provide useful basic data to develop new curriculum about clinical research nursing for nursing students and to improve working conditions for clinical research nurses.e purpose of this study is to design and implement a sign language dictionary for the deaf to understand information communication terminologies. When the deafs who have difficulties in communication use the internet, they can get help from this dictionary in accessing various types of information and expressing their intension. In order for the deaf to utilize the internet as efficiently as ordinary people, they must understand information communication terminologies first.

Annotation of a Non-native English Speech Database by Korean Speakers

  • Kim, Jong-Mi
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.111-135
    • /
    • 2002
  • An annotation model of a non-native speech database has been devised, wherein English is the target language and Korean is the native language. The proposed annotation model features overt transcription of predictable linguistic information in native speech by the dictionary entry and several predefined types of error specification found in native language transfer. The proposed model is, in that sense, different from other previously explored annotation models in the literature, most of which are based on native speech. The validity of the newly proposed model is revealed in its consistent annotation of 1) salient linguistic features of English, 2) contrastive linguistic features of English and Korean, 3) actual errors reported in the literature, and 4) the newly collected data in this study. The annotation method in this model adopts the widely accepted conventions, Speech Assessment Methods Phonetic Alphabet (SAMPA) and the TOnes and Break Indices (ToBI). In the proposed annotation model, SAMPA is exclusively employed for segmental transcription and ToBI for prosodic transcription. The annotation of non-native speech is used to assess speaking ability for English as Foreign Language (EFL) learners.

  • PDF

Korean Nominal Bank, Using Language Resources of Sejong Project (세종계획 언어자원 기반 한국어 명사은행)

  • Kim, Dong-Sung
    • Language and Information
    • /
    • v.17 no.2
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF

Linear Programming Model Discovery from Databases Using GPS and Artificial Neural Networks (GPS와 인공신경망을 활용한 데이터베이스로부터의 선형계획모형 발견법)

  • 권오병;양진설
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.25 no.3
    • /
    • pp.91-107
    • /
    • 2000
  • The linear programming model is a special form of useful knowledge that is embedded in a database. Since formulating models from scratch requires knowledge-intensive efforts, knowledge-based formulation support systems have been proposed in the Decision Support Systems area. However, they rely on the assumption that sufficient domain knowledge should already be captured as a specific knowledge representation form. Hence, the purpose of this paper is to propose a methodology that finds useful knowledge on building linear programming models from a database. The methodology consists of two parts. The first part is to find s first-cut model based on a data dictionary. To do so, we applied the General Problem Solver(GPS) algorithm. The second part is to discover a second-cut model by applying neural network technique. An illustrative example is described to show the feasibility of the proposed methodology.

  • PDF

A Study on the Actual Pronunciation of the Words of Foreign Origin and the Related Rules (외래어의 발음 실태와 발음 규정)

  • Cha Jae-Eun
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.17-20
    • /
    • 2006
  • The purpose of this paper is to investigate the actual pronunciation of the words of foreign origin on TV news programs, and to review the regulations related to it. To investigate the actual pronunciation of the foreign words, the frequency data of the National Korean Language Institute is used as the subject of investigation. There is a big gap between the actual pronunciation and the orthography of the words of foreign origin. And received pronunciation of foreign words is need to teach or learn Korean efficiently. I suggest the pronunciation of foreign words is marked on Korean dictionary instead of revising the related regulations.

  • PDF

A Study on the Efficent Propulsion of Customer Relationship Management System for Library (도서관 CRM 시스템의 효율적 추진에 관한 연구)

  • You Yang-Keun
    • Journal of Korean Library and Information Science Society
    • /
    • v.35 no.3
    • /
    • pp.251-270
    • /
    • 2004
  • The purpose of this study is to introduce a customer relationship management(CRM) for more user satisfied information in through the relationship between an user-centered library management and library customers. The characteristics of library customer information needs and a general CRM system design are introduced. The result shows a plan for library CRM system. It is included a conceptual modeling design for the CRM system, a data dictionary, and event class.

  • PDF

Standardization of DRM Technologies in MPEG-21 (MPEG-21의 DRM 기술 표준화 현황 분석)

  • Jeong, Senator
    • Journal of Information Management
    • /
    • v.35 no.2
    • /
    • pp.107-130
    • /
    • 2004
  • MPEG-21 is an open standard framework for creation, delivery and consumption of digital content in interoperable and rights-managed and protected way. Focusing on DRM technologies, this paper covers with concept and ongoing activities of MPEG-21's parts - Digital Item Declaration which is the base unit of trade and delivery, Digital Item Identification, Intellectual Property Management & Protection, Rights Data Dictionary, Rights Expression Language, Persistent Association Technology, Event Reporting, and so on.