• Title/Summary/Keyword: word dictionary

Search Result 276, Processing Time 0.025 seconds

A Method of Analyzing Sentiment Polarity of Multilingual Social Media: A Case of Korean-Chinese Languages (다국어 소셜미디어에 대한 감성분석 방법 개발: 한국어-중국어를 중심으로)

  • Cui, Meina;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.91-111
    • /
    • 2016
  • It is crucial for the social media based marketing practices to perform sentiment analyze the unstructured data written by the potential consumers of their products and services. In particular, when it comes to the companies which are interested in global business, the companies must collect and analyze the data from the social media of multinational settings (e.g. Youtube, Instagram, etc.). In this case, since the texts are multilingual, they usually translate the sentences into a certain target language before conducting sentiment analysis. However, due to the lack of cultural differences and highly qualified data dictionary, translated sentences suffer from misunderstanding the true meaning. These result in decreasing the quality of sentiment analysis. Hence, this study aims to propose a method to perform a multilingual sentiment analysis, focusing on Korean-Chinese cases, while avoiding language translations. To show the feasibility of the idea proposed in this paper, we compare the performance of the proposed method with those of the legacy methods which adopt language translators. The results suggest that our method outperforms in terms of RMSE, and can be applied by the global business institutions.

Critical Approach to the Discourse of Livelihood in Korean Newspaper's Editorial (민생 없는 민생 담론 -한국 종합일간지 사설에 대한 비판적 담론 분석)

  • Lee, JungMin;Lee, SangKhee
    • Korean journal of communication and information
    • /
    • v.67
    • /
    • pp.88-118
    • /
    • 2014
  • This study attempted to clarify (1) the meaning of 'people's livelihood (Minsaeng, 民生)' conveyed by the newspapers in Korean society and the specific matter it refers to, and (2) consider the discourse formed by the newspapers and what does and does not change in that discourse over the passage of time. Editorials were classified and analyzed based on the framework of Fairclough's critical discourse analysis(CDA). It was clear, from the political perspective, that the discourse was respectively formed and changed for each administration. The discourse on 'people's livelihood' was critical and at the same time generally negative, because it dealt with the important social incidents or controversies of the time. The discourse on 'people's livelihood' related to the massive social streams of Korea's democratization and globalization process. Whereas the discourse on 'people's livelihood' in the 1990s, seen from an economic perspective, tried to resolve labor strikes, inflation rate, housing problem, and financial crisis. The discourse in the 2000s changed to issues ranging from economic growth and distribution to bi-polarization problem, job creation, abolishment of non-regular employments, etc. The meaning of 'people's livelihood' produced in the editorials of the major daily newspapers is different from the word's dictionary definition as 'the people's lives'.

  • PDF

A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments (제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue;Chung, Woo-Keun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.10
    • /
    • pp.788-801
    • /
    • 2010
  • Advancing of mobile device is remarkable, so the research on mobile input device is getting more important issue. There are lots of input devices such as keypad, QWERTY keypad, touch and speech recognizer, but they are not as convenient as typical keyboard-based desktop input devices so input strings usually contain many typing errors. These input errors are not trouble with communication among person, but it has very critical problem with searching in database, such as dictionary and address book, we can not obtain correct results. Especially, Hangeul has more than 10,000 different characters because one Hangeul character is made by combination of consonants and vowels, frequency of error is higher than English. Generally, suffix tree is the most widely used data structure to deal with errors of query, but it is not enough for variety errors. In this paper, we propose fast approximate Korean word searching system, which allows variety typing errors. This system includes several algorithms for applying general approximate string searching to Hangeul. And we present profanity filters by using proposed system. This system filters over than 90% of coined profanities.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia (위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법)

  • Lee, Hokyung;An, Jaehyun;Yoon, Jeongmin;Bae, Kyoungman;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.813-821
    • /
    • 2017
  • Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user's query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.

Analysis of the Earth Science Vocabularies Used in the 11th Grade Science Textbooks (지구과학 I 교과서 어휘 등급 분석 - 살아있는 지구 단원을 중심으로-)

  • Im, Young-Goo;Park, Hye-Jin;Lee, Hyonyong;Kim, Taesu;Oh, Heejin
    • Journal of Science Education
    • /
    • v.32 no.2
    • /
    • pp.87-102
    • /
    • 2008
  • The purposes of this study were to analyze vocabularies used the section of 'Living Earth' in 11-grade Earth science textbooks with the Science Word Analysis (SWA) program and to investigate the vocabularies selected by the 11th grade students as difficult ones. For the purpose, we extracted the Earth science vocabularies from six textbooks, and classified into the scientific and non-scientific vocabularies with SWA program based on the standard Korean language dictionary. Also, we investigated the difficulty of each vocabulary by using questionnaire to three hundred sixty students. From the results analyzed with the program, it was found that the frequency of the scientific vocabularies out of the level was the largest any other level in all textbooks. And from the survey, most of the vocabularies selected by students as difficult to understand were classified into out of the level. From these results, it were suggested that the students' cognitive level should be considered in developing science textbooks and difficult vocabularies should be replaced to easy ones within the limits of changeless in the meanings.

  • PDF

The Analysis of Relevance of Vocabulary Used in the 'Water' unit of Chemistry I Textbook (화학 I 교과서의 "물"단원에 사용된 어휘의 적절성 분석)

  • Kim, Ji-Young;Cho, Mi-Ju;Goo, Mi-Na;Park, Jong-Seok
    • Journal of the Korean Chemical Society
    • /
    • v.54 no.4
    • /
    • pp.471-478
    • /
    • 2010
  • This study analyzed the vocabulary level in the 'Water' unit of chemistry I textbook. It also analyzed its relevance to the 11th graders' vocabulary level. The main tool for analyzing vocabulary level was SWA(Science Word Analysis) program which was referenced the Standard Korean Dictionary and Graduated Vocabulary of Korean Language Education. The results in this study turned out to be as follows: The distribution of scientific vocabulary level increased from Level-1 to Level-3 and showed a tendency to decrease from Level-3 until Level-5. The average percentage of Out of level is the largest as 37%. The highest percentage of non-scientific vocabulary was Level-1. The distribution of non-scientific vocabulary level decreased progressively. The Level-5 and Out of level are used 18% averagely. So, there are 6 vocabularies of Level-5 and 82 vocabularies of extra-level inappropriate in scientific vocabularies. And there are 53 vocabularies of Level-5 and 145 vocabularies of Out of level inappropriate in non-scientific vocabularies. Therefore, the overall state of textbooks for grade 11 students are reasonable. But there are a great many vocabularies inappropriate for them. Those should be used minimum, and to be changed to the 1-4 of level vocabulary as stated in the student's level of understanding of appropriate vocabulary.

An Efficient Method for Korean Noun Extraction Using Noun Patterns (명사 출현 특성을 이용한 효율적인 한국어 명사 추출 방법)

  • 이도길;이상주;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.173-183
    • /
    • 2003
  • Morphological analysis is the most widely used method for extracting nouns from Korean texts. For every Eojeol, in order to extract nouns from it, a morphological analyzer performs frequent dictionary lookup and applies many morphonological rules, therefore it requires many operations. Moreover, a morphological analyzer generates all the possible morphological interpretations (sequences of morphemes) of a given Eojeol, which may by unnecessary from the noun extraction`s point of view. To reduce unnecessary computation of morphological analysis from the noun extraction`s point of view, this paper proposes a method for Korean noun extraction considering noun occurrence characteristics. Noun patterns denote conditions on which nouns are included in an Eojeol or not, which are positive cues or negative cues, respectively. When using the exclusive information as the negative cues, it is possible to reduce the search space of morphological analysis by ignoring Eojeols not including nouns. Post-noun syllable sequences(PNSS) as the positive cues can simply extract nouns by checking the part of the Eojeol preceding the PNSS and can guess unknown nouns. In addition, morphonological information is used instead of many morphonological rules in order to recover the lexical form from its altered surface form. Experimental results show that the proposed method can speed up without losing accuracy compared with other systems based on morphological analysis.

An Artificial Neural Network Based Phrase Network Construction Method for Structuring Facility Error Types (설비 오류 유형 구조화를 위한 인공신경망 기반 구절 네트워크 구축 방법)

  • Roh, Younghoon;Choi, Eunyoung;Choi, Yerim
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.21-29
    • /
    • 2018
  • In the era of the 4-th industrial revolution, the concept of smart factory is emerging. There are efforts to predict the occurrences of facility errors which have negative effects on the utilization and productivity by using data analysis. Data composed of the situation of a facility error and the type of the error, called the facility error log, is required for the prediction. However, in many manufacturing companies, the types of facility error are not precisely defined and categorized. The worker who operates the facilities writes the type of facility error in the form with unstructured text based on his or her empirical judgement. That makes it impossible to analyze data. Therefore, this paper proposes a framework for constructing a phrase network to support the identification and classification of facility error types by using facility error logs written by operators. Specifically, phrase indicating the types are extracted from text data by using dictionary which classifies terms by their usage. Then, a phrase network is constructed by calculating the similarity between the extracted phrase. The performance of the proposed method was evaluated by using real-world facility error logs. It is expected that the proposed method will contribute to the accurate identification of error types and to the prediction of facility errors.

A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론)

  • Roh, Jeong-Min;Kim, Yongsung
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.52-62
    • /
    • 2021
  • Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

Movie Recommended System base on Analysis for the User Review utilizing Ontology Visualization (온톨로지 시각화를 활용한 사용자 리뷰 분석 기반 영화 추천 시스템)

  • Mun, Seong Min;Kim, Gi Nam;Choi, Gyeong cheol;Lee, Kyung Won
    • Design Convergence Study
    • /
    • v.15 no.2
    • /
    • pp.347-368
    • /
    • 2016
  • Recently, researches for the word of mouth(WOM) imply that consumers use WOM informations of products in their purchase process. This study suggests methods using opinion mining and visualization to understand consumers' opinion of each goods and each markets. For this study we conduct research that includes developing domain ontology based on reviews confined to "movie" category because people who want to have watching movie refer other's movie reviews recently, and it is analyzed by opinion mining and visualization. It has differences comparing other researches as conducting attribution classification of evaluation factors and comprising verbal dictionary about evaluation factors when we conduct ontology process for analyzing. We want to prove through the result if research method will be valid. Results derived from this study can be largely divided into three. First, This research explains methods of developing domain ontology using keyword extraction and topic modeling. Second, We visualize reviews of each movie to understand overall audiences' opinion about specific movies. Third, We find clusters that consist of products which evaluated similar assessments in accordance with the evaluation results for the product. Case study of this research largely shows three clusters containing 130 movies that are used according to audiences'opinion.