• 제목/요약/키워드: data dictionary

검색결과 346건 처리시간 0.026초

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • 제22권2호
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

A New Dictionary Mechanism for Efficient Fault Diagnosis (효율적인 고장진단을 위한 딕셔너리 구조 개발)

  • Kim Sang-Wook;Kim Yong-Joon;Chun Sung-Hoon;Kang Sung-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • 제43권4호
    • /
    • pp.49-55
    • /
    • 2006
  • In this paper, a fault dictionary for fault locations is considered. The foremost problem in fault diagnosis is the size of the data. As circuits are large, the data for fault diagnosis increase to the point where they are impossible to be stored. The increased information makes it impossible to store the dictionary into storage media. In order to generate the dictionary, j.e. pass-fail dictionary some dictionaries store a portion of the information. The deleted data makes it difficult to diagnose fault models except single stuck-at fault. This paper proposes a new dictionary format. A new format makes a dictionary small size without deleting any informations.

Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

  • Sung, Thaileang;Hwang, Insoo
    • Journal of Information Technology Applications and Management
    • /
    • 제23권2호
    • /
    • pp.11-28
    • /
    • 2016
  • In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

A Practical Algorithm for Two-Dimensional Dictionary Matching (2차원 사전 정합을 위한 실용적인 알고리즘)

  • Lee, Gwang-Su
    • The Transactions of the Korea Information Processing Society
    • /
    • 제6권3호
    • /
    • pp.812-820
    • /
    • 1999
  • In two-dimensional dictionary matching problem, we are given a two-dimensional text T and a dictionary D={P\ulcorner, ...., P\ulcorner} as a set of two-dimensional patterns. We seek the locations of all the dictionary patterns that appear in T. We present a new two-dimensional pattern matching algorithm that can handle just a single pattern, and then show how to extend it into two-dimensional dictionary matching algorithm. The suggested algorithm is practical in the sense that it can deal that it uses a small extra space proportional to the size of the dictionary, and that it is quite simple to be implemented without depending on complicated data structures.

  • PDF

Hyper-Text Compression Method Based on LZW Dictionary Entry Management (개선된 LZW 사전 관리 기법에 기반한 효과적인 Hyper-Text 문서 압축 방안)

  • Sin, Gwang-Cheol;Han, Sang-Yong
    • The KIPS Transactions:PartA
    • /
    • 제9A권3호
    • /
    • pp.311-316
    • /
    • 2002
  • LZW is a popular variant of LZ78 to compress text documents. LZW yields a high compression rate and is widely used by many commercial programs. Its core idea is to assign most probably used character group an entry in a dictionary. If a group of character which is already positioned in a dictionary appears in the streaming data, then an index of a dictionary is replaced in the position of character group. In this paper, we propose a new efficient method to find least used entries in a dictionary using counter. We also achieve higher compression rate by preassigning widely used tags in hyper-text documents. Experimental results show that the proposed method is more effective than V.42bis and Unix compression method. It gives 3∼8% better in the standard Calgary Corpus and 23∼24% better in HTML documents.

A Study of Methodology for Automatic Construction of OWL Ontologies from Sejong Electronic Dictionary (대용량 OWL 온톨로지 자동구축을 위한 세종전자사전 활용 방법론 연구)

  • Song Do Gyu
    • Language and Information
    • /
    • 제9권1호
    • /
    • pp.19-34
    • /
    • 2005
  • Ontology is an indispensable component in intelligent and semantic processing of knowledge and information, such as in semantic web. However, ontology construction requires vast amount of data collection and arduous efforts in processing these un-structured data. This study proposed a methodology to automatically construct and generate ontologies from Sejong Electronic Dictionary. As Sejong Electronic Dictionary is structured in XML format, it can be processed automatically by computer programmed tools into an OWL(Web Ontology Language)-based ontologies as specified in W3C . This paper presents the process and concrete application of this methodology.

  • PDF

A Structural Analysis of Dictionary Text for the Construction of Lexical Data Base (어휘정보구축을 위한 사전텍스트의 구조분석 및 변환)

  • 최병진
    • Language and Information
    • /
    • 제6권2호
    • /
    • pp.33-55
    • /
    • 2002
  • This research aims at transforming the definition tort of an English-English-Korean Dictionary (EEKD) which is encoded in EST files for the purpose of publishing into a structured format for Lexical Data Base (LDB). The construction of LDB is very time-consuming and expensive work. In order to save time and efforts in building new lexical information, the present study tries to extract useful linguistic information from an existing printed dictionary. In this paper, the process of extraction and structuring of lexical information from a printed dictionary (EEKD) as a lexical resource is described. The extracted information is represented in XML format, which can be transformed into another representation for different application requirements.

  • PDF

A Study on Feature Classification and Data Dictionary of Digital Map (수치지도 지형지물 분류체계 개선 및 자료사전에 관한 연구)

  • 조우석;이동구;윤영보
    • Spatial Information Research
    • /
    • 제10권3호
    • /
    • pp.455-468
    • /
    • 2002
  • Toward the systematic and efficient management of national land, National Geography Institute(NGI, National mapping agency) has been producing national basemap in automated process since middle of 1980's. Under the National Geographic Information System(NGIS) Development Plan, NGI began to produce digital maps in the scales of 1:1,000, 1:5,000, 1:25,000 since 1995. However, those of digital maps that have been generated under NGIS Development Plan need to be modified and corrected due to lack of technology and experience in making digital maps. In this context, those digital maps generated are currently in great need for improving the data dictionary. It is fully appreciated in previous research that data dictionary will be a key element far users and generators of digital maps to rectify the existing problems in digital maps as well as to maximize the application of digital maps. In this paper, we analyzed existing problems in digital maps based on previous researches and interviews with engineers in different fields of geospatial engineering. And then, the existing data dictionary has been redefined and modified. In the line of modification process, a relational matrix was established fur each topographic feature defined in the existing feature classification system. This paper presents newly proposed data dictionary which conforms to newly defined feature classification system from previous research performed by NGI.

  • PDF

Development and Evaluation of Video English Dictionary for Silver Generation (실버세대를 위한 동영상 영어사전의 개발 및 평가)

  • Kim, Jeiyoung;Park, Ji Su;Shon, Jin Gon
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제9권11호
    • /
    • pp.345-350
    • /
    • 2020
  • Based on the analysis of physical and learning characteristics and requirements of the silver generation, a video English dictionary was developed and evaluated as English learning contents. The video English dictionary was developed using OCR as an input method and video as an output method, and 17 silver generations were evaluated for academic achievement, learning satisfaction, and ease of use. As a result of the analysis, both the text English dictionary and the video English dictionary showed high learning satisfaction, but the video English dictionary showed higher results than the text English dictionary in an academic achievement and ease of use.

Radioisotope identification using sparse representation with dictionary learning approach for an environmental radiation monitoring system

  • Kim, Junhyeok;Lee, Daehee;Kim, Jinhwan;Kim, Giyoon;Hwang, Jisung;Kim, Wonku;Cho, Gyuseong
    • Nuclear Engineering and Technology
    • /
    • 제54권3호
    • /
    • pp.1037-1048
    • /
    • 2022
  • A radioactive isotope identification algorithm is a prerequisite for a low-resolution scintillation detector applied to an unmanned radiation monitoring system. In this paper, a sparse representation with dictionary learning approach is proposed and applied to plastic gamma-ray spectra. Label-consistent K-SVD was used to learn a discriminative dictionary for the spectra corresponding to a mixture of four isotopes (133Ba, 22Na, 137Cs, and 60Co). A Monte Carlo simulation was employed to produce the simulated data as learning samples. Experimental measurement was conducted to obtain practical spectra. After determining the hyper parameters, two dictionaries tailored to the learning samples were tested by varying with the source position and the measurement time. They achieved average accuracies of 97.6% and 98.0% for all testing spectra. The average accuracy of each dictionary was above 96% for spectra measured over 2 s. They also showed acceptable performance when the spectra were artificially shifted. Thus, the proposed method could be useful for identifying radioisotopes in gamma-ray spectra from a plastic scintillation detector even when a dictionary is adapted to only simulated data. Furthermore, owing to the outstanding properties of sparse representation, the proposed approach can easily be built into an insitu monitoring system.