• Title/Summary/Keyword: dictionary-based

Search Result 555, Processing Time 0.027 seconds

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

  • Lee, Jong Hwa;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2015
  • Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.

Implementation of Augmentative and Alternative Communication System Using Image Dictionary (이미지 사전을 이용한 보완대체 의사소통 시스템의 구현)

  • Ryu Je;Kim Woo-Sung;Han Kwang-Rok
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.9
    • /
    • pp.1208-1221
    • /
    • 2006
  • In this paper, we implement an AAC(Augmentative and Alternative Communication) system based on image dictionary in order that speech defectives can easily communicate their opinion to others by using images. Normally, those who have a speech defect use only a few limited words to express their intentions and it is an effective way to use images in their communication because they have a difficulty in speaking. Therefore we make verbal images of verb and adjective that play an important role in expressing the speaker's intention, define pattern of semantic relation between the verbal images and noun images, and construct the image dictionary. In this AAC system, when a user clicks a verbal image, the system generates a sentence by selecting noun images which are component parts of corresponding pattern based on semantic relation with the verbal image. We evaluate the implemented system by how efficiently children of speech defect can express their intention and the result shows more than 70% of success rate in communication.

  • PDF

Dictionary Attack on Huang-Wei's Key Exchange and Authentication Scheme (Huang-Wei의 키 교환 및 인증 방식에 대한 사전공격)

  • Kim, Mi-Jin;Nam, Jung-Hyun;Won, Dong-Ho
    • Journal of Internet Computing and Services
    • /
    • v.9 no.2
    • /
    • pp.83-88
    • /
    • 2008
  • Session initiation protocol (SIP) is an application-layer prolocol to initiate and control multimedia client session. When client ask to use a SIP service, they need to be authenticated in order to get service from the server. Authentication in a SIP application is the process in which a client agent present credentials to another SIP element to establish a session or be granted access to the network service. In 2005, Yang et al. proposed a key exchange and authentication scheme for use in SIP applications, which is based on the Diffie-Hellman protocol. But, Yang et al.'s scheme is not suitable for the hardware-limited client and severs, since it requires the protocol participant to perform significant amount of computations (i.e., four modular exponentiations). Based on this observation. Huang and Wei have recently proposed a new efficient key exchange and authentication scheme thor improves on Yang et al.'s scheme. As for security, Huang and Wei claimed, among others, that their scheme is resistant to offline dictionary attacks. However, the claim turned out to be untrue. In this paper, we show thor Huang and Wei's key exchange and authentication scheme is vulnerable to on offline dictionary attack and forward secrecy.

  • PDF

One Pass Identification processing Password-based

  • Park, Byung-Jun;Park, Jong-Min
    • Journal of information and communication convergence engineering
    • /
    • v.4 no.4
    • /
    • pp.166-169
    • /
    • 2006
  • Almost all network systems provide an authentication mechanism based on user ID and password. In such system, it is easy to obtain the user password using a sniffer program with illegal eavesdropping. The one-time password and challenge-response method are useful authentication schemes that protect the user passwords against eavesdropping. In client/server environments, the one-time password scheme using time is especially useful because it solves the synchronization problem. In this paper, we present a new identification scheme: OPI(One Pass Identification). The security of OPI is based on the square root problem, and OPI is secure: against the well known attacks including pre-play attack, off-line dictionary attack and server comprise. A number of pass of OPI is one, and OPI processes the password and does not need the key. We think that OPI is excellent for the consuming time to verify the prover.

A Study on Feature Classification and Data Dictionary of Digital Map (수치지도 지형지물 분류체계 개선 및 자료사전에 관한 연구)

  • 조우석;이동구;윤영보
    • Spatial Information Research
    • /
    • v.10 no.3
    • /
    • pp.455-468
    • /
    • 2002
  • Toward the systematic and efficient management of national land, National Geography Institute(NGI, National mapping agency) has been producing national basemap in automated process since middle of 1980's. Under the National Geographic Information System(NGIS) Development Plan, NGI began to produce digital maps in the scales of 1:1,000, 1:5,000, 1:25,000 since 1995. However, those of digital maps that have been generated under NGIS Development Plan need to be modified and corrected due to lack of technology and experience in making digital maps. In this context, those digital maps generated are currently in great need for improving the data dictionary. It is fully appreciated in previous research that data dictionary will be a key element far users and generators of digital maps to rectify the existing problems in digital maps as well as to maximize the application of digital maps. In this paper, we analyzed existing problems in digital maps based on previous researches and interviews with engineers in different fields of geospatial engineering. And then, the existing data dictionary has been redefined and modified. In the line of modification process, a relational matrix was established fur each topographic feature defined in the existing feature classification system. This paper presents newly proposed data dictionary which conforms to newly defined feature classification system from previous research performed by NGI.

  • PDF

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

Syllable-based Korean POS Tagging Based on Combining a Pre-analyzed Dictionary with Machine Learning (기분석사전과 기계학습 방법을 결합한 음절 단위 한국어 품사 태깅)

  • Lee, Chung-Hee;Lim, Joon-Ho;Lim, Soojong;Kim, Hyun-Ki
    • Journal of KIISE
    • /
    • v.43 no.3
    • /
    • pp.362-369
    • /
    • 2016
  • This study is directed toward the design of a hybrid algorithm for syllable-based Korean POS tagging. Previous syllable-based works on Korean POS tagging have relied on a sequence labeling method and mostly used only a machine learning method. We present a new algorithm integrating a machine learning method and a pre-analyzed dictionary. We used a Sejong tagged corpus for training and evaluation. While the machine learning engine achieved eojeol precision of 0.964, the proposed hybrid engine achieved eojeol precision of 0.990. In a Quiz domain test, the machine learning engine and the proposed hybrid engine obtained 0.961 and 0.972, respectively. This result indicates our method to be effective for Korean POS tagging.

A Novel Multiple Kernel Sparse Representation based Classification for Face Recognition

  • Zheng, Hao;Ye, Qiaolin;Jin, Zhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.4
    • /
    • pp.1463-1480
    • /
    • 2014
  • It is well known that sparse code is effective for feature extraction of face recognition, especially sparse mode can be learned in the kernel space, and obtain better performance. Some recent algorithms made use of single kernel in the sparse mode, but this didn't make full use of the kernel information. The key issue is how to select the suitable kernel weights, and combine the selected kernels. In this paper, we propose a novel multiple kernel sparse representation based classification for face recognition (MKSRC), which performs sparse code and dictionary learning in the multiple kernel space. Initially, several possible kernels are combined and the sparse coefficient is computed, then the kernel weights can be obtained by the sparse coefficient. Finally convergence makes the kernel weights optimal. The experiments results show that our algorithm outperforms other state-of-the-art algorithms and demonstrate the promising performance of the proposed algorithms.

Domain Adaptation Image Classification Based on Multi-sparse Representation

  • Zhang, Xu;Wang, Xiaofeng;Du, Yue;Qin, Xiaoyan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.5
    • /
    • pp.2590-2606
    • /
    • 2017
  • Generally, research of classical image classification algorithms assume that training data and testing data are derived from the same domain with the same distribution. Unfortunately, in practical applications, this assumption is rarely met. Aiming at the problem, a domain adaption image classification approach based on multi-sparse representation is proposed in this paper. The existences of intermediate domains are hypothesized between the source and target domains. And each intermediate subspace is modeled through online dictionary learning with target data updating. On the one hand, the reconstruction error of the target data is guaranteed, on the other, the transition from the source domain to the target domain is as smooth as possible. An augmented feature representation produced by invariant sparse codes across the source, intermediate and target domain dictionaries is employed for across domain recognition. Experimental results verify the effectiveness of the proposed algorithm.