• Title/Summary/Keyword: dictionary seed

Search Result 10, Processing Time 0.023 seconds

Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS (SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구)

  • Lee, Jong-Hwa
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

Product Evaluation Summarization Through Linguistic Analysis of Product Reviews (상품평의 언어적 분석을 통한 상품 평가 요약 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.17B no.1
    • /
    • pp.93-98
    • /
    • 2010
  • In this paper, we introduce a system that summarizes product evaluation through linguistic analysis to effectively utilize explosively increasing product reviews. Our system analyzes polarities of product reviews by product features, based on which customers evaluate each product like 'design' and 'material' for a skirt product category. The system shows to customers a graph as a review summary that represents percentages of positive and negative reviews. We build an opinion word dictionary for each product feature through context based automatic expansion with small seed words, and judge polarity of reviews by product features with the extracted dictionary. In experiment using product reviews from online shopping malls, our system shows average accuracy of 69.8% in extracting judgemental word dictionary and 81.8% in polarity resolution for each sentence.

Analysis of the Lee-Chen's One-Time Password Authentication Scheme (Lee와 Chen의 일회용 비밀번호 인증기법 분석)

  • You, Il-Sun;Kim, Bo-Nam;Kim, Heung-Jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.2
    • /
    • pp.285-292
    • /
    • 2009
  • In 2005, Lee and Chen suggested an enhanced one-time password authentication scheme which can prevent the stolen verifier attack that the Yeh-Shen-Whang's scheme has. The Lee-Chen's scheme addresses the stolen verifier attack by deriving each user's pre-shared secret SEED from the server secret. However, we investigated the weakness of the Lee-Chen's scheme and found out that it was suffering from the off-line dictionary attack on the server secret. We demonstrated that the off-line dictionary attack on the server secret can be easily tackled with only the help of the Hardware Security Modules (HSM). Moreover, we improved the scheme not to be weak to the denial of service attack and allow compromise of the past session keys even though the current password is stolen. Through the comparison between the Lee-Chen's scheme and the proposed one, we showed that the proposed one is stronger than other.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

Double Encryption Authentication System for Resisting Dictionary Attack in Linux (리눅스에서 사전공격방지를 위한 이중 암호 인증 시스템)

  • 최종혁;허기택;주낙근
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2001.11a
    • /
    • pp.666-670
    • /
    • 2001
  • 기존의 리눅스 패스워드 인증 시스템에서는 사용자의 계정을 만들어 처음으로 로그인하여 패스워드를 생성할 때에는 사용자의 프로세스 번호와 패스워드를 생성한 시각에 해당하는 값을 seed로 하여 난수를 발생시켜서 salt값을 만든다. 이 salt값은 사용자가 서로 같은 암호를 사용할 경우 암호가 같은 값으로 저장되는걸 막기 위해서 사용하는데 시스템은 salt값과 사용자 패스워드를 단방향 DES 알고리즘을 사용하여 패스워드 파일을 암호화하여 저장한다. 그러나 패스워드 파일에 사용자암호는 암호화되어 저장되지만, salt값이 그대로 저장되기 때문에 패스워드 파일을 가져가게 된다면 사전공격 해킹툴인 John-the-ripper나 Crack 프로그램 등을 이용하여 쉬운 패스워드는 공격자에 의해 간단하게 풀려버린다. 이러한 사전공격에 대한 취약점을 해결하기 위해 암호화된 사용자 패스워드들을 시스템의 또 다른 비밀키를 사용하여 암호화하는 방법을 도입함으로써 사전공격에 강한 패스워드 인증 시스템을 설계 및 구현한다.

  • PDF

Performance Improvement of Bilingual Lexicon Extraction via Pivot Language and Word Alignment Tool (중간언어와 단어정렬을 통한 이중언어 사전의 자동 추출에 대한 성능 개선)

  • Kwon, Hong-Seok;Seo, Hyeung-Won;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.27-32
    • /
    • 2013
  • 본 논문은 잘 알려지지 않은 언어 쌍에 대해서 병렬말뭉치(parallel corpus)로부터 자동으로 이중언어 사전을 추출하는 방법을 제안하였다. 이 방법은 중간언어(pivot language)를 매개로 하고 문맥 벡터를 생성하기 위해 공개된 단어 정렬 도구인 Anymalign을 사용하였다. 그 결과로 초기사전(seed dictionary)을 사용한 문맥벡터의 번역 과정이 필요 없으며 통계적 방법의 약점인 낮은 빈도수를 가지는 어휘에 대한 번역 정확도를 높였다. 또한 문맥벡터의 요소 값으로 특정 임계값 이상을 가지는 양방향 번역 확률 정보를 사용하여 상위 5위 이내의 번역 정확도를 크게 높였다. 본 논문은 두 개의 서로 다른 언어 쌍 한국어-스페인어 그리고 한국어-프랑스어 양방향에 대해서 각각 이중언어 사전을 추출하는 실험을 하였다. 높은 빈도수를 가지는 어휘에 대한 번역 정확도는 이전 연구에서 보인 실험 결과에 비해 최소 3.41% 최대 67.91%의 성능 향상을 보였고 낮은 빈도수를 가지는 어휘에 대한 번역 정확도는 최소 5.06%, 최대 990%의 성능 향상을 보였다.

  • PDF

A S/KEY Based Secure Authentication Protocol Using Public Key Cryptography (공개키를 적용한 S/KEY 기반의 안전한 사용자 인증 프로토콜)

  • You, Il-Sun;Cho, Kyung-San
    • The KIPS Transactions:PartC
    • /
    • v.10C no.6
    • /
    • pp.763-768
    • /
    • 2003
  • In this paper, we propose a S/KEY based authentication protocol using smart cards to address the vulnerebilities of both the S/KEY authentication protocol and the secure one-time password protpcol which YEH, SHEN and HWANG proposed [1]. Because out protpcel is based on public key, it can authenticate the server and distribute a session key without any pre-shared secret. Also, it can prevent off-line dictionary attacks by using the randomly generated user is stored in the users smart card. More importantly, it can truly achieve the strength of the S/KEY scheme that no secret information need be stored on the server.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Utilizing Local Bilingual Embeddings on Korean-English Law Data (한국어-영어 법률 말뭉치의 로컬 이중 언어 임베딩)

  • Choi, Soon-Young;Matteson, Andrew Stuart;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.45-53
    • /
    • 2018
  • Recently, studies about bilingual word embedding have been gaining much attention. However, bilingual word embedding with Korean is not actively pursued due to the difficulty in obtaining a sizable, high quality corpus. Local embeddings that can be applied to specific domains are relatively rare. Additionally, multi-word vocabulary is problematic due to the lack of one-to-one word-level correspondence in translation pairs. In this paper, we crawl 868,163 paragraphs from a Korean-English law corpus and propose three mapping strategies for word embedding. These strategies address the aforementioned issues including multi-word translation and improve translation pair quality on paragraph-aligned data. We demonstrate a twofold increase in translation pair quality compared to the global bilingual word embedding baseline.

The ginseng magnate BongSang Son; His life and achivements (인삼왕 손봉상의 업적을 통해 본 개성인삼 개척사)

  • Kim, Johyung;Ock, Soonjong
    • Journal of Ginseng Culture
    • /
    • v.2
    • /
    • pp.27-38
    • /
    • 2020
  • Gaesung was the Mecca of Korean ginseng. Factors that Gaesung has been a leading brand of Korean ginseng were multiplicative. Those were natural conditions and huge commercial capital, red ginseng factory, creative business systems and etc. We can quote BongSang Son, SungHak Kong and JeongHo Kim as a famous Gaesung' ginseng merchants. They, as leaders modern ginseng industry had supplied the method of cultivation, prevention of phyto-diseases, excellent ginseng seed, and prepayment system of farming capital. The Gaesung merchants also adopted modern marketing techniques : commercial advertisement, made-order sales, changing package of the ginseng products. The book 'The Dictionary of Korean Companies and Stores' which was published in 1927 introduces BongSang Son as a great businessman in Gaesung. He was not only merchants but also educator and social worker. He practiced the spirit of entrepreneurship. BongSang Son's role of pioneer contributed to the development of Korean ginseng and Gaesung. Due to such efforts of Gaesung merchants, Korean ginseng industry were took a great step forward. This article considered the development of Korean ginseng industry through the life and achievement of the ginseng magnate BongSang Son who was a representative one of Gaesung merchants.