• Title/Summary/Keyword: Data dictionary

Search Result 346, Processing Time 0.028 seconds

A Technique for Product Effect Analysis Using Online Customer Reviews (온라인 고객 리뷰를 활용한 제품 효과 분석 기법)

  • Lim, Young Seo;Lee, So Yeong;Lee, Ji Na;Ryu, Bo Kyung;Kim, Hyon Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.259-266
    • /
    • 2020
  • In this paper, we propose a novel scheme for product effect analysis, termed PEM, to find out the effectiveness of products used for improving the current condition, such as health supplements and cosmetics, by utilizing online customer reviews. The proposed technique preprocesses online customer reviews to remove advertisements automatically, constructs the word dictionary composed of symptoms, effects, increases, and decreases, and measures products' effects from online customer reviews. Using Naver Shopping Review datasets collected through crawling, we evaluated the performance of PEM compared to those of two methods using traditional sentiment dictionary and an RNN model, respectively. Our experimental results shows that the proposed technique outperforms the other two methods. In addition, by applying the proposed technique to the online customer reviews of atopic dermatitis and acne, effective treatments for them were found appeared on online social media. The proposed product effect analysis technique presented in this paper can be applied to various products and social media because it can score the effect of products from reviews of various media including blogs.

Disambiguation of Homograph Suffixes using Lexical Semantic Network(U-WIN) (어휘의미망(U-WIN)을 이용한 동형이의어 접미사의 의미 중의성 해소)

  • Bae, Young-Jun;Ock, Cheol-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.1
    • /
    • pp.31-42
    • /
    • 2012
  • In order to process the suffix derived nouns of Korean, most of Korean processing systems have been registering the suffix derived nouns in dictionary. However, this approach is limited because the suffix is very high productive. Therefore, it is necessary to analyze semantically the unregistered suffix derived nouns. In this paper, we propose a method to disambiguate homograph suffixes using Korean lexical semantic network(U-WIN) for the purpose of semantic analysis of the suffix derived nouns. 33,104 suffix derived nouns including the homograph suffixes in the morphological and semantic tagged Sejong Corpus were used for experiments. For the experiments first of all we semantically tagged the homograph suffixes and extracted root of the suffix derived nouns and mapped the root to nodes in the U-WIN. And we assigned the distance weight to the nodes in U-WIN that could combine with each homograph suffix and we used the distance weight for disambiguating the homograph suffixes. The experiments for 35 homograph suffixes occurred in the Sejong corpus among 49 homograph suffixes in a Korean dictionary result in 91.01% accuracy.

EVALUATION OF THE SYNTHETIC SPEECH QUALITY BY THE TD-PCULI METHOD

  • Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok;Kwon, Ki-Hyung;Chin, Yong-Ohk
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.977-983
    • /
    • 1994
  • In this paper we have evaluated the synthetic speech quality by the proposed TD-PCULI speech synthesis method. For the synthesis we have extracted parameters from the Korean monosyllables through the analysis of speech waveforms in the time domain. We have constructed the Korean data format dictionary for the synthesis-by-rule depending upon the frequencies of the Korean pronunciation large vocabulary dictionary, in which V type syllables are 19, CV type's are 80, VC type's are 30 and CVC type's are 100. And using them we have synthesized various Korean monosyllables, words and sentences. We have tested each 10 syllables selected according to the 4 Korean syllable types with the objective MOS(Mean Opinion Score) evluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. And also we have tested the possibility to modify a duration and F0 into another forms with changing a duration (i.e., 150msec, 300msec, 500msec, 700msec and 1sec) and a central fundamental frequency(i.e., 80Hz, 118Hz, 140Hz, 170Hz, and 200Hz). As the results of experiments the noises occurred in the course of synthesizing the speech by the rules are removed to be a very clear level and we can find that the prosodic elements can be controled as a good condition.

  • PDF

Development of the Rule-based Smart Tourism Chatbot using Neo4J graph database

  • Kim, Dong-Hyun;Im, Hyeon-Su;Hyeon, Jong-Heon;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.2
    • /
    • pp.179-186
    • /
    • 2021
  • We have been developed the smart tourism app and the Instagram and YouTube contents to provide personalized tourism information and travel product information to individual tourists. In this paper, we develop a rule-based smart tourism chatbot with the khaiii (Kakao Hangul Analyzer III) morphological analyzer and Neo4J graph database. In the proposed chatbot system, we use a morpheme analyzer, a proper noun dictionary including tourist destination names, and a general noun dictionary including containing frequently used words in tourist information search to understand the intention of the user's question. The tourism knowledge base built using the Neo4J graph database provides adequate answers to tourists' questions. In this paper, the nodes of Neo4J are Area based on tourist destination address, Contents with property of tourist information, and Service including service attribute data frequently used for search. A Neo4J query is created based on the result of analyzing the intention of a tourist's question with the property of nodes and relationships in Neo4J database. An answer to the question is made by searching in the tourism knowledge base. In this paper, we create the tourism knowledge base using more than 1300 Jeju tourism information used in the smart tourism app. We plan to develop a multilingual smart tour chatbot using the named entity recognition (NER), intention classification using conditional random field(CRF), and transfer learning using the pretrained language models.

Determination of Intrusion Log Ranking using Inductive Inference (귀납 추리를 이용한 침입 흔적 로그 순위 결정)

  • Ko, Sujeong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.1-8
    • /
    • 2019
  • Among the methods for extracting the most appropriate information from a large amount of log data, there is a method using inductive inference. In this paper, we use SVM (Support Vector Machine), which is an excellent classification method for inductive inference, in order to determine the ranking of intrusion logs in digital forensic analysis. For this purpose, the logs of the training log set are classified into intrusion logs and normal logs. The associated words are extracted from each classified set to generate a related word dictionary, and each log is expressed as a vector based on the generated dictionary. Next, the logs are learned using the SVM. We classify test logs into normal logs and intrusion logs by using the log set extracted through learning. Finally, the recommendation orders of intrusion logs are determined to recommend intrusion logs to the forensic analyst.

Detection of Adverse Drug Reactions Using Drug Reviews with BERT+ Algorithm (BERT+ 알고리즘 기반 약물 리뷰를 활용한 약물 이상 반응 탐지)

  • Heo, Eun Yeong;Jeong, Hyeon-jeong;Kim, Hyon Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.465-472
    • /
    • 2021
  • In this paper, we present an approach for detection of adverse drug reactions from drug reviews to compensate limitations of the spontaneous adverse drug reactions reporting system. Considering negative reviews usually contain adverse drug reactions, sentiment analysis on drug reviews was performed and extracted negative reviews. After then, MedDRA dictionary and named entity recognition were applied to the negative reviews to detect adverse drug reactions. For the experiment, drug reviews of Celecoxib, Naproxen, and Ibuprofen from 5 drug review sites, and analyzed. Our results showed that detection of adverse drug reactions is able to compensate to limitation of under-reporting in the spontaneous adverse drugs reactions reporting system.

Methodology of Automatic Editing for Academic Writing Using Bidirectional RNN and Academic Dictionary (양방향 RNN과 학술용어사전을 이용한 영문학술문서 교정 방법론)

  • Roh, Younghoon;Chang, Tai-Woo;Won, Jongwun
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.175-192
    • /
    • 2022
  • Artificial intelligence-based natural language processing technology is playing an important role in helping users write English-language documents. For academic documents in particular, the English proofreading services should reflect the academic characteristics using formal style and technical terms. But the services usually does not because they are based on general English sentences. In addition, since existing studies are mainly for improving the grammatical completeness, there is a limit of fluency improvement. This study proposes an automatic academic English editing methodology to deliver the clear meaning of sentences based on the use of technical terms. The proposed methodology consists of two phases: misspell correction and fluency improvement. In the first phase, appropriate corrective words are provided according to the input typo and contexts. In the second phase, the fluency of the sentence is improved based on the automatic post-editing model of the bidirectional recurrent neural network that can learn from the pair of the original sentence and the edited sentence. Experiments were performed with actual English editing data, and the superiority of the proposed methodology was verified.

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

Analysis of the Yearbook from the Korea Meteorological Administration using a text-mining agorithm (텍스트 마이닝 알고리즘을 이용한 기상청 기상연감 자료 분석)

  • Sun, Hyunseok;Lim, Changwon;Lee, YungSeop
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.603-613
    • /
    • 2017
  • Many people have recently posted about personal interests on social media. The development of the Internet and computer technology has enabled the storage of digital forms of documents that has resulted in an explosion of the amount of textual data generated; subsequently there is an increased demand for technology to create valuable information from a large number of documents. A text mining technique is often used since text-based data is mostly composed of unstructured forms that are not suitable for the application of statistical analysis or data mining techniques. This study analyzed the Meteorological Yearbook data of the Korea Meteorological Administration (KMA) with a text mining technique. First, a term dictionary was constructed through preprocessing and a term-document matrix was generated. This term dictionary was then used to calculate the annual frequency of term, and observe the change in relative frequency for frequently appearing words. We also used regression analysis to identify terms with increasing and decreasing trends. We analyzed the trends in the Meteorological Yearbook of the KMA and analyzed trends of weather related news, weather status, and status of work trends that the KMA focused on. This study is to provide useful information that can help analyze and improve the meteorological services and reflect meteorological policy.

A study on procedures of search and seize in digital data

  • Kim, Woon Go
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.2
    • /
    • pp.133-139
    • /
    • 2017
  • Today, the activities of individuals and corporations are dependent not only on digital technology but also on the future of society, which is referred to as the fourth industrial revolution. Since the traces that arise from the crimes that occur in the digital society are also inevitably developed into a society that should be found in the digital, the judicial dependence of judging by the digital evidence is inevitably increased in the criminal procedure. On the other hand, considering the fact that many users are using virtual shared computing resources of service providers considering the fact that they are being converted into a cloud computing environment system, searching for evidence in cloud computing resources is not related to crime. The possibility of infringing on the basic rights of the criminal procedure is increased, so that the ability of evidence of digital data which can be used in the criminal procedure is limited. Therefore, considering these two aspects of digital evidence, this point should be fully taken into account in judging the evidence ability in the post-seizure warrant issuance and execution stage as well as the pre-emptive control. There is a view that dictionary control is useless, but it needs to be done with lenient control in order to materialize post-modern control through judging ability of evidence. In other words, more efforts are needed than ever before, including legislation to ensure proper criminal procedures in line with the digital age.