• Title/Summary/Keyword: Data dictionary

Search Result 346, Processing Time 0.026 seconds

Development and Validation of the Letter-unit based Korean Sentimental Analysis Model Using Convolution Neural Network (회선 신경망을 활용한 자모 단위 한국형 감성 분석 모델 개발 및 검증)

  • Sung, Wonkyung;An, Jaeyoung;Lee, Choong C.
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.13-33
    • /
    • 2020
  • This study proposes a Korean sentimental analysis algorithm that utilizes a letter-unit embedding and convolutional neural networks. Sentimental analysis is a natural language processing technique for subjective data analysis, such as a person's attitude, opinion, and propensity, as shown in the text. Recently, Korean sentimental analysis research has been steadily increased. However, it has failed to use a general-purpose sentimental dictionary and has built-up and used its own sentimental dictionary in each field. The problem with this phenomenon is that it does not conform to the characteristics of Korean. In this study, we have developed a model for analyzing emotions by producing syllable vectors based on the onset, peak, and coda, excluding morphology analysis during the emotional analysis procedure. As a result, we were able to minimize the problem of word learning and the problem of unregistered words, and the accuracy of the model was 88%. The model is less influenced by the unstructured nature of the input data and allows for polarized classification according to the context of the text. We hope that through this developed model will be easier for non-experts who wish to perform Korean sentimental analysis.

Automatic Merging of Distributed Topic Maps based on T-MERGE Operator (T-MERGE 연산자에 기반한 분산 토픽맵의 자동 통합)

  • Kim Jung-Min;Shin Hyo-Pil;Kim Hyoung-Joo
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.9
    • /
    • pp.787-801
    • /
    • 2006
  • Ontology merging describes the process of integrating two ontologies into a new ontology. How this is done best is a subject of ongoing research in the Semantic Web, Data Integration, Knowledge Management System, and other ontology-related application systems. Earlier research on ontology merging, however, has studied for developing effective ontology matching approaches but missed analyzing and solving methods of problems of merging two ontologies given correspondences between them. In this paper, we propose a specific ontology merging process and a generic operator, T-MERGE, for integrating two source ontologies into a new ontology. Also, we define a taxonomy of merging conflicts which is derived from differing representations between input ontologies and a method for detecting and resolving them. Our T-MERGE operator encapsulates the process of detection and resolution of conflicts and merging two entities based on given correspondences between them. We define a data structure, MergeLog, for logging the execution of T-MERGE operator. MergeLog is used to inform detailed results of execution of merging to users or recover errors. For our experiments, we used oriental philosophy ontologies, western philosophy ontologies, Yahoo western philosophy dictionary, and Naver philosophy dictionary as input ontologies. Our experiments show that the automatic merging module compared with manual merging by a expert has advantages in terms of time and effort.

A Study on the New Trends of EDI based Internet (인터넷을 기반으로 하는 EDI 신조류)

  • 조원길
    • The Journal of Information Technology
    • /
    • v.4 no.1
    • /
    • pp.125-139
    • /
    • 2001
  • EDI(Electronic Data Interchange) works by providing a collection of standard message formats and element dictionary in a simple way for businesses to exchange data via any electronic messaging service. Open-edi is electronic data interchange among autonomous parties using public standards and aiming towards interoperability over time, business sectors, information technology and data types. The number of Internet services using XML/EDI has grown rapidly since it is easily expansible and exchangeable. To use this service, the client does not have to install EDI S/W but only needs internet browser. Consequently, it became much easier and faster to handle the trading process in an office. eBusiness SML (extensible markup language) electronic data interchange. eXedi is the service that realizes B2B of XML/EDI. eXedi can be used easily in small and medium sized companies. Companies in any place can access to eXedi using the existing Internet connection. XML/EDI provides a standard framework to exchange different types of data -- for example, an invoice, healthcare claim, project status -- so that the information be it in a transaction, exchanged via an Application Program Interface (API), web automation, database portal, catalog, a workflow document or message can be searched, decoded, manipulated, and displayed consistently and correctly by first implementing EDI dictionaries and extending our vocabulary via on-line repositories to include our business language, rules and objects.

  • PDF

Text Mining for Korean: Characteristics and Application to 2011 Korean Economic Census Data (한국어 텍스트 마이닝의 특성과 2011 한국 경제총조사 자료에의 응용)

  • Goo, Juna;Kim, Kyunga
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1207-1217
    • /
    • 2014
  • 2011 Korean Economic Census is the first economic census in Korea, which contains text data on menus served by Korean-food restaurants as well as structured data on characteristics of restaurants including area, opening year and total sales. In this paper, we applied text mining to the text data and investigated statistical and technical issues and characteristics of Korean text mining. Pork belly roast was the most popular menu across provinces and/or restaurant types in year 2010, and the number of restaurants per 10000 people was especially high in Kangwon-do and Daejeon metropolitan city. Beef tartare and fried pork cutlet are popular menus in start-up restaurants while whole chicken soup and maeuntang (spicy fish stew) are in long-lived restaurants. These results can be used as a guideline for menu development to restaurant owners, and for government policy-making process that lead small restaurants to choose proper menus for successful business.

An Analysis of IT Proposal Evaluation Results using Big Data-based Opinion Mining (빅데이터 분석 기반의 오피니언 마이닝을 이용한 정보화 사업 평가 분석)

  • Kim, Hong Sam;Kim, Chong Su
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.1
    • /
    • pp.1-10
    • /
    • 2018
  • Current evaluation practices for IT projects suffer from several problems, which include the difficulty of self-explanation for the evaluation results and the improperly scaled scoring system. This study aims to develop a methodology of opinion mining to extract key factors for the causal relationship analysis and to assess the feasibility of quantifying evaluation scores from text comments using opinion mining based on big data analysis. The research has been performed on the domain of publicly procured IT proposal evaluations, which are managed by the National Procurement Service. Around 10,000 sets of comments and evaluation scores have been gathered, most of which are in the form of digital data but some in paper documents. Thus, more refined form of text has been prepared using various tools. From them, keywords for factors and polarity indicators have been extracted, and experts on this domain have selected some of them as the key factors and indicators. Also, those keywords have been grouped into into dimensions. Causal relationship between keyword or dimension factors and evaluation scores were analyzed based on the two research models-a keyword-based model and a dimension-based model, using the correlation analysis and the regression analysis. The results show that keyword factors such as planning, strategy, technology and PM mostly affects the evaluation result and that the keywords are more appropriate forms of factors for causal relationship analysis than the dimensions. Also, it can be asserted from the analysis that evaluation scores can be composed or calculated from the unstructured text comments using opinion mining, when a comprehensive dictionary of polarity for Korean language can be provided. This study may contribute to the area of big data-based evaluation methodology and opinion mining for IT proposal evaluation, leading to a more reliable and effective IT proposal evaluation method.

A Study on the Polarity of Apartment Price News Using Big Data Analysis Method (빅데이터 분석기법을 활용한 아파트 가격 관련 뉴스 기사의 극성 분석)

  • Cho, Sang-Yeon;Hong, Eun-Pyo
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.47-54
    • /
    • 2019
  • This study confirms the polarity of news articles on apartment prices using Opinion Mining which has widely been used for a big data analysis. The analyses were carried out utilizing internet news articles posted on the Naver for two years: 2012 and 2018. We proposed a sentiment analysis model and modeled a topic-oriented sentiment dictionary construction methods. As a result of analyzing the proposed sentiment analysis model, it was confirmed that there was a difference according to the tendency of the media companies in selecting social issues at the time of rising apartment prices. At the same time, we were able to find more affirmative articles in the media companies which share similar sentiment with the government in charge. In this paper, we proposed a sentiment analysis model that can be used in real estate field and analyzed the polarity of unformatted data related to real estate. In order to integrate them into various fields in the future, it is necessary to build the sentiment dictionaries by themes, as well as to collect various unformatted data over extended periods.

Development of Online Fashion Thesaurus and Taxonomy for Text Mining (텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축)

  • Seyoon Jang;Ha Youn Kim;Songmee Kim;Woojin Choi;Jin Jeong;Yuri Lee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.6
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Classification of Unstructured Customer Complaint Text Data for Potential Vehicle Defect Detection (잠재적 차량 결함 탐지를 위한 비정형 고객불만 텍스트 데이터 분류)

  • Ju Hyun Jo;Chang Su Ok;Jae Il Park
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.2
    • /
    • pp.72-81
    • /
    • 2023
  • This research proposes a novel approach to tackle the challenge of categorizing unstructured customer complaints in the automotive industry. The goal is to identify potential vehicle defects based on the findings of our algorithm, which can assist automakers in mitigating significant losses and reputational damage caused by mass claims. To achieve this goal, our model uses the Word2Vec method to analyze large volumes of unstructured customer complaint data from the National Highway Traffic Safety Administration (NHTSA). By developing a score dictionary for eight pre-selected criteria, our algorithm can efficiently categorize complaints and detect potential vehicle defects. By calculating the score of each complaint, our algorithm can identify patterns and correlations that can indicate potential defects in the vehicle. One of the key benefits of this approach is its ability to handle a large volume of unstructured data, which can be challenging for traditional methods. By using machine learning techniques, we can extract meaningful insights from customer complaints, which can help automakers prioritize and address potential defects before they become widespread issues. In conclusion, this research provides a promising approach to categorize unstructured customer complaints in the automotive industry and identify potential vehicle defects. By leveraging the power of machine learning, we can help automakers improve the quality of their products and enhance customer satisfaction. Further studies can build upon this approach to explore other potential applications and expand its scope to other industries.

The Sensitivity Analysis for Customer Feedback on Social Media (소셜 미디어 상 고객피드백을 위한 감성분석)

  • Song, Eun-Jee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.4
    • /
    • pp.780-786
    • /
    • 2015
  • Social media, such as Social Network Service include a lot of spontaneous opinions from customers, so recent companies collect and analyze information about customer feedback by using the system that analyzes Big Data on social media in order to efficiently operate businesses. However, it is difficult to analyze data collected from online sites accurately with existing morpheme analyzer because those data have spacing errors and spelling errors. In addition, many online sentences are short and do not include enough meanings which will be selected, so established meaning selection methods, such as mutual information, chi-square statistic are not able to practice Emotional Classification. In order to solve such problems, this paper suggests a module that can revise the meanings by using initial consonants/vowels and phase pattern dictionary and meaning selection method that uses priority of word class in a sentence. On the basis of word class extracted by morpheme analyzer, these new mechanisms would separate and analyze predicate and substantive, establish properties Database which is subordinate to relevant word class, and extract positive/negative emotions by using accumulated properties Database.

The Implementation of Database Building System for Korean Medical Paper Database (한의학술논문 데이터베이스 구축을 위한 입력 및 검수 시스템 개발)

  • Yea, Sang-Jun;Kim, Ik-Tae;Jang, Yun-Ji;Seong, Bo-Seok;Jang, Hyun-Chul;Kim, Sang-Kyun;Kim, An-Na;Song, Mi-Young;Kim, Chul
    • Korean Journal of Oriental Medicine
    • /
    • v.18 no.3
    • /
    • pp.141-146
    • /
    • 2012
  • Objectives : KIOM(Korean Institute of Oriental Medicine) built up korean medical paper database and services it through information portal OASIS. The database are updated about 1,600 papers and 48,000 references annually. Because lots of manpower and time are needed to update database, it is very important to raise up efficiency and quality of it. Methods : In this paper, we implemented web based database building system utilizing pre-built OASIS' database to improve the working process, data quality and ease of management. Results : First we designed and implemented web based system to input bibliography of the paper efficiently. It raised efficiency using OASIS' paper and reference database. Second we improved the refining process using web based system to raise up data quality. And third we developed the manager functions of web based system to control and check the working process. Conclusions : If we add korean medical dictionary and link outside paper database in the future, we hope that work efficiency and data quality will be raised more. And because the database schema of OASIS system and developed system are different, we are implementing the data transformation system.