• Title/Summary/Keyword: Document Frequency

Search Result 303, Processing Time 0.023 seconds

Terms Based Sentiment Classification for Online Review Using Support Vector Machine (Support Vector Machine을 이용한 온라인 리뷰의 용어기반 감성분류모형)

  • Lee, Taewon;Hong, Taeho
    • Information Systems Review
    • /
    • v.17 no.1
    • /
    • pp.49-64
    • /
    • 2015
  • Customer reviews which include subjective opinions for the product or service in online store have been generated rapidly and their influence on customers has become immense due to the widespread usage of SNS. In addition, a number of studies have focused on opinion mining to analyze the positive and negative opinions and get a better solution for customer support and sales. It is very important to select the key terms which reflected the customers' sentiment on the reviews for opinion mining. We proposed a document-level terms-based sentiment classification model by select in the optimal terms with part of speech tag. SVMs (Support vector machines) are utilized to build a predictor for opinion mining and we used the combination of POS tag and four terms extraction methods for the feature selection of SVM. To validate the proposed opinion mining model, we applied it to the customer reviews on Amazon. We eliminated the unmeaning terms known as the stopwords and extracted the useful terms by using part of speech tagging approach after crawling 80,000 reviews. The extracted terms gained from document frequency, TF-IDF, information gain, chi-squared statistic were ranked and 20 ranked terms were used to the feature of SVM model. Our experimental results show that the performance of SVM model with four POS tags is superior to the benchmarked model, which are built by extracting only adjective terms. In addition, the SVM model based on Chi-squared statistic for opinion mining shows the most superior performance among SVM models with 4 different kinds of terms extraction method. Our proposed opinion mining model is expected to improve customer service and gain competitive advantage in online store.

A Study on the "Terms of Reference" in the ICC Rules of Arbitration (ICC 중재규칙(ICC Rules of Arbitration)의 "위탁조건"(Terms of Reference)에 관한 연구)

  • Oh, Won-Suk
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.31
    • /
    • pp.81-106
    • /
    • 2006
  • The Terms of Reference are one of the most distictive features of ICC Arbitration. No document of this type is required to be drawn up under the rules of any of the other major international arbitration institutions. The purpose of this paper is to examine their advantages and to introduce main contents provided in Article 18 of ICC Rules of Arbitration, which results in the wide recognition of the Terms of Reference. As the volume of our international commercial transaction ranks almost ten in the world, the frequency using ICC Arbitration is expected to increase continuously. The Terms of Reference provide the parties and the arbitrators with an opportunity to identify and agree on procedural and other matters, such as the applicable law, the language of the arbitration and the timetable for the arbitration. They also afford the parties and the arbitrators to identify the substantive issues that are addressed in the arbitration and to delimit the precise scope of the Arbitract Tribunal's mandate. The contents of the Terms of Reference which are provided in Article 18(1) include the summary of parties claims, the list of issues and procedural rules. For the effects of the Terms of Reference, they are not intended to replace the parties' arbitration agreement. But they may in certain circumstances be regarded as a form of submission agreement. Article 18(2) provides that the Terms of Reference shall be signed by the parties and the Arbitral Tribunal, and requires the Arbitral Tribunal to transmit a signed copy of the Terms of Reference to the Court within two months of the date on which the file was transmitted to it by the Secretariat. The Court enjoys the power to extend the two-month time limit for the Terms of Reference on the reasoned request of the Arbitral Tribunal or on the Court's own initiative. Article 18(3) provides that if any of the parties refuses to take part in the drawing up of the Terms of Reference or to sign the same, they shall be submitted to the Court for approval. Article 18(4) allows the Arbitral Tribunal to extablish in a separate document a provisional timetable. This is a provision that encourages the acceleration of the arbitraction process. The timetable provided for therein is merely "provisional" and may be modified, as necessary, during the course of the arbitration.

  • PDF

Representative Labels Selection Technique for Document Cluster using WordNet (문서 클러스터를 위한 워드넷기반의 대표 레이블 선정 방법)

  • Kim, Tae-Hoon;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.18 no.2
    • /
    • pp.61-73
    • /
    • 2017
  • In this paper, we propose a Documents Cluster Labeling method using information content of words in clusters to understand what the clusters imply. To do so, we calculate the weight and frequency of the words. These two measures are used to determine the weight among the words in the cluster. As a nest step, we identify the candidate labels using the WordNet. At this time, the candidate labels are matched to least common hypernym of the words in the cluster. Finally, the representative labels are determined with respect to information content of the words and the weight of the words. To prove the superiority of our method, we perform the heuristic experiment using two kinds of measures, named the suitability of the candidate label ($Suitability_{cl}$) and the appropriacy of representative label ($Appropriacy_{rl}$). In applying the method proposed in this research, in case of suitability of the candidate label, it decreases slightly compared with existing methods, but the computational cost is about 20% of the conventional methods. And we confirmed that appropriacy of the representative label is better results than the existing methods. As a result, it is expected to help data analysts to interpret the document cluster easier.

Code for Unplanned Encounters at Sea(CUES): Its Limitation and Recommendations for Improvement (해상에서의 우발적 조우 시 신호 규칙(CUES)의 제한점과 개선을 위한 제언)

  • Oh, Dongkeon
    • Strategy21
    • /
    • s.44
    • /
    • pp.323-351
    • /
    • 2018
  • Adopted in Western Pacific Naval Symposium(WPNS) 2014, Code for Unplanned Encounters at Sea(CUES) has been the most valuable output of WPNS history. Written and suggested by Australian Navy in 1999, the goal of CUES is to decrease the possibility of the naval conflict by establishing the code among international navies in the Western Pacific region. Facing many oppositions and requirement of People's Liberation Army Navy(PLAN) in WPNS 2012 and 2013, but it finally adopted in WPNS 2014, with many changes in detailed provisions. From then, navies in the Western Pacific region have followed CUES to prevent maritime conflicts in the region, CUES, however, sometimes does not work correctly. Contents of CUES is the mixture of the parts of Multinational Maritime Tactical Signal and Maneuvering Book(MTP) and International Regulations for Preventing Collision at Sea 1972(CORLEGs). There are means of radio communications such as frequency and signals, instructions for maneuvering and so on. Thus, it is not a new document for the U.S. Navy and its allies, but it requires training to implicate at sea for navies other than U.S. allies, like PLAN. Lots of provisions in CUES were changed because of the opposition of PLAN, and CUES has many shortcomings and practical limitations. First, since CUES is non-legally binding, and there are no methods to force the naval assets on the sea to follow. Second, CUES is only applied to naval assets; naval ships - warships, naval auxiliaries, and submarines - and naval aircraft. Third, the geographical scope in CUES is not clear. Fourth, there is no provision for submerged submarines. Finally, CUES has no time-based framework or roadmap for training. In this regard, there would be six recommendations for improvement. First, CUES should be reviewed by WPNS or other international institutions, while keeping non-binding status so that WPNS could send signals to the navies which do not answer CUES on the sea. Second, the participation of Maritime Law Enforcements(MLEs) such as coast guard is inevitable. Third, navies would use full text of MTP rather than current CUES, which extracts some parts of MTP. Fourth, CUES needs provisions with respect to submerged submarines, which recognizes as offensive weapons themselves. Fifth, the geographic scope of CUES should be clear. Since there are some countries in which claim that a rock with a concrete structure is their territory, CUES should be applied on every sea including EEZ and territorial seas. Finally, the detailed training plan is required to implicate CUES at sea. Rim of the Pacific (RIMPAC) is a good exercise to train CUES, because almost all WPNS member countries except six countries are participating in RIMPAC. CUES is a meaningful document not only for navies but also for nation-states in the region. To prevent escalation of conflict in the region, potentially caused by an unplanned collision at sea, CUES should be applied more strictly. CUES will continue to be in subsequent WPNS and therefore continue to improve in the effectiveness as both an operational and diplomatic agreement.

A Study on Joseon Royal Cuisine through Sachanbalgi of the Jangseogak Archives - Focusing on Royal Birthday, Child birth, Weddings and Funerals- (장서각 소장 사찬발기를 통한 조선왕실의 사찬음식 연구 - 탄일, 출산, 가례, 상례를 중심으로 -)

  • Chung, Hae-Kyung;Shin, Dayeon;Woo, Nariyah
    • Journal of the Korean Society of Food Culture
    • /
    • v.34 no.5
    • /
    • pp.508-533
    • /
    • 2019
  • This study investigated the Sachanbalgi, which record the royal feasts given by the royal family of the Joseon Dynasty of Korea. These records are contained within the Gungjung Balgi, which recorded the types and quantity of items used in royal court ceremonies. The Eumsikbalgi is the general name for the records of food found within this document. Using these Eumsikbalgi, and in particular the Sachanbalgi, this study investigated the food eaten and bestowed by the Joseon royal family. The Sachanbalgi describes four categories or occasions of feasts: royal birthdays, childbirth, royal weddings, and funerals. These records allow us to reconstruct who the attendees were and what the table settings and food were for instances not directly indicated in oral records, books, or other documents. The food at these Sachan (feasts) was diverse, being related to the specific event, and its contents varied based on the position of the person who was receiving the food. Usually, Bab (rice) was not found at a Sachanbalgi, and only on two occasions were meals with Bab observed. Specifically, it was served with Gwaktang (seaweed soup) at a childbirth feast. There were seven kinds of soups and stews that appeared in the Sachanbalgi: Gwaktang, Yeonpo (octopus soup), Japtang (mixed food stew), Chogyetang (chilled chicken soup), Sinseonro (royal hot pot), and Yukjang (beef and soybean paste). Nureumjeok (grilled brochette) and Saengchijeok (pheasant), and Ganjeonyueo (pan-fried cow liver fillet) and Saengseonjeonyueo (pan-fried fish fillet) were eaten. Yangjeonyueo, Haejeon, Tigakjeon (pan-fried kelp) and other dishes, known and unknown, were also recorded. Boiled meat slices appeared at high frequency (40 times) in the records; likewise, 22 kinds of rice cake and traditional sweets were frequently served at feasts. Five kinds of non-alcoholic beverages were provided. Seasonal fruits and nuts, such as fresh pear or fresh chestnut, are thought to have been served following the event. In addition, a variety of dishes including salted dry fish, boiled dish, kimchi, fruit preserved in honey, seasoned vegetables, mustard seeds, fish, porridge, fillet, steamed dishes, stir-fried dishes, vegetable wraps, fruit preserved in sugar, and jellied foods were given to guests, and noodles appear 16 times in the records. Courtiers were given Banhap, Tanghap, Myeonhap, wooden bowls, or lunchboxes. The types of food provided at royal events tracked the season. In addition, considering that for feasts food of the royal household was set out for receptions of guests, cooking instructions for the food in the lunchbox-type feasts followed the cooking instructions used in the royal kitchen at the given time. Previous studies on royal cuisine have dealt mostly with the Jineosang presented to the king, but in the Sachanbalgi, the food given by the royal family to its relatives, retainers, and attendants is recorded. The study of this document is important because it extends the knowledge regarding the food of the royal families of the Joseon Dynasty. The analysis of Sachanbalgi and the results of empirical research conducted to reconstruct the precise nature of that food will improve modern knowledge of royal cuisine.

A Trend Analysis and Policy proposal for the Work Permit System through Text Mining: Focusing on Text Mining and Social Network analysis (텍스트마이닝을 통한 고용허가제 트렌드 분석과 정책 제안 : 텍스트마이닝과 소셜네트워크 분석을 중심으로)

  • Ha, Jae-Been;Lee, Do-Eun
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.9
    • /
    • pp.17-27
    • /
    • 2021
  • The aim of this research was to identify the issue of the work permit system and consciousness of the people on the system, and to suggest some ideas on the government policies on it. To achieve the aim of research, this research used text mining based on social data. This research collected 1,453,272 texts from 6,217 units of online documents which contained 'work permit system' from January to December, 2020 using Textom, and did text-mining and social network analysis. This research extracted 100 key words frequently mentioned from the analyses of data top-level key word frequency, and degree centrality analysis, and constituted job problem, importance of policy process, competitiveness in the respect of industries, and improvement of living conditions of foreign workers as major key words. In addition, through semantic network analysis, this research figured out major awareness like 'employment policy', and various kinds of ambient awareness like 'international cooperation', 'workers' human rights', 'law', 'recruitment of foreigners', 'corporate competitiveness', 'immigrant culture' and 'foreign workforce management'. Finally, this research suggested some ideas worth considering in establishing government policies on the work permit system and doing related researches.

Analysis of User Reviews of Running Applications Using Text Mining: Focusing on Nike Run Club and Runkeeper (텍스트마이닝을 활용한 러닝 어플리케이션 사용자 리뷰 분석: Nike Run Club과 Runkeeper를 중심으로)

  • Gimun Ryu;Ilgwang Kim
    • Journal of Industrial Convergence
    • /
    • v.22 no.4
    • /
    • pp.11-19
    • /
    • 2024
  • The purpose of this study was to analyze user reviews of running applications using text mining. This study used user reviews of Nike Run Club and Runkeeper in the Google Play Store using the selenium package of python3 as the analysis data, and separated the morphemes by leaving only Korean nouns through the OKT analyzer. After morpheme separation, we created a rankNL dictionary to remove stopwords. To analyze the data, we used TF, TF-IDF and LDA topic modeling in text mining. The results of this study are as follows. First, the keywords 'record', 'app', and 'workout' were identified as the top keywords in the user reviews of Nike Run Club and Runkeeper applications, and there were differences in the rankings of TF and TF-IDF. Second, the LDA topic modeling of Nike Run Club identified the topics of 'basic items', 'additional features', 'errors', and 'location-based data', and the topics of Runkeeper identified the topics of 'errors', 'voice function', 'running data', 'benefits', and 'motivation'. Based on the results, it is recommended that errors and improvements should be made to contribute to the competitiveness of the application.

Reliability improvement methods of AF track circuits for the train control system (열차내 연산시스템용 AF궤도회로 신뢰성향상 방안 연구)

  • Park, Jae-Young
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.10
    • /
    • pp.4762-4767
    • /
    • 2012
  • The AF track circuit that detecting train position and transmitting various train control data for DTG to the train on-board is composed of single operation system. If a failure occurs on this system, the driver should be operate the train by manually until the system is restored, because the system cannot control switch machines and signals by automatically. In this process the human error affects to the train delay, collision, derailment and critical safety accident. Therefore, this document has analyzed the effects that each failure mode influences on system and train, and quantified the failure valuation point and class. Basis on this quantified analysis result, MTBF increased and MTTR decreased and failure number also decreased by adopting the independent installation of power supply, the replacement of defected capacitors, the installation of resister cooling system and the improvement of maintenance methods. And the failure factors of AF track circuits were decreased by conducting the preventive maintenance which is a quantitative way of maintenance system by experience.

A Method for Information Source Selection using Teasaurus for Distributed Information Retrieval

  • Goto, Shoji;Ozono, Tadachika;Shintani, Toramatsu
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.272-277
    • /
    • 2001
  • In this paper, we describe a new method for selecting information sources in a distributed environment. Recently, there has been much research on distributed information retrieval, that is information retrieval (IR) based on a multi-database model in which the existence of multiple sources is modeled explicitly. In distributed IR, a method is needed that would enable selecting appropriate sources for users\` queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate ate sources if a query contains polysemous words. In this paper, we describe an information-source selection method using two types of thesaurus. One is a thesaurus automatically constructed from documents in a source. The other is a hand-crafted general-purpose thesaurus(e.g. WordNet). Terms used in documents in a source differ from one another and the meanings of a term differ depending on th situation in which the term is used. The difference is a characteristic of the source. In our method, the meanings of a term are distinguished between by the relationship between the term and other terms, and the relationship appear in the co-occurrence-based thesaurus. In this paper, we describe an algorithm for evaluating a usefulness of a source for a query based on a thesaurus. For a practical application of our method, we have developed Papits, a multi-agent-based in formation sharing system. An experiment of selection shows that our method is effective for selecting appropriate sources.

  • PDF

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation (단어 중의성 해소를 위한 지도학습 방법의 통계적 자질선정에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.22 no.2
    • /
    • pp.5-25
    • /
    • 2011
  • This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.