• Title/Summary/Keyword: topic extraction

Search Result 123, Processing Time 0.023 seconds

An Evaluation of Applying Knowledge Base to Academic Information Service

  • Lee, Seok-Hyoung;Kim, Hwan-Min;Choe, Ho-Seop
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.3 no.1
    • /
    • pp.81-95
    • /
    • 2013
  • Through a series of precise text handling processes, including automatic extraction of information from documents with knowledge from various fields, recognition of entity names, detection of core topics, analysis of the relations between the extracted information and topics, and automatic inference of new knowledge, the most efficient knowledge base of the relevant field is created, and plans to apply these to the information knowledge management and service are the core requirements necessary for intellectualization of information. In this paper, the knowledge base, which is a necessary core resource and comprehensive technology for intellectualization of science and technology information, is described and the usability of academic information services using it is evaluated. The knowledge base proposed in this article is an amalgamation of information expression and knowledge storage, composed of identifying code systems from terms to documents, by integrating terminologies, word intelligent networks, topic networks, classification systems, and authority data.

Multi-Topic Meeting Summarization using Lexical Co-occurrence Frequency and Distribution (어휘의 동시 발생 빈도와 분포를 이용한 다중 주제 회의록 요약)

  • Lee, Byung-Soo;Lee, Jee-Hyong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.07a
    • /
    • pp.13-16
    • /
    • 2015
  • 본 논문에서는 어휘의 동시 발생 (co-occurrence) 빈도와 분포를 이용한 회의록 요약방법을 제안한다. 회의록은 일반 문서와 달리 문서에 여러 세부적인 주제들이 나타나며, 잘못된 형식의 문장, 불필요한 잡담들을 포함하고 있기 때문에 이러한 특징들이 문서요약 과정에서 고려되어야 한다. 기존의 일반적인 문서요약 방법은 하나의 주제를 기반으로 문서 전체에서 가장 중요한 문장으로 요약하기 때문에 다중 주제 회의록 요약에는 적합하지 않다. 제안한 방법은 먼저 어휘의 동시 발생 (co-occurrence) 빈도를 이용하여 회의록 분할 (segmentation) 과정을 수행한다. 다음으로 주제의 구분에 따라 분할된 각 영역 (block)의 중요 단어 집합 생성, 중요 문장 추출 과정을 통해 회의록의 중요 문장들을 선별한다. 마지막으로 추출된 중요 문장들의 위치, 종속 관계를 고려하여 최종적으로 회의록을 요약한다. AMI meeting corpus를 대상으로 실험한 결과, 제안한 방법이 baseline 요약 방법들보다 요약 비율에 따른 평가 및 요약문의 세부 주제별 평가에서 우수한 요약 성능을 보임을 확인하였다.

  • PDF

Automatic Payload Signature Update System for the Classification of Dynamically Changing Internet Applications

  • Shim, Kyu-Seok;Goo, Young-Hoon;Lee, Dongcheul;Kim, Myung-Sup
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1284-1297
    • /
    • 2019
  • The network environment is presently becoming very increased. Accordingly, the study of traffic classification for network management is becoming difficult. Automatic signature extraction system is a hot topic in the field of traffic classification research. However, existing automatic payload signature generation systems suffer problems such as semi-automatic system, generating of disposable signatures, generating of false-positive signatures and signatures are not kept up to date. Therefore, we provide a fully automatic signature update system that automatically performs all the processes, such as traffic collection, signature generation, signature management and signature verification. The step of traffic collection automatically collects ground-truth traffic through the traffic measurement agent (TMA) and traffic management server (TMS). The step of signature management removes unnecessary signatures. The step of signature generation generates new signatures. Finally, the step of signature verification removes the false-positive signatures. The proposed system can solve the problems of existing systems. The result of this system to a campus network showed that, in the case of four applications, high recall values and low false-positive rates can be maintained.

Determining plasma boundary in Alvand-U tokamak

  • Yahya Sadeghi
    • Nuclear Engineering and Technology
    • /
    • v.55 no.9
    • /
    • pp.3485-3492
    • /
    • 2023
  • One of the major topic of tokamak research is the determination of the magnetic profile due to magnetic coil fields and plasma current by mean of data from magnetic probes. The most practical approach is to use the current filament method, which models the plasma column with multiple current carrying filaments and the total current of these filaments is equal to the plasma current. Determining the plasma boundary in Alvand-U tokamak is the main purpose of this paper. In order to determine the magnetic field profile and plasma boundary, information concerning the magnetic coils, their position, and current is required in the computing code. Then, the plasma shape is determined and finally the plasma boundary is extracted by the code. In the conducted research, we discuss how to determine the plasma boundary and the performance of the computing code for extraction of the plasma boundary. The developed algorithm shows to be effective by running it in the regular pc machine with characteristics of Intel (R) core (TM) i3-10100 CPU @3.60 GHz and 8.00 GB of RAM. Finally, we present results of a test run for computing code using a typical experimental pulse.

Depth tracking of occluded ships based on SIFT feature matching

  • Yadong Liu;Yuesheng Liu;Ziyang Zhong;Yang Chen;Jinfeng Xia;Yunjie Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.4
    • /
    • pp.1066-1079
    • /
    • 2023
  • Multi-target tracking based on the detector is a very hot and important research topic in target tracking. It mainly includes two closely related processes, namely target detection and target tracking. Where target detection is responsible for detecting the exact position of the target, while target tracking monitors the temporal and spatial changes of the target. With the improvement of the detector, the tracking performance has reached a new level. The problem that always exists in the research of target tracking is the problem that occurs again after the target is occluded during tracking. Based on this question, this paper proposes a DeepSORT model based on SIFT features to improve ship tracking. Unlike previous feature extraction networks, SIFT algorithm does not require the characteristics of pre-training learning objectives and can be used in ship tracking quickly. At the same time, we improve and test the matching method of our model to find a balance between tracking accuracy and tracking speed. Experiments show that the model can get more ideal results.

Prisma Statement: The Strategic Advantages and Disadvantages of Foreign Direct Investments (FDI)

  • Phouthakannha NANTHARATH
    • The Journal of Industrial Distribution & Business
    • /
    • v.14 no.10
    • /
    • pp.1-9
    • /
    • 2023
  • Purpose: In an increasingly globalized world, foreign direct investment (FDI) plays an essential role in the economic improvement of countries. This study aims to delve into the topic of overseas direct investment (FDI) and offer a complete analysis of its strategic advantages and disadvantages. By thoroughly examining the present literature, this study aims to discover and explore the diverse advantages and drawbacks. Research design, data and methodology: The information analysis system systematically and rigorously examined the selected studies. The evaluation will follow a thematic technique in which conventional subject matters and styles associated with FDI's strategic benefits and downsides can be recognized and synthesized. Data extraction contained relevant facts from the chosen research, along with the study objectives. Results: This study provides the findings of the, which explores the strategic advantages and disadvantages of foreign direct investments (FDI) primarily based on the evaluation of previous research. A comprehensive review of the identified benefits and drawbacks highlights their implications for businesses engaged in FDI. Conclusions: In sum, the findings offer valuable insights for practitioners, guiding their decision-making methods in the international commercial enterprise landscape. Organizations can function for fulfillment and sustainable development in the global marketplace by leveraging the advantages and effectively managing demanding situations.

How Chinese Online Media Users Respond to Carbon Neutrality: A Quantitative Textual Analysis of Comments on Bilibili, a Chinese Video Sharing Platform

  • Zha Yiru
    • Asian Journal for Public Opinion Research
    • /
    • v.11 no.2
    • /
    • pp.145-162
    • /
    • 2023
  • This research investigates how users of Bilibili, a video sharing website based in China have responded to carbon neutrality. By conducting quantitative textual analyses on 3,311 comments on Bilibili using LDA topic extraction and content statistics, this research discovers that: (1) Bilibili users have assigned more weight to geopolitical topics (56.3%) than energy (22.0%) and environmental topics (21.7%). (2) When assessing carbon neutrality, Bilibili users considered geopolitical (53.8%) and energy factors (15.8%) more heavily than factors related to the class (9.2%), economy (8.9%), environment (8.7%), and definition (3.6%). (3) More Bilibili users had negative (64.6%) attitudes towards carbon neutrality, with only a small portion of them expressing positive (26.8%) and neutral (8.6%) attitudes. (4) Negative attitudes towards carbon neutrality were mainly driven by geopolitical concerns about the West's approach to China, other countries' free-riding on China's efforts and the West's manipulation of rules, doubts about the feasibility of energy transition and suspicion of capitalists exploiting consumers through this concept. This research highlights the geopolitical concerns behind the environmental attitudes of Chinese people, deepening our understanding to psychological constructs and crisis sensitivity of Chinese people towards environmental issues.

Analysis of Nitrosamines Concentration in Condom by using LC-MS/MS (LC-MS/MS를 이용한 콘돔에 함유된 니트로사민류 농도 분석)

  • Park, Na-Youn;Kim, Sungmin;Jung, Woong;Kho, Younglim
    • Journal of the Korean Chemical Society
    • /
    • v.62 no.3
    • /
    • pp.181-186
    • /
    • 2018
  • Nitrosamines are the nitrosocompounds which are produced by nitrosation reactions of the secondary amine and nitrite, and has been found to be produced through the vulcanization process during the production of rubber products Recently, nitrosamines have been detected in rubber products and become a major topic. Condoms are disposable medical devices, so safety is important because they come into direct contact with the skin and mucous membranes. In this study, we developed an analytical method for nitrosamines in condoms by applying ISO 29941 method. The samples were eluted by distilled water, and target compounds were extracted by liquid-liquid extraction with dichloromethane. And then after concentrated, and quantitatively analyzed by LC-MS/MS. The accuracies of the analytical method were ranged from 85.8 to 108.7%, precisions were lower than 11.5%, and the detection limits were from 0.11 (NDPA and NDBA) to 0.48 (NPYR) ng/mL. Among the 31 condom samples, NDBA was detected from 2 cases by extraction of distilled water, and NDMA were detected from 1 case, NDEA from 4 cases and NDBA from 26 cases by extraction of artificial saliva (pH 4.5). The total amount of nitrosamines in all samples were less than $500{\mu}g/kg$.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.