• Title/Summary/Keyword: keyword extraction

Search Result 189, Processing Time 0.023 seconds

A Corpus Construction System of Consistent Document Categorization and Keyword Extraction (일관성 있는 문서분류 및 키워드 추출을 위한 말뭉치 구축도구)

  • Jeong, Jae-Cheol;Park, So-Young;Chang, Ju-No;Kihl, Tae-Suk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.675-676
    • /
    • 2010
  • As the number of documents rapidly increases in the web environment, the efficient document classification approaches have been required to retrieve the desired information from too many documents. In this paper, we propose a corpus construction tool to annotate document classification information such as category, keywords, and usage to each product description document. The proposed tool can help a human annotator to correctly identify this information by providing the verification step to check the input results of other human annotators. Also, the human annotator can construct the corpus anytime anywhere by using the web-based proposed system.

  • PDF

Perception and Trend Differences between Korea, China, and the US on Vegan Fashion -Using Big Data Analytics- (빅데이터를 이용한 비건 패션 쟁점의 분석 -한국, 중국, 미국을 중심으로-)

  • Jiwoon Jeong;Sojung Yun
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.5
    • /
    • pp.804-821
    • /
    • 2023
  • This study examines current trends and perceptions of veganism and vegan fashion in Korea, China, and the United States. Using big data tools Textom and Ucinet, we conducted cluster analysis between keywords. Further, frequency analysis using keyword extraction and CONCOR analysis obtained the following results. First, the nations' perceptions of veganism and vegan fashion differ significantly. Korea and the United States generally share a similar understanding of vegan fashion. Second, the industrial structures, such as products and businesses, impacted how Korea perceived veganism. Third, owing to its ongoing sociopolitical tensions, the United States views veganism as an ethical consumption method that ties into activism. In contrast, China views veganism as a healthy diet rather than a lifestyle and associates it with Buddhist vegetarianism. This perception is because of their religious history and culinary culture. Fundamentally, this study is meaningful for using big data to extract keywords related to vegan fashion in Korea, China, and the United States. This study deepens our understanding of vegan fashion by comparing perceptions across nations.

Text Analysis on the Research Trends of Nature Restoration in Korea (텍스트 분석을 활용한 국내 자연환경복원 연구동향 분석)

  • Lee, Gil-sang;Jung, Yee-rim;Song, Young-keun;Lee, Sang-hyuk;Son, Seung-Woo
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.27 no.2
    • /
    • pp.29-42
    • /
    • 2024
  • As a global response to climate and biodiversity challenges, there is an emphasis on the conservation and restoration of ecosystems that can simultaneously reduce carbon emissions and enhance biodiversity. This study comprised a text analysis and keyword extraction of 1,100 research papers addressing nature restoration in Korea, aiming to provide a quantative and systematic evaluation of domestic research trends in this field. To discern the major research topics of these papers, topic modeling was applied and correlations were established through network analysis. Research on nature restoration exhibited a mainly upward trend in 2002-2022 but with a slight recent decline. The most common keywords were "species," "forest," and "water". Research topics were broadly classified into (1) predictions of habitat size and species distribution, (2) the conservation and utilization of natural resources in urban areas, (3) ecosystems and landscape managements in protected areas, (4) the planting and growth of vegetation, and (5) habitat formation methods. The number of studies on nature restoration are increasing across various domains in Korea, with each domain experiencing professional development.

Development and Performance Analysis of a Cultural Heritage Search Application Utilizing Image Recognition (이미지 인식을 활용한 문화유산 검색 어플리케이션 개발)

  • Hyun-Ji Kim;Tae-Hyun Shin;Hyun-Bin Jeong;Da-Hyun Kim;Jai-Soon Baek;Yong-Han Yu;Sung-Jin Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.181-183
    • /
    • 2024
  • 본 논문은 이미지 인식, 지도 기반 검색, 그리고 키워드 검색을 활용한 문화유산 검색 어플리케이션의 개발과 성능 분석에 대한 연구를 다룬다. 우리는 이러한 다양한 기술과 기능을 결합하여 사용자에게 맞춤형 문화유산 정보를 제공하는 어플리케이션을 설계하고 구현하였다. 더불어, 어플리케이션의 성능을 평가하고 향상시키기 위한 실험과 분석을 수행하였다. 연구 결과, 이미지 인식 및 지도 기반 검색을 활용한 어플리케이션은 문화유산 관련 정보를 빠르고 정확하게 제공함으로써 사용자의 경험을 향상시킬 수 있음을 확인하였다. 이러한 연구는 문화유산 검색 어플리케이션의 개발과 성능 향상을 위한 중요한 기여를 제공할 것으로 기대된다.

  • PDF

Export Control System based on Case Based Reasoning: Design and Evaluation (사례 기반 지능형 수출통제 시스템 : 설계와 평가)

  • Hong, Woneui;Kim, Uihyun;Cho, Sinhee;Kim, Sansung;Yi, Mun Yong;Shin, Donghoon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.109-131
    • /
    • 2014
  • As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

A Study on the Development of Dynamic Models under Inter Port Competition (항만의 경쟁상황을 고려한 동적모형 개발에 관한 연구)

  • 여기태;이철영
    • Journal of the Korean Institute of Navigation
    • /
    • v.23 no.1
    • /
    • pp.75-84
    • /
    • 1999
  • Although many studies on modelling of port competitive situation have been conducted, both theoretical frame and methodology are still very weak. In this study, therefore, a new algorithm called ESD (Extensional System Dynamics) for the evaluation of port competition was presented, and applied to simulate port systems in northeast asia. The detailed objectives of this paper are to develop Unit fort Model by using SD(System Dynamics) method; to develop Competitive Port Model by ESD method; to perform sensitivity analysis by altering parameters, and to propose port development strategies. For these the algorithm for the evaluation of part's competition was developed in two steps. Firstly, SD method was adopted to develop the Unit Port models, and secondly HFP(Hierarchical Fuzzy Process) method was introduced to expand previous SD method. The proposed models were then developed and applied to the five ports - Pusan, Kobe, Yokohama, Kaoshiung, Keelung - with real data on each ports, and several findings were derived. Firstly, the extraction of factors for Unit Port was accomplished by consultation of experts such as research worker, professor, research fellows related to harbor, and expert group, and finally, five factor groups - location, facility, service, cargo volumes, and port charge - were obtained. Secondly, system's structure consisting of feedback loop was found easily by location of representative and detailed factors on keyword network of STGB map. Using these keyword network, feedback loop was found. Thirdly, for the target year of 2003, the simulation for Pusan port revealed that liner's number would be increased from 829 ships to 1,450 ships and container cargo volumes increased from 4.56 million TEU to 7.74 million TEU. It also revealed that because of increased liners and container cargo volumes, length of berth should be expanded from 2,162m to 4,729m. This berth expansion was resulted in the decrease of congested ship's number from 97 to 11. It was also found that port's charge had a fluctuation. Results of simulation for Kobe, Yokohama, Kaoshiung, Keelung in northeast asia were also acquired. Finally, the inter port competition models developed by ESB method were used to simulate container cargo volumes for Pusan port. The results revealed that under competitive situation container cargo volume was smaller than non-competitive situation, which means Pusan port is lack of competitive power to other ports. Developed models in this study were then applied to estimate change of container cargo volumes in competitive relation by altering several parameters. And, the results were found to be very helpful for port mangers who are in charge of planning of port development.

  • PDF

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

  • Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.615-625
    • /
    • 2020
  • The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.

A Study on the Feature Point Extraction Methodology based on XML for Searching Hidden Vault Anti-Forensics Apps (은닉형 Vault 안티포렌식 앱 탐색을 위한 XML 기반 특징점 추출 방법론 연구)

  • Kim, Dae-gyu;Kim, Chang-soo
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.61-70
    • /
    • 2022
  • General users who use smartphone apps often use the Vault app to protect personal information such as photos and videos owned by individuals. However, there are increasing cases of criminals using the Vault app function for anti-forensic purposes to hide illegal videos. These apps are one of the apps registered on Google Play. This paper proposes a methodology for extracting feature points through XML-based keyword frequency analysis to explore Vault apps used by criminals, and text mining techniques are applied to extract feature points. In this paper, XML syntax was compared and analyzed using strings.xml files included in the app for 15 hidden Vault anti-forensics apps and non-hidden Vault apps, respectively. In hidden Vault anti-forensics apps, more hidden-related words are found at a higher frequency in the first and second rounds of terminology processing. Unlike most conventional methods of static analysis of APK files from an engineering point of view, this paper is meaningful in that it approached from a humanities and sociological point of view to find a feature of classifying anti-forensics apps. In conclusion, applying text mining techniques through XML parsing can be used as basic data for exploring hidden Vault anti-forensics apps.

Investigating an Automatic Method for Summarizing and Presenting a Video Speech Using Acoustic Features (음향학적 자질을 활용한 비디오 스피치 요약의 자동 추출과 표현에 관한 연구)

  • Kim, Hyun-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.191-208
    • /
    • 2012
  • Two fundamental aspects of speech summary generation are the extraction of key speech content and the style of presentation of the extracted speech synopses. We first investigated whether acoustic features (speaking rate, pitch pattern, and intensity) are equally important and, if not, which one can be effectively modeled to compute the significance of segments for lecture summarization. As a result, we found that the intensity (that is, difference between max DB and min DB) is the most efficient factor for speech summarization. We evaluated the intensity-based method of using the difference between max-DB and min-DB by comparing it to the keyword-based method in terms of which method produces better speech summaries and of how similar weight values assigned to segments by two methods are. Then, we investigated the way to present speech summaries to the viewers. As such, for speech summarization, we suggested how to extract key segments from a speech video efficiently using acoustic features and then present the extracted segments to the viewers.

Development of Survey to Inquire Continuity of English Curriculum betwenn Elementary and Middle School (초, 중등 영어 교과과정 연계성 인식 조사를 위한 설문도구 개발)

  • Won, Eun-Sok;Jeon, Young-Ju
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.8
    • /
    • pp.568-579
    • /
    • 2016
  • This study tried to design survey questions to inquire continuity in the English curriculum between elementary and middle school by integrating English teachers' opinions. To achieve this goal, the authors designed three-step development procedure. First, to set fundamental structure of the research, the researchers elicited main keywords by adopting Keyword Extraction toward precedent studies. Next, implemented a survey to 102 English teachers and analyzed their quantitative and qualitative responses. Among them, 22 teachers were selected as interviewee. Based on the results drawn from the pilot survey and the interview, this study suggested a survey, containing 40 questions designed to examine seven factors and 3 open-ended questions, to be utilized in researching cognition of continuity of the English curriculum.