• Title/Summary/Keyword: Academic Text

Search Result 356, Processing Time 0.025 seconds

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Exploring trends in blockchain publications with topic modeling: Implications for forecasting the emergence of industry applications

  • Jeongho Lee;Hangjung Zo;Tom Steinberger
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.982-995
    • /
    • 2023
  • Technological innovation generates products, services, and processes that can disrupt existing industries and lead to the emergence of new fields. Distributed ledger technology, or blockchain, offers novel transparency, security, and anonymity characteristics in transaction data that may disrupt existing industries. However, research attention has largely examined its application to finance. Less is known of any broader applications, particularly in Industry 4.0. This study investigates academic research publications on blockchain and predicts emerging industries using academia-industry dynamics. This study adopts latent Dirichlet allocation and dynamic topic models to analyze large text data with a high capacity for dimensionality reduction. Prior studies confirm that research contributes to technological innovation through spillover, including products, processes, and services. This study predicts emerging industries that will likely incorporate blockchain technology using insights from the knowledge structure of publications.

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

An Exploratory Study of e-Learning Satisfaction: A Mixed Methods of Text Mining and Interview Approaches (이러닝 만족도 증진을 위한 탐색적 연구: 텍스트 마이닝과 인터뷰 혼합방법론)

  • Sun-Gyu Lee;Soobin Choi;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.21 no.1
    • /
    • pp.39-59
    • /
    • 2019
  • E-learning has improved the educational effect by making it possible to learn anytime and anywhere by escaping the traditional infusion education. As the use of e-learning system increases with the increasing popularity of e-learning, it has become important to measure e-learning satisfaction. In this study, we used the mixed research method to identify satisfaction factors of e-learning. The mixed research method is to perform both qualitative research and quantitative research at the same time. As a quantitative research, we collected reviews in Udemy.com by text mining. Then we classified high and low rated lectures and applied topic modeling technique to derive factors from reviews. Also, this study conducted an in-depth 1:1 interview on e-learning learners as a qualitative research. By combining these results, we were able to derive factors of e-learning satisfaction and dissatisfaction. Based on these factors, we suggested ways to improve e-learning satisfaction. In contrast to the fact that survey-based research was mainly conducted in the past, this study collects actual data by text mining. The academic significance of this study is that the results of the topic modeling are combined with the factor based on the information system success model.

The development of the comics studies in Korea (우리나라 만화 연구 경향 분석과 향후 과제)

  • Lee, Sang-Min;Yim, Hak-Soon
    • Cartoon and Animation Studies
    • /
    • s.16
    • /
    • pp.1-20
    • /
    • 2009
  • This paper explores the research trends in the area of the comics studies in Korea. In this article, the 664 academic articles are examined in terms of the characteristics of the researchers, the field of comics studies, the research theme and the research methodology. This study is on the basis of the recognition that there have been no consensus on what the core essence of comics studies is. As a result, there are a few articles on the academic identity of the comics studies. The comics studies have not consider the distinctive characteristics of the Korean comics significantly. In Korea, over 50% of the academic articles on the comics have been published by comics scholarships in the field of pedagogy and human sciences. Since the 1990's, comics studies have started to consider the value of the comics positively. The comics text studies also have increased since the 1990's in Korea. The comics studies on the comics policy and comics industries have been increased since 2000. The rise of comics studies is concomitant with the increased awareness of comics in Korea. The article concludes that the comics studies need to become an independent academic discipline in the future. The interdisciplinary studies on the comics is necessary to study the diverse aspects of the comics. In addition, the infrastructure for the comics studies should be established in order to improve the comics studies.

  • PDF

Research on the Usage of Electronic Information Resources of the Humanities Scholars in Korea (인문학자의 전자정보원 이용행태에 관한 연구)

  • Yoon, Cheong-Ok
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.2
    • /
    • pp.5-28
    • /
    • 2009
  • The purpose of this study is to investigate the use of the electronic information resources of humanities scholars in Korea and propose the planning of academic library and information services to serve their needs. To collect data, a postal survey was conducted during the period of November 2007 through January 2008. Out of 799 humanities scholars sampled from 25 universities, 132 responded with a completion rate of 16%. The major findings of this study are as follows: Firstly, the majority of humanities scholars distribute their time equally between research and education, and conduct independent research. Secondly, they use, to a certain degree, electronic information resources largely in text format, and depend upon the electronic collection of their academic libraries. Thirdly, with the exception of a couple of sources of electronic journal resources, the electronic resources that these humanities scholars regularly use vary so widely that none could be considered to be a common resource. Fourthly, they value the convenience of accessing and using electronic resources, but worry about the quality and scope of the contents. It is suggested that academic libraries (1) become the gateway for the electronic information that is available both inside and outside the library, (2) provide integrated search feature for and a 'single sign on' access to electronic resources, and (3) plan customized user education for specific subject fields in the humanities.

Automatic Generation of Bibliographic Metadata with Reference Information for Academic Journals (학술논문 내에서 참고문헌 정보가 포함된 서지 메타데이터 자동 생성 연구)

  • Jeong, Seonki;Shin, Hyeonho;Ji, Seon-Yeong;Choi, Sungphil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.3
    • /
    • pp.241-264
    • /
    • 2022
  • Bibliographic metadata can help researchers effectively utilize essential publications that they need and grasp academic trends of their own fields. With the manual creation of the metadata costly and time-consuming. it is nontrivial to effectively automatize the metadata construction using rule-based methods due to the immoderate variety of the article forms and styles according to publishers and academic societies. Therefore, this study proposes a two-step extraction process based on rules and deep neural networks for generating bibliographic metadata of scientific articlles to overcome the difficulties above. The extraction target areas in articles were identified by using a deep neural network-based model, and then the details in the areas were analyzed and sub-divided into relevant metadata elements. IThe proposed model also includes a model for generating reference summary information, which is able to separate the end of the text and the starting point of a reference, and to extract individual references by essential rule set, and to identify all the bibliographic items in each reference by a deep neural network. In addition, in order to confirm the possibility of a model that generates the bibliographic information of academic papers without pre- and post-processing, we conducted an in-depth comparative experiment with various settings and configurations. As a result of the experiment, the method proposed in this paper showed higher performance.

Text Mining and Association Rules Analysis to a Self-Introduction Letter of Freshman at Korea National College of Agricultural and Fisheries (1) (한국농수산대학 신입생 자기소개서의 텍스트 마이닝과 연관규칙 분석 (1))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.22 no.1
    • /
    • pp.113-129
    • /
    • 2020
  • In this study we examined the topic analysis and correlation analysis by text mining to extract meaningful information or rules from the self introduction letter of freshman at Korea National College of Agriculture and Fisheries in 2020. The analysis items are described in items related to 'academic' and 'in-school activities' during high school. In the text mining results, the keywords of 'academic' items were 'study', 'thought', 'effort', 'problem', 'friend', and the key words of 'in-school activities' were 'activity', 'thought', 'friend', 'club', 'school' in order. As a result of the correlation analysis, the key words of 'thinking', 'studying', 'effort', and 'time' played a central role in the 'academic' item. And the key words of 'in-school activities' were 'thought', 'activity', 'school', 'time', and 'friend'. The results of frequency analysis and association analysis were visualized with word cloud and correlation graphs to make it easier to understand all the results. In the next study, TF-IDF(Term Frequency-Inverse Document Frequency) analysis using 'frequency of keywords' and 'reverse of document frequency' will be performed as a method of extracting key words from a large amount of documents.

Science and Technology Policy Studies, Society, and the State : An Analysis of a Co-evolution Among Social Issue, Governmental Policy, and Academic Research in Science and Technology (과학기술정책 연구와 사회, 정부 : 과학기술의 사회이슈, 정부정책, 학술연구의 공진화 분석)

  • Kwon, Ki-Seok;Jeong, Seohwa;Yi, Chan-Goo
    • Journal of Korea Technology Innovation Society
    • /
    • v.21 no.1
    • /
    • pp.64-91
    • /
    • 2018
  • This study explores the interactive pattern among social issue, academic research, and governmental policy on science and technology during the last 20 years. In particular, we try understand wether the science and technology policy research and governmental policy meets social needs appropriately. In order to do this, we have collected text data from news articles, papers, and governmental documents. Based on these data, social network analysis and cluster analysis has been carried out. According to the results, we have found that science and technology policy researches tend to focus on fragmented technological innovation meeting urgent practical needs at the initial stage. However, recently, the main characteristics of science and technology policy research shows co-evolutionary patterns responding to society. Furthermore, time lag also has been observed in the process of interaction among the three bodies. Based on these results, we put forward some suggestions for upcoming researches in science and technology policy. Firstly, analysis levels are needed to be shifted from micro level to mezo or macro level. Secondly, more research efforts are required to be focused on policy process in science technology and its public management. Finally, we have to enhance the sensitiveness to social issues through studies on agenda setting in science and technology policy.