• Title/Summary/Keyword: Natural Language Analysis

Search Result 523, Processing Time 0.028 seconds

Data-Driven Technology Portfolio Analysis for Commercialization of Public R&D Outcomes: Case Study of Big Data and Artificial Intelligence Fields (공공연구성과 실용화를 위한 데이터 기반의 기술 포트폴리오 분석: 빅데이터 및 인공지능 분야를 중심으로)

  • Eunji Jeon;Chae Won Lee;Jea-Tek Ryu
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.71-84
    • /
    • 2021
  • Since small and medium-sized enterprises fell short of the securement of technological competitiveness in the field of big data and artificial intelligence (AI) field-core technologies of the Fourth Industrial Revolution, it is important to strengthen the competitiveness of the overall industry through technology commercialization. In this study, we aimed to propose a priority related to technology transfer and commercialization for practical use of public research results. We utilized public research performance information, improving missing values of 6T classification by deep learning model with an ensemble method. Then, we conducted topic modeling to derive the converging fields of big data and AI. We classified the technology fields into four different segments in the technology portfolio based on technology activity and technology efficiency, estimating the potential of technology commercialization for those fields. We proposed a priority of technology commercialization for 10 detailed technology fields that require long-term investment. Through systematic analysis, active utilization of technology, and efficient technology transfer and commercialization can be promoted.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

A study on detective story authors' style differentiation and style structure based on Text Mining (텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구)

  • Moon, Seok Hyung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.89-115
    • /
    • 2019
  • This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

Analysis on Dynamics of Korea Startup Ecosystems Based on Topic Modeling (토픽 모델링을 활용한 한국의 창업생태계 트렌드 변화 분석)

  • Heeyoung Son;Myungjong Lee;Youngjo Byun
    • Knowledge Management Research
    • /
    • v.23 no.4
    • /
    • pp.315-338
    • /
    • 2022
  • In 1986, Korea established legal systems to support small and medium-sized start-ups, which becomes the main pillars of national development. The legal systems have stimulated start-up ecosystems to have more than 1 million new start-up companies founded every year during the past 30 years. To analyze the trend of Korea's start-up ecosystem, in this study, we collected 1.18 million news articles from 1991 to 2020. Then, we extracted news articles that have the keywords "start-up", "venture", and "start-up". We employed network analysis and topic modeling to analyze collected news articles. Our analysis can contribute to analyzing the government policy direction shown in the history of start-up support policy. Specifically, our analysis identifies the dynamic characteristics of government influenced by external environmental factors (e.g., society, economy, and culture). The results of our analysis suggest that the start-up ecosystems in Korea have changed and developed mainly by the government policies for corporation governance, industrial development planning, deregulation, and economic prosperity plan. Our frequency keyword analysis contributes to understanding entrepreneurial productivity attributed to activities among the networked components in industrial ecosystems. Our analyses and results provide practitioners and researchers with practical and academic implications that can help to establish dedicated support policies through forecast tasks of the economic environment surrounding the start-ups. Korean entrepreneurial productivity has been empowered by growing numbers of large companies in the mobile phone industry. The spectrum of large companies incorporates content startups, platform providers, online shopping malls, and youth-oriented start-ups. In addition, economic situational factors contribute to the growth of Korean entrepreneurial productivity the economic, which are related to the global expansions of the mobile industry, and government efforts to foster start-ups. Our research is methodologically implicative. We employ natural language processes for 30 years of media articles, which enables more rigorous analysis compared to the existing studies which only observe changes in government and policy based on a qualitative manner.

ZFC and Non-Denumerability (ZFC와 열거불가능성)

  • An, Yohan
    • Korean Journal of Logic
    • /
    • v.22 no.1
    • /
    • pp.43-86
    • /
    • 2019
  • If 1st order ZFC is consistent(has a model($M_1$)) it has a transitive denumerable model($M_2$). This leads to a paradoxical situation called 'Skolem paradox'. This can be easily resolved by Skolem's typical resolution. but In the process, we must accept the model theoretic relativity for the concept of set. This relativity can generate a situation where the meaning of the set concept, for example, is given differently depending on the two models. The problem is next. because the sentence '¬denu(PN)' which indicate that PN is not denumerable is equally true in two models, A indistinguishability problem that the concept <¬denu> is not formally indistinguishable in ZFC arise. First, I will give a detail analysis of what the nature of this problem is. And I will provide three ways of responding to this problem from the standpoint of supporting ZFC. First, I will argue that <¬denu> concept, which can be relative to the different models, can be 'almost' distinguished in ZFC by using the formalization of model theory in ZFC. Second, I will show that <¬denu> can change its meaning intrinsically or naturally, by its contextual dependency from the semantic considerations about quantifier that plays a key role in the relativity of <¬denu>. Thus, I will show the model-relative meaning change of <¬denu> concept is a natural phenomenon external to the language, not a matter of responsible for ZFC.

A Study on Building Knowledge Base for Intelligent Battlefield Awareness Service

  • Jo, Se-Hyeon;Kim, Hack-Jun;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose a method to build a knowledge base based on natural language processing for intelligent battlefield awareness service. The current command and control system manages and utilizes the collected battlefield information and tactical data at a basic level such as registration, storage, and sharing, and information fusion and situation analysis by an analyst is performed. This is an analyst's temporal constraints and cognitive limitations, and generally only one interpretation is drawn, and biased thinking can be reflected. Therefore, it is essential to aware the battlefield situation of the command and control system and to establish the intellignet decision support system. To do this, it is necessary to build a knowledge base specialized in the command and control system and develop intelligent battlefield awareness services based on it. In this paper, among the entity names suggested in the exobrain corpus, which is the private data, the top 250 types of meaningful names were applied and the weapon system entity type was additionally identified to properly represent battlefield information. Based on this, we proposed a way to build a battlefield-aware knowledge base through mention extraction, cross-reference resolution, and relationship extraction.

A Study of the Definition and Components of Data Literacy for K-12 AI Education (초·중등 AI 교육을 위한 데이터 리터러시 정의 및 구성 요소 연구)

  • Kim, Seulki;Kim, Taeyoung
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.5
    • /
    • pp.691-704
    • /
    • 2021
  • The development of AI technology has brought about a big change in our lives. The importance of AI and data education is also growing as AI's influence from life to society to the economy grows. In response, the OECD Education Research Report and various domestic information and curriculum studies deal with data literacy and present it as an essential competency. However, the definition of data literacy and the content and scope of the components vary among researchers. Thus, we analyze the semantic similarity of words through Word2Vec deep learning natural language processing methods along with the definitions of key data literacy studies and analysis of word frequency utilized in components, to present objective and comprehensive definition and components. It was revised and supplemented by expert review, and we defined data literacy as the 'basic ability of knowledge construction and communication to collect, analyze, and use data and process it as information for problem solving'. Furthermore we propose the components of each category of knowledge, skills, values and attitudes. We hope that the definition and components of data literacy derived from this study will serve as a good foundation for the systematization and education research of AI education related to students' future competency.

A Study on the Current State of the Library's AI Service and the Service Provision Plan (도서관의 인공지능(AI) 서비스 현황 및 서비스 제공 방안에 관한 연구)

  • Kwak, Woojung;Noh, Younghee
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.1
    • /
    • pp.155-178
    • /
    • 2021
  • In the era of the 4th industrial revolution, public libraries need a strategy for promoting intelligent library services in order to actively respond to changes in the external environment such as artificial intelligence. Therefore, in this study, based on the concept of artificial intelligence and analysis of domestic and foreign artificial intelligence related trends, policies, and cases, we proposed the future direction of introduction and development of artificial intelligence services in the library. Currently, the library operates a reference information service that automatically provides answers through the introduction of artificial intelligence technologies such as deep learning and natural language processing, and develops a big data-based AI book recommendation and automatic book inspection system to increase business utilization and provide customized services for users. Has been provided. In the field of companies and industries, regardless of domestic and overseas, we are developing and servicing technologies based on autonomous driving using artificial intelligence, personal customization, etc., and providing optimal results by self-learning information using deep learning. It is developed in the form of an equation. Accordingly, in the future, libraries will utilize artificial intelligence to recommend personalized books based on the user's usage records, recommend reading and culture programs, and introduce real-time delivery services through transport methods such as autonomous drones and cars in the case of book delivery service. Service development should be promoted.

Exploiting Chunking for Dependency Parsing in Korean (한국어에서 의존 구문분석을 위한 구묶음의 활용)

  • Namgoong, Young;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.7
    • /
    • pp.291-298
    • /
    • 2022
  • In this paper, we present a method for dependency parsing with chunking in Korean. Dependency parsing is a task of determining a governor of every word in a sentence. In general, we used to determine the syntactic governor in Korean and should transform the syntactic structure into semantic structure for further processing like semantic analysis in natural language processing. There is a notorious problem to determine whether syntactic or semantic governor. For example, the syntactic governor of the word "먹고 (eat)" in the sentence "밥을 먹고 싶다 (would like to eat)" is "싶다 (would like to)", which is an auxiliary verb and therefore can not be a semantic governor. In order to mitigate this somewhat, we propose a Korean dependency parsing after chunking, which is a process of segmenting a sentence into constituents. A constituent is a word or a group of words that function as a single unit within a dependency structure and is called a chunk in this paper. Compared to traditional dependency parsing, there are some advantage of the proposed method: (1) The number of input units in parsing can be reduced and then the parsing speed could be faster. (2) The effectiveness of parsing can be improved by considering the relation between two head words in chunks. Through experiments for Sejong dependency corpus, we have shown that the USA and LAS of the proposed method are 86.48% and 84.56%, respectively and the number of input units is reduced by about 22%p.