• Title/Summary/Keyword: 비정형데이터

Search Result 580, Processing Time 0.027 seconds

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

The Effect of Expert Reviews on Consumer Product Evaluations: A Text Mining Approach (전문가 제품 후기가 소비자 제품 평가에 미치는 영향: 텍스트마이닝 분석을 중심으로)

  • Kang, Taeyoung;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.63-82
    • /
    • 2016
  • Individuals gather information online to resolve problems in their daily lives and make various decisions about the purchase of products or services. With the revolutionary development of information technology, Web 2.0 has allowed more people to easily generate and use online reviews such that the volume of information is rapidly increasing, and the usefulness and significance of analyzing the unstructured data have also increased. This paper presents an analysis on the lexical features of expert product reviews to determine their influence on consumers' purchasing decisions. The focus was on how unstructured data can be organized and used in diverse contexts through text mining. In addition, diverse lexical features of expert reviews of contents provided by a third-party review site were extracted and defined. Expert reviews are defined as evaluations by people who have expert knowledge about specific products or services in newspapers or magazines; this type of review is also called a critic review. Consumers who purchased products before the widespread use of the Internet were able to access expert reviews through newspapers or magazines; thus, they were not able to access many of them. Recently, however, major media also now provide online services so that people can more easily and affordably access expert reviews compared to the past. The reason why diverse reviews from experts in several fields are important is that there is an information asymmetry where some information is not shared among consumers and sellers. The information asymmetry can be resolved with information provided by third parties with expertise to consumers. Then, consumers can read expert reviews and make purchasing decisions by considering the abundant information on products or services. Therefore, expert reviews play an important role in consumers' purchasing decisions and the performance of companies across diverse industries. If the influence of qualitative data such as reviews or assessment after the purchase of products can be separately identified from the quantitative data resources, such as the actual quality of products or price, it is possible to identify which aspects of product reviews hamper or promote product sales. Previous studies have focused on the characteristics of the experts themselves, such as the expertise and credibility of sources regarding expert reviews; however, these studies did not suggest the influence of the linguistic features of experts' product reviews on consumers' overall evaluation. However, this study focused on experts' recommendations and evaluations to reveal the lexical features of expert reviews and whether such features influence consumers' overall evaluations and purchasing decisions. Real expert product reviews were analyzed based on the suggested methodology, and five lexical features of expert reviews were ultimately determined. Specifically, the "review depth" (i.e., degree of detail of the expert's product analysis), and "lack of assurance" (i.e., degree of confidence that the expert has in the evaluation) have statistically significant effects on consumers' product evaluations. In contrast, the "positive polarity" (i.e., the degree of positivity of an expert's evaluations) has an insignificant effect, while the "negative polarity" (i.e., the degree of negativity of an expert's evaluations) has a significant negative effect on consumers' product evaluations. Finally, the "social orientation" (i.e., the degree of how many social expressions experts include in their reviews) does not have a significant effect on consumers' product evaluations. In summary, the lexical properties of the product reviews were defined according to each relevant factor. Then, the influence of each linguistic factor of expert reviews on the consumers' final evaluations was tested. In addition, a test was performed on whether each linguistic factor influencing consumers' product evaluations differs depending on the lexical features. The results of these analyses should provide guidelines on how individuals process massive volumes of unstructured data depending on lexical features in various contexts and how companies can use this mechanism from their perspective. This paper provides several theoretical and practical contributions, such as the proposal of a new methodology and its application to real data.

An Investigation on the Periodical Transition of News related to North Korea using Text Mining (텍스트마이닝을 활용한 북한 관련 뉴스의 기간별 변화과정 고찰)

  • Park, Chul-Soo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.63-88
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korea represented in South Korean mass media. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. In this study, R program was used to apply the text mining technique. R program is free software for statistical computing and graphics. Also, Text mining methods allow to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud. This study proposes a procedure to find meaningful tendencies based on a combination of word cloud, and co-occurrence networks. This study aims to more objectively explore the images of North Korea represented in South Korean newspapers by quantitatively reviewing the patterns of language use related to North Korea from 2016. 11. 1 to 2019. 5. 23 newspaper big data. In this study, we divided into three periods considering recent inter - Korean relations. Before January 1, 2018, it was set as a Before Phase of Peace Building. From January 1, 2018 to February 24, 2019, we have set up a Peace Building Phase. The New Year's message of Kim Jong-un and the Olympics of Pyeong Chang formed an atmosphere of peace on the Korean peninsula. After the Hanoi Pease summit, the third period was the silence of the relationship between North Korea and the United States. Therefore, it was called Depression Phase of Peace Building. This study analyzes news articles related to North Korea of the Korea Press Foundation database(www.bigkinds.or.kr) through text mining, to investigate characteristics of the Kim Jong-un regime's South Korea policy and unification discourse. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. In particular, it examines the changes in the international circumstances, domestic conflicts, the living conditions of North Korea, the South's Aid project for the North, the conflicts of the two Koreas, North Korean nuclear issue, and the North Korean refugee problem through the co-occurrence word analysis. It also offers an analysis of South Korean mentality toward North Korea in terms of the semantic prosody. In the Before Phase of Peace Building, the results of the analysis showed the order of 'Missiles', 'North Korea Nuclear', 'Diplomacy', 'Unification', and ' South-North Korean'. The results of Peace Building Phase are extracted the order of 'Panmunjom', 'Unification', 'North Korea Nuclear', 'Diplomacy', and 'Military'. The results of Depression Phase of Peace Building derived the order of 'North Korea Nuclear', 'North and South Korea', 'Missile', 'State Department', and 'International'. There are 16 words adopted in all three periods. The order is as follows: 'missile', 'North Korea Nuclear', 'Diplomacy', 'Unification', 'North and South Korea', 'Military', 'Kaesong Industrial Complex', 'Defense', 'Sanctions', 'Denuclearization', 'Peace', 'Exchange and Cooperation', and 'South Korea'. We expect that the results of this study will contribute to analyze the trends of news content of North Korea associated with North Korea's provocations. And future research on North Korean trends will be conducted based on the results of this study. We will continue to study the model development for North Korea risk measurement that can anticipate and respond to North Korea's behavior in advance. We expect that the text mining analysis method and the scientific data analysis technique will be applied to North Korea and unification research field. Through these academic studies, I hope to see a lot of studies that make important contributions to the nation.

과학자(科學者)의 정보생산(情報生産) 계속성(繼續性)과 정보유통(情報流通)(2)

  • Garvey, W.D.
    • Journal of Information Management
    • /
    • v.6 no.5
    • /
    • pp.131-134
    • /
    • 1973
  • 본고(本稿)시리이즈의 제1보(第一報)에서 우리는 물리(物理), 사회과학(社會科學) 및 공학분야(工學分野)의 12,442명(名)의 과학자(科學者)와 기술자(技術者)에 대한 정보교환활동(情報交換活動)의 78례(例)에 있어서 일반과정(一般過程)과 몇 가지 결과(結果)를 기술(記述)한 바 있다. 4년반(年半) 이상(以上)의 기간(其間)($1966{\sim}1971$)에서 수행(遂行)된 이 연구(硏究)는 현재(現在)의 과학지식(科學知識)의 집성체(集成體)로 과학자(科學者)들이 연구(硏究)를 시작(始作)한 때부터 기록상(記錄上)으로 연구결과(硏究結果)가 취합(聚合)될 때까지 각종(各種) 정형(定形), 비정형(非定形) 매체(媒體)를 통한 유통정보(流通情報)의 전파(傳播)와 동화(同化)에 대한 포괄적(包括的)인 도식(圖式)으로 표시(表示)할 수 있도록 설정(設定)하고 또 시행(施行)되었다. 2보(二報), 3보(三報), 4보(四報)에서는 데이터 뱅크에 수집(蒐集) 및 축적(蓄積)된 데이터의 일반적(一般的)인 기술(記述)을 적시(摘示)하였다. (1) 과학(科學)과 기술(技術)의 정보유통(情報流通)에 있어서 국가적(國家的) 회합(會合)의 역할(役割)(Garvey; 4보(報)) 국가적(國家的) 회합(會合)은 투고(投稿)와 이로 인한 잡지중(雜誌中) 게재간(揭載間)의 상대적(相對的)인 오랜 기간(期間)동안 이러한 연구(硏究)가 공개매체(公開媒體)로 인하여 일시적(一時的)이나마 게재여부(揭載如否)의 불명료성(不明瞭性)을 초래(招來)하기 전(前)에 과학연구(科學硏究)의 초기전파(初期傳播)를 위하여 먼저 행한 주요(主要) 사례(事例)와 마지막의 비정형매체(非定形媒體)의 양자(兩者)를 항상 조직화(組織化)하여 주는 전체적(全體的)인 유통과정(流通過程)에 있어서 명확(明確)하고도 중요(重要)한 기능(機能)을 갖는다는 것을 알 수 있었다. (2) 잡지(雜誌)에 게재(揭載)된 정보(情報)의 생산(生産)과 관련(關聯)되는 정보(情報)의 전파과정(傳播過程)(Garvey; 1보(報)). 이 연구(硏究)를 위해서 우리는 정보유통과정(情報流通過程)을 따라 많은 노력(努力)을 하였는데, 여기서 유통과정(流通過程)의 인상적(印象的)인 면목(面目)은 특별(特別)히 연구(硏究)로부터의 정보(情報)는 잡지(雜誌)에 게재(揭載)되기까지 진정으로는 공개적(公開的)이 못된다는 것과 이러한 사실(事實)은 선진연구(先進硏究)가 자주 시대(時代)에 뒤떨어지게 된다는 것을 발견할 수 있었다. 경험(經驗)이 많은 정보(情報)의 수요자(需要者)는 이러한 폐물화(廢物化)에 매우 민감(敏感)하며 자기(自己) 연구(硏究)에 당면한, 진행중(進行中)이거나 최근(最近) 완성(完成)된 연구(硏究)에 대하여 정보(情報)를 얻기 위한 모든 수단(手段)을 발견(發見)코자 하였다. 예를 들어, 이들은 잡지(雜誌)에 보문(報文)을 발표(發表)하기 전(前)에 발생(發生)하는 정보전파과정(情報傳播過程)을 통하여 유루(遺漏)될지도 모르는 정보(情報)를 얻기 위하여 한 잡지(雜誌)나 2차자료(二次資料) 또는 전형적(典型的)으로 이용(利用)되는 다른 잡지류중(雜誌類中)에서 당해정보(當該情報)가 발견(發見)되기를 기다리지 않는다는 것이다. (3) "정보생산 과학자(情報生産 科學者)"에 의한 정보전파(情報傳播)의 계속성(繼續性)(이 연구(硏究) 시리이즈의 결과(結果)는 본고(本稿)의 주내용(主內容)으로 되어 있다.) 1968/1969년(年)부터 1970/1971년(年)의 이년기간(二年期間)동안 보문(報文)을 낸 과학자(科學者)(1968/1969년(年) 잡지중(雜誌中)에 "질이 높은" 보문(報文)을 발표(發表)한)의 약 2/3는 1968/1969의 보문(報文)과 동일(同一)한 대상영역(對象領域)의 연구(硏究)를 계속(繼續) 수행(遂行)하였다. 그래서 우리는 본연구(本硏究)에 오른 대부분(大部分)의 저자(著者)가 정상적(正常的)인 과학(科學), 즉 연구수행중(硏究遂行中) 의문(疑問)에 대한 완전(完全)한 해답(解答)을 얻게 되는 가장 중요(重要)한 추구(追求)로서 Kuhn(제5보(第5報))에 의하여 기술(技術)된 방법(방법)으로 과학(연구)(科學(硏究))을 실행(實行)하였음을 알았다. 최근(最近)에 연구(硏究)를 마치고 그 결과(結果)를 보문(報文)으로서 발표(發表)한 이들 과학자(科學者)들은 다음 단계(段階)로 해야 할 사항(事項)에 대하여 선행(先行)된 동일견해(同一見解)를 가진 다른 연구자(硏究자)들의 연구(硏究)와 대상(對象)에 밀접(密接)하게 관련(關聯)되고 있다. 이 계속성(繼續性)의 효과(效果)에 대한 지표(指標)는 보문(報文)과 동일(同一)한 영역(領域)에서 연구(硏究)를 계속(繼續)한 저자(著者)들의 약 3/4은 선행(先行) 보문(報文)에 기술(技術)된 연구결과(硏究結果)에서 직접적(直接的)으로 새로운 연구(硏究)가 유도(誘導)되었음을 보고(報告)한 사항(事項)에 반영(反映)되어 있다. 그렇지만 우리들의 데이터는 다음 영역(領域)으로 기대(期待)하지 않은 전환(轉換)을 일으킬 수도 있음을 보여주고 있다. 동일(同一) 대상(對象)에서 연구(硏究)를 속행(續行)하였던 저자(著者)들의 1/5 이상(以上)은 뒤에 새로운 영역(領域)으로 연구(硏究)를 전환(轉換)하였고 또한 이 영역(領域)에서 연구(硏究)를 계속(繼續)하였다. 연구영역(硏究領域)의 이러한 변화(變化)는 연구자(硏究者)의 일반(一般) 정보유통(情報流通) 패턴에 크게 변화(變化)를 보이지는 않는다. 즉 새로운 지적(知的) 문제(問題)에 대한 변화(變化)에서 야기(惹起)되는 패턴에 있어서 저자(著者)들은 오래된 문제(問題)의 방법(方法)과 기술(技術)을 새로운 문제(問題)로 맞추려 한다. 과학사(科學史)의 최근(最近) 해석(解釋)(Hanson: 6보(報))에서 예기(豫期)되었던 바와 같이 정상적(正常的)인 과학(科學)의 계속성(繼續性)은 항상 절대적(絶對的)이 아니며 "과학지식(科學知識)"의 첫발자욱은 예전 연구영역(硏究領域)의 대상(對象)에 관계(關係)없이 나타나는 다른 영역(領域)으로 내딛게 될지도 모른다. 우리들의 연구(硏究)에서 저자(著者)의 1/3은 동일(同一) 영역(領域)의 대상(對象)에서 속계적(續繼的)인 연구(硏究)를 수행(遂行)치 않고 새로운 영역(領域)으로 옮아갔다. 우리는 이와 같은 데이터를 (a) 저자(著者)가 각개과학자(各個科學者)의 활동(活動)을 통하여 집중적(集中的)인 과학적(科學的) 노력(努力)을 시험(試驗)할 때 각자(各自)의 연구(硏究)에 대한 많은 양(量)의 계속성(繼續性)이 어떤 진보중(進步中)의 과학분야(科學分野)에서도 나타난다는 것과 (b) 이 계속성(繼續性)은 과학(科學)에 대한 집중적(集中的) 진보(進步)의 필요적(必要的) 특질(特質)이라는 것을 의미한다. 또한 우리는 이 계속성(繼續性)과 관련(關聯)되는 유통문제(流通問題)라는 새로운 대상영역(對象領域)으로 전환(轉換)할 때 연구(硏究)의 각단계(各段階)의 진보(進步)와 새로운 목적(目的)으로 전환시(轉換時) 양자(兩者)가 다 필요(必要)로 하는 각개(各個) 과학자(科學者)의 정보수요(情報需要)를 위한 시간(時間) 소비(消費)라는 것을 탐지(探知)할 수 있다. 이러한 관찰(觀察)은 정보(情報)의 선택제공(選擇提供)시스팀이 현재(現在) 필요(必要)로 하는 정보(情報)의 만족(滿足)을 위하여는 효과적(效果的)으로 매우 융통성(融通性)을 띠어야 한다는 것을 암시(暗示)하는 것이다. 본고(本稿)의 시리이즈에 기술(記述)된 전정보유통(全情報流通) 과정(過程)의 재검토(再檢討) 결과(結果)는 과학자(科學者)들이 항상 그들의 요구(要求)를 조화(調和)시키는 신축성(伸縮性)있는 유통체제(流通體制)를 발전(發展)시켜 왔다는 것을 시사(示唆)해 주고 있다. 이 시스팀은 정보전파(情報傳播) 사항(事項)을 중심(中心)으로 이루어 지며 또한 이 사항(事項)의 대부분(大部分)의 참여자(參與者)는 자기자신(自己自身)이 과학정보(科學情報) 전파자(傳播者)라는 기본적(基本的)인 정보전파체제(情報傳播體制)인 것이다. 그러나 이 과정(過程)의 유통행위(流通行爲)에서 살펴본 바와 같이 우리는 대부분(大部分)의 정보전파자(情報傳播者)가 역시 정보(情報)의 동화자(同化者)-다시 말해서 과학정보(科學情報)의 생산자(生産者)는 정보(情報)의 이용자(利用者)라는 것을 알 수 있다. 이 연구(硏究)에서 전형적(典型的)인 과학자((科學者)는 과학정보(科學情報)의 생산(生産)이나 전파(傳播)의 양자(兩者)에 연속적(連續的)으로 관계(關係)하고 있음을 보았다. 만일(萬一) 연구자(硏究者)가 한 편(編)의 연구(硏究)를 완료(完了)한다면 이 연구자(硏究者)는 다음에 무엇을 할 것이냐 하는 관념(觀念)을 갖게 되고 따라서 "완료(完了)된" 연구(硏究)에 관한 정보(情報)를 이용(利用)하여 동시(同時)에 새로운 일을 시작(始作)하게 된다. 예를 들어, 한 과학자(科學者)가 동일(同一) 영역(領域)의 다른 동료연구자(同僚硏究者)에게 완전(完全)하며 이의(異議)에 방어(防禦)할 수 있는 보고서(報告書)를 제공(提供)할 수 있는 단계(段階)에 도달(到達)하였다면 우리는 이 과학자(科學者)가 정보유통과정(情報流通過程)에서 많은 역할(役割)을 해낼 수 있다는 것을 알 것이다. 즉 이 과학자(科學者)는 다른 과학자(科學者)들에게 최신(最新)의 과학적(科學的) 결과(結果)를 제공(提供)할 때 하나의 과학정보(科學情報) 전파자(傳播者)가 되며, 이 연구(硏究)의 의의(意義)와 타당성(妥當性)에 관한 논평(論評)이나 비평(批評)을 동료(同僚)로부터 구(求)하는 관점(觀點)에서 보면 이 과학자(科學者)는 하나의 정보탐색자(情報探索者)가 된다. 또한 장래(將來)의 이용(利用)을 위하여 증정(贈呈)이나 동화(同化)한 이 정보(情報)로부터 피이드백을 받아 드렸을 때의 범주(範疇)에서 보면 (잡지(雜誌)에 투고(投稿)하기 위하여 원고(原稿)를 작성(作成)하는 경우에 있어서와 같이) 과학자(科學者)는 하나의 정보이용자(情報利用者)가 되고 이러한 모든 가능성(可能性)에서 정보생산자(情報生産者)는 다음 정보생산(情報生産)에 이미 들어가 있다고 볼 수 있다(저자(著者)들의 2/3는 보문(報文)이 게재(揭載)되기 전(前)에 이미 새로운 연구(硏究)를 시작(始作)하였다). 과학자(科學者)가 자기연구(自己硏究)를 마치고 예비보고서(豫備報告書)를 만든 후(後) 자기연구(自己硏究)에 관한 정보(情報)의 전파(傳播)를 계속하게 되는데 이와 관계(關係)되는 일반적(一般的)인 패턴을 보면 소수(少數)의 동료(同僚)그룹에 출석(出席)하는 경우 (예로 지역집담회)(地域集談會))와 대중(大衆) 앞에서 행(行)하는 경우(예로 국가적 회합(國家的 會合)) 등이 있다. 그러는 동안에 다양성(多樣性) 있는 성문보고서(成文報告書)가 이루어진다. 그러나 과학자(科學者)들이 자기연구(自己硏究)를 위한 주정보전파목표(主情報傳播目標)는 과학잡지중(科學雜誌中)에 게재(揭載)되는 보문(報文)이라는 것이 명확(明確)한 사실(事實)인 것이다. 이러한 목표(目標)에 도달(到達)할 때까지의 각(各) 정보전파단계(情報傳播段階)에서 과학자(科學者)들은 목표달성(目標達成)을 위하여 청중(聽衆), 자기동화(自己同化)된 정보(情報) 및 이미 이용(利用)된 정보(情報)로부터 피이드백을 탐색(探索)하게 된다. 우리가 본고(本稿)의 시리이즈중(中)에 표현(表現)하려 했던 바와 같이 이러한 활동(活動)은 조사수임자(調査受任者)의 의견(意見)이 원고(原稿)에 반영(反映)되고 또 그 원고(原稿)가 잡지게재(雜誌揭載)를 위해 수리(受理)될 때까지 계속적(繼續的)으로 정보(情報)를 탐색(探索)하는 과학자(科學者)나 기타(其他)사람들에게 효과적(效果的)이었다. 원고(原稿)가 수리(受理)되면 그 원고(原稿)의 저자(著者)들은 그 보문(報文)의 주내용(主內容)에 대하여 적극적(積極的)인 정보전파자(情報傳播者)로서의 역할(役割)을 종종 중지(中止)하는 일이 있는데 이때에는 저자(著者)들의 역할(役割)이 변화(變化)하는 것을 볼 수 있었다. 즉 이 저자(著者)들은 일시적(一時的)이긴 하나 새로운 일을 착수(着手)하기 위하여 정보(情報)의 동화자(同化者)를 찾게 된다. 또한 전(前)에 행한 일에 대한 의견(意見)이나 비평(批評)이 새로운 일에 영향(影響)을 끼치게 된다. 동시(同時)에 새로운 과학정보생산(科學情報生産) 과정(過程)에 들어가게 되고 현재(現在) 진행중(進行中)이거나 최근(最近) 완료(完了)한 연구(硏究)에 대한 정보(情報)를 항상 찾게 된다. 활발(活潑)한 연구(硏究)를 하는 과학자(科學者)들에게는, 동화자(同化者)로서의 역할(役割)과 전파자(傳播者)로서의 역할(役割)을 분리(分離)시킨다는 것은 실제적(實際的)은 못된다. 즉 후자(後者)를 완성(完成)하기 위해서는 전자(前者)를 이용(利用)하게 된다는 것이다. 과학자(科學者)들은 한 단계(段階)에서 한 전파자(傳播者)로서의 역할(役割)이 뚜렷하나 다른 단계(段階)에서는 정보교환(情報交換)이 기본적(基本的)으로 정보동화(情報同化)에 직결(直結)되고 있는 것이다. 정보전파자(情報傳播者)와 정보동화자간(情報同化者間)의 상호관계(相互關係)(또는 정보생산자(情報生産者)와 정보이용자간(情報利用者間))는 과학(科學)에 있어서 하나의 필수양상(必修樣相)이다. 과학(科學)의 유통구조(流通構造)가 전파자(傳播者)(이용자(利用者)로서의 역할(役割)보다는)의 필요성(必要性)에서 볼 때 복잡(複雜)하고 다이나믹한 시스팀으로 구성(構成)된다는 사실(事實)은 과학(科學)의 발전과정(發展過程)에서 필연적(必然的)으로 나타난다. 이와 같은 사실(事實)은 과학정보(科學情報)의 전파요원(傳播要員)이 국가적 회합(國家的 會合)에서 자기연구(自己硏究)에 대한 정보(情報)의 전파기회(傳播機會)를 거절(拒絶)하고 따라서 전파정보(電波情報)를 판단(判斷)하고 선별(選別)하는 것을 감소(減少)시키며 결과적(結果的)으로 잡지(雜誌)나 단행본(單行本)에서 비평(批評)을 하고 추고(推敲)하는 것이 배제(排除)될 때는 유형적(有形的) 과학(科學)은 급속(急速)히 비과학성(非科學性)을 띠게 된다는 것을 Lysenko의 생애(生涯)에 대한 Medvedev의 기술중(記述中)[7]에 지적(指摘)한 것과 관계(關係)되고 있다.

  • PDF

Artificial Intelligence In Wheelchair: From Technology for Autonomy to Technology for Interdependence and Care (휠체어 탄 인공지능: 자율적 기술에서 상호의존과 돌봄의 기술로)

  • HA, Dae-Cheong
    • Journal of Science and Technology Studies
    • /
    • v.19 no.2
    • /
    • pp.169-206
    • /
    • 2019
  • This article seeks to explore new relationships and ethics of human and technology by analyzing a cultural imaginary produced by artificial intelligence. Drawing on theoretical reflections of the Feminist Scientific and Technological Studies which understand science and technology as the matter of care(Puig de la Bellacas, 2011), this paper focuses on the fact that artificial intelligence and robots materialize cultural imaginary such as autonomy. This autonomy, defined as the capacity to adapt to a new environment through self-learning, is accepted as a way to conceptualize an authentic human or an ideal subject. However, this article argues that artificial intelligence is mediated by and dependent on invisible human labor and complex material devices, suggesting that such autonomy is close to fiction. The recent growth of the so-called 'assistant technology' shows that it is differentially visualizing the care work of both machines and humans. Technology and its cultural imaginary hide the care work of human workers and actively visualize the one of the machine. And they make autonomy and agency ideal humanness, leaving disabled bodies and dependency as unworthy. Artificial intelligence and its cultural imaginary negate the value of disabled bodies while idealizing abled-bodies, and result in eliminating the real relationship between man and technology as mutually dependent beings. In conclusion, the author argues that the technology we need is not the one to exclude the non-typical bodies and care work of others, but the one to include them as they are. This technology responsibly empathizes marginalized beings and encourages solidarity between fragile beings. Inspired by an art performance of artist Sue Austin, the author finally comes up with and suggests 'artificial intelligence in wheelchair' as an alternative figuration for the currently dominant 'autonomous artificial intelligence'.

Analyzing Self-Introduction Letter of Freshmen at Korea National College of Agricultural and Fisheries by Using Semantic Network Analysis : Based on TF-IDF Analysis (언어네트워크분석을 활용한 한국농수산대학 신입생 자기소개서 분석 - TF-IDF 분석을 기초로 -)

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Kim, S.H.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.1
    • /
    • pp.89-104
    • /
    • 2021
  • Based on the TF-IDF weighted value that evaluates the importance of words that play a key role, the semantic network analysis(SNA) was conducted on the self-introduction letter of freshman at Korea National College of Agriculture and Fisheries(KNCAF) in 2020. The top three words calculated by TF-IDF weights were agriculture, mathematics, study (Q. 1), clubs, plants, friends (Q. 2), friends, clubs, opinions, (Q. 3), mushrooms, insects, and fathers (Q. 4). In the relationship between words, the words with high betweenness centrality are reason, high school, attending (Q. 1), garbage, high school, school (Q. 2), importance, misunderstanding, completion (Q.3), processing, feed, and farmhouse (Q. 4). The words with high degree centrality are high school, inquiry, grades (Q. 1), garbage, cleanup, class time (Q. 2), opinion, meetings, volunteer activities (Q.3), processing, space, and practice (Q. 4). The combination of words with high frequency of simultaneous appearances, that is, high correlation, appeared as 'certification - acquisition', 'problem - solution', 'science - life', and 'misunderstanding - concession'. In cluster analysis, the number of clusters obtained by the height of cluster dendrogram was 2(Q.1), 4(Q.2, 4) and 5(Q. 3). At this time, the cohesion in Cluster was high and the heterogeneity between Clusters was clearly shown.

Structural and functional characteristics of rock-boring clam Barnea manilensis (암석을 천공하는 돌맛조개(Barnea manilensis)의 구조 및 기능)

  • Ji Yeong Kim;Yun Jeon Ahn;Tae Jin Kim;Seung Min Won;Seung Won Lee;Jongwon Song;Jeongeun Bak
    • Korean Journal of Environmental Biology
    • /
    • v.40 no.4
    • /
    • pp.413-422
    • /
    • 2022
  • Barnea manilensis is a bivalve which bores soft rocks, such as, limestone or mudstone in the low intertidal zone. They make burrows which have narrow entrances and wide interiors and live in these burrows for a lifetime. In this study, the morphology and the microstructure of the valve of rock-boring clam B. manilensis were observed using a stereoscopic microscope and FE-SEM, respectively. The chemical composition of specific part of the valve was assessed by energy dispersive X-ray spectroscopy (EDS) analysis. 3D modeling and structural dynamic analysis were used to simulate the boring behavior of B. manilensis. Microscopy results showed that the valve was asymmetric with plow-like spikes which were located on the anterior surface of the valve and were distributed in a specific direction. The anterior parts of the valve were thicker than the posterior parts. EDS results indicated that the valve mainly consisted of calcium carbonate, while metal elements, such as, Al, Si, Mn, Fe, and Mg were detected on the outer surface of the anterior spikes. It was assumed that the metal elements increased the strength of the valve, thus helping the B. manilensis to bore sediment. The simulation showed that spikes located on the anterior part of the valve received a load at all angles. It was suggested that the anterior part of the shell received the load while drilling rocks. The boring mechanism using the amorphous valve of B. manilensis is expected to be used as basic data to devise an efficient drilling mechanism.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.