• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.027 seconds

A Study on Open API of Securities and Investment Companies in Korea for Activating Big Data

  • Ryu, Gui Yeol
    • International journal of advanced smart convergence
    • /
    • v.8 no.2
    • /
    • pp.102-108
    • /
    • 2019
  • Big data was associated with three key concepts, volume, variety, and velocity. Securities and investment services produce and store a large data of text/numbers. They have also the most data per company on the average in the US. Gartner found that the demand for big data in finance was 25%, which was the highest. Therefore securities and investment companies produce the largest data such as text/numbers, and have the highest demand. And insurance companies and credit card companies are using big data more actively than banking companies in Korea. Researches on the use of big data in securities and investment companies have been found to be insignificant. We surveyed 22 major securities and investment companies in Korea for activating big data. We can see they actively use AI for investment recommend. As for big data of securities and investment companies, we studied open API. Of the major 22 securities and investment companies, only six securities and investment companies are offering open APIs. The user OS is 100% Windows, and the language used is mainly VB, C#, MFC, and Excel provided by Windows. There is a difficulty in real-time analysis and decision making since developers cannot receive data directly using Hadoop, the big data platform. Development manuals are mainly provided on the Web, and only three companies provide as files. The development documentation for the file format is more convenient than web type. In order to activate big data in the securities and investment fields, we found that they should support Linux, and Java, Python, easy-to-view development manuals, videos such as YouTube.

Suggestions on how to convert official documents to Machine Readable (공문서의 기계가독형(Machine Readable) 전환 방법 제언)

  • Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.67
    • /
    • pp.99-138
    • /
    • 2021
  • In the era of big data, analyzing not only structured data but also unstructured data is emerging as an important task. Official documents produced by government agencies are also subject to big data analysis as large text-based unstructured data. From the perspective of internal work efficiency, knowledge management, records management, etc, it is necessary to analyze big data of public documents to derive useful implications. However, since many of the public documents currently held by public institutions are not in open format, a pre-processing process of extracting text from a bitstream is required for big data analysis. In addition, since contextual metadata is not sufficiently stored in the document file, separate efforts to secure metadata are required for high-quality analysis. In conclusion, the current official documents have a low level of machine readability, so big data analysis becomes expensive.

Feature Analyze and Research of National Convergence R&D: With Focus on the Text Mining (국가 융합 R&D 특성 분석에 관한 연구: 텍스트분석을 중심으로)

  • Yoo, KiCheol;Lee, TaeHee;Choi, SangHyun;Lee, JungHwan
    • Journal of Information Technology Applications and Management
    • /
    • v.27 no.1
    • /
    • pp.59-73
    • /
    • 2020
  • There is a growing interest in convergence. National R & D is also providing various policies and institutional support to promote convergence research. Convergence research, however, does not clearly specify its characteristics at the academic and government levels. This research proceeds with the process of collecting, refining, analyzing, modeling, verifying and visualizing national R & D data through the National Science and Technology Information Service (NTIS). The method is to derive the convergence research characteristics and to derive through text mining, focusing on the unstructured information of national R & D project data. The study confirmed that there was a difference in perception between the definition of converged research and the research site. In order to improve this, the research suggested that convergence among research subjects, collaboration among research topics reflecting various backgrounds and characteristics of researchers, and analysis of characteristics of convergence research using information were suggested in the process of establishing convergence policy.

Systemic Analysis of Research Activities and Trends Related to Artificial Intelligence(A.I.) Technology Based on Latent Dirichlet Allocation (LDA) Model (Latent Dirichlet Allocation (LDA) 모델 기반의 인공지능(A.I.) 기술 관련 연구 활동 및 동향 분석)

  • Chung, Myoung Sug;Lee, Joo Yeoun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.3
    • /
    • pp.87-95
    • /
    • 2018
  • Recently, with the technological development of artificial intelligence, related market is expanding rapidly. In the artificial intelligence technology field, which is still in the early stage but still expanding, it is important to reduce uncertainty about research direction and investment field. Therefore, this study examined technology trends using text mining and topic modeling among big data analysis methods and suggested trends of core technology and future growth potential. We hope that the results of this study will provide researchers with an understanding of artificial intelligence technology trends and new implications for future research directions.

Comparison and Analysis of Domestic and Foreign Sports Brands Using Text Mining and Opinion Mining Analysis (텍스트 마이닝과 오피니언 마이닝 분석을 활용한 국내외 스포츠용품 브랜드 비교·분석 연구)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.6
    • /
    • pp.217-234
    • /
    • 2018
  • In this study, big data analysis was conducted for domestic and international sports goods brands. Text Mining, TF-IDF, Opinion Mining, interestity graph were conducted through the social matrix program Textom and the fashion data analysis platform MISP. In order to examine the recent recognition of sports brands, the period of study is limited to 1 year from January 1, 2017 to December 31, 2017. As a result of analysis, first, we could confirm the products representing each brand. Second, I could confirm the marketing that represents each brand. Third, the common words extracted from each brand were identified. Fourth, the emotions of positive and negative of each brand were confirmed.

Time Series Analysis of Patent Keywords for Forecasting Emerging Technology (특허 키워드 시계열 분석을 통한 부상 기술 예측)

  • Kim, Jong-Chan;Lee, Joon-Hyuck;Kim, Gab-Jo;Park, Sang-Sung;Jang, Dong-Sick
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.355-360
    • /
    • 2014
  • Forecasting of emerging technology plays important roles in business strategy and R&D investment. There are various ways for technology forecasting including patent analysis. Qualitative analysis methods through experts' evaluations and opinions have been mainly used for technology forecasting using patents. However qualitative methods do not assure objectivity of analysis results and requires high cost and long time. To make up for the weaknesses, we are able to analyze patent data quantitatively and statistically by using text mining technique. In this paper, we suggest a new method of technology forecasting using text mining and ARIMA analysis.

PreBAC: a novel Access Control scheme based Proxy Re-Encryption for cloud computing

  • Su, Mang;Wang, Liangchen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.5
    • /
    • pp.2754-2767
    • /
    • 2019
  • Cloud computing is widely used in information spreading and processing, which has provided a easy and quick way for users to access data and retrieve service. Generally, in order to prevent the leakage of the information, the data in cloud is transferred in the encrypted form. As one of the traditional security technologies, access control is an important part for cloud security. However, the current access control schemes are not suitable for cloud, thus, it is a vital problem to design an access control scheme which should take account of complex factors to satisfy the various requirements for cipher text protection. We present a novel access control scheme based on proxy re-encryption(PRE) technology (PreBAC) for cipher text. It will suitable for the protection of data confidently and information privacy. At first, We will give the motivations and related works, and then specify system model for our scheme. Secondly, the algorithms are given and security of our scheme is proved. Finally, the comparisons between other schemes are made to show the advantages of PreBAC.

Topic Modeling of Suicide Papers using Text Mining (텍스트마이닝을 활용한 자살 관련 논문 토픽 모델링)

  • Cho, Kyoung Won;Kim, Ha-young;Kim, Mi-ri;Woo, Young Woon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.275-277
    • /
    • 2019
  • The purpose of this study is to classify the topics related to the suicide papers published so far and to identify the proporations of the main topics and the trends of the topics over the past 20 years. For this purpose, a text mining technique used in big data analysis was used as a data base of the Korean Journal of Citation Index (KCI), where information sharing about the papers is most active. This study, which grasps the trends of suicide related research according to the changes of the times, will become a basic data for establishing a strategy to adapt the academic direction related to suicide in the future.

  • PDF

Association Modeling on Keyword and Abstract Data in Korean Port Research

  • Yoon, Hee-Young;Kwak, Il-Youp
    • Journal of Korea Trade
    • /
    • v.24 no.5
    • /
    • pp.71-86
    • /
    • 2020
  • Purpose - This study investigates research trends by searching for English keywords and abstracts in 1,511 Korean journal articles in the Korea Citation Index from the 2002-2019 period using the term "Port." The study aims to lay the foundation for a more balanced development of port research. Design/methodology - Using abstract and keyword data, we perform frequency analysis and word embedding (Word2vec). A t-SNE plot shows the main keywords extracted using the TextRank algorithm. To analyze which words were used in what context in our two nine-year subperiods (2002-2010 and 2010-2019), we use Scattertext and scaled F-scores. Findings - First, during the 18-year study period, port research has developed through the convergence of diverse academic fields, covering 102 subject areas and 219 journals. Second, our frequency analysis of 4,431 keywords in 1,511 papers shows that the words "Port" (60 times), "Port Competitiveness" (33 times), and "Port Authority" (29 times), among others, are attractive to most researchers. Third, a word embedding analysis identifies the words highly correlated with the top eight keywords and visually shows four different subject clusters in a t-SNE plot. Fourth, we use Scattertext to compare words used in the two research sub-periods. Originality/value - This study is the first to apply abstract and keyword analysis and various text mining techniques to Korean journal articles in port research and thus has important implications. Further in-depth studies should collect a greater variety of textual data and analyze and compare port studies from different countries.

Analysis of VR Game Trends using Text Mining and Word Cloud -Focusing on STEAM review data- (텍스트마이닝과 워드 클라우드를 활용한 VR 게임 트렌드 분석 -스팀(steam) 리뷰 데이터를 중심으로-)

  • Na, Ji Young
    • Journal of Korea Game Society
    • /
    • v.22 no.1
    • /
    • pp.87-98
    • /
    • 2022
  • With the development of fourth industrial revolution-related technology and increased demands for non-face-to-face services, VR games attract attention. This study collected VR game review data from an online game platform STEAM and analyzed chronical trends using text mining and word cloud analysis. According to the results, experience and perceived cost were major trends from 2016 to 2017, increased demands for FPS and rhythm games were from 2018 to 2019, and story and immersion were from 2020 to 2021. It aims to contribute to expanding the base of VR games by identifying the keywords VR users take interest in by period.