• Title/Summary/Keyword: Text Index

Search Result 268, Processing Time 0.02 seconds

Association Modeling on Keyword and Abstract Data in Korean Port Research

  • Yoon, Hee-Young;Kwak, Il-Youp
    • Journal of Korea Trade
    • /
    • v.24 no.5
    • /
    • pp.71-86
    • /
    • 2020
  • Purpose - This study investigates research trends by searching for English keywords and abstracts in 1,511 Korean journal articles in the Korea Citation Index from the 2002-2019 period using the term "Port." The study aims to lay the foundation for a more balanced development of port research. Design/methodology - Using abstract and keyword data, we perform frequency analysis and word embedding (Word2vec). A t-SNE plot shows the main keywords extracted using the TextRank algorithm. To analyze which words were used in what context in our two nine-year subperiods (2002-2010 and 2010-2019), we use Scattertext and scaled F-scores. Findings - First, during the 18-year study period, port research has developed through the convergence of diverse academic fields, covering 102 subject areas and 219 journals. Second, our frequency analysis of 4,431 keywords in 1,511 papers shows that the words "Port" (60 times), "Port Competitiveness" (33 times), and "Port Authority" (29 times), among others, are attractive to most researchers. Third, a word embedding analysis identifies the words highly correlated with the top eight keywords and visually shows four different subject clusters in a t-SNE plot. Fourth, we use Scattertext to compare words used in the two research sub-periods. Originality/value - This study is the first to apply abstract and keyword analysis and various text mining techniques to Korean journal articles in port research and thus has important implications. Further in-depth studies should collect a greater variety of textual data and analyze and compare port studies from different countries.

Development of a korean Text Recognition System (한글 문서 인식 시스템 개발 연구)

  • 고견;이일병
    • Korean Journal of Cognitive Science
    • /
    • v.1 no.1
    • /
    • pp.77-102
    • /
    • 1989
  • This paper reports on the development of a recognition system for Korean character,numbers and punctuation marks by syntactic approach after extracting a character or punctuation mark from a page of text.First,using the projection profile(Masudaet.al.1985,Pavlidin 1981)method, we segment a page into different regions of column or row major and then extracts lines of characters from it.Considering the height,width and connectivity of character block,we proceed to extract syllables from the extracted lines.Basically we distinguish syables into six types of formal pattern(남궁재찬 1982,이주근등 1981)following the research of lee and others,and the punctuation marks and numbers into two kinds of formal patterns,and discriminate the surface structure of the extracted syllables.By Index-Removal algorithm,we subdivide them into 44 kinds of basic korean subpattern and special characters (numbers,punctuation marks)and recognize them by syntactic method(이주근등 1981.)

What Topics Have Been Studied in Korean Mathematics Education for 15 Years: Latent Topic Modeling Analysis

  • Hwang, Jihyun
    • Research in Mathematical Education
    • /
    • v.24 no.4
    • /
    • pp.313-335
    • /
    • 2021
  • The purpose of this research is to identify topics discussed by Korean mathematics education studies and examine research trends for 15 years. I applied latent Dirichlet allocation (LDA) to the original text datasets including English abstracts of 3,157 articles published in eight journals indexed by the Korean Citation Index (KCI) from 1997 to 2019. I identified an LDA model with 60 topics, then research trends in 2,884 articles between 2002 and 2018 were as follows; mathematics educators have paid most attention to teacher education through 2010 to 2015 and curriculum analysis after 2016. The findings in this research can contribute to understand what have been discussed in Korean mathematics education society as well as what will and need to be emphasized more in the future compared to the global research trends. In addition, LDA has potentials to identify topics and keywords of manuscripts newly written and submitted to any journals in addition to information provided by authors.

A Study on Informetric Analysis for Measuring the Qualitative Research Performance (연구성과의 질적 평가를 위한 계량정보학적 분석에 관한 연구)

  • Kang, Dae-Shin;Moon, Sung-Been
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.377-394
    • /
    • 2009
  • There are some limitations in the existing bibliometric methods to satisfy the various requests of the interest parties including researchers, managers, policy makers to identify 1) which research group or researcher is the key player, and the overall trends of the particular technological sub-fields, 2) which research groups, institutions or countries mainly use their research outputs, 3) what are the spin-offs from research outputs to some scientific and technological fields, 4) in which levels they are when comparing their quantitative and qualitative research outputs to those of other competitive institutions. It is essential to develop new informetric indicators and methodologies in order to satisfy stakeholder's various demands and to strengthen qualitative analysis in measuring research performance. This study suggested informetric indicators such as article quality index, citation impact index, international cooperation index, excellent article production index and methodologies including citation analysis, text mining.

Internet Information Orientation: The Link to National Competitiveness on Internet

  • Song, In Kuk;Kang, Mingoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3028-3039
    • /
    • 2015
  • Recently, the web index of Korea peaked at the top 10 among the eighty six countries, and Korea became the only Asian country ranked at the top level. Korea also has been on the top in the field of Internet penetration rate, in terms of both high-speed broadband and wireless Internet. However, such achievements did not guarantee the national level for the effective use of information utilizing Internet. According to OECD, the national informatization index of Korea has not been free from the middle of the OECD countries. Despite of the heightened pressure in practically enhancing effective information use utilizing Internet, the previous research interests and efforts to develop the Internet-related framework or to identify Internet capabilities rarely existed. The study aims to propose the framework, named "Internet Information Orientation" that illustrates the relationship between Internet capabilities and national competitiveness on Internet. The research identified the specific Internet capabilities, reclassified the capabilities based on the research issues provided at the 6th international conference on Internet held in December 2014, and finally described the rigorous research endeavors on the issues. As a result, 16 papers presented and selected as the outstanding papers at the conference handle issues to be brought together, which include: Wireless Network, Internet of Things, Green Computing, Multimedia Processing, Big Data and Text Mining, Database in Cloud Environment, Business Intelligence, Software Engineering, IT Strategy & Policy, and Social Network Services.

A Prediction of Stock Price Through the Big-data Analysis (인터넷 뉴스 빅데이터를 활용한 기업 주가지수 예측)

  • Yu, Ji Don;Lee, Ik Sun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.3
    • /
    • pp.154-161
    • /
    • 2018
  • This study conducted to predict the stock market prices based on the assumption that internet news articles might have an impact and effect on the rise and fall of stock market prices. The internet news articles were tested to evaluate the accuracy by comparing predicted values of the actual stock index and the forecasting models of the companies. This paper collected stock news from the internet, and analyzed and identified the relationship with the stock price index. Since the internet news contents consist mainly of unstructured texts, this study used text mining technique and multiple regression analysis technique to analyze news articles. A company H as a representative automobile manufacturing company was selected, and prediction models for the stock price index of company H was presented. Thus two prediction models for forecasting the upturn and decline of H stock index is derived and presented. Among the two prediction models, the error value of the prediction model (1) is low, and so the prediction performance of the model (1) is relatively better than that of the prediction model (2). As the further research, if the contents of this study are supplemented by real artificial intelligent investment decision system and applied to real investment, more practical research results will be able to be developed.

A Study on the 'Tangaek-Unhoei(湯液韻彙)' Index of Herbal Medicine in the Inje-Ji(仁濟志) of the Imwon-Gyeongje-Ji(林園經濟志), by Seo-Yugu(徐有榘) Focusing on 'Fang(方)' (풍석(楓石) 서유구(徐有榘)의 『임원경제지(林園經濟志)』 「인제지(仁濟志)」 '탕액운휘(湯液韻彙)'와 처방 제형에 대한 연구 - '방(方)'을 중심으로 -)

  • JEON, Jongwook
    • Journal of Korean Medical classics
    • /
    • v.36 no.4
    • /
    • pp.25-40
    • /
    • 2023
  • Objectives : This paper studies the Tangaek-Unhoei(湯液韻彙) index of herbal medicine in the Inje-Ji(仁濟志) of the Imwon-Gyeongje-Ji(林園經濟志), which contains about 4,800 formulas. Created by 19th-century Joseon scholar Seo, Yugu, it not only lists the formulas according to their names, but also provides index by topic, which enabled the collection and effective application of massive medical information. Methods : We quantitatively examined the nearly 4,800 herbal medicines in the Tangaek-Unhoei and their categorization. Any uncommon or particular categorization was examined further by analyzing the original text. Results & Conclusions : The prescriptions contained in the Inje-Ji are categorized under 26 headings. They are listed according to the 106 units of the Chinese character dictionary and organized by double headings. This unique index makes it easy to browse the contents of such a vast book containing massive medicinal knowledge. In addition, the fifty or so remedies called 'Fang(方)' exemplify the author's attitude toward medicinal knowledge, which is both rational and inclusive. This is an attitude that should be recognized beyond tradition.

Research Trends in Record Management Using Unstructured Text Data Analysis (비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향)

  • Deokyong Hong;Junseok Heo
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.4
    • /
    • pp.73-89
    • /
    • 2023
  • This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as "record management" (889 times), "analysis" (888 times), "archive" (742 times), "record" (562 times), and "utilization" (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as "archiving," "national record management," "standardization," "official documents," and "record management systems" occurring frequently in the first group (past). On the other hand, keywords such as "community," "data," "record information service," "online," and "digital archives" in the second group (current) were garnering substantial focus.

A Text Processing Method for Devanagari Scripts in Andriod (안드로이드에서 힌디어 텍스트 처리 방법)

  • Kim, Jae-Hyeok;Maeng, Seung-Ryol
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.560-569
    • /
    • 2011
  • In this paper, we propose a text processing method for Hindi characters, Devanagari scripts, in the Android. The key points of the text processing are to device automata, which define the combining rules of alphabets into a set of syllables, and to implement a font rendering engine, which retrieves and displays the glyph images corresponding to specific characters. In general, an automaton depends on the type and the number of characters. For the soft-keyboard, we designed the automata with 14 consonants and 34 vowels based on Unicode. Finally, a combined syllable is converted into a glyph index using the mapping table, used as a handle to load its glyph image. According to the multi-lingual framework of Freetype font engine, Dvanagari scripts can be supported in the system level by appending the implementation of our method to the font engine as the Hindi module. The proposed method is verified through a simple message system.

An Empirical Study on Improving the Performance of Text Categorization Considering the Relationships between Feature Selection Criteria and Weighting Methods (자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구)

  • Lee Jae-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.2
    • /
    • pp.123-146
    • /
    • 2005
  • This study aims to find consistent strategies for feature selection and feature weighting methods, which can improve the effectiveness and efficiency of kNN text classifier. Feature selection criteria and feature weighting methods are as important factor as classification algorithms to achieve good performance of text categorization systems. Most of the former studies chose conflicting strategies for feature selection criteria and weighting methods. In this study, the performance of several feature selection criteria are measured considering the storage space for inverted index records and the classification time. The classification experiments in this study are conducted to examine the performance of IDF as feature selection criteria and the performance of conventional feature selection criteria, e.g. mutual information, as feature weighting methods. The results of these experiments suggest that using those measures which prefer low-frequency features as feature selection criterion and also as feature weighting method. we can increase the classification speed up to three or five times without loosing classification accuracy.