Search | Korea Science

Media-based Analysis of Gasoline Inventory with Korean Text Summarization (한국어 문서 요약 기법을 활용한 휘발유 재고량에 대한 미디어 분석)

Sungyeon Yoon;Minseo Park
- The Journal of the Convergence on Culture Technology
- /
- v.9 no.5
- /
- pp.509-515
- /
- 2023
Despite the continued development of alternative energies, fuel consumption is increasing. In particular, the price of gasoline fluctuates greatly according to fluctuations in international oil prices. Gas stations adjust their gasoline inventory to respond to gasoline price fluctuations. In this study, news datasets is used to analyze the gasoline consumption patterns through fluctuations of the gasoline inventory. First, collecting news datasets with web crawling. Second, summarizing news datasets using KoBART, which summarizes the Korean text datasets. Finally, preprocessing and deriving the fluctuations factors through N-Gram Language Model and TF-IDF. Through this study, it is possible to analyze and predict gasoline consumption patterns.
https://doi.org/10.17703/JCCT.2023.9.5.509 인용 PDF

Optimizing Innovative Tools for Dissemination of Information in Nigerian Academic Libraries During Post-COVID Era

Halimah Odunayo AMUDA;Ayotola Olubunmi ONANUGA
- International Journal of Knowledge Content Development & Technology
- /
- v.14 no.1
- /
- pp.19-31
- /
- 2024
In order to support the mission of the institution in which they are attached, academic libraries provide services in both manual and digital but COVID -19 pandemic that spanned between March and September, 2020 has changed the scenario. With particular reference to Nigeria, about 249,606 cases were confirmed and in order to curb the scourge of this deadly disease, physical academic activities were prevented by Nigeria Centre for Disease Control (NCDC). With this development, innovative tools became indispensable tools for successful delivery of library services in Nigerian academic libraries. Whether or not these tools are still in use for reformation of library service during post- Covid era remains unclear, hence, need for this study. This study examined librarians' use of innovative tools for information dissemination in Nigerian academic libraries during the post-Covid era using a descriptive survey design. Data were obtained both in quantitative and qualitative formats from one hundred and forty-four librarians as respondents. A total enumeration sampling technique was adopted because the population was minimal. Findings of the study revealed that innovative tools such as videoconferencing, WhatsApp, teleconferencing, Facebook, LinkedIn, and web-based learning applications are still in use by librarians for the dissemination of information during the post-Covid era. These tools are useful and beneficial to librarians during the post-COVID era, as they facilitate easy participation and engagement of library users in various discussions. Inadequate funding and lack of advanced technology skills were also identified as major impediments to the successful use of innovative tools for information dissemination. As a result, it was suggested that academic libraries throughout Nigeria prioritize staff training on the necessary digital skills needed to cope in this advanced technology era.
https://doi.org/10.5865/IJKCT.2024.14.1.019 인용 PDF

A System for Automatic Classification of Traditional Culture Texts (전통문화 콘텐츠 표준체계를 활용한 자동 텍스트 분류 시스템)

Hur, YunA;Lee, DongYub;Kim, Kuekyeng;Yu, Wonhee;Lim, HeuiSeok
- Journal of the Korea Convergence Society
- /
- v.8 no.12
- /
- pp.39-47
- /
- 2017
The Internet have increased the number of digital web documents related to the history and traditions of Korean Culture. However, users who search for creators or materials related to traditional cultures are not able to get the information they want and the results are not enough. Document classification is required to access this effective information. In the past, document classification has been difficult to manually and manually classify documents, but it has recently been difficult to spend a lot of time and money. Therefore, this paper develops an automatic text classification model of traditional cultural contents based on the data of the Korean information culture field composed of systematic classifications of traditional cultural contents. This study applied TF-IDF model, Bag-of-Words model, and TF-IDF/Bag-of-Words combined model to extract word frequencies for 'Korea Traditional Culture' data. And we developed the automatic text classification model of traditional cultural contents using Support Vector Machine classification algorithm.
https://doi.org/10.15207/JKCS.2017.8.12.039 인용 PDF KSCI

Development of Block-based Code Generation and Recommendation Model Using Natural Language Processing Model (자연어 처리 모델을 활용한 블록 코드 생성 및 추천 모델 개발)

Jeon, In-seong;Song, Ki-Sang
- Journal of The Korean Association of Information Education
- /
- v.26 no.3
- /
- pp.197-207
- /
- 2022
In this paper, we develop a machine learning based block code generation and recommendation model for the purpose of reducing cognitive load of learners during coding education that learns the learner's block that has been made in the block programming environment using natural processing model and fine-tuning and then generates and recommends the selectable blocks for the next step. To develop the model, the training dataset was produced by pre-processing 50 block codes that were on the popular block programming language web site 'Entry'. Also, after dividing the pre-processed blocks into training dataset, verification dataset and test dataset, we developed a model that generates block codes based on LSTM, Seq2Seq, and GPT-2 model. In the results of the performance evaluation of the developed model, GPT-2 showed a higher performance than the LSTM and Seq2Seq model in the BLEU and ROUGE scores which measure sentence similarity. The data results generated through the GPT-2 model, show that the performance was relatively similar in the BLEU and ROUGE scores except for the case where the number of blocks was 1 or 17.
https://doi.org/10.14352/jkaie.2022.26.3.197 인용 PDF KSCI

Development of Overhead Projector Films, CD-ROM, and Bio-Cosmos Home Page as Teaching Resources for High School Biology (고교 생물의 오버헤드 프로젝터용 필름 제작 및 전달 매체로서의 CD-ROM과 홈페이지의 설계)

Song, Bang-Ho;Sin, Youn-Uk;Choi, Mie-Sook;Park, Chang-Bo;Ahn, Na-Young;Kang, Jae-Seuk;Kim, Jeung-Hyun;Seo, Hae-Ae;Kwon, Duck-Kee;Sohn, Jong-Kyung;Chung, Hwa-Sook;Yang, Hong-Jun;Park, Sung-Ho
- Journal of The Korean Association For Science Education
- /
- v.19 no.3
- /
- pp.428-440
- /
- 1999
The colorful overhead projector films, named as Bio-cosmos II, including photographs, pictures, concept maps, and diagrams, were developed and manufactured as audio-visual teaching aids and teaching resources for students' biology learning in high school, and the CD-ROM and web sites for their application to the school were also constructed. The content of the films was organized based upon the analysis of seven different biology textbooks approved by the Ministry of Education. The films were designated based on various instructional strategies and manufactured using multimedia with various educational softwares. The CD-ROM was composed of the scenes as logo, initial main, chapters list, contents, and quit. Initial main scene indicated various chapters according to the texts of biology areas in General Science, Biology I, and II. Each chapters linked with the scenes for detailed concept maps, the downstream real subjects, and contents. The subject screens were composed of various types of summarized diagrams including lesson contents, figures, pictures, photographs, and their explanation, experimental procedures and results, tables for summarized contents, and additional animation with video captures, explanations, glossary, etc. Most files were manufactured in software Adobe Photoshop by scanning the pictures, figures and photographs, and then the explanation, modification, storing with PICT or PSD files, and transformation with JPG files, were processed in the aspect of high quality in terms of instructional strategies and graphic skills on gracefulness, clearness, colorfulness, brightness, and distinctness. A 14 films for biology areas in General Science, 80 for Biology I, and 142 for Biology II were manufactured and loaded to the CD-ROM and web site, and the files had been attempted to opened with an internet home-page of http://gic.kyungpook.ac.kr/biocosmos.
PDF

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
- Science of Emotion and Sensibility
- /
- v.13 no.1
- /
- pp.47-60
- /
- 2010
Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.
PDF

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

Jeong, Hanjo;Park, Byeonghwa
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.1-13
- /
- 2015
As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.
https://doi.org/10.13088/jiis.2015.21.1.01 인용 PDF KSCI

Design and Implementation of a Question Management System based on a Concept Lattice (개념 망 구조를 기반으로 한 문항 관리 시스템의 설계 및 구현)

Kim, Mi-Hye
- The Journal of the Korea Contents Association
- /
- v.8 no.11
- /
- pp.412-425
- /
- 2008
One of the important elements for improving academic achievement of learners in education through e-learning is to support learners to study by finding questions they want with providing various evaluation questions. However, most of question retrieval systems usually depend on keyword search based on only a syntactical analysis and/or a hierarchical browsing system classified by the topics of subjects. In such a system it is not easy to find integrative questions associated with each other. In order to improve this problem, in this paper we proposed a question management and retrieval system which allows users to easily manage questions and also to effectively find questions for study on the Web. Then, we implemented a system that gives to access questions for the domain of C language programming. The system makes it possible to easily search questions related to not only a single theme but also questions integrated by interrelationship between topics and questions. This is done by supporting to be able to retrieve questions according to conceptual interrelationships between questions from user query. Consequently, it is expected that the proposed system will provide learners to understand the basic theories and the concepts of the subjects as well as to improve the ability of comprehensive knowledge utilization and problem-solving.
https://doi.org/10.5392/JKCA.2008.8.11.412 인용 PDF

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

Eo, Kyun Sun;Lee, Kun Chang
- Journal of Digital Convergence
- /
- v.17 no.2
- /
- pp.163-170
- /
- 2019
Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.
https://doi.org/10.14400/JDC.2019.17.2.163 인용 PDF KSCI HTML

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.119-138
- /
- 2017
Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.
https://doi.org/10.13088/jiis.2017.23.3.119 인용 PDF KSCI

Search Result 1,320, Processing Time 0.043 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)