• Title/Summary/Keyword: Data dictionary

Search Result 346, Processing Time 0.026 seconds

A Comparative Study between Stock Price Prediction Models Using Sentiment Analysis and Machine Learning Based on SNS and News Articles (SNS와 뉴스기사의 감성분석과 기계학습을 이용한 주가예측 모형 비교 연구)

  • Kim, Dongyoung;Park, Jeawon;Choi, Jaehyun
    • Journal of Information Technology Services
    • /
    • v.13 no.3
    • /
    • pp.221-233
    • /
    • 2014
  • Because people's interest of the stock market has been increased with the development of economy, a lot of studies have been going to predict fluctuation of stock prices. Latterly many studies have been made using scientific and technological method among the various forecasting method, and also data using for study are becoming diverse. So, in this paper we propose stock prices prediction models using sentiment analysis and machine learning based on news articles and SNS data to improve the accuracy of prediction of stock prices. Stock prices prediction models that we propose are generated through the four-step process that contain data collection, sentiment dictionary construction, sentiment analysis, and machine learning. The data have been collected to target newspapers related to economy in the case of news article and to target twitter in the case of SNS data. Sentiment dictionary was built using news articles among the collected data, and we utilize it to process sentiment analysis. In machine learning phase, we generate prediction models using various techniques of classification and the data that was made through sentiment analysis. After generating prediction models, we conducted 10-fold cross-validation to measure the performance of they. The experimental result showed that accuracy is over 80% in a number of ways and F1 score is closer to 0.8. The result can be seen as significantly enhanced result compared with conventional researches utilizing opinion mining or data mining techniques.

Text Extraction In WWW Images (웹 영상에 포함된 문자 영역의 추출)

  • 김상현;심재창;김중수
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.15-18
    • /
    • 2000
  • In this paper, we propose a method for text extraction in the Web images. Our approach is based on contrast detecting and pixel component ratio analysis in mouse position. Extracted data with OCR can be used for real time dictionary call or language translation application in Web browser.

  • PDF

ManBIF: a Program for Mining and Managing Biobank Impact Factor Data

  • Yu, Ki-Jin;Nam, Jung-Min;Her, Yun;Chu, Min-Seock;Seo, Hyung-Seok;Kim, Jun-Woo;Jeon, Jae-Pil;Park, Hye-Kyung;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.37-38
    • /
    • 2011
  • Biobank Impact Factor (BIF), which is a very effective criterion to evaluate the activity of biobanks, can be estimated by the citation information of biobanks from scientific papers. We have developed a program, ManBIF, to investigate the citation information from PDF files in the literature. The program manages a dictionary for expressions to represent biobanks and their resources, mines the citation information by converting PDF files to text files and searching with a dictionary, and produces a statistical report file. It can be used as an important tool by biobanks.

Efficient and Secure Sound-Based Hybrid Authentication Factor with High Usability

  • Mohinder Singh B;Jaisankar N.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2844-2861
    • /
    • 2023
  • Internet is the most prevailing word being used nowadays. Over the years, people are becoming more dependent on the internet as it makes their job easier. This became a part of everyone's life as a means of communication in almost every area like financial transactions, education, and personal-health operations. A lot of data is being converted to digital and made online. Many researchers have proposed different authentication factors - biometric and/or non-biometric authentication factors - as the first line of defense to secure online data. Among all those factors, passwords and passphrases are being used by many users around the world. However, the usability of these factors is low. Also, the passwords are easily susceptible to brute force and dictionary attacks. This paper proposes the generation of a novel passcode from the hybrid authentication factor - sound. The proposed passcode is evaluated for its strength to resist brute-force and dictionary attacks using the Shannon entropy and Passcode (or password) entropy formulae. Also, the passcode is evaluated for its usability. The entropy value of the proposed is 658.2. This is higher than that of other authentication factors. Like, for a 6-digit pin - the entropy value was 13.2, 101.4 for Password with Passphrase combined with Keystroke dynamics and 193 for fingerprint, and 30 for voice biometrics. The proposed novel passcode is far much better than other authentication factors when compared with their corresponding strength and usability values.

A Semi-Automatic Semantic Mark Tagging System for Building Dialogue Corpus (대화 말뭉치 구축을 위한 반자동 의미표지 태깅 시스템)

  • Park, Junhyeok;Lee, Songwook;Lim, Yoonseob;Choi, Jongsuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.5
    • /
    • pp.213-222
    • /
    • 2019
  • Determining the meaning of a keyword in a speech dialogue system is an important technology for the future implementation of an intelligent speech dialogue interface. After extracting keywords to grasp intention from user's utterance, the intention of utterance is determined by using the semantic mark of keyword. One keyword can have several semantic marks, and we regard the task of attaching the correct semantic mark to the user's intentions on these keyword as a problem of word sense disambiguation. In this study, about 23% of all keywords in the corpus is manually tagged to build a semantic mark dictionary, a synonym dictionary, and a context vector dictionary, and then the remaining 77% of all keywords is automatically tagged. The semantic mark of a keyword is determined by calculating the context vector similarity from the context vector dictionary. For an unregistered keyword, the semantic mark of the most similar keyword is attached using a synonym dictionary. We compare the performance of the system with manually constructed training set and semi-automatically expanded training set by selecting 3 high-frequency keywords and 3 low-frequency keywords in the corpus. In experiments, we obtained accuracy of 54.4% with manually constructed training set and 50.0% with semi-automatically expanded training set.

Designing a FRBR Work Grouping Algorithm of Bibliographic Records using a Role Term Dictionary of Authors (저자역할용어사전 구축 및 저작군집화에 관한 연구)

  • Yun, Jaehyuk;Do, Seulki;Oh, Sam G.
    • Journal of the Korean Society for information Management
    • /
    • v.37 no.2
    • /
    • pp.197-223
    • /
    • 2020
  • The purpose of this study is to analyze the issues resulted from the process of grouping KORMARC records using FRBR WORK concept and to suggest a new method. The previous studies did not sufficiently address the criteria or processes for identifying representative authors of records and their derivatives. Therefore, our study focused on devising a method of identifying the representative author when there are multiple contributors in a work. The study developed a method of identifying representative authors using an author role dictionary constructed by extracting role-terms from the statement of responsibility field (245). We also designed another way to group records as a work by calculating similarity measures of authors and titles. The accuracy rate of WORK grouping was the highest when blank spaces, parentheses, and controling processes were removed from titles and the measured similarity rates of authors and titles were higher than 80 percent. This was an experiment study where we developed an author-role dictionary that can be utilized in selecting a representative author and measured the similarity rate of authors and titles in order to achieve effective WORK grouping of KORMARC records. The future study will attempt to devise a way to improve the similarity measure of titles, incorporate FRBR Group 1 entities such as expression, manifestation and item data into the algorithm, and a method of improving the algorithm by utilizing other forms of MARC data that are widely used in Korea.

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

Structured Analysis of SNS for Development of Production Inventory System Fitted to Minor Enterprise (중소기업에 적합한 생산재고관리 시스템 개발을 위한 SNS 의 구조적 분석)

  • Jeon, Tae-Joon
    • IE interfaces
    • /
    • v.6 no.1
    • /
    • pp.47-54
    • /
    • 1993
  • Sequential Numbering System(SNS) is one of the production and inventory management system, which is more effective and practical to minor enterprises than Material Requirement Planning (MRP) system or Just-in-Time(JIT) system. The purpose of the paper is the structured analysis of SNS as the first phase of software development. Data Flow Diagram(DFD), Data Dictionary(DD), and Mini-Specs are used to analyze the system through the second level. The result can be exploited to SNS software design and programming.

  • PDF

Research and Development of Document Recognition System for Utilizing Image Data (이미지데이터 활용을 위한 문서인식시스템 연구 및 개발)

  • Kwag, Hee-Kue
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.125-138
    • /
    • 2010
  • The purpose of this research is to enhance document recognition system which is essential for developing full-text retrieval system of the document image data stored in the digital library of a public institution. To achieve this purpose, the main tasks of this research are: 1) analyzing the document image data and then developing its image preprocessing technology and document structure analysis one, 2) building its specialized knowledge base consisting of document layout and property, character model and word dictionary, respectively. In addition, developing the management tool of this knowledge base, the document recognition system is able to handle the various types of the document image data. Currently, we developed the prototype system of document recognition which is combined with the specialized knowledge base and the library of document structure analysis, respectively, adapted for the document image data housed in National Archives of Korea. With the results of this research, we plan to build up the test-bed and estimate the performance of document recognition system to maximize the utilization of full-text retrieval system.

The practical use with online database program of cosmetics' raw materials. (화장품원료 온라인 데이터베이스 구축과 활용)

  • Jeon Sang-hoon;Kim Ju-Duck
    • Journal of the Society of Cosmetic Scientists of Korea
    • /
    • v.29 no.2 s.43
    • /
    • pp.233-250
    • /
    • 2003
  • We often use the KCID(Korean Cosmetic Ingredient Dictionary) and ICID(International Cosmetic Ingredient Dictionary) within cosmetics research and within their export and import. so far, we do not have a database of a cosmetics' raw materials. Because of this, we consume a lot of time to find the raw material data that is needed. This study constructs a cosmetics' raw material database and develops the program to retrieve it. We used a Linux machine as the equipment for this study and we used Apache web server, MySQL database server and PHP as the tools of this study. 11,817 kinds of raw materials data were registered as ICID, 866 kinds of raw materials data were registered as KCID and 28,008 kinds of raw materials data with registered trade name into the database. Also, The database was composed of the database of the association form. The database of the online form could ultimately reduce the task time as soon as it did its purpose. The product of this study can become a good basis of data to reconfigure. In the future, it can become a good database in relation with different databases.