• Title/Summary/Keyword: Portal system

Search Result 631, Processing Time 0.024 seconds

Analyzing the Effect of Online media on Overseas Travels: A Case study of Asian 5 countries (해외 출국에 영향을 미치는 온라인 미디어 효과 분석: 아시아 5개국을 중심으로)

  • Lee, Hea In;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.53-74
    • /
    • 2018
  • Since South Korea has an economic structure that has a characteristic which market-dependent on overseas, the tourism industry is considered as a very important industry for the national economy, such as improving the country's balance of payments or providing income and employment increases. Accordingly, the necessity of more accurate forecasting on the demand in the tourism industry has been raised to promote its industry. In the related research, economic variables such as exchange rate and income have been used as variables influencing tourism demand. As information technology has been widely used, some researchers have also analyzed the effect of media on tourism demand. It has shown that the media has a considerable influence on traveler's decision making, such as choosing an outbound destination. Furthermore, with the recent availability of online information searches to obtain the latest information and two-way communication in social media, it is possible to obtain up-to-date information on travel more quickly than before. The information in online media such as blogs can naturally create the Word-of-Mouth effect by sharing useful information, which is called eWOM. Like all other service industries, the tourism industry is characterized by difficulty in evaluating its values before it is experienced directly. And furthermore, most of the travelers tend to search for more information in advance from various sources to reduce the perceived risk to the destination, so they can also be influenced by online media such as online news. In this study, we suggested that the number of online media posting, which causes the effects of Word-of-Mouth, may have an effect on the number of outbound travelers. We divided online media into public media and private media according to their characteristics and selected online news as public media and blog as private media, one of the most popular social media in tourist information. Based on the previous studies about the eWOM effects on online news and blog, we analyzed a relationship between the volume of eWOM and the outbound tourism demand through the panel model. To this end, we collected data on the number of national outbound travelers from 2007 to 2015 provided by the Korea Tourism Organization. According to statistics, the highest number of outbound tourism demand in Korea are China, Japan, Thailand, Hong Kong and the Philippines, which are selected as a dependent variable in this study. In order to measure the volume of eWOM, we collected online news and blog postings for the same period as the number of outbound travelers in Naver, which is the largest portal site in South Korea. In this study, a panel model was established to analyze the effect of online media on the demand of Korean outbound travelers and to identify that there was a significant difference in the influence of online media by each time and countries. The results of this study can be summarized as follows. First, the impact of the online news and blog eWOM on the number of outbound travelers was significant. We found that the number of online news and blog posting have an influence on the number of outbound travelers, especially the experimental result suggests that both the month that includes the departure date and the three months before the departure were found to have an effect. It is shown that online news and blog are online media that have a significant influence on outbound tourism demand. Next, we found that the increased volume of eWOM in online news has a negative effect on departure, while the increase in a blog has a positive effect. The result with the country-specific models would be the same. This paper shows that online media can be used as a new variable in tourism demand by examining the influence of the eWOM effect of the online media. Also, we found that both social media and news media have an important role in predicting and managing the Korean tourism demand and that the influence of those two media appears different depending on the country.

Issue tracking and voting rate prediction for 19th Korean president election candidates (댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측)

  • Seo, Dae-Ho;Kim, Ji-Ho;Kim, Chang-Ki
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.199-219
    • /
    • 2018
  • With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.

A Document Collection Method for More Accurate Search Engine (정확도 높은 검색 엔진을 위한 문서 수집 방법)

  • Ha, Eun-Yong;Gwon, Hui-Yong;Hwang, Ho-Yeong
    • The KIPS Transactions:PartA
    • /
    • v.10A no.5
    • /
    • pp.469-478
    • /
    • 2003
  • Internet information search engines using web robots visit servers conneted to the Internet periodically or non-periodically. They extract and classify data collected according to their own method and construct their database, which are the basis of web information search engines. There procedure are repeated very frequently on the Web. Many search engine sites operate this processing strategically to become popular interneet portal sites which provede users ways how to information on the web. Web search engine contacts to thousands of thousands web servers and maintains its existed databases and navigates to get data about newly connected web servers. But these jobs are decided and conducted by search engines. They run web robots to collect data from web servers without knowledge on the states of web servers. Each search engine issues lots of requests and receives responses from web servers. This is one cause to increase internet traffic on the web. If each web server notify web robots about summary on its public documents and then each web robot runs collecting operations using this summary to the corresponding documents on the web servers, the unnecessary internet traffic is eliminated and also the accuracy of data on search engines will become higher. And the processing overhead concerned with web related jobs on web servers and search engines will become lower. In this paper, a monitoring system on the web server is designed and implemented, which monitors states of documents on the web server and summarizes changes of modified documents and sends the summary information to web robots which want to get documents from the web server. And an efficient web robot on the web search engine is also designed and implemented, which uses the notified summary and gets corresponding documents from the web servers and extracts index and updates its databases.

Verification of Non-Uniform Dose Distribution in Field-In-Field Technique for Breast Tangential Irradiation (유방암 절선조사 시 종속조사면 병합방법의 불균등한 선량분포 확인)

  • Park, Byung-Moon;Bae, Yong-Ki;Kang, Min-Young;Bang, Dong-Wan;Kim, Yon-Lae;Lee, Jeong-Woo
    • Journal of radiological science and technology
    • /
    • v.33 no.3
    • /
    • pp.277-282
    • /
    • 2010
  • The study is to verify non-uniform dose distribution in Field-In-Field (FIF) technique using two-dimensional ionization chamber (MatriXX, Wellhofer Dosimetrie, Germany) for breast tangential irradiation. The MatriXX and an inverse planning system (Eclipse, ver 6.5, Varian, Palo Alto, USA) were used. Hybrid plans were made from the original twenty patients plans. To verify the non-uniform dose distribution in FIF technique, each portal prescribed doses (90 cGy) was delivered to the MatriXX. The measured doses on the MatriXX were compared to the planned doses. The quantitative analyses were done with a commercial analyzing tool (OmniPro IMRT, ver. 1.4, Wellhofer Dosimetrie, Germany). The delivered doses at the normalization points were different to average 1.6% between the calculated and the measured. In analysis of line profiles, there were some differences of 1.3-5.5% (Avg: 2.4%), 0.9-3.9% (Avg: 2.5%) in longitudinal and transverse planes respectively. For the gamma index (criteria: 3 mm, 3%) analyses, there were shown that 90.23-99.69% (avg: 95.11%, std: 2.81) for acceptable range ($\gamma$-index $\geq$ 1) through the twenty patients cases. In conclusion, through our study, we have confirmed the availability of the FIF technique by comparing the calculated with the measured using MatriXX. In the future, various clinical applications of the FIF techniques would be good trials for better treatment results.

CT Simulation Technique for Craniospinal Irradiation in Supine Position (전산화단층촬영모의치료장치를 이용한 배와위 두개척수 방사선치료 계획)

  • Lee, Suk;Kim, Yong-Bae;Kwon, Soo-Il;Chu, Sung-Sil;Suh, Chang-Ok
    • Radiation Oncology Journal
    • /
    • v.20 no.2
    • /
    • pp.165-171
    • /
    • 2002
  • Purpose : In order to perform craniospinal irradiation (CSI) in the supine position on patients who are unable to lie in the prone position, a new simulation technique using a CT simulator was developed and its availability was evaluated. Materials and Method : A CT simulator and a 3-D conformal treatment planning system were used to develop CSI in the supine position. The head and neck were immobilized with a thermoplastic mask in the supine position and the entire body was immobilized with a Vac-Loc. A volumetrie image was then obtained using the CT simulator. In order to improve the reproducibility of the patients' setup, datum lines and points were marked on the head and the body. Virtual fluoroscopy was peformed with the removal of visual obstacles such as the treatment table or the immobilization devices. After the virtual simulation, the treatment isocenters of each field were marked on the body and the immobilization devices at the conventional simulation room. Each treatment field was confirmed by comparing the fluoroscopy images with the digitally reconstructed radiography (DRR)/digitally composite radiography (DCR) images from the virtual simulation. The port verification films from the first treatment were also compared with the DRR/DCR images for a geometrical verification. Results : CSI in the supine position was successfully peformed in 9 patients. It required less than 20 minutes to construct the immobilization device and to obtain the whole body volumetric images. This made it possible to not only reduce the patients' inconvenience, but also to eliminate the position change variables during the long conventional simulation process. In addition, by obtaining the CT volumetric image, critical organs, such as the eyeballs and spinal cord, were better defined, and the accuracy of the port designs and shielding was improved. The differences between the DRRs and the portal films were less than 3 mm in the vertebral contour. Conclusion : CSI in the supine position is feasible in patients who cannot lie on prone position, such as pediatric patienta under the age of 4 years, patients with a poor general condition, or patients with a tracheostomy.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

The effect of Big-data investment on the Market value of Firm (기업의 빅데이터 투자가 기업가치에 미치는 영향 연구)

  • Kwon, Young jin;Jung, Woo-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.99-122
    • /
    • 2019
  • According to the recent IDC (International Data Corporation) report, as from 2025, the total volume of data is estimated to reach ten times higher than that of 2016, corresponding to 163 zettabytes. then the main body of generating information is moving more toward corporations than consumers. So-called "the wave of Big-data" is arriving, and the following aftermath affects entire industries and firms, respectively and collectively. Therefore, effective management of vast amounts of data is more important than ever in terms of the firm. However, there have been no previous studies that measure the effects of big data investment, even though there are number of previous studies that quantitatively the effects of IT investment. Therefore, we quantitatively analyze the Big-data investment effects, which assists firm's investment decision making. This study applied the Event Study Methodology, which is based on the efficient market hypothesis as the theoretical basis, to measure the effect of the big data investment of firms on the response of market investors. In addition, five sub-variables were set to analyze this effect in more depth: the contents are firm size classification, industry classification (finance and ICT), investment completion classification, and vendor existence classification. To measure the impact of Big data investment announcements, Data from 91 announcements from 2010 to 2017 were used as data, and the effect of investment was more empirically observed by observing changes in corporate value immediately after the disclosure. This study collected data on Big Data Investment related to Naver 's' News' category, the largest portal site in Korea. In addition, when selecting the target companies, we extracted the disclosures of listed companies in the KOSPI and KOSDAQ market. During the collection process, the search keywords were searched through the keywords 'Big data construction', 'Big data introduction', 'Big data investment', 'Big data order', and 'Big data development'. The results of the empirically proved analysis are as follows. First, we found that the market value of 91 publicly listed firms, who announced Big-data investment, increased by 0.92%. In particular, we can see that the market value of finance firms, non-ICT firms, small-cap firms are significantly increased. This result can be interpreted as the market investors perceive positively the big data investment of the enterprise, allowing market investors to better understand the company's big data investment. Second, statistical demonstration that the market value of financial firms and non - ICT firms increases after Big data investment announcement is proved statistically. Third, this study measured the effect of big data investment by dividing by company size and classified it into the top 30% and the bottom 30% of company size standard (market capitalization) without measuring the median value. To maximize the difference. The analysis showed that the investment effect of small sample companies was greater, and the difference between the two groups was also clear. Fourth, one of the most significant features of this study is that the Big Data Investment announcements are classified and structured according to vendor status. We have shown that the investment effect of a group with vendor involvement (with or without a vendor) is very large, indicating that market investors are very positive about the involvement of big data specialist vendors. Lastly but not least, it is also interesting that market investors are evaluating investment more positively at the time of the Big data Investment announcement, which is scheduled to be built rather than completed. Applying this to the industry, it would be effective for a company to make a disclosure when it decided to invest in big data in terms of increasing the market value. Our study has an academic implication, as prior research looked for the impact of Big-data investment has been nonexistent. This study also has a practical implication in that it can be a practical reference material for business decision makers considering big data investment.

Calculation of Damage to Whole Crop Corn Yield by Abnormal Climate Using Machine Learning (기계학습모델을 이용한 이상기상에 따른 사일리지용 옥수수 생산량에 미치는 피해 산정)

  • Ji Yung Kim;Jae Seong Choi;Hyun Wook Jo;Moonju Kim;Byong Wan Kim;Kyung Il Sung
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.43 no.1
    • /
    • pp.11-21
    • /
    • 2023
  • This study was conducted to estimate the damage of Whole Crop Corn (WCC; Zea Mays L.) according to abnormal climate using machine learning as the Representative Concentration Pathway (RCP) 4.5 and present the damage through mapping. The collected WCC data was 3,232. The climate data was collected from the Korea Meteorological Administration's meteorological data open portal. The machine learning model used DeepCrossing. The damage was calculated using climate data from the automated synoptic observing system (ASOS, 95 sites) by machine learning. The calculation of damage was the difference between the dry matter yield (DMY)normal and DMYabnormal. The normal climate was set as the 40-year of climate data according to the year of WCC data (1978-2017). The level of abnormal climate by temperature and precipitation was set as RCP 4.5 standard. The DMYnormal ranged from 13,845-19,347 kg/ha. The damage of WCC which was differed depending on the region and level of abnormal climate where abnormal temperature and precipitation occurred. The damage of abnormal temperature in 2050 and 2100 ranged from -263 to 360 and -1,023 to 92 kg/ha, respectively. The damage of abnormal precipitation in 2050 and 2100 was ranged from -17 to 2 and -12 to 2 kg/ha, respectively. The maximum damage was 360 kg/ha that the abnormal temperature in 2050. As the average monthly temperature increases, the DMY of WCC tends to increase. The damage calculated through the RCP 4.5 standard was presented as a mapping using QGIS. Although this study applied the scenario in which greenhouse gas reduction was carried out, additional research needs to be conducted applying an RCP scenario in which greenhouse gas reduction is not performed.

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.