• Title/Summary/Keyword: 개별 시스템

Search Result 2,345, Processing Time 0.038 seconds

A Study on Industry-specific Sustainability Strategy: Analyzing ESG Reports and News Articles (산업별 지속가능경영 전략 고찰: ESG 보고서와 뉴스 기사를 중심으로)

  • WonHee Kim;YoungOk Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.287-316
    • /
    • 2023
  • As global energy crisis and the COVID-19 pandemic have emerged as social issues, there is a growing demand for companies to move away from profit-centric business models and embrace sustainable management that balances environmental, social, and governance (ESG) factors. ESG activities of companies vary across industries, and industry-specific weights are applied in ESG evaluations. Therefore, it is important to develop strategic management approaches that reflect the characteristics of each industry and the importance of each ESG factor. Additionally, with the stance of strengthened focus on ESG disclosures, specific guidelines are needed to identify and report on sustainable management activities of domestic companies. To understand corporate sustainability strategies, analyzing ESG reports and news articles by industry can help identify strategic characteristics in specific industries. However, each company has its own unique strategies and report structures, making it difficult to grasp detailed trends or action items. In our study, we analyzed ESG reports (2019-2021) and news articles (2019-2022) of six companies in the 'Finance,' 'Manufacturing,' and 'IT' sectors to examine the sustainability strategies of leading domestic ESG companies. Text mining techniques such as keyword frequency analysis and topic modeling were applied to identify industry-specific, ESG element-specific management strategies and issues. The analysis revealed that in the 'Finance' sector, customer-centric management strategies and efforts to promote an inclusive culture within and outside the company were prominent. Strategies addressing climate change, such as carbon neutrality and expanding green finance, were also emphasized. In the 'Manufacturing' sector, the focus was on creating sustainable communities through occupational health and safety issues, sustainable supply chain management, low-carbon technology development, and eco-friendly investments to achieve carbon neutrality. In the 'IT' sector, there was a tendency to focus on technological innovation and digital responsibility to enhance social value through technology. Furthermore, the key issues identified in the ESG factors were as follows: under the 'Environmental' element, issues such as greenhouse gas and carbon emission management, industry-specific eco-friendly activities, and green partnerships were identified. Under the 'Social' element, key issues included social contribution activities through stakeholder engagement, supporting the growth and coexistence of members and partner companies, and enhancing customer value through stable service provision. Under the 'Governance' element, key issues were identified as strengthening board independence through the appointment of outside directors, risk management and communication for sustainable growth, and establishing transparent governance structures. The exploration of the relationship between ESG disclosures in reports and ESG issues in news articles revealed that the sustainability strategies disclosed in reports were aligned with the issues related to ESG disclosed in news articles. However, there was a tendency to strengthen ESG activities for prevention and improvement after negative media coverage that could have a negative impact on corporate image. Additionally, environmental issues were mentioned more frequently in news articles compared to ESG reports, with environmental-related keywords being emphasized in the 'Finance' sector in the reports. Thus, ESG reports and news articles shared some similarities in content due to the sharing of information sources. However, the impact of media coverage influenced the emphasis on specific sustainability strategies, and the extent of mentioning environmental issues varied across documents. Based on our study, the following contributions were derived. From a practical perspective, companies need to consider their characteristics and establish sustainability strategies that align with their capabilities and situations. From an academic perspective, unlike previous studies on ESG strategies, we present a subdivided methodology through analysis considering the industry-specific characteristics of companies.

Export Prediction Using Separated Learning Method and Recommendation of Potential Export Countries (분리학습 모델을 이용한 수출액 예측 및 수출 유망국가 추천)

  • Jang, Yeongjin;Won, Jongkwan;Lee, Chaerok
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.69-88
    • /
    • 2022
  • One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.

A Study on the Operation Plan of the Gangwon-do Disaster Management Resources Integrated Management Center (강원도 재난관리자원 통합관리센터 운영방안에 관한 연구)

  • Hang-Il Jo;Sang-Beom Park;Kye-Won Jun
    • Journal of Korean Society of Disaster and Security
    • /
    • v.17 no.1
    • /
    • pp.9-16
    • /
    • 2024
  • In Korea, as disasters become larger and more complex, there is a trend of shifting from a focus on response and recovery to a focus on prevention and preparedness. In order to prevent and prepare for disasters, each local government manages disaster management resources by stockpiling them. However, although disaster management resources are stored in individual warehouses, they are managed by department rather than by warehouse, resulting in insufficient management of disaster management resources due to the heavy workload of those in charge. In order to intensively manage these disaster management resources, an integrated disaster management resource management center is established and managed at the metropolitan/provincial level. In the case of Gangwon-do, the subject of this study, a warehouse is rented and operated as an integrated disaster management resource management center. When leasing an integrated management center, there is the inconvenience of having to move the location every 1 to 2 years, so it is deemed necessary to build a dedicated facility in an available site. To select a location candidate, network analysis was used to measure access to and use of facilities along interconnected routes of networks such as roads and railways. During network analysis, the Location-Allocation method, which was widely used in the past to determine the location of multiple facilities, was applied. As a result, Hoengseong-gun in Gangwon-do was identified as a suitable candidate site. In addition, if the integrated management center uses our country's logistics system to stockpile disaster management resources, local governments can mobilize disaster management resources in 3 days, and it is said that it takes 3 days to return to normal life after a disaster occurs. Each city's disaster management resource stockpile is 3 days' worth per week, and the integrated management center stores 3 times the maximum of the city's 4-day stockpile.

Work & Life Balance and Conflict among Employees : Work-life Balance Effect that Reflects Work Characteristics (일·생활 균형과 구성원간 갈등관계 : 직장 내 업무 특성을 반영한 WLB 효과 중심으로)

  • Lee, Yang-pyo;Choi, Chang-bum
    • Journal of Venture Innovation
    • /
    • v.7 no.1
    • /
    • pp.183-200
    • /
    • 2024
  • Recently, with the MZ generation's entry into society and the social participation of the female population, conflicts are occurring between workplace groups that value WLB and existing groups that emphasize collaboration due to differences in work orientation. Public institutions and companies that utilize work-life balance support systems show differences in job Commitment depending on the nature of the work and the activation of the support system. Accordingly, it is necessary to verify the effectiveness of the WLB support system actually operated by the company and present universally valid standards. The purpose of this study is, first, to verify the effectiveness of the support system for work-life balance and to find practical consensus amid changes in policies and perceptions of the working environment. Second, the influence of work-life balance level and job immersion according to work characteristics was analyzed to verify the mutual influence in order to establish standards for WLB operation that reflects work characteristics. For the study, a 2X2 matrix model was used to analyze the impact of work-life balance and work characteristics on job commitment, and four hypotheses were established. First, analysis of the job involvement level of conflict-type group members, second, analysis of the job involvement level of leading group members, third, analysis of the job involvement level of agreeable group members, and fourth, analysis of the job involvement level of cooperative group members. To conduct this study, an online survey was conducted targeting employees working in public institutions and large corporations. The survey was conducted for a total of 9 days from October 23 to 31, 2023, and 163 people responded, and the analysis was based on a valid sample of 152 people, excluding 11 copies that were insincere responses or gave up midway. As a result of the study's hypothesis testing, first, the conflict type group was found to have the lowest level of job engagement at 1.43. Second, the proactive group showed the highest level of job engagement at 4.54. Third, the conformity group showed a slightly lower level of job involvement at 2.58. Fourth, the cooperative group showed a slightly higher level of job involvement at 3.80. The academic implications of the study are that it subdivides employees' personalities into factors based on the level of work-life balance and nature of work. The practical implications of the study are that it analyzes the effectiveness of WLB support systems operated by public institutions and large corporations by grouping them.

Policy Direction for The Farmland Sizing Suitable to Regional Trait (지역특성을 반영한 영농규모화사업의 발전방향-충남지역을 중심으로-)

  • Shim, Jae-Sung
    • The Journal of Natural Sciences
    • /
    • v.14 no.1
    • /
    • pp.83-121
    • /
    • 2004
  • This study was carried out to examine how solid the production foundation of rice in Chung-Nam Province is, and, if not, to probe alternative measures through the size of farms specializing in rice, of which direction would be a pivot of rice industry-oriented policy. The results obtained can be summarized as follows : 1. The amount of rice production in Chung-Nam Province is highest in Korea and the size of paddy field area is the second largest : This implying that the probability that rice production in Chung-Nam Province would be severely influenced by a global trend of market conditions. The number of farms specializing in rice becoming the core group of rice farming account for 7.7 percent of the total number of farm household in Korea. Average field area financial support which had been input to farm household by Government had a noticeable effect on the improvement of the policy of farm-size program. 2. Farm-size program in Chung-Nam Province established from 1980 to 2002 in creased the cultivation size of paddy field to 19,484 hectares, and this program enhanced the buying and selling of farmland and the number of farmland bargain reached 6,431 household and 16,517 hectares, respectively, in 1995-2002. Meanwhile, long-term letting and hiring of farmland appeared so active that the bargain acreage reached 6,970 hectares, and farm involved was 7,059 households, however, the farm-exchange-and-unity program did not satisfy our expectation, because the retirement farm operators reluctantly participated to sell their farms. Another reason that had delayed the bargain of farms rested on the general category of social complication attendant upon the exchange and unity operation for scattered farm. Such difficulties would work negative effects out to carry on the target of farm-size work in general. 3. The following measures were presented to propel the farm-size promotion program : a. Occupation shift project, followed by the social security program for retirement and elderly farm operators, should be promptly established and also a number of types of incentives for promoting the letting and hiring work and farm-exchange-and-unity program would also be set up. b. To establish the effective key system of rice production, all the farm operators should increase the unit area yield of rice and lower the production cost. To do so, a great deal of production teams of rice equipped with managerial techniques and capabilities need to be organized. And, also, there should be appropriate arrays of facilities including information system. This plan is desirable to be in line with a diversity of the structural implement of regional integration based on farm system building. c. To extend the size of farm and to improve farm management, we have to devise the enlargement of individual size of farm for maximized management and the utilization of farm-size grouping method. In conclusion, it can be said that the farm-size project in Chung-Nam Province which has continued since the 1980s was satisfactorily achieved. However, we still have a lot of problems to be solved to break down the barrier for attainment of the desirable farm-size operation work.. Farm-size project has fairly close relation with farm specialization in rice and, thus, the positive support for farm household including the integrated program for both retirement farmers and off-farm operators should be considered to pursue the progressive development of the farm-size program, which is key means to successful achievement of rice farming enforcement in Chung-Nam Province.

  • PDF

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Quality Assurance for Intensity Modulated Radiation Therapy (세기조절방사선치료(Intensity Modulated Radiation Therapy; IMRT)의 정도보증(Quality Assurance))

  • Cho Byung Chul;Park Suk Won;Oh Do Hoon;Bae Hoonsik
    • Radiation Oncology Journal
    • /
    • v.19 no.3
    • /
    • pp.275-286
    • /
    • 2001
  • Purpose : To setup procedures of quality assurance (OA) for implementing intensity modulated radiation therapy (IMRT) clinically, report OA procedures peformed for one patient with prostate cancer. Materials and methods : $P^3IMRT$ (ADAC) and linear accelerator (Siemens) with multileaf collimator are used to implement IMRT. At first, the positional accuracy, reproducibility of MLC, and leaf transmission factor were evaluated. RTP commissioning was peformed again to consider small field effect. After RTP recommissioning, a test plan of a C-shaped PTV was made using 9 intensity modulated beams, and the calculated isocenter dose was compared with the measured one in solid water phantom. As a patient-specific IMRT QA, one patient with prostate cancer was planned using 6 beams of total 74 segmented fields. The same beams were used to recalculate dose in a solid water phantom. Dose of these beams were measured with a 0.015 cc micro-ionization chamber, a diode detector, films, and an array detector and compared with calculated one. Results : The positioning accuracy of MLC was about 1 mm, and the reproducibility was around 0.5 mm. For leaf transmission factor for 10 MV photon beams, interleaf leakage was measured $1.9\%$ and midleaf leakage $0.9\%$ relative to $10\times\;cm^2$ open filed. Penumbra measured with film, diode detector, microionization chamber, and conventional 0.125 cc chamber showed that $80\~20\%$ penumbra width measured with a 0.125 cc chamber was 2 mm larger than that of film, which means a 0.125 cc ionization chamber was unacceptable for measuring small field such like 0.5 cm beamlet. After RTP recommissioning, the discrepancy between the measured and calculated dose profile for a small field of $1\times1\;cm^2$ size was less than $2\%$. The isocenter dose of the test plan of C-shaped PTV was measured two times with micro-ionization chamber in solid phantom showed that the errors upto $12\%$ for individual beam, but total dose delivered were agreed with the calculated within $2\%$. The transverse dose distribution measured with EC-L film was agreed with the calculated one in general. The isocenter dose for the patient measured in solid phantom was agreed within $1.5\%$. On-axis dose profiles of each individual beam at the position of the central leaf measured with film and array detector were found that at out-of-the-field region, the calculated dose underestimates about $2\%$, at inside-the-field the measured one was agreed within $3\%$, except some position. Conclusion : It is necessary more tight quality control of MLC for IMRT relative to conventional large field treatment and to develop QA procedures to check intensity pattern more efficiently. At the conclusion, we did setup an appropriate QA procedures for IMRT by a series of verifications including the measurement of absolute dose at the isocenter with a micro-ionization chamber, film dosimetry for verifying intensity pattern, and another measurement with an array detector for comparing off-axis dose profile.

  • PDF

A Study on the Present Condition and Improvement of Cultural Heritage Management in Seoul - Based on the Results of Regular Surveys (2016~2018) - (서울특별시 지정문화재 관리 현황 진단 및 개선방안 연구 - 정기조사(2016~2018) 결과를 중심으로 -)

  • Cho, Hong-seok;Suh, Hyun-jung;Kim, Ye-rin;Kim, Dong-cheon
    • Korean Journal of Heritage: History & Science
    • /
    • v.52 no.2
    • /
    • pp.80-105
    • /
    • 2019
  • With the increasing complexity and irregularity of disaster types, the need for cultural asset preservation and management from a proactive perspective has increased as a number of cultural properties have been destroyed and damaged by various natural and humanistic factors. In consideration of these circumstances, the Cultural Heritage Administration enacted an Act in December 2005 to enforce the regular commission of surveys for the systematic preservation and management of cultural assets, and through a recent revision of this Act, the investigation cycle has been reduced from five to three years, and the object of regular inspections has been expanded to cover registered cultural properties. According to the ordinance, a periodic survey of city- or province-designated heritage is to be carried out mainly by metropolitan and provincial governments. The Seoul Metropolitan Government prepared a legal basis for commissioning regular surveys under the Seoul Special City Cultural Properties Protection Ordinance 2008 and, in recognition of the importance of preventive management due to the large number of cultural assets located in the city center and the high demand for visits, conducted regular surveys of the entire city-designated cultural assets from 2016 to 2018. Upon the first survey being completed, it was considered necessary to review the policy effectiveness of the system and to conduct a comprehensive review of the results of the regular surveys that had been carried out to enhance the management of cultural assets. Therefore, the present study examined the comprehensive management status of the cultural assets designated by the Seoul Metropolitan Government for three years (2016-2018), assessing the performance and identifying limitations. Additionally, ways to improve it were sought, and a DB establishment plan for the establishment of an integrated management system under the auspices of the Seoul Metropolitan Government was proposed. Specifically, survey forms were administered under the Guidelines for the Operation of Periodic Surveys of National Designated Cultural Assets; however, the types of survey forms were reclassified and further subdivided in consideration of the characteristics of the designated cultural assets, and manuals were developed for consistent and specific information technologies in respect of the scope and manner of the survey. Based on this analysis, it was confirmed that 401 cases (77.0%) out of 521 cases were generally well preserved; however, 102 cases (19.6%) were found to require special measures such as attention, precision diagnosis, and repair. Meanwhile, there were 18 cases (3.4%) of unsurveyed cultural assets. These were inaccessible to the investigation at this time due to reasons such as unknown location or closure to the public. Regarding the specific types of cultural assets, among a total of 171 cultural real estate properties, 63 cases (36.8%) of structural damage were caused by the failure and elimination of members, and 73 cases (42.7%) of surface area damage were the result of biological damage. Almost all plants and geological earth and scenic spots were well preserved. In the case of movable cultural assets, 25 cases (7.1%) among 350 cases were found to have changed location, and structural damage and surface area damage was found according to specific material properties, excluding ceramics. In particular, papers, textiles, and leather goods, with material properties that are vulnerable to damage, were found to have greater damage than those of other materials because they were owned and managed by individuals and temples. Thus, it has been confirmed that more proactive management is needed. Accordingly, an action plan for the comprehensive preservation and management status check shall be developed according to management status and urgency, and the project promotion plan and the focus management target should be selected and managed first. In particular, concerning movable cultural assets, there have been some cases in which new locations have gone unreported after changes in ownership (management); therefore, a new system is required to strengthen the obligation to report changes in ownership (management) or location. Based on the current status diagnosis and improvement measures, it is expected that the foundation of a proactive and efficient cultural asset management system can be realized through the establishment of an effective mid- to long-term database of the integrated management system pursued by the Seoul Metropolitan Government.

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

  • Byun, Sungho;Lee, Donghoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.23-43
    • /
    • 2016
  • As a result of the growth of Internet data and the rapid development of Internet technology, "big data" analysis has gained prominence as a major approach for evaluating and mining enormous data for various purposes. Especially, in recent years, people tend to share their experiences related to their leisure activities while also reviewing others' inputs concerning their activities. Therefore, by referring to others' leisure activity-related experiences, they are able to gather information that might guarantee them better leisure activities in the future. This phenomenon has appeared throughout many aspects of leisure activities such as movies, traveling, accommodation, and dining. Apart from blogs and social networking sites, many other websites provide a wealth of information related to leisure activities. Most of these websites provide information of each product in various formats depending on different purposes and perspectives. Generally, most of the websites provide the average ratings and detailed reviews of users who actually used products/services, and these ratings and reviews can actually support the decision of potential customers in purchasing the same products/services. However, the existing websites offering information on leisure activities only provide the rating and review based on one stage of a set of evaluation criteria. Therefore, to identify the main issue for each evaluation criterion as well as the characteristics of specific elements comprising each criterion, users have to read a large number of reviews. In particular, as most of the users search for the characteristics of the detailed elements for one or more specific evaluation criteria based on their priorities, they must spend a great deal of time and effort to obtain the desired information by reading more reviews and understanding the contents of such reviews. Although some websites break down the evaluation criteria and direct the user to input their reviews according to different levels of criteria, there exist excessive amounts of input sections that make the whole process inconvenient for the users. Further, problems may arise if a user does not follow the instructions for the input sections or fill in the wrong input sections. Finally, treating the evaluation criteria breakdown as a realistic alternative is difficult, because identifying all the detailed criteria for each evaluation criterion is a challenging task. For example, if a review about a certain hotel has been written, people tend to only write one-stage reviews for various components such as accessibility, rooms, services, or food. These might be the reviews for most frequently asked questions, such as distance between the nearest subway station or condition of the bathroom, but they still lack detailed information for these questions. In addition, in case a breakdown of the evaluation criteria was provided along with various input sections, the user might only fill in the evaluation criterion for accessibility or fill in the wrong information such as information regarding rooms in the evaluation criteria for accessibility. Thus, the reliability of the segmented review will be greatly reduced. In this study, we propose an approach to overcome the limitations of the existing leisure activity information websites, namely, (1) the reliability of reviews for each evaluation criteria and (2) the difficulty of identifying the detailed contents that make up the evaluation criteria. In our proposed methodology, we first identify the review content and construct the lexicon for each evaluation criterion by using the terms that are frequently used for each criterion. Next, the sentences in the review documents containing the terms in the constructed lexicon are decomposed into review units, which are then reconstructed by using the evaluation criteria. Finally, the issues of the constructed review units by evaluation criteria are derived and the summary results are provided. Apart from the derived issues, the review units are also provided. Therefore, this approach aims to help users save on time and effort, because they will only be reading the relevant information they need for each evaluation criterion rather than go through the entire text of review. Our proposed methodology is based on the topic modeling, which is being actively used in text analysis. The review is decomposed into sentence units rather than considering the whole review as a document unit. After being decomposed into individual review units, the review units are reorganized according to each evaluation criterion and then used in the subsequent analysis. This work largely differs from the existing topic modeling-based studies. In this paper, we collected 423 reviews from hotel information websites and decomposed these reviews into 4,860 review units. We then reorganized the review units according to six different evaluation criteria. By applying these review units in our methodology, the analysis results can be introduced, and the utility of proposed methodology can be demonstrated.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.