• Title/Summary/Keyword: 표본수 결정

Search Result 446, Processing Time 0.025 seconds

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Analysis of Factors Affecting the Length of Stay in Children(Aged 0 to 12) with Injuries: Centering Around the Data from the Korea National Hospital Discharge In-Depth Injury Surveys (어린이(0-12세) 손상환자의 재원일수에 미치는 요인분석: 퇴원손상심층자료를 중심으로)

  • Lee Chae Kyung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.137-143
    • /
    • 2023
  • This study was conducted to analyze factors affecting the length of stay in children with injuries by determining relationships between length of stay and characteristics of children(aged 0 to 12) with injuries. 7,804 patients aged 0 to 12 who participated in the Korea Nation Hospital Discharge In-Depth Injury Surveys, got a diagnosis of sequelae of injuries and of other consequences of external causes(S00-T98), and were discharged between 1 January 2016 and 31 December 2020 were investigated. A frequency analysis, independent samples t-test, and ANOVA were performed. Also, to identify factors affecting the length of stay, a regression analysis was performed. The average length of stay for the patients investigated in this study was 5.5 days. The length of stay for school-age children(aged 7 to 12) and children who had either public or private coverage was higher than that for preschoolers(aged 0 to 6) and children who didn't have public or private coverage, respectively. The length of stay for children admitted to a hospital in a rural area(Jeolla-do or Gyeongsang-do) was higher than that for children admitted to a hospital in a metropolitan area and the length of stay for children admitted to a hospital that had 100-299 hospital beds was relatively long. However, children who first visited a hospital for outpatient care stayed relatively short in hospital and children who had been burned or injured in traffic crashes stayed relatively long in hospital. Children who got a secondary diagnosis and had a principal procedure or who died after being discharged were in hospital for a long time. The findings of this study shall be useful, as they identified characteristics related to the length of stay for Korean children with injuries and factors that determine the length of stay for those children by analyzing the national dataset, or more specifically, the data from the Korea National Hospital Discharge In-Depth Injury Surveys. The risk of child injuries can be easily reduced by taking actions to prevent them and providing safety education programs. The present study has provided essential baseline data for the provision of aggressive care for child injuries and the establishment of a range of policies for child injury prevention.

Effect of the Angle of Ventricular Septal Wall on Left Anterior Oblique View in Multi-Gated Cardiac Blood Pool Scan (게이트 심장 혈액풀 스캔에서 심실중격 각도에 따른 좌전사위상 변화에 대한 연구)

  • You, Yeon Wook;Lee, Chung Wun;Seo, Yeong Deok;Choi, Ho Yong;Kim, Yun Cheol;Kim, Yong Geun;Won, Woo Jae;Bang, Ji-In;Lee, Soo Jin;Kim, Tae-Sung
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.20 no.1
    • /
    • pp.13-19
    • /
    • 2016
  • Purpose In order to calculate the left ventricular ejection fraction (LVEF) accurately, it is important to acquire the best septal view of left ventricle in the multi-gated cardiac blood pool scan (GBP). This study aims to acquire the best septal view by measuring angle of ventricular septal wall (${\theta}$) using enhanced CT scan and compare with conventional method using left anterior oblique (LAO) 45 view. Materials and Methods From March to July in 2015, we analyzed the 253 patients who underwent both enhanced chest CT and GBP scan in the department of nuclear medicine at National Cancer Center. Angle (${\theta}$) between ventricular septum and imaginary midline was measured in transverse image of enhanced chest CT scan, and the patients whose difference between the angle of ${\theta}$ and 45 degree was more than 10 degrees were included. GBP scan was acquired using both LAO 45 and LAO ${\theta}$ views, and LVEFs measured by automated and manual region of interest (Auto-ROI and Manual-ROI) modes respectively were analyzed. Results $Mean{\pm}SD$ of ${\theta}$ on total 253 patients was $37.0{\pm}8.5^{\circ}$. Among them, the patients whose difference between 45 and ${\theta}$ degrees were more than ${\pm}10$ degrees were 88 patients ($29.3{\pm}6.1^{\circ}$). In Auto-ROI mode, there was statistically significant difference between LAO 45 and LAO ${\theta}$ (LVEF $45=62.0{\pm}6.6%$ vs. LVEF ${\theta}=64.0{\pm}5.6%$; P = 0.001). In Manual-ROI mode, there was also statistically significant difference between LAO 45 and LAO ${\theta}$ (LVEF $45=66.7{\pm}7.2%$ vs. LVEF ${\theta}=69.0{\pm}6.4%$; P < 0.001). Intraclass correlation coefficients of both methods were more than 95%. In case of comparison between Auto-ROI and Manual ROI of each LAO 45 and LAO ${\theta}$, there was no significant difference statistically. Conclusion We could measure the angle of ventricular septal wall accurately by using transverse image of enhanced chest CT and applied to LAO acquisition in the GBP scan. It might be the alternative method to acquire the best septal view of LAO effectively. We could notify significant difference between conventional LAO 45 and LAO ${\theta}$ view.

  • PDF

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

The Characteristics and Performances of Manufacturing SMEs that Utilize Public Information Support Infrastructure (공공 정보지원 인프라 활용한 제조 중소기업의 특징과 성과에 관한 연구)

  • Kim, Keun-Hwan;Kwon, Taehoon;Jun, Seung-pyo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.1-33
    • /
    • 2019
  • The small and medium sized enterprises (hereinafter SMEs) are already at a competitive disadvantaged when compared to large companies with more abundant resources. Manufacturing SMEs not only need a lot of information needed for new product development for sustainable growth and survival, but also seek networking to overcome the limitations of resources, but they are faced with limitations due to their size limitations. In a new era in which connectivity increases the complexity and uncertainty of the business environment, SMEs are increasingly urged to find information and solve networking problems. In order to solve these problems, the government funded research institutes plays an important role and duty to solve the information asymmetry problem of SMEs. The purpose of this study is to identify the differentiating characteristics of SMEs that utilize the public information support infrastructure provided by SMEs to enhance the innovation capacity of SMEs, and how they contribute to corporate performance. We argue that we need an infrastructure for providing information support to SMEs as part of this effort to strengthen of the role of government funded institutions; in this study, we specifically identify the target of such a policy and furthermore empirically demonstrate the effects of such policy-based efforts. Our goal is to help establish the strategies for building the information supporting infrastructure. To achieve this purpose, we first classified the characteristics of SMEs that have been found to utilize the information supporting infrastructure provided by government funded institutions. This allows us to verify whether selection bias appears in the analyzed group, which helps us clarify the interpretative limits of our study results. Next, we performed mediator and moderator effect analysis for multiple variables to analyze the process through which the use of information supporting infrastructure led to an improvement in external networking capabilities and resulted in enhancing product competitiveness. This analysis helps identify the key factors we should focus on when offering indirect support to SMEs through the information supporting infrastructure, which in turn helps us more efficiently manage research related to SME supporting policies implemented by government funded institutions. The results of this study showed the following. First, SMEs that used the information supporting infrastructure were found to have a significant difference in size in comparison to domestic R&D SMEs, but on the other hand, there was no significant difference in the cluster analysis that considered various variables. Based on these findings, we confirmed that SMEs that use the information supporting infrastructure are superior in size, and had a relatively higher distribution of companies that transact to a greater degree with large companies, when compared to the SMEs composing the general group of SMEs. Also, we found that companies that already receive support from the information infrastructure have a high concentration of companies that need collaboration with government funded institution. Secondly, among the SMEs that use the information supporting infrastructure, we found that increasing external networking capabilities contributed to enhancing product competitiveness, and while this was no the effect of direct assistance, we also found that indirect contributions were made by increasing the open marketing capabilities: in other words, this was the result of an indirect-only mediator effect. Also, the number of times the company received additional support in this process through mentoring related to information utilization was found to have a mediated moderator effect on improving external networking capabilities and in turn strengthening product competitiveness. The results of this study provide several insights that will help establish policies. KISTI's information support infrastructure may lead to the conclusion that marketing is already well underway, but it intentionally supports groups that enable to achieve good performance. As a result, the government should provide clear priorities whether to support the companies in the underdevelopment or to aid better performance. Through our research, we have identified how public information infrastructure contributes to product competitiveness. Here, we can draw some policy implications. First, the public information support infrastructure should have the capability to enhance the ability to interact with or to find the expert that provides required information. Second, if the utilization of public information support (online) infrastructure is effective, it is not necessary to continuously provide informational mentoring, which is a parallel offline support. Rather, offline support such as mentoring should be used as an appropriate device for abnormal symptom monitoring. Third, it is required that SMEs should improve their ability to utilize, because the effect of enhancing networking capacity through public information support infrastructure and enhancing product competitiveness through such infrastructure appears in most types of companies rather than in specific SMEs.

The 1998, 1999 Patterns of Care Study for Breast Irradiation After Breast-Conserving Surgery in Korea (1998, 1999년도 우리나라에서 시행된 유방보존수술 후 방사선치료 현황 조사)

  • Suh Chang-Ok;Shin Hyun Soo;Cho Jae Ho;Park Won;Ahn Seung Do;Shin Kyung Hwan;Chung Eun Ji;Keum Ki Chang;Ha Sung Whan;Ahn Sung Ja;Kim Woo Cheol;Lee Myung Za;Ahn Ki Jung
    • Radiation Oncology Journal
    • /
    • v.22 no.3
    • /
    • pp.192-199
    • /
    • 2004
  • Purpose: To determine the patterns on evaluation and treatment in the patient with early breast cancer treated with conservative surgery and radiotherapy and to improve the radiotherapy techiniques, nationwide survey was peformed. Materials and Methods: A web-based database system for korean Patterns of Care Study (PCS) for 6 common cancers was developed. Two hundreds sixty-one randomly selected records of eligible patients treated between 1998$\~$1999 from 15 hospitals were reviewed. Results: The patients ages ranged from 24 to 85 years(median 45 years). Infiltrating ductal carcinoma was most common histologic type (88.9$\%$) followed by medullary carcinoma (4.2$\%$) and infiltrating lobular carcinoma (1.5$\%$). Pathologic T stage by AJCC was T1 in 59.7$\%$ of the casses, T2 in 29.5$\%$ of the cases, Tis in 8.8$\%$ of the cases. Axillary lymph node dissection was peformed I\in 91.2$\%$ of the cases and 69.7$\%$ were node negative. AJCC stage was 0 in 8.8$\%$ of the cases, stage I in 44.9$\%$ of the cases, stage IIa in 33.3$\%$ of the cases, and stage IIb in 8.4$\%$ of the cases. Estrogen and progesteron receptors were evaluated in 71.6$\%$, and 70.9$\%$ of the patients, respectively. Surgical methods of breast-conserving surgery was excision/lumpectomy in 37.2$\%$, wide excision in 11.5$\%$, quadrantectomy in 23$\%$ and partial mastectomy in 27.5$\%$ of the cases. A pathologically confirmed negative margin was obtained in 90.8$\%$ of the cases. Pathological margin was involved with tumor in 10 patients and margin was close (less than 2 mm) in 10 patients. All the patients except one recieved more than 90$\%$ of the planned radiotherapy dose. Radiotherapy volume was breast only In 88$\%$ of the cases, breast+supraclavicular fossa (SCL) in 5$\%$ of the cases, and breast+ SCL+ posterior axillary boost in 4.2%$\%$of the cases. Only one patient received isolated internal mammary lymph node irradiation. Used radiation beam was Co-60 in 8 cases, 4 MV X-ray in 115 cases, 6 MV X-ray in 125 cases, and 10 MV X-ray in 11 cases. The radiation dose to the whole breast was 45$\~$59.4 Gy (median 50.4) and boost dose was 8$\~$20 Gy (median 10 Gy). The total radiation dose delivered was 50.4$\~$70.4 Gy (median 60.4 Gy). Conclusion: There was no major deviation from current standard in the patterns of evaluation and treatment for the patients with early breast cancer treated with breast conservation method. Some varieties were identified in boost irradiation dose. Separate analysis for the datails of radiotherapy planning will be followed and the outcome of treatment is needed to evaluate the process.