• Title/Summary/Keyword: 정보시스템 효과성

Search Result 3,199, Processing Time 0.037 seconds

The Advancement of Underwriting Skill by Selective Risk Acceptance (보험Risk 세분화를 통한 언더라이팅 기법 선진화 방안)

  • Lee, Chan-Hee
    • The Journal of the Korean life insurance medical association
    • /
    • v.24
    • /
    • pp.49-78
    • /
    • 2005
  • Ⅰ. 연구(硏究) 배경(背景) 및 목적(目的) o 우리나라 보험시장의 세대가입율은 86%로 보험시장 성숙기에 진입하였으며 기존의 전통적인 전업채널에서 방카슈랑스의 도입, 온라인전문보험사의 출현, TM 영업의 성장세 等멀티채널로 진행되고 있음 o LTC(장기간병), CI(치명적질환), 실손의료보험 등(等)선 진형 건강상품의 잇따른 출시로 보험리스크 관리측면에서 언더라이팅의 대비가 절실한 시점임 o 상품과 마케팅 等언더라이팅 측면에서 매우 밀접한 영역의 변화에 발맞추어 언더라이팅의 인수기법의 선진화가 시급히 요구되는 상황하에서 위험을 적절히 분류하고 평가하는 선진적 언더라이팅 기법 구축이 필수 적임 o 궁극적으로 고객의 다양한 보장니드 충족과 상품, 마케팅, 언더라이팅의 경쟁력 강화를 통한 보험사의 종합이익 극대화에 기여할 수 있는 방안을 모색하고자 함 Ⅱ. 선진보험시장(先進保險市場)Risk 세분화사례(細分化事例) 1. 환경적위험(環境的危險)에 따른 보험료(保險料) 차등(差等) (1) 위험직업 보험료 할증 o 미국, 유럽등(等) 대부분의 선진시장에서는 가입당시 피보험자의 직업위험도에 따라 보험료를 차등 적용중(中)임 o 가입하는 보장급부에 따라 직업 분류방법 및 할증방식도 상이하며 일반사망과 재해사망,납입면제, DI에 대해서 별도의 방법을 사용함 o 할증적용은 표준위험율의 일정배수를 적용하여 할증 보험료를 산출하거나, 가입금액당 일정한 추가보험료를 적용하고 있음 - 광부의 경우 재해사망 가입시 표준위험율의 300% 적용하며, 일반사망 가입시 $1,000당 $2.95 할증보험료 부가 (2) 위험취미 보험료 할증 o 취미와 관련 사고의 지속적 다발로 취미활동도 위험요소로 인식되어 보험료를 차등 적용중(中)임 o 할증보험료는 보험가입금액당 일정비율로 부가(가입 금액과 무관)하며, 신종레포츠 등(等)일부 위험취미는 통계의 부족으로 언더라이터가 할증율 결정하여 적용함 - 패러글라이딩 년(年)$26{\sim}50$회(回) 취미생활의 경우 가입금액 $1,000당 재해사망 $2, DI보험 8$ 할증보험료 부가 o 보험료 할증과는 별도로 위험취미에 대한 부담보를 적용함. 위험취미 활동으로 인한 보험사고 발생시 사망을 포함한 모든 급부에 대한 보장을 부(不)담보로 인수함. (3) 위험지역 거주/ 여행 보험료 할증 o 피보험자가 거주하고 있는 특정국가의 임시 혹은 영구적 거주시 기후위험, 거주지역의 위생과 의료수준, 여행위험, 전쟁과 폭동위험 등(等)을 고려하여 평가 o 일반사망, 재해사망 등(等)보장급부별로 할증보험료 부가 또는 거절 o 할증보험료는 보험全기간에 대해 동일하게 적용 - 러시아의 경우 가입금액 $1,000당 일반사망은 2$의 할증보험료 부가, 재해사망은 거절 (4) 기타 위험도에 대한 보험료 차등 o 비행관련 위험은 세가지로 분류(항공운송기, 개인비행, 군사비행), 청약서, 추가질문서, 진단서, 비행이력 정보를 바탕으로 할증보험료를 부가함 - 농약살포비행기조종사의 경우 가입금액 $1,000당 일반사망 6$의 할증보험료 부가, 재해사망은 거절 o 미국, 일본등(等)서는 교통사고나 교통위반 관련 기록을 활용하여 무(無)사고운전자에 대해 보험료 할인(우량체 위험요소로 활용) 2. 신체적위험도(身體的危險度)에 따른 보험료차등(保險料差等) (1) 표준미달체 보험료 할증 1) 총위험지수 500(초과위험지수 400)까지 인수 o 300이하는 25점단위, 300점 초과는 50점 단위로 13단계로 구분하여 할증보험료를 적용중(中)임 2) 삭감법과 할증법을 동시 적용 o 보험금 삭감부분만큼 할증보험료가 감소하는 효과가 있어 청약자에게 선택의 기회를 제공할수 있으며 고(高)위험 피보험자에게 유용함 3) 특정암에 대한 기왕력자에 대해 단기(Temporary)할증 적용 o 질병성향에 따라 가입후 $1{\sim}5$년간 할증보험료를 부가하고 보험료 할증 기간이 경과한 후에는 표준체보험료를 부가함 4) 할증보험료 반환옵션(Return of the extra premium)의 적용 o 보험계약이 유지중(中)이며, 일정기간 생존시 할증보험료가 반환됨 (2) 표준미달체 급부증액(Enhanced annuity) o 영국에서는 표준미달체를 대상으로 연금급부를 증가시킨 증액형 연금(Enhanced annuity) 상품을 개발 판매중(中)임 o 흡연, 직업, 병력 등(等)다양한 신체적, 환경적 위험도에 따라 표준체에 비해 증액연금을 차등 지급함 (3) 우량 피보험체 가격 세분화 o 미국시장에서는 $8{\sim}14$개 의적, 비(非)의적 위험요소에 대한 평가기준에 따라 표준체를 최대 8개 Class로 분류하여 할인보험료를 차등 적용 - 기왕력, 혈압, 가족력, 흡연, BMI, 콜레스테롤, 운전, 위험취미, 거주지, 비행력, 음주/마약 등(等) o 할인율은 회사, Class, 가입기준에 따라 상이(최대75%)하며, 가입연령은 최저 $16{\sim}20$세, 최대 $65{\sim}75$세, 최저보험금액은 10만달러(HIV검사가 필요한 최저 금액) o 일본시장에서는 $3{\sim}4$개 위험요소에 따라 $3{\sim}4$개 Class로 분류 우량체 할인중(中)임 o 유럽시장에서는 영국 등(等)일부시장에서만 비(非)흡연할인 또는 우량체할인 적용 Ⅲ. 국내보험시장(國內保險市場) 현황(現況)및 문제점(問題點) 1. 환경적위험도(環境的危險度)에 따른 가입한도제한(加入限度制限) (1) 위험직업 보험가입 제한 o 업계공동의 직업별 표준위험등급에 따라 각 보험사 자체적으로 위험등급별 가입한도를 설정 운영중(中)임. 비(非)위험직과의 형평성, 고(高)위험직업 보장 한계, 수익구조 불안정화 등(等)문제점을 내포하고 있음 - 광부의 경우 위험1급 적용으로 사망 최대 1억(億), 입원 1일(日) 2만원까지 제한 o 금융감독원이 2002년(年)7월(月)위험등급별 위험지수를 참조 위험율로 인가하였으나, 비위험직은 70%, 위험직은 200% 수준으로 산정되어 현실적 적용이 어려움 (2) 위험취미 보험가입 제한 o 해당취미의 직업종사자에 준(準)하여 직업위험등급을 적용하여 가입 한도를 제한하고 있음. 추가질문서를 활용하여 자격증 유무, 동호회 가입등(等)에 대한 세부정보를 입수하지 않음 - 패러글라이딩의 경우 위험2급을 적용, 사망보장 최대 2 억(億)까지 제한 (3) 거주지역/ 해외여행 보험가입 제한 o 각(各)보험사별로 지역적 특성상 사고재해 다발 지역에 대해 보험가입을 제한하고 있음 - 강원, 충청 일부지역 상해보험 가입불가 - 전북, 태백 일부지역 입원급여금 1일(日)2만원이내 o 해외여행을 포함한 해외체류에 대해서는 일정한 가입 요건을 정하여 운영중(中)이며, 가입한도 설정 보험가입을 제한하거나 재해집중보장 상품에 대해 거절함 - 러시아의 경우 단기체류는 위험1급 및 상해보험 가입 불가, 장기 체류는 거절처리함 2. 신체적위험도(身體的危險度)에 따른 인수차별화(引受差別化) (1) 표준미달체 인수방법 o 체증성, 항상성 위험에 대한 초과위험지수를 보험금삭감법으로 전환 사망보험에 적용(최대 5년(年))하여 5년(年)이후 보험 Risk노출 심각 o 보험료 할증은 일부 회사에서 주(主)보험 중심으로 사용중(中)이며, 총위험지수 300(8단계)까지 인수 - 주(主)보험 할증시 특약은 가입 불가하며, 암 기왕력자는 대부분 거절 o 신체부위 39가지, 질병 5가지에 대해 부담보 적용(입원, 수술 등(等)생존급부에 부담보) (2) 비(非)흡연/ 우량체 보험료 할인 o 1999년(年)최초 도입 이래 $3{\sim}4$개의 위험요소로 1개 Class 운영중(中)임 S생보사의 경우 비(非)흡연우량체, 비(非)흡연표준체의 2개 Class 운영 o 보험료 할인율은 회사, 상품에 따라 상이하며 최대 22%(영업보험료기준)임. 흡연여부는 뇨스틱을 활용 코티닌테스트를 실시함 o 우량체 판매는 신계약의 $2{\sim}15%$수준(회사의 정책에 따라 상이) Ⅳ. 언더라이팅 기법(技法) 선진화(先進化) 방안(方案) 1. 직업위험도별 보험료 차등 적용 o 생 손보 직업위험등급 일원화와 연계하여 3개등급으로 위험지수개편, 비위험직 기준으로 보험요율 차별적용 2. 위험취미에 대한 부담보 적용 o 해당취미를 원인으로 보험사고(사망포함) 발생시 부담보 제도 도입 3. 표준미달체 인수기법 선진화를 통한 인수범위 대폭 확대 o 보험료 할증법 적용 확대를 통한 Risk 헷지로 총위험지수 $300{\rightarrow}500$으로 확대(거절건 최소화) 4. 보험료 할증법 보험금 삭감 병행 적용 o 삭감기간을 적용한 보험료 할증방식 개발, 고객에게 선택권 제공 5. 기한부 보험료할증 부가 o 위암, 갑상선암 등(等)특정암의 성향에 따라 위험도가 높은 가입초기에 평준할증보험료를 적용하여 인수 6. 보험료 할증법 부가특약 확대 적용, 부담보 병행 사용 o 정기특약 등(等)사망관련 특약에 할증법 확대, 생존급부 특약은 부담보 7. 표준체 고객 세분화 확대 o 콜레스테롤, HDL 등(等)위험평가요소 확대를 통한 Class 세분화 Ⅴ. 기대효과(期待效果) 1. 고(高)위험직종사자, 위험취미자, 표준미달체에 대한 보험가입 문호개방 2. 보험계약자간 형평성 제고 및 다양한 고객의 보장니드에 부응 3. 상품판매 확대 및 Risk헷지를 통한 수입보험료 증대 및 사차익 개선 4. 본격적인 가격경쟁에 대비한 보험사 체질 개선 5. 회사 이미지 제고 및 진단 거부감 해소, 포트폴리오 약화 방지 Ⅵ. 결론(結論) o 종래의 소극적이고 일률적인 인수기법에서 탈피하여 피보험자를 다양한 측면에서 위험평가하여 적정 보험료 부가와 합리적 가입조건을 제시하는 적절한 위험평가 수단을 도입하고, o 언더라이팅 인수기법의 선진화와 함께 언더라이팅 인력의 전문화, 정보입수 및 시스템 인프라의 구축 등이 병행함으로써, o 보험사의 사차손익 관리측면에서 뿐만 아니라 보험시장 개방 및 급변하는 보험환경에 대비한 한국 생보언더라이팅 경쟁력 강화 및 언더라이터의 글로벌화에도 크게 기여할 것임.

  • PDF

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.

Measuring the Economic Impact of Item Descriptions on Sales Performance (온라인 상품 판매 성과에 영향을 미치는 상품 소개글 효과 측정 기법)

  • Lee, Dongwon;Park, Sung-Hyuk;Moon, Songchun
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.1-17
    • /
    • 2012
  • Personalized smart devices such as smartphones and smart pads are widely used. Unlike traditional feature phones, theses smart devices allow users to choose a variety of functions, which support not only daily experiences but also business operations. Actually, there exist a huge number of applications accessible by smart device users in online and mobile application markets. Users can choose apps that fit their own tastes and needs, which is impossible for conventional phone users. With the increase in app demand, the tastes and needs of app users are becoming more diverse. To meet these requirements, numerous apps with diverse functions are being released on the market, which leads to fierce competition. Unlike offline markets, online markets have a limitation in that purchasing decisions should be made without experiencing the items. Therefore, online customers rely more on item-related information that can be seen on the item page in which online markets commonly provide details about each item. Customers can feel confident about the quality of an item through the online information and decide whether to purchase it. The same is true of online app markets. To win the sales competition against other apps that perform similar functions, app developers need to focus on writing app descriptions to attract the attention of customers. If we can measure the effect of app descriptions on sales without regard to the app's price and quality, app descriptions that facilitate the sale of apps can be identified. This study intends to provide such a quantitative result for app developers who want to promote the sales of their apps. For this purpose, we collected app details including the descriptions written in Korean from one of the largest app markets in Korea, and then extracted keywords from the descriptions. Next, the impact of the keywords on sales performance was measured through our econometric model. Through this analysis, we were able to analyze the impact of each keyword itself, apart from that of the design or quality. The keywords, comprised of the attribute and evaluation of each app, are extracted by a morpheme analyzer. Our model with the keywords as its input variables was established to analyze their impact on sales performance. A regression analysis was conducted for each category in which apps are included. This analysis was required because we found the keywords, which are emphasized in app descriptions, different category-by-category. The analysis conducted not only for free apps but also for paid apps showed which keywords have more impact on sales performance for each type of app. In the analysis of paid apps in the education category, keywords such as 'search+easy' and 'words+abundant' showed higher effectiveness. In the same category, free apps whose keywords emphasize the quality of apps showed higher sales performance. One interesting fact is that keywords describing not only the app but also the need for the app have asignificant impact. Language learning apps, regardless of whether they are sold free or paid, showed higher sales performance by including the keywords 'foreign language study+important'. This result shows that motivation for the purchase affected sales. While item reviews are widely researched in online markets, item descriptions are not very actively studied. In the case of the mobile app markets, newly introduced apps may not have many item reviews because of the low quantity sold. In such cases, item descriptions can be regarded more important when customers make a decision about purchasing items. This study is the first trial to quantitatively analyze the relationship between an item description and its impact on sales performance. The results show that our research framework successfully provides a list of the most effective sales key terms with the estimates of their effectiveness. Although this study is performed for a specified type of item (i.e., mobile apps), our model can be applied to almost all of the items traded in online markets.

Monitoring of a Time-series of Land Subsidence in Mexico City Using Space-based Synthetic Aperture Radar Observations (인공위성 영상레이더를 이용한 멕시코시티 시계열 지반침하 관측)

  • Ju, Jeongheon;Hong, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1657-1667
    • /
    • 2021
  • Anthropogenic activities and natural processes have been causes of land subsidence which is sudden sinking or gradual settlement of the earth's solid surface. Mexico City, the capital of Mexico, is one of the most severe land subsidence areas which are resulted from excessive groundwater extraction. Because groundwater is the primary water resource occupies almost 70% of total water usage in the city. Traditional terrestrial observations like the Global Navigation Satellite System (GNSS) or leveling survey have been preferred to measure land subsidence accurately. Although the GNSS observations have highly accurate information of the surfaces' displacement with a very high temporal resolution, it has often been limited due to its sparse spatial resolution and highly time-consuming and high cost. However, space-based synthetic aperture radar (SAR) interferometry has been widely used as a powerful tool to monitor surfaces' displacement with high spatial resolution and high accuracy from mm to cm-scale, regardless of day-or-night and weather conditions. In this paper, advanced interferometric approaches have been applied to get a time-series of land subsidence of Mexico City using four-year-long twenty ALOS PALSAR L-band observations acquired from Feb-11, 2007 to Feb-22, 2011. We utilized persistent scatterer interferometry (PSI) and small baseline subset (SBAS) techniques to suppress atmospheric artifacts and topography errors. The results show that the maximum subsidence rates of the PSI and SBAS method were -29.5 cm/year and -27.0 cm/year, respectively. In addition, we discuss the different subsidence rates where the study area is discriminated into three districts according to distinctive geotechnical characteristics. The significant subsidence rate occurred in the lacustrine sediments with higher compressibility than harder bedrock.

<Field Action Report> Local Governance for COVID-19 Response of Daegu Metropolitan City (<사례보고> 코로나바이러스감염증-19 유행과 로컬 거버넌스 - 2020년 대구광역시 유행에 대한 대응을 중심으로 -)

  • Kyeong-Soo Lee;Jung Jeung Lee;Keon-Yeop Kim;Jong-Yeon Kim;Tae-Yoon Hwang;Nam-Soo Hong;Jun Hyun Hwang;Jaeyoung Ha
    • Journal of agricultural medicine and community health
    • /
    • v.49 no.1
    • /
    • pp.13-36
    • /
    • 2024
  • Objectives: The purpose of this field case report is 1) to analyze the community's strategy and performance in responding to infectious diseases through the case of COVID-19 infectious disease crisis response of Daegu Metropolitan City, and 2) to interpret this case using governance theory and infectious disease response governance framework. and 3) to propose a strategic model to prepare for future infectious disease outbreaks of the community. Methods: Cases of Daegu Metropolitan City's infectious disease crisis response were analyzed through researchers' participatory observations. And review of OVID-19 White Paper of Daegu Metropolitan City, Daegu Medical Association's COVID-19 White Paper, and literature review of domestic and international governance, and administrative documents. Results: Through the researcher's participatory observation and literature review, 1) establishment of leadership and response system to respond to the infectious disease crisis in Daegu Metropolitan City, 2) citizen's participation and communication strategy through the pan-citizen response committee, 3) cooperation between Daegu Metropolitan City and governance of public-private medical facilities, 4) decision-making and crisis response through participation and communication between the Daegu Metropolitan City Medical Association, Medi-City Daegu Council, and medical experts of private sector, 5) symptom monitoring and patient triage strategies and treatment response for confirmed infectious disease patients by member of Daegu Medical Association, 6) strategies and implications for establishing and utilizing a local infectious disease crisis response information system were derived. Conclusions: The results of the study empirically demonstrate that collaborative governance of the community through the participation of citizens, private sector experts, and community medical facilities is a key element for effective response to infectious disease crises.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Human Embryonic Stem Cell-derived Neuroectodermal Spheres Revealing Neural Precursor Cell Properties (인간 배아줄기세포 유래 신경전구세포의 특성 분석)

  • Han, Hyo-Won;Kim, Jang-Hwan;Kang, Man-Jong;Moon, Seong-Ju;Kang, Yong-Kook;Koo, Deog-Bon;Cho, Yee-Sook
    • Development and Reproduction
    • /
    • v.12 no.1
    • /
    • pp.87-95
    • /
    • 2008
  • Neural stem/precursor derived from pluripotent human embryonic stem cells (hESCs) has considerable therapeutic potential due to their ability to generate various neural cells which can be used in cell-replacement therapies for neurodegenerative diseases. However, production of neural cells from hESCs remains technically very difficult. Understanding neural-tube like rosette characteristic neural precursor cells from hESCs may provide useful information to increase the efficiency of hESC neural differentiation. Generally, neural rosettes were derived from differentiating hEBs in attached culture system, however this is time-consuming and complicated. Here, we examined if neural rosettes could be formed in suspension culture system by bypassing attachment requirement. First, we tested whether the size of hESC clumps affected the formation of human embryonic bodies (hEBs) and neural differentiation. We confirmed that hEBs derived from $500{\times}500\;{\mu}m$ square sized hESC clumps were effectively differentiated into neural lineage than those of the other sizes. To induce the rosette formation, regular size hEBs were derived by incubation of hESC clumps($500{\times}500\;{\mu}m$) in EB medium for 1 wk in a suspended condition on low attachment culture dish and further incubated for additional $1{\sim}2$ wks in neuroectodermal sphere(NES)-culture medium. We observed the neural tube-like rosette structure from hEBs after $7{\sim}10$ days of differentiation. Their identity as a neural precursor cells was assessed by measuring their expressions of neural precursor markers(Vimentin, Nestin, MSI1, MSI2, Prominin-1, Pax6, Sox1, N-cadherin, Otx2, and Tuj1) by RT-PCR and immunofluorescence staining. We also confirmed that neural rosettes could be terminally differentiated into mature neural cell types by additional incubation for $2{\sim}6$ wks with NES medium without growth factors. Neuronal(Tuj1, MAP2, GABA) and glial($S100{\beta}$ and GFAP) markers were highly expressed after $2{\sim}3$ and 4 wks of incubation, respectively. Expression of oligodendrocyte markers O1 and CNPase was significantly increased after $5{\sim}6$ wks of incubation. Our results demonstrate that rosette forming neural precursor cells could be successfully derived from suspension culture system and that will not only help us understand the neural differentiation process of hESCs but also simplify the derivation process of neural precursors from hESCs.

  • PDF

A Comparison of Single and Multi-matrix Models for Bird Strike Risk Assessment (단일 및 다중 매트릭스 모델의 비교를 통한 항공기-조류 충돌 위험성 평가 모델 분석)

  • Hong, Mi-Jin;Kim, Myun-Sik;Moon, Young-Min;Choi, Jin-Hwan;Lee, Who-Seung;Yoo, Jeong-Chil
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.6
    • /
    • pp.624-635
    • /
    • 2019
  • Bird strike accidents, a collision between aircraft and birds, have been increasing annually due to an increasing number of aircraft operating each year to meet heavier demand for air traffic. As such, many airports have conducted studies to assess and manage bird strike risks effectively by identifying and ranking bird species that can damage aircraft based on the bird strike records. This study was intended to investigate the bird species that were likely to threaten aircraft and compare and discuss the risk of each species estimated by the single-matrix and multi-matrix risk assessment models based on the Integrated Flight Information Service (IFIS) data collected in Gimpo, Gimhae and Jeju Airports in South Korea from 2005 to 2013. We found that there was a difference in the assessment results between the two models. The single-matrix model estimated 2 species and 6 taxa in Gimpo and Gimhae Airports and 2 species and 5 taxa in Jeju Airport to have the risk score above "high," whereas the multi-matrix model estimated 3 species and 5 taxa in Gimpo Airport, 4 species and 5 taxa in Gimhae Airport, and 2 species and 3 taxa in Jeju Airport to have the risk score above "very high." Although both models estimated the similar high-risk species in Gimpo and Gimhae Airports, there was a significant difference in Jeju Airport. Gimpo and Gimhae Airports are near the estuary of a river, which is an excellent habitat for large and heavy waterbirds. On the other hand, Jeju Airport is near the coast and the city center, and small and light bird species are mostly observed. Since collisions with such species have little effect on aircraft fuselage, the impact of common variables between the two models was small, and the additional variables caused a significant difference between the estimation by the two models.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.