• Title/Summary/Keyword: Statistical Selection Method

Search Result 497, Processing Time 0.024 seconds

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

What Determines the Emotional Quality of Homepage\ulcorner - from the emotion, users and designers perspectives (무엇이 홈페이지의 감성 품질을 결정하는가\ulcorner -감성 측면과 디자이너의 측면 그리고 사용자 측면을 중심으로)

  • 박수이;최동성;김진우
    • Archives of design research
    • /
    • v.15 no.4
    • /
    • pp.97-110
    • /
    • 2002
  • As users environments change, users primary needs for homepages also change more complicatedly. Today, users do not only want usability for homepages, but also to feel appropriate emotional experiences. Despite users needs, users do not always experience appropriate emotions that are conveyed by designers through homepage. I In this research paper, we analyzed the related factors with the emotional quality, which means the degree that users feel target emotions intended by designers. For analyzing factors related with the emotional quality, three hypotheses were verified; the factor of an emotion, the factor of users and the factor of designers. As the factor of emotions, the first hypothesis is that unclear emotional dimensions in users minds are related with the emotional quality. The second hypothesis, as the factor of users, is that the diversity of users experiences by same homepage is related with the emotional quality. The third hypothesis, as the factor of designers, is that the appropriate selection of design elements is related with the emotional quality. In the previous research, we selected the basic 13 emotional dimensions and 30 representative emotional words based on the statistical results and evaluations by professional designers. For this research, we conducted an experiment and user survey. In the experiment, we asked 30 designers to design homepages focusing on the typical emotion that was presented by a researcher. Based on the designing process and user evaluation, we performed statistical analyses: ANOVA with Tukey post hoc method and Factor Analysis. We found the discrepancy between the emotions that designers intend and the actual emotions that users experienced from homepages. From the result of analysis, we know that the factor of users and the factor of designers related with the emotional qualities, but the factor of emotions did not. The definiteness of emotions did not relate with the emotional quality. However, the diversity of emotions that users feel seeing the same homepages and design elements that designers chose for conveying target emotion related with the emotional quality.

  • PDF

A Comparative Study of Health Behaviors by Chronic Diseases of the Low-income Middle-aged People in Seoul's Apartment Residents (서울시 임대아파트에 거주하는 일부 저소득 중장년의 만성질병별 건강행태 비교연구)

  • Yang, Junmo;Park, Haemo;Lee, Sundong
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.18 no.2
    • /
    • pp.11-30
    • /
    • 2014
  • Objective : To compare the differences in health behaviors by chronic diseases of middle-aged living in Seoul's low-income housing Method : Of the 1469 residents aged 35 to 60 living in low-income housing in Seoul's District A, 318 were equal probability of selection method selected. t-test, ANOVA, $x^2$, OR(95% CI, P-value) were selected to analyze the data, and the confidence interval was 5%. Results : There were no significant differences in all health behaviors by vascular and metabolic, But there was a statistically significant difference for gastrointestinal diseases caused by sleep hours(p=0.001), liver diseases caused by smoking, drinking and sleep hours(p=0.004, p=0.001, and p=0.033, respectively), musculoskeletal diseases caused by sleep hours and health exam(p=0.0000 and p=0.002, respectively). Also, statistically significant differences were found for tumors caused by sleep hours(p=0.004), depression by the sleep hours and health exam(p=0.001 and p=0.013, respectively), allergies by in sleep hours(p=0.004), thyroid diseases by smoking and health exam(p=0.013 and p=0.007, respectively). After adjusting for the confounding factors for diseases, OR was obtained for each health behavior. There were no statistically significant differences in all health behaviors for vascular, metabolic, and tumors. However, the OR for gastrointestinal diseases 4.10(1.63-10.36, 0.0028) and 2.96(1.05-8.41, 0.0041) at 5-7 and 7-9 sleep hours. The OR for liver diseases was 3.13(1.03-9.48, 0.0437) at 7-9 sleep hours, the OR values for musculoskeletal diseases were 2.91(1.23-6.88, 0.00149), and 4.46(1.68-11.86, 0.0027) at 5-7 and 7-9 sleep hours. OR for depression were 4.82(1.70-13.66, 0.0031) and 4.13(1.19-14.31, 0.0026) at 5-7 and 7-9 sleep hours. OR for allergy were 2.78(1.22-6.32, 0.0015) and 3.93(1.49-10.39, 0.0058) at 5-7 and 7-9 sleep hours. There were statistical significances for liver diseases for 1-2 times of health exam at 0.35(0.14-0.90, 0.00301), for liver diseases for 1-2 times of health exam at 0.35(0.14-0.90, 0.00301), for musculoskeletal diseases for 3-4 times of health exam at 0.26(0.09-0.79, 0.0175), for depression for 3-4 times of health exam at 0.17(0.04-0.66, 0.0106), for allergies for 1-2 times of health exam at 0.30(0.13-0.70, 0.0055), and for thyroid diseases for 1-2 times and annually of health exam at 0.07(0.01-0.60, 0.00154), 0.09(0.01-0.96, 0.0461). We known significant difference the health behaviors by the diseases. especially in sleep hours and health exam times Conclusion : Only sleep hours and health exam caused statistically significant differences in chronic diseases. but the sleep hours was postitively correlated with the risk of disease, while health exam were inversely related.

A Study on Infant Weaning Practices Based on Maternal Education and Income Levels (양육인의 교육 및 수입정도에 따른 이유기 식생활관리에 대한 실태조사)

  • Kim, Song-Suk
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.34 no.7
    • /
    • pp.1000-1007
    • /
    • 2005
  • The aim of the present study was to examine the relationship of maternal factors such as knowledge, attitude and practice of weaning with infant feeding. The subjects were 103 mothers visiting a public health center in Gumi, Kyungbook who filled out self-administered Questionnaires. First of all, about $90\%$ of the participants recognized the importance of complementary foods and proper weaning practices. The response for the recognition of the importance of infant weaning process showed a significant difference by education levels. Concerning an appropriate time for the introduction of weaning foods, $53\%$ of mothers had commenced weaning at age $4\~6$ months, while $38\%$ had done so at age $6\~8$ months. Approximately $76\%$ of mothers fed their babies without the knowledge of age-related weaning method and type of weaning foods. There were no statistical differences in maternal weaning knowledges between levels of education and house income. Mothers with higher levels of education and family income tended to show high perception scores regarding possibility of food allergies caused by baby foods. A demand for reliable sources and education related to nutritious weaning foods and weaning practices were strong in the group with higher education. Knowledge of weaning method and baby foods were obtained by 59 of the 103 mothers from mass media, 35 from friends caring babies, and 9 obtained advice from health professionals or family. Advice from the heath professionals was not the main influence on their decision to introduce weaning foods. Although commercial baby foods are the most commonly used as first weaning foods, those with higher education groups considered commercial baby food are not nutritionally better than home-maid foods. The current findings suggest to us that to improve weaning process, mothers should be educated on the selection and preparation of nutritious, balanced weaning foods and on good weaning practices. It is advised that supportive health professionals from community public health centers should lead the education of infant feeding practices based on maternal characteristics and on basic food and nutritional knowledge.

Identification of White Hanwoo Breed Using Single Nucleotide Polymorphism Markers (단일염기다형성 마커를 이용한 백우 품종 식별 방법)

  • Kim, Seungchang;Kim, Kwanwoo;Roh, Heejong;Kim, Dongkyo;Kim, Sungwoo;Kim, Chalan;Lee, Sanghoon;Ko, Yeounggyu;Cho, Changyeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.240-246
    • /
    • 2020
  • This study was conducted to develop specific Single Nucleotide Polymorphism (SNP) markers to identify the genetic characteristics and breed of White Hanwoo (WH) using a molecular biological method. SNP genotyping was performed with an Illumina Bovine HD 777K SNP chip using DNA extracted from 48 Hanwoo and 22 WH. The minor allele frequency (MAF) difference of each SNP was calculated and the statistical significance (P-value) of the MAF difference was calculated through Fisher's Exact test (Genotype). SNPs with 100% difference in the MAF difference were selected based on marker selection criteria. The nine SNP markers with genetic differences were selected. The selected markers have different alleles as being Hanwoo- and WH- specific. Therefore, based on these results, it can be concluded that the Hanwoo and WH varieties can be clearly distinguished by using these SNPs. So, the patent of the WH breed identification markers was registered. WH is a breed that shows the characteristics of a Korean native species that is separate from the native Hanwoo. It is expected that genetic characteristics research on the WH can be used to identify the breed and as a knowledge base for enhancing the value of breeding stock.

A Study on Epidemiologic Characteristics of Recurrent Abdominal Pain in Elementary School Children (반복성 복통증 환아의 역학적 특징에 관한 조사)

  • Oh, Sang-Hyun;Yang, Eun-Seok;Park, Sang-Kee;Park, Young-Bong;Park, Jong;Park, Sang-Hak;Moon, Kyung-Rye
    • Pediatric Gastroenterology, Hepatology & Nutrition
    • /
    • v.2 no.1
    • /
    • pp.21-29
    • /
    • 1999
  • Purpose: The aims of this study are to examine clinical characteristics, patterns of medical care utilization, and factors which determine medical care utilization of elementary school children with recurrent abdominal pain (RAP), to find posssible factors influencing the onset and the course of the disorder. Method: We performed questionnaires in Kwangju on children from two primary schools from June,1 1998 to June 30 and carried out statistical analysis. Result: 1) Total number of questionnaires were 1417. 715 were male and 702 were female and the ratio of male to female was 1.02:1. Average age was 10.3 years. 2) 268 children had RAP (18.9%), boys 132 (18.4%), girls136 (19.2%). 3) The duration of the pain within 10 minutes was 68.5%. 178 children with RAP (66.3%) visited the doctor. The utilization pattern of medical facilities of the pupils with RAP; the most frequently utilized medical facility was pediatrics (35.2%) and the order ran as internal medicine (31.5%), and pharmacy (29.25). The utilization pattern of medical facilities for the older students; the utilization rate of pediatrics decreased, but internal medicine increased. The major factors affecting the selection of the medical facility were geographic accessibility, kindness of the personnel, good results and traffic convenience. 4) Symptoms which were accompanied with abdominal pain were headache (44.5%), chest pain (28.2%), dizziness (26.6%), vomiting (9%), and 119 children (44.5%) had no accompanied symptoms. 5) In 95 children (35.3%) abdominal pain, occured at postprandial time, in 55 children (20.5%) before meal and in 39 children (14.7%) at school. The highest incidence rate of RAP was observed on Monday (21.4%), and the lowest on Saturday (8.7%). 6) The most frequent involved part of the abdomen was periumbrical area (38%) and the order ran as epigastrium and suprapubic area. The most frequent characteristics of abdominal pain were burning pain (36.9%) and the order ran as dull, cramping and colicky pain. Conclusion: RAP is a frequent disease entitiy in children. Too many times children with RAP are treated by other departments instead of Pediatrics. A child has a peculiar growth and development which is different to those from an adult with advancing years. So, it is necessary to choose special medical care and an adequate medical facility.

  • PDF

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.