• Title/Summary/Keyword: Regression models

Search Result 3,638, Processing Time 0.03 seconds

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Pareto Ratio and Inequality Level of Knowledge Sharing in Virtual Knowledge Collaboration: Analysis of Behaviors on Wikipedia (지식 공유의 파레토 비율 및 불평등 정도와 가상 지식 협업: 위키피디아 행위 데이터 분석)

  • Park, Hyun-Jung;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.19-43
    • /
    • 2014
  • The Pareto principle, also known as the 80-20 rule, states that roughly 80% of the effects come from 20% of the causes for many events including natural phenomena. It has been recognized as a golden rule in business with a wide application of such discovery like 20 percent of customers resulting in 80 percent of total sales. On the other hand, the Long Tail theory, pointing out that "the trivial many" produces more value than "the vital few," has gained popularity in recent times with a tremendous reduction of distribution and inventory costs through the development of ICT(Information and Communication Technology). This study started with a view to illuminating how these two primary business paradigms-Pareto principle and Long Tail theory-relates to the success of virtual knowledge collaboration. The importance of virtual knowledge collaboration is soaring in this era of globalization and virtualization transcending geographical and temporal constraints. Many previous studies on knowledge sharing have focused on the factors to affect knowledge sharing, seeking to boost individual knowledge sharing and resolve the social dilemma caused from the fact that rational individuals are likely to rather consume than contribute knowledge. Knowledge collaboration can be defined as the creation of knowledge by not only sharing knowledge, but also by transforming and integrating such knowledge. In this perspective of knowledge collaboration, the relative distribution of knowledge sharing among participants can count as much as the absolute amounts of individual knowledge sharing. In particular, whether the more contribution of the upper 20 percent of participants in knowledge sharing will enhance the efficiency of overall knowledge collaboration is an issue of interest. This study deals with the effect of this sort of knowledge sharing distribution on the efficiency of knowledge collaboration and is extended to reflect the work characteristics. All analyses were conducted based on actual data instead of self-reported questionnaire surveys. More specifically, we analyzed the collaborative behaviors of editors of 2,978 English Wikipedia featured articles, which are the best quality grade of articles in English Wikipedia. We adopted Pareto ratio, the ratio of the number of knowledge contribution of the upper 20 percent of participants to the total number of knowledge contribution made by the total participants of an article group, to examine the effect of Pareto principle. In addition, Gini coefficient, which represents the inequality of income among a group of people, was applied to reveal the effect of inequality of knowledge contribution. Hypotheses were set up based on the assumption that the higher ratio of knowledge contribution by more highly motivated participants will lead to the higher collaboration efficiency, but if the ratio gets too high, the collaboration efficiency will be exacerbated because overall informational diversity is threatened and knowledge contribution of less motivated participants is intimidated. Cox regression models were formulated for each of the focal variables-Pareto ratio and Gini coefficient-with seven control variables such as the number of editors involved in an article, the average time length between successive edits of an article, the number of sections a featured article has, etc. The dependent variable of the Cox models is the time spent from article initiation to promotion to the featured article level, indicating the efficiency of knowledge collaboration. To examine whether the effects of the focal variables vary depending on the characteristics of a group task, we classified 2,978 featured articles into two categories: Academic and Non-academic. Academic articles refer to at least one paper published at an SCI, SSCI, A&HCI, or SCIE journal. We assumed that academic articles are more complex, entail more information processing and problem solving, and thus require more skill variety and expertise. The analysis results indicate the followings; First, Pareto ratio and inequality of knowledge sharing relates in a curvilinear fashion to the collaboration efficiency in an online community, promoting it to an optimal point and undermining it thereafter. Second, the curvilinear effect of Pareto ratio and inequality of knowledge sharing on the collaboration efficiency is more sensitive with a more academic task in an online community.

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."

Correlaton between soluble transferrin receptor concentration and inflammatory markers (수용성 트랜스페린 수용체의 농도와 염증 인자와의 관련성에 관한 연구)

  • Kim, So Young;Son, Meong Hi;Yeom, Jung suk;Park, Ji sook;Park, Eun Sil;Seo, Ji-Hyun;Lim, Jae-Young;Park, Chan-Hoo;Woo, Hyang-Ok;Youn, Hee-Shang
    • Clinical and Experimental Pediatrics
    • /
    • v.52 no.4
    • /
    • pp.435-440
    • /
    • 2009
  • Purpose : The concentration of soluble transferrin receptor (sTfR) is estimated as an iron parameter to evaluate erythropoiesis and iron status. The aim of our study is to evaluate the correlation between sTfR concentration and inflammatory parameters and to distinguish iron deficiency anemia from anemia of inflammation. Methods : One hundred and forty-four infants younger than two years of age who visited Gyeongsang University Hospital for 7 years from 2000 to 2006 were enrolled. Patients who had hemoglobin (Hb) <11 g/dL and ferritin <12 mg/L were excluded. Routine hematologic lab, serum ferritin, sTfR, and inflammatory markers [C-reactive protein(CRP), interleukin-6(IL-6), and absolute neutrophil count (ANC)] were investigated. Results : In all patients, the sTfR concentration showed a correlation with Hb, ferritin, MCV, and ANC, but not with CRP and IL-6. In multiple regression models, positive correlations were found between sTfR concentration and IL-6 (r=0.078, P=0.043), and negative correlations were found between sTfR concentration and ANC (r=-0.117, P=0.033) and MCV (r=-0.027, P=0.009). Conclusion : sTfR concentration was influenced by inflammatory parameters. Therefore, sTfR does not appear to be a useful parameter for discriminating between iron deficiency anemia and anemia of inflammation in infants.

Development of Estimation Models for Parking Units -Focused on Gwangju Metropolitan City Condominium Apartments- (주차원단위 산정 모형 개발에 관한 연구 -광주광역시 공동 주택 아파트를 대상으로-)

  • Kwon, Sung-Dae;Ko, Dong-Bong;Park, Je-Jin;Ha, Tae-Jun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.2
    • /
    • pp.549-559
    • /
    • 2014
  • The rapid expansion of cities led to the shortage of housing in urban areas. The government compensated for this shortage through large scale residential developments that increased the housing supply. The supply of condominium apartments remains above 83% of the entire housing supply, and the proportion of apartments are at a steady increase, at about 50%. Due to the increase, illegally parked cars resulting from the shortage of parking spaces within the apartment complex have become increasingly problematic as they block the transit of emergency vehicles, and heighten the tension among neighboring residents in obtaining a parking space. Especially, the future residents are considered to plan the parking based on the estimated demand for parking. However, the parking unit method utilized to estimate the parking demand accounts for the exclusive use of space, which is believed to be far from the parking demands in reality. The reason for this discrepancy is that, as the number of households decrease, and area of exclusive space is expanded, the planned parking increases. On the other hand, when the number of households increase, and the area of exclusive space is reduced, the planned parking decreases, thus methods to recalculate the parking units based on estimated parking demand is an urgent concern. To estimate the parking units based on condominium apartments, this study first examined the existing research literature, and appointed the field of investigation to collect the necessary data. In addition, field study data and surveys collected and analyzed, in order to identify the problems underlying parking units, and problems regarding the current traffic impact assessment parking unit calculation method were deduced. Through identifying the influential factors on parking demand estimates, and performing a factorial analysis based on the collected data, the variables were selected in relation to the parking demand estimates, to develop the parking unit estimate model. Finally, through comparing and verifying the existing traffic impact assessment parking unit estimate against the newly developed model using collected data, a far more realistic parking unite estimate was suggested, reflecting the characteristics of the residents. The parking unit estimate model developed in this study is anticipated to serve as the guidelines for future parking lot legislature, as wel as the basis to provide a more realistic estimate of parking demands based on the resident characteristics of an apartment complex.

A Study on the Determinants of Consumer-Oriented Nursing Service Quality;SERVQUAL Model based (소비자 중심의 간호서비스 질 결정요인에 관한 연구;SERVQUAL모형을 중심으로)

  • Joo, Mee-Kyoung
    • Journal of Korean Academy of Nursing Administration
    • /
    • v.8 no.1
    • /
    • pp.169-191
    • /
    • 2002
  • As the tendency of the society is centralized into consumers and services, patients are getting to ask better medical services. The consumers influenced from various social surroundings became to have some expectation of nursing service. Compared with their expectation, the quality of the services which they virtually get may be recognized and evaluated. So it is necessary to know exactly what the consumers want in nursing services. The purpose of this study is to examine the determinants which can evaluate the quality of nursing services by researching into consumers' expectation and perception of the nursing services depending on the consumer-oriented attributions on the basis of the model of SERVQUAL. 1,144 of outpatients were selected as the subjects for this research. They used to continuously visit the same hospital after being hospitalized and nursed in 9 hospitals randomly selected among the second-level medical organizations in Seoul from January to February, 2001. The collected data are analyzed into the Descriptive Statistics, t-test, GLM and Multiple Regression through the SAS program. Delphi was used for the research tool and the results of the research are as follows; the determinants in evaluating the quality of nursing services consist of 5 categories such as Tangibility, Reliability, Responsiveness, Assurance, Empathy. The values of Cronbach' $\alpha$ appeared to be 0.96 in the expectation of nursing services, 0.94 in the perception of nursing services and 0.96 in the importance of nursing services. The determinants in the expectation of nursing services were ranged in the order of Assurance, Empathy, Reliability, Responsiveness and Tangibility. And those in the perception of nursing services were in the order of Assurance, Empathy, Reliability, Tangibility and Responsiveness. Those in the importance of nursing service were in the order of Empathy, Assurance, Reliability, Tangibility and Responsiveness. Finally, those in the quality of nursing service were in the order of Tangibility, Responsiveness, Empathy, Reliability and Assurance. Each expectation of nursing services appeared different depending on the subjects' age, gender, clinical department and reason for hospitalization. The hypothesis examined in this research shows that the group having higher personal needs shows meaningful differences in the expectation of nursing services, and the subjects who have had external communication show higher perception of nursing service than uncommunicative ones. After all, we can see that the statistical differences in the perception of nursing services depend on whether the subjects have external communications or not. The determinants in the expectation of nursing services can explain the quality of nursing service up to 14.96%. The statistically meaningful determinants in the expectation could be arranged in the order of Reliability, Assurance and Tangibility. And the more expectation brings about the lower evaluation of the quality of nursing services. The determinants in the perception of nursing service can explain the quality of nursing services up to 29.85%. The statistically meaningful determinants in the perception could be arranged in the order of Responsiveness, Reliability, Tangibility, Empathy, and Assurance. And the more perception brings about the higher evaluation of the quality of nursing services. According to the result of the above research, I would like to propose as follows. As long as this research is oriented to get knowledge of the consumer-oriented nursing services, it should be continued to draw the other elements determining the quality of the nursing services. Furthermore, this research is based upon the Parasuraman, A., et al.'s SERVQUAL Model(1991), which deals only expectation, perception and quality of consumer-oriented nursing services, so it will be necessary to inspect and verify it through the other models containing the offerers of nursing services in the future. On the other hand, as this research evaluates the actual quality of nursing services based on the expectation and perception of nursing services, it can be utilized as fundamental data to develop the marketing strategies and to estimate the qualities as well. I hope this research will be periodically estimated to be the useful data to develop the marketing strategies in the nursing service area.

  • PDF

Managerial Implication of Trails in the Teabaeksan National Park Derived from the Analysis of Visitors Behaviors Using Automatic Visitor Counter Data (탐방객 자동 계수기 데이터를 활용한 태백산국립공원 탐방로 탐방 행태 분석 및 관리 방안 제언)

  • Sung, Chan Yong;Cho, Woo;Kim, Jong-Sub
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.5
    • /
    • pp.446-453
    • /
    • 2020
  • This study built a model to predict the daily number of visitors to 18 trails in the Taebaeksan National Park using the auto-counter system data to analyze the factors affecting the daily number of visitors to each trail and classified the trails by visitors' behaviors. Results of the multiple regression models with the daily number of visitors of the 18 trails indicated that the events, such as the National Foundation Day celebration of Snow Festival, affected the number of visitors of all of the 18 trails and were the most critical factor that determined the daily number of visitors to the Taebaeksan National Park. The long-holidays of three days or longer and other national holidays also affected the daily number of visitors to the trails. Precipitation had a negative impact on the number of visitors of trails where the intention of most visitors was for sightseeing or camping instead of hiking, whereas had no significant impacts on the number of visitors of trails where many visitors intended for hiking. It indicated that visitors who intended for hiking went ahead hiking even if the weather was poor. The effects of temperature had a positive effect on the number of visitors who intended for hiking but a negative effect on the number of visitor to the trails near Danggol Plaza where the Snow Festival was held in each winter, suggesting that the impact of the Snow Festival was the deterministic factor for trail management. Results of K-mean clustering showed that the 18 trails of the Taekbaeksan National Park could be classified into three types: those affected by the Snow Festival (type 1), those that have sightseeing points and so were visited mostly by non-hikers (type 2), and those visited mostly by hikers (type 3). Since visitor behaviors and illegal actions differ according to the trail type, this study's results can be used to prepare a trail management plan based on the trail characteristics.

Biomass, Net Production and Nutrient Distribution of Bamboo Phyllostachys Stands in Korea (왕대속(屬) 대나무림(林)의 물질생산(物質生産) 및 무기영양물(無機營養物) 분배(分配)에 관한 연구(硏究))

  • Park, In Hyeop;Ryu, Suk Bong
    • Journal of Korean Society of Forest Science
    • /
    • v.85 no.3
    • /
    • pp.453-461
    • /
    • 1996
  • Three Phyllostachys stands of P. pubescens, P. bambusoides and P. nigra var, henonis in Sunchon were studied to investigate biomass, net production and nutrient distribution. Five $10m{\times}10m$ quadrats were set up and 20 sample culms of 2 years and over were harvested for dimension analysis in each stand. One year old culms and subterranean parts were estimated by the harvested quadrat method. The largest mean DBH, height and basal area were shown in P. pubescens stand, and followed by P. nigra var. henonis stand and P. bambusoides stand. There was little difference in accuracy among three allometric biomass regression models of logWt=A+B1ogD, $logWt=A+BlogD^2H$ and logWt=A+BlogD+ClogH, where Wt, D and H were dry weight, DBH and height, respectively. Analysis of covariance showed that there were significant differences in intercept among the linear allometric biomass regressons of three Phyllostachys species. Biomass included subterranean parts was the largest in P. pubescens stand(103.621t/ha), and followed by P. nigra var. henonis stand(86.447t/ha) and P. bambusoides stand(36.767t/ha). Leaf biomass was 6.3% to 7.8% of total biomass in each stands. The ratio of aboveground biomass and subterranean biomass in each stand was 1.87 to 2.26. Net production included subterranean parts was the greatest in P. pubescens stand(6.115t/ha/yr), and followed by P. nigra var. henonis stand(5.609t/ha/yr) and P, bambusoides stand(3.252t/ha/yr). The highest net assimilation ratio was estimated in P. pubescens stand(2.979), and followed by P. nigra var. henonis stand(2.752) and P. bambusoides stand(2.187). Biomass accumulation ratio of each stand was 2.679 to 5.358. Concentrations of N, P and Mg were the highest in leaves, and followed by subterranean parts, and culms+branches in all three species. Concentration of Ca was the highest in leaves, and followed by culms+branches, and subterranean parts in all three species. The difference in biomass among three species stands was caused by their culm size, leaf biomass, net assimilation ratio, and efficiency of leaves to produce culms.

  • PDF

Speed-up Techniques for High-Resolution Grid Data Processing in the Early Warning System for Agrometeorological Disaster (농업기상재해 조기경보시스템에서의 고해상도 격자형 자료의 처리 속도 향상 기법)

  • Park, J.H.;Shin, Y.S.;Kim, S.K.;Kang, W.S.;Han, Y.K.;Kim, J.H.;Kim, D.J.;Kim, S.O.;Shim, K.M.;Park, E.W.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.3
    • /
    • pp.153-163
    • /
    • 2017
  • The objective of this study is to enhance the model's speed of estimating weather variables (e.g., minimum/maximum temperature, sunshine hour, PRISM (Parameter-elevation Regression on Independent Slopes Model) based precipitation), which are applied to the Agrometeorological Early Warning System (http://www.agmet.kr). The current process of weather estimation is operated on high-performance multi-core CPUs that have 8 physical cores and 16 logical threads. Nonetheless, the server is not even dedicated to the handling of a single county, indicating that very high overhead is involved in calculating the 10 counties of the Seomjin River Basin. In order to reduce such overhead, several cache and parallelization techniques were used to measure the performance and to check the applicability. Results are as follows: (1) for simple calculations such as Growing Degree Days accumulation, the time required for Input and Output (I/O) is significantly greater than that for calculation, suggesting the need of a technique which reduces disk I/O bottlenecks; (2) when there are many I/O, it is advantageous to distribute them on several servers. However, each server must have a cache for input data so that it does not compete for the same resource; and (3) GPU-based parallel processing method is most suitable for models such as PRISM with large computation loads.

Association between Sleep Duration, Dental Caries, and Periodontitis in Korean Adults: The Korea National Health and Nutrition Examination Survey, 2013~2014 (한국 성인에서 수면시간과 영구치 우식증 및 치주질환과의 관련성: 2013~2014 국민건강영양조사)

  • Lee, Da-Hyun;Lee, Young-Hoon
    • Journal of dental hygiene science
    • /
    • v.17 no.1
    • /
    • pp.38-45
    • /
    • 2017
  • We evaluated the association between sleep duration, dental caries, and periodontitis by using representative nationwide data. We examined 8,356 subjects aged ${\geq}19$ years who participated in the sixth Korea National Health and Nutrition Examination Survey (2013~2014). Sleep duration were grouped into ${\leq}5$, 6, 7, 8, and ${\geq}9$ hours. Presence of dental caries was defined as caries in ${\geq}1$ permanent tooth on dental examination. Periodontal status was assessed by using the community periodontal index (CPI), and a CPI code of ${\geq}3$ was defined as periodontitis. A chi-square test and multiple logistic regression analysis were used to determine statistical significance. Model 1 was adjusted for age and sex, model 2 for household income, educational level, and marital status plus model 1, and model 3 for smoking status, alcohol consumption, blood pressure level, fasting blood glucose level, total cholesterol level, and body mass index plus model 2. The prevalence of dental caries according to sleep duration showed a U-shaped curve of 33.4%, 29.4%, 28.4%, 29.4%, and 31.8% with ${\leq}5$, 6, 7, 8, and ${\geq}9$ hours of sleep, respectively. In the fully adjusted model 3, the risk of developing dental caries was significantly higher with ${\leq}5$ than with 7 hours of sleep (odds ratio, 1.23; 95% confidence interval, 1.06~1.43). The prevalence of periodontitis according to sleep duration showed a U-shaped curve of 34.4%, 28.6%, 28.1%, 31.3%, and 32.5%, respectively. The risk of periodontitis was significantly higher with ${\geq}9$ than with 7 hours of sleep in models 1 and 2, whereas the significant association disappeared in model 3. In a nationally representative sample, sleep duration was significantly associated with dental caries formation and weakly associated with periodontitis. Adequate sleep is required to prevent oral diseases such as dental caries and periodontitis.