• Title/Summary/Keyword: clusters

Search Result 5,088, Processing Time 0.035 seconds

Spatial Distribution of Benthic Macroinvertebrate Assemblages in Wetlands of Jeju Island, Korea (제주도 일대 습지에 서식하는 저서성 대형무척추동물의 군집 분포 특성)

  • Yung Chul Jun;Seung Phil Cheon;Mi Suk Kang;Jae Heung Park;Chang Su Lee;Soon Jik Kwon
    • Korean Journal of Ecology and Environment
    • /
    • v.57 no.1
    • /
    • pp.1-16
    • /
    • 2024
  • Most wetlands worldwide have suffered from extensive human exploitation. Unfortunately they have been less explored compared to river and lake ecosystems despite their ecological importance and economic values. This is the same case in Korea. This study was aimed to estimate the assemblage attributes and distribution characteristics of benthic macroinvertebrates for fifty wetlands distributed throughout subtropical Jeju Island in 2021. A total of 133 taxa were identified during survey periods belonging to 53 families, 19 orders, 5 classes and 3 phyla. Taxa richness ranged from 4 to 31 taxa per wetland with an average of 17.5 taxa. Taxa richness and abundance of predatory insect groups such as Odonata, Hemiptera and Coleoptera respectively accounted for 67.7% and 68.2% of the total. Among them Coleoptera were the most diverse and abundant. Taxa richness and abundance did not significantly differ from each wetland type classified in accordance with the National Wetland Classification System. There were three endangered species (Clithon retropictum, Lethocerus deyrolli and Cybister (Cybister) chinensis) and several restrictively distributed species only in Jeju Island. Cluster analysis based on the similarity in the benthic macroinvertebrate composition largely classified 50 wetlands into two major clusters: small wetlands located in lowland areas and medium-sized wetlands in middle mountainous regions. All cluster groups displayed significant differences in wetland area, long axis, percentage of fine particles and macrophyte composition ratio. Indicator Species Analysis selected 19 important indicators with the highest indicator value of Ceriagrion melanurum at 63%, followed by Noterus japonicus (59%) and Polypylis hemisphaerula (58%). Our results are expected to provide fundamental information on the biodiversity and habitat environments for benthic macroinvertebrates in wetland ecosystems, consequently helping to establish conservation and restoration plans for small wetlands relatively vulnerable to human disturbance.

Breeding of New Ever-bearing Strawberry 'Miha' with High Hardness (고경도 사계성 딸기 '미하' 육성)

  • Jong Nam Lee;Jong Taek Suh;Su Jeong Kim;Hwang Bae Shon;Ki Deog Kim;Hye Jin Kim;Mi Ja Choi;Yul Ho Kim;Su Young Hong
    • Korean Journal of Plant Resources
    • /
    • v.37 no.1
    • /
    • pp.87-92
    • /
    • 2024
  • 'Miha' is a new strawberry (Fragaria x ananassa Duch.) cultivar, which was released by the Highland Agriculture Research Institute in 2019. The 'Miha' cultivar originates from a 2014 cross between 'Monterey' and 'Saebong No. 3', both of which exhibited excellent ever-bearing characteristics, including continuous flowering and large fruits under long-day and high temperature conditions. This new cultivar was initially named 'Saebong No. 12' after examining its characteristics and productivity during summer cultivation between 2015 and 2019. After regional adaptability tests, 'Miha' was selected from 'Saebong No. 12' as an elite cultivar. The general characteristics of 'Miha' include intermediate, elliptic leaves, and strong growth. The fruits are conical and of a dark-red color. The number of leaves of 'Miha' was 21.9, which was 6.2 fewer than that of the control cultivar, 'Goha' with 28.1. The number of flower clusters of 'Miha' was similar to that of 'Goha'. The average fruit weight of 'Miha' was 13.4 g, which was 4.3 g heavier than that of 'Goha'. The fruit hardness of 'Miha' was 36.2 g·mm-2, which was 10.1 g·mm-2 harder than that of 'Goha'. The marketable yield of 'Miha' was 37,393 kg·ha-1, 156% more than that of 'Goha' with 23,970 kg·ha-1. Therefore, the new cultivar of ever-bearing strawberry 'Miha' is expected to be very popular in the export or bakery market because it is hard.

Epidemiological Characteristic and Risk Factor of COVID-19 Cluster Related to Educational Facilities in Gangwon-do, Korea (December 10, 2020-September 23, 2021) (강원도내 교육시설관련 코로나바이러스감염증19 집단발생의 역학적특성과 위험요인 (2020.12.10-2021.9.23))

  • Hyosug Choi;Mi Young Kim;Shinyoung Lee;Eunmi Kim;Yeo Jin Kim
    • Pediatric Infection and Vaccine
    • /
    • v.31 no.1
    • /
    • pp.102-112
    • /
    • 2024
  • Purpose: To identify the epidemiological characteristics and risk factors of coronavirus disease 19 (COVID-19) outbreaks depending on the type of educational facility by analyzing the COVID-19 cluster associated with educational facilities. Methods: This study is based on epidemiological investigation of COVID-19 cluster in Gangwon-do, Korea from December 10, 2020 to September 23, 2021 reported to the Korea Disease Control and Prevention Agency's Integrated Disease and Health Management System. Four hundred seven patients in 19 facilities, classified as cluster related to educational facilities, were the study population. The result of preliminary epidemiology survey report, in-depth epidemiological survey by phone and the result of risk assessment derived from the field epidemiology investigation were retrospectively analyzed to evaluate infectivity and the characteristics of the risk factors. Results: There were total of 407 confirmed patients related to 19 educational facilities, with 204 students under the age of 19 (50.1%). One hundred fifty-five preceding spreaders were from families (38.1%) and 125 were the teachers (30.7%). The place exposed to confirmed patients was the highest with 139 people (34.2%) at home. Conclusions: It was confirmed that the cause of the occurrence of clusters related to educational facilities was higher due to family transmission than the risk of facilities in schools. Nevertheless, continuous efforts should be made to control infection in educational facilities, and that teachers' implementation of principles for prevention of COVID-19 personal hygiene in their daily lives should be strengthened.

Brief Introduction of Research Progresses in Control and Biocontrol of Clubroot Disease in China

  • He, Yueqiu;Wu, Yixin;He, Pengfei;Li, Xinyu
    • 한국균학회소식:학술대회논문집
    • /
    • 2015.05a
    • /
    • pp.45-46
    • /
    • 2015
  • Clubroot disease of crucifers has occurred since 1957. It has spread to the whole China, especially in the southwest and nourtheast where it causes 30-80% loss in some fields. The disease has being expanded in the recent years as seeds are imported and the floating seedling system practices. For its effective control, the Ministry of Agriculture of China set up a program in 2010 and a research team led by Dr. Yueqiu HE, Yunnan Agricultural University. The team includes 20 main reseachers of 11 universities and 5 institutions. After 5 years, the team has made a lot of progresses in disease occurrence regulation, resources collection, resistance identification and breeding, biological agent exploration, formulation, chemicals evaluation, and control strategy. About 1200 collections of local and commercial crucifers were identified in the field and by artificiall inoculation in the laboratories, 10 resistant cultivars were breeded including 7 Chinese cabbages and 3 cabbages. More than 800 antagostic strains were isolated including bacteria, stretomyces and fungi. Around 100 chemicals were evaluated in the field and greenhouse based on its control effect, among them, 6 showed high control effect, especially fluazinam and cyazofamid could control about 80% the disease. However, fluzinam has negative effect on soil microbes. Clubroot disease could not be controlled by bioagents and chemicals once when the pathogen Plasmodiophora brassicae infected its hosts and set up the parasitic relationship. We found the earlier the pathogent infected its host, the severer the disease was. Therefore, early control was the most effective. For Chinese cabbage, all controlling measures should be taken in the early 30 days because the new infection could not cause severe symptom after 30 days of seeding. For example, a biocontrol agent, Bacillus subtilis Strain XF-1 could control the disease 70%-85% averagely when it mixed with seedling substrate and was drenching 3 times after transplanting, i.e. immediately, 7 days, 14 days. XF-1 has been deeply researched in control mechanisms, its genome, and development and application of biocontrol formulate. It could produce antagonistic protein, enzyme, antibiotics and IAA, which promoted rhizogenesis and growth. Its The genome was sequenced by Illumina/Solexa Genome Analyzer to assembled into 20 scaffolds then the gaps between scaffolds were filled by long fragment PCR amplification to obtain complet genmone with 4,061,186 bp in size. The whole genome was found to have 43.8% GC, 108 tandem repeats with an average of 2.65 copies and 84 transposons. The CDSs were predicted as 3,853 in which 112 CDSs were predicted to secondary metabolite biosynthesis, transport and catabolism. Among those, five NRPS/PKS giant gene clusters being responsible for the biosynthesis of polyketide (pksABCDEFHJLMNRS in size 72.9 kb), surfactin(srfABCD, 26.148 kb, bacilysin(bacABCDE 5.903 kb), bacillibactin(dhbABCEF, 11.774 kb) and fengycin(ppsABCDE, 37.799 kb) have high homolgous to fuction confirmed biosynthesis gene in other strain. Moreover, there are many of key regulatory genes for secondary metabolites from XF-1, such as comABPQKX Z, degQ, sfp, yczE, degU, ycxABCD and ywfG. were also predicted. Therefore, XF-1 has potential of biosynthesis for secondary metabolites surfactin, fengycin, bacillibactin, bacilysin and Bacillaene. Thirty two compounds were detected from cell extracts of XF-1 by MALDI-TOF-MS, including one Macrolactin (m/z 441.06), two fusaricidin (m/z 850.493 and 968.515), one circulocin (m/z 852.509), nine surfactin (m/z 1044.656~1102.652), five iturin (m/z 1096.631~1150.57) and forty fengycin (m/z 1449.79~1543.805). The top three compositions types (contening 56.67% of total extract) are surfactin, iturin and fengycin, in which the most abundant is the surfactin type composition 30.37% of total extract and in second place is the fengycin with 23.28% content with rich diversity of chemical structure, and the smallest one is the iturin with 3.02% content. Moreover, the same main compositions were detected in Bacillus sp.355 which is also a good effects biocontol bacterial for controlling the clubroot of crucifer. Wherefore those compounds surfactin, iturin and fengycin maybe the main active compositions of XF-1 against P. brassicae. Twenty one fengycin type compounds were evaluate by LC-ESI-MS/MS with antifungal activities, including fengycin A $C_{16{\sim}C19}$, fengycin B $C_{14{\sim}C17}$, fengycin C $C_{15{\sim}C18}$, fengycin D $C_{15{\sim}C18}$ and fengycin S $C_{15{\sim}C18}$. Furthermore, one novel compound was identified as Dehydroxyfengycin $C_{17}$ according its MS, 1D and 2D NMR spectral data, which molecular weight is 1488.8480 Da and formula $C_{75}H_{116}N_{12}O_{19}$. The fengycin type compounds (FTCPs $250{\mu}g/mL$) were used to treat the resting spores of P. brassicae ($10^7/mL$) by detecting leakage of the cytoplasm components and cell destruction. After 12 h treatment, the absorbencies at 260 nm (A260) and at 280 nm (A280) increased gradually to approaching the maximum of absorbance, accompanying the collapse of P. brassicae resting spores, and nearly no complete cells were observed at 24 h treatment. The results suggested that the cells could be lyzed by the FTCPs of XF-1, and the diversity of FTCPs was mainly attributed to a mechanism of clubroot disease biocontrol. In the five selected medium MOLP, PSA, LB, Landy and LD, the most suitable for growth of strain medium is MOLP, and the least for strains longevity is the Landy sucrose medium. However, the lipopeptide highest yield is in Landy sucrose medium. The lipopeptides in five medium were analyzed with HPLC, and the results showed that lipopeptides component were same, while their contents from B. subtilis XF-1 fermented in five medium were different. We found that it is the lipopeptides content but ingredients of XF-1 could be impacted by medium and lacking of nutrition seems promoting lipopeptides secretion from XF-1. The volatile components with inhibition fungal Cylindrocarpon spp. activity which were collect in sealed vesel were detected with metheds of HS-SPME-GC-MS in eight biocontrol Bacillus species and four positive mutant strains of XF-1 mutagenized with chemical mutagens, respectively. They have same main volatile components including pyrazine, aldehydes, oxazolidinone and sulfide which are composed of 91.62% in XF-1, in which, the most abundant is the pyrazine type composition with 47.03%, and in second place is the aldehydes with 23.84%, and the third place is oxazolidinone with 15.68%, and the smallest ones is the sulfide with 5.07%.

  • PDF

An Expert System for the Estimation of the Growth Curve Parameters of New Markets (신규시장 성장모형의 모수 추정을 위한 전문가 시스템)

  • Lee, Dongwon;Jung, Yeojin;Jung, Jaekwon;Park, Dohyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.17-35
    • /
    • 2015
  • Demand forecasting is the activity of estimating the quantity of a product or service that consumers will purchase for a certain period of time. Developing precise forecasting models are considered important since corporates can make strategic decisions on new markets based on future demand estimated by the models. Many studies have developed market growth curve models, such as Bass, Logistic, Gompertz models, which estimate future demand when a market is in its early stage. Among the models, Bass model, which explains the demand from two types of adopters, innovators and imitators, has been widely used in forecasting. Such models require sufficient demand observations to ensure qualified results. In the beginning of a new market, however, observations are not sufficient for the models to precisely estimate the market's future demand. For this reason, as an alternative, demands guessed from those of most adjacent markets are often used as references in such cases. Reference markets can be those whose products are developed with the same categorical technologies. A market's demand may be expected to have the similar pattern with that of a reference market in case the adoption pattern of a product in the market is determined mainly by the technology related to the product. However, such processes may not always ensure pleasing results because the similarity between markets depends on intuition and/or experience. There are two major drawbacks that human experts cannot effectively handle in this approach. One is the abundance of candidate reference markets to consider, and the other is the difficulty in calculating the similarity between markets. First, there can be too many markets to consider in selecting reference markets. Mostly, markets in the same category in an industrial hierarchy can be reference markets because they are usually based on the similar technologies. However, markets can be classified into different categories even if they are based on the same generic technologies. Therefore, markets in other categories also need to be considered as potential candidates. Next, even domain experts cannot consistently calculate the similarity between markets with their own qualitative standards. The inconsistency implies missing adjacent reference markets, which may lead to the imprecise estimation of future demand. Even though there are no missing reference markets, the new market's parameters can be hardly estimated from the reference markets without quantitative standards. For this reason, this study proposes a case-based expert system that helps experts overcome the drawbacks in discovering referential markets. First, this study proposes the use of Euclidean distance measure to calculate the similarity between markets. Based on their similarities, markets are grouped into clusters. Then, missing markets with the characteristics of the cluster are searched for. Potential candidate reference markets are extracted and recommended to users. After the iteration of these steps, definite reference markets are determined according to the user's selection among those candidates. Then, finally, the new market's parameters are estimated from the reference markets. For this procedure, two techniques are used in the model. One is clustering data mining technique, and the other content-based filtering of recommender systems. The proposed system implemented with those techniques can determine the most adjacent markets based on whether a user accepts candidate markets. Experiments were conducted to validate the usefulness of the system with five ICT experts involved. In the experiments, the experts were given the list of 16 ICT markets whose parameters to be estimated. For each of the markets, the experts estimated its parameters of growth curve models with intuition at first, and then with the system. The comparison of the experiments results show that the estimated parameters are closer when they use the system in comparison with the results when they guessed them without the system.

IR Study on the Adsorption of Carbon Monoxide on Silica Supported Ruthenium-Nickel Alloy (실리카 지지 루테늄-니켈 합금에 있어서 일산화탄소의 흡착에 관한 IR 연구)

  • Park, Sang-Youn;Yoon, Dong-Wook
    • Applied Chemistry for Engineering
    • /
    • v.17 no.4
    • /
    • pp.349-356
    • /
    • 2006
  • We have investigated adsorption and desorption properties of CO adsorption on silica supported Ru/Ni alloys at various Ru/Ni mole content ratio as well as CO partial pressures using Fourier transform infrared spectrometer (FT-IR). For Ru-$SiO_{2}$ sample, four bands were observed at $2080.0cm^{-1}$, $2021.0{\sim}2030.7cm^{-1}$, $1778.9{\sim}1799.3cm^{-1}$, $1623.8cm^{-1}$ on adsorption and three bands were observed at $2138.7cm^{-1}$, $2069.3cm^{-1}$, $1988.3{\sim}2030.7cm^{-1}$ on vacumn desorption. For Ni-$SiO_{2}$ sample, four bands were observed at $2057.7cm^{-1}$, $2019.1{\sim}2040.3cm^{-1}$, $1862.9{\sim}1868.7cm^{-1}$, $1625.7cm^{-1}$ on adsorption and two bands were observed at $2009.5{\sim}2040.3cm^{-1}$, $1828.4{\sim}1868.7cm^{-1}$ on vacumn desorption. These absorption bands correspond with those of the previous reports approximately. For Ru/Ni(9/1, 8/2, 7/3, 6/4, 5/5; mole content ratio)-$SiO_{2}$ samples, three bands were observed at $2001.8{\sim}2057.7cm^{-1}$, $1812.8{\sim}1926.5cm^{-1}$, $1623.8{\sim}1625.7cm^{-1}$ on adsorption and three bands were observed at $2140.6cm^{-1}$, $2073.1cm^{-1}$, $1969.0{\sim}2057.7cm^{-1}$ on vacumn desorption. The spectrum pattern observed for Ru/Ni-$SiO_{2}$ sample at 9/1 Ru/Ni mole content ratio on CO adsorption and on vacumn desorption is almost like the spectrum pattern observed for Ru-$SiO_{2}$ sample. But the spectrum patterns observed for Ru/Ni-$SiO_{2}$ samples under 8/2 Ru/Ni mole content ratio on CO adsorption and vacumn desorption are almost like the pattern observed for $Ni-SiO_{2}$ sample. It may be suggested surfaces of alloy clusters on the Ru/Ni-$SiO_{2}$ samples contain more Ni components than the mole content ratio of the sample considering the above phenomena. With Ru/Ni-$SiO_{2}$ samples the absorption band shifts may be ascribed to variations of surface concentration, strain variation due to atomic size difference, variation of bonding energy and electronic densities, and changes of surface geometries according to surface concentration variation. Studies for CO adsorption on Ru/Ni alloy cluster surface by LEED and Auger spectroscopy, interation between Ru/Ni alloy cluster and $SiO_{2}$, and MO calculation for the system would be needed to look into the phenomena.

Association of osteoarthritis and bone mineral density in women -The health and nutritional examination survey in Kuri- (여성의 골관절염과 골밀도간의 관련성 분석 -구리시민 건강.영양진단 조사결과를 바탕으로-)

  • Sheen, Seung-Soo;Lee, Soon-Young;Min, Byung-Hyun;Suh, Il
    • Journal of Preventive Medicine and Public Health
    • /
    • v.30 no.4 s.59
    • /
    • pp.669-685
    • /
    • 1997
  • Previous studies, reporting the inverse relationship between osteoarthritis and osteoporosis suggest the existence of possible pathophysiologic mechanisms between them. To examinine the hypothesis that 'bone mineral densities of women with osteoarthritis are significantly higher than that of women without osteoarthritis in Korea', subjects from the health and nutritional examination survey in Kuri city were sampled. Samples were selected through multi-stage sampling frame using established clusters in Kuri city. From August 18 to September 10,1997, the survey was conducted. Among the. total number of selected sample population (1,656 people), response .ate was 52.4 percent (348 men and 519 women). 420 women who took BMD measurement, radiologic exam, and anthropometric exam were selected for the analysis. The analytic results are as follows. 1. General characteristics: Mean BMD was $0.493g/cm^2$, mean age was 43.0, mean BMI was $23.9kg/m^2$. The number of women who experienced menopause was 106, hysterectomy was 19. There were 0 case of osteoarthritis of hip, 64 cases of osteoarthritis of knee, and 2 cases of osteoarthritis of hand. 2. Univariate analysis results: Mean BMD of women with the osteoarthritis of knee was significantly lower than that of women without the osteoarthritis of knee(0.4269 vs. $0.5057g/cm^2$). But, there were too few cases of osteoarthritis of hip and hand, so comparative studies of BMD in osteoarthritis of hip and hand could not be conducted. There were significant differences of BMD among pre-menopause group(0.5204), post-menopause group(0.4206), and hysterectomy group(0.4881). Additionally, there were significant differences of BMD among diabetes group(0.4297), impaired glucose tolerance group(0.4874), and normal group(0.5057). Furthermore, age, parity, BMI, bioimpedance were significantly related with BMD. 3. Multivariate analysis results: To examinine the relationship between osteoarthritis and BMD while controlling the other variables' effects which were significant in the univariate analyses, multiple linear regression analysis was done. But, it was found that osteoarthritis of knee was not a significant variable to BMD anymore. While age and menopause had significant negative relationship with BMD. Diabetes, parity, BMI, and bioimpedance did not have significant relationships with BMD. After stratification of subjects according to menopause, multiple linear regression analyses were done to each strata. Consequently, age in post-menopause group, age and osteoarthritis of knee in hysterectomy group showed significant negative relationship with BMD. The results did not support the many results of other previous studies done with white men and women. further studies of biological plausibility to Korean women are recommended. Also it is suggested that longitudinal study to verify the relationship between osteoarthritis and BMD will be valuable.

  • PDF

A Study on Dietary Behavior of Chinese Consumers Segmented by Dietary Lifestyle (중국 현지 소비자들의 식생활 라이프스타일 세분화에 따른 식행동 연구)

  • Oh, Ji Eun;Yoon, Hei-Ryeo
    • Journal of the Korean Society of Food Culture
    • /
    • v.32 no.5
    • /
    • pp.383-393
    • /
    • 2017
  • This study was conducted to analyze the dietary lifestyle of local Chinese consumers and to classify dietary characteristics according to their dietary lifestyle factors and dietary behaviors. This investigation was conducted for 1 month from 1 January 2017 targeting 300 adult males and females living in China using the online survey company surveymonkey. Four factors relating to dietary lifestyle were identified, gourmet factor, healthy factor, convenience factor and economic factor, and these were grouped into 4 clusters according to their dietary lifestyle factor scores. Group 1, the gourmet economy group, showed a high percentage of living alone and a high frequency of eating out, but a relatively low percentage of three regular meals per day. Their dietary lifestyle was sensitive to gourmet factors and economic factors, but less sensitive to health and convenience factors. Group 2, the wide interest group, contained a high percentage of individuals in their 30s, as well as more highly educated individuals and a higher income than other groups. Because their dietary lifestyle scores tended to be higher than those of other groups, they sought a variety of new foods and gourmet meals for enjoyment of dining and life, as well as well-being food materials and foods related to health. Group 3, the health economic group, constituted a family-type consumer group with lower income level than the other groups. Members of this group were seeking health food and natural food in their dietary lifestyle and tended to pursue a high economic profit ratio when purchasing food. Finally, group 4 showed a relatively higher percentage of women over 30 and individuals with a college level or higher education than the other groups. This group was more interested in health and taste than price and convenience, and showed the highest LOHAS orientation among middle aged Chinese women. Moreover, members of this group directly utilized their knowledge regarding nutrition in real life.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

A Study on the Clustering Method of Row and Multiplex Housing in Seoul Using K-Means Clustering Algorithm and Hedonic Model (K-Means Clustering 알고리즘과 헤도닉 모형을 활용한 서울시 연립·다세대 군집분류 방법에 관한 연구)

  • Kwon, Soonjae;Kim, Seonghyeon;Tak, Onsik;Jeong, Hyeonhee
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.95-118
    • /
    • 2017
  • Recent centrally the downtown area, the transaction between the row housing and multiplex housing is activated and platform services such as Zigbang and Dabang are growing. The row housing and multiplex housing is a blind spot for real estate information. Because there is a social problem, due to the change in market size and information asymmetry due to changes in demand. Also, the 5 or 25 districts used by the Seoul Metropolitan Government or the Korean Appraisal Board(hereafter, KAB) were established within the administrative boundaries and used in existing real estate studies. This is not a district classification for real estate researches because it is zoned urban planning. Based on the existing study, this study found that the city needs to reset the Seoul Metropolitan Government's spatial structure in estimating future housing prices. So, This study attempted to classify the area without spatial heterogeneity by the reflected the property price characteristics of row housing and Multiplex housing. In other words, There has been a problem that an inefficient side has arisen due to the simple division by the existing administrative district. Therefore, this study aims to cluster Seoul as a new area for more efficient real estate analysis. This study was applied to the hedonic model based on the real transactions price data of row housing and multiplex housing. And the K-Means Clustering algorithm was used to cluster the spatial structure of Seoul. In this study, data onto real transactions price of the Seoul Row housing and Multiplex Housing from January 2014 to December 2016, and the official land value of 2016 was used and it provided by Ministry of Land, Infrastructure and Transport(hereafter, MOLIT). Data preprocessing was followed by the following processing procedures: Removal of underground transaction, Price standardization per area, Removal of Real transaction case(above 5 and below -5). In this study, we analyzed data from 132,707 cases to 126,759 data through data preprocessing. The data analysis tool used the R program. After data preprocessing, data model was constructed. Priority, the K-means Clustering was performed. In addition, a regression analysis was conducted using Hedonic model and it was conducted a cosine similarity analysis. Based on the constructed data model, we clustered on the basis of the longitude and latitude of Seoul and conducted comparative analysis of existing area. The results of this study indicated that the goodness of fit of the model was above 75 % and the variables used for the Hedonic model were significant. In other words, 5 or 25 districts that is the area of the existing administrative area are divided into 16 districts. So, this study derived a clustering method of row housing and multiplex housing in Seoul using K-Means Clustering algorithm and hedonic model by the reflected the property price characteristics. Moreover, they presented academic and practical implications and presented the limitations of this study and the direction of future research. Academic implication has clustered by reflecting the property price characteristics in order to improve the problems of the areas used in the Seoul Metropolitan Government, KAB, and Existing Real Estate Research. Another academic implications are that apartments were the main study of existing real estate research, and has proposed a method of classifying area in Seoul using public information(i.e., real-data of MOLIT) of government 3.0. Practical implication is that it can be used as a basic data for real estate related research on row housing and multiplex housing. Another practical implications are that is expected the activation of row housing and multiplex housing research and, that is expected to increase the accuracy of the model of the actual transaction. The future research direction of this study involves conducting various analyses to overcome the limitations of the threshold and indicates the need for deeper research.