• Title/Summary/Keyword: open data quality

Search Result 517, Processing Time 0.026 seconds

KOMUChat: Korean Online Community Dialogue Dataset for AI Learning (KOMUChat : 인공지능 학습을 위한 온라인 커뮤니티 대화 데이터셋 연구)

  • YongSang Yoo;MinHwa Jung;SeungMin Lee;Min Song
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.219-240
    • /
    • 2023
  • Conversational AI which allows users to interact with satisfaction is a long-standing research topic. To develop conversational AI, it is necessary to build training data that reflects real conversations between people, but current Korean datasets are not in question-answer format or use honorifics, making it difficult for users to feel closeness. In this paper, we propose a conversation dataset (KOMUChat) consisting of 30,767 question-answer sentence pairs collected from online communities. The question-answer pairs were collected from post titles and first comments of love and relationship counsel boards used by men and women. In addition, we removed abuse records through automatic and manual cleansing to build high quality dataset. To verify the validity of KOMUChat, we compared and analyzed the result of generative language model learning KOMUChat and benchmark dataset. The results showed that our dataset outperformed the benchmark dataset in terms of answer appropriateness, user satisfaction, and fulfillment of conversational AI goals. The dataset is the largest open-source single turn text data presented so far and it has the significance of building a more friendly Korean dataset by reflecting the text styles of the online community.

A Study of Family Caregiver's Burden for the Terminally III Patients (지역사회 말기질환자 가족 부담감에 관한 연구)

  • Han, Sung-Suk;Ro, You-Ja;Yang, Soo;Yoo, Yang-Sook;Kim, Sek-Il;Hwang, Hee-Hyung
    • Journal of Home Health Care Nursing
    • /
    • v.10 no.1
    • /
    • pp.58-72
    • /
    • 2003
  • The purpose of this study was to describe the perceived burden of the terminally III patients's caregiver and to analyze relationship between the perceived burden and the various demographics, illness characteristics, family relationships, and economic factor of the family & patients. The sample of 132 caregivers who care for the terminally III patients Kyung-Gi province, Seoul, Korea. The period of this study was from August to September, 2002. The perceived burden of the family caregiver was measured by the burden scale(20 items, 4 point scale) developed by Montgomery et al. (1985). The Data was analyzed using SAS-program by t-test and ANOVA. The results were as follows; 1. The mean of the family caregiver's burden score was 3.02. The score showed that caregivers perceive severe the level of burden. The hight items of the family caregiver's burden were' I feel it is painful to watch patient's diseases'(3.77). 'I feel afraid for what the future holds for my patients'(3.66), 'I feel it reduced to amount of privacy time'(3.64). 2. The caregiver's burden was significantly related to patient's gender(F=3.17, p= 0.0020), patient's job(F=2.49, p=0.0476), caregiver's age(F=4.29, p=0.0030), and caregiver's job(F=2.49, p=0.0476). 3. The caregiver's burden according to illness characteristics showed no significant difference. 4. The caregiver's burden was significantly associated with patient's family relationship (F=4.05, p=0.0041), patient's care mean period in a day(F=47.18,

  • PDF

Future Development Strategies for KODISA Journals: Overview of 2016 and Strategic Plans for the Future (KODISA 학술지 성장전략: 2016 개관 및 미래 성장개요)

  • Hwang, Hee-Joong;Lee, Jung-Wan;Youn, Myoung-Kil;Kim, Dong-Ho;Lee, Jong-Ho;Shin, Dong-Jin;Kim, Byung-Goo;Kim, Tae-Joong;Lee, Yong-Ki;Kim, Wan-Ki
    • Journal of Distribution Science
    • /
    • v.15 no.5
    • /
    • pp.75-83
    • /
    • 2017
  • Purpose - With the rise of the fourth industrial revolution, it has converged with the existing industrial revolution to give shape to increased accessibility of knowledge and information. As a result, it has become easier for scholars to actively persue and compile research in various fields. This current study aims to focus and assess the current standing of KODISA: the Journal of Distribution Science (JDS), International Journal of Industrial Distribution & Business(IJIDB), the East Asian Journal of Business Management (EAJBM), the Journal of Asian Finance, Economics and Business (JAFEB) in a rapidly evolving era. Novel strategies for creating the future vision of KODISA 2020 will also be examined. Research design, data, and methodology - The current research will analyze published journals of KODISA in order to offer a vision for the KODISA 2020 future. In part 1, this paper will observe the current address of the KODISA journal and its overview of past achievements. Next, part 2 will discuss the activities that will be needed for journals of KODISA, JDS, IJIDB, EAJBM, JAFEB to branch out internationally and significant journals will be statistically analyzed in part 3. The last part 4 will offer strategies for the continued growth of KODISA and visions for KODISA 2020. Results - Among the KODISA publications, IJIDB was second, JDS was 23rd (in economic publications of 54 journals), and EAJBM was 22nd (out of 79 publications in management field journals). This shows the high quality of the KODISA publication journals. According to 2016 publication analysis, JDS, IJIDB, etc. each had 157 publications, 15 publications, 16 publications, and 28 publications. In the case of JDS, it showed an increase of 14% compared to last year. Additionally, JAFEB showed a significant increase of 68%. This shows that compared to other journals, it had a higher rate of paper submission. IJIDB and EAJBM did not show any significant increases. In JDS, it showed many studies related to the distribution, management of distribution, and consumer behavior. In order to increase the status of the KODISA journal to a SCI status, many more international conferences will open to increase its international recognition levels. Second, the systematic functions of the journal will be developed further to increase its stability. Third, future graduate schools will open to foster future potential leaders in this field and build a platform for innovators and leaders. Conclusions - In KODISA, JDS was first published in 1999, and has been registered in SCOPUS February 2017. Other sister publications within the KODISA are preparing for SCOPUS registration as well. KODISA journals will prepare to be an innovative journal for 2020 and the future beyond.

Comparative Analysis of Diversity Characteristics (γ-, α-, and β-diversity) of Biological Communities in the Korean Peninsula Estuaries (하구 순환 유지 여부에 따른 하구 주요 생물 군집별 다양성 특성 연구: 열린하구와 닫힌하구에서의 γ-, α- 및 β-다양성 비교)

  • Oh, Hye-Ji;Jang, Min-Ho;Kim, Jeong-Hui;Kim, Yong-Jae;Lim, Sung-Ho;Won, Doo-Hee;Moon, Jeong-Suk;Kwon, Soonhyun;Chang, Kwang-Hyeon
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.1
    • /
    • pp.84-98
    • /
    • 2022
  • Estuary is important in terms of biodiversity because it has the characteristics of transition waters, created by the mixing of fresh- and seawater. The estuarine water circulation provides a variety of habitats with different environments by inducing gradients in the chemical and physical environment, such as water quality and river bed structure, which are ultimately the main factors influencing biological community composition. If the water circulation is interrupted, the loss of brackish areas and the interception of migration of biological communities will lead to changes in the spatial distribution of biodiversity. In this study, among the sites covered by the Estuary Aquatic Ecosystem Health Assessment, we selected study sites where changes in biodiversity can be assessed by spatial gradient from the upper reaches of the river to the lower estuarine area. The α-, γ- and β-diversity of diatom, benthic macroinvertebrates, and fish communities were calculated, and they were divided into open and closed estuary data and compared to determine the trends in biodiversity variation due to estuarine circulation. As results, all communities showed higher γ-diversity at open estuary sites. The benthic macroinvertebrate community showed a clear difference between open and closed estuaries in β-diversity, consequently the estuarine transects were considered as a factor that decreases spatial heterogeneity of their diversity among sites. The biodiversity trends analyzed in this study will be used to identify estuaries with low γ- and β-diversity by community, providing a useful resource for further mornitoring and management to maintain estuarine health.

GIS-based Market Analysis and Sales Management System : The Case of a Telecommunication Company (시장분석 및 영업관리 역량 강화를 위한 통신사의 GIS 적용 사례)

  • Chang, Nam-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.61-75
    • /
    • 2011
  • A Geographic Information System(GIS) is a system that captures, stores, analyzes, manages and presents data with reference to geographic location data. In the later 1990s and earlier 2000s it was limitedly used in government sectors such as public utility management, urban planning, landscape architecture, and environmental contamination control. However, a growing number of open-source packages running on a range of operating systems enabled many private enterprises to explore the concept of viewing GIS-based sales and customer data over their own computer monitors. K telecommunication company has dominated the Korean telecommunication market by providing diverse services, such as high-speed internet, PSTN(Public Switched Telephone Network), VOLP (Voice Over Internet Protocol), and IPTV(Internet Protocol Television). Even though the telecommunication market in Korea is huge, the competition between major services providers is growing more fierce than ever before. Service providers struggled to acquire as many new customers as possible, attempted to cross sell more products to their regular customers, and made more efforts on retaining the best customers by offering unprecedented benefits. Most service providers including K telecommunication company tried to adopt the concept of customer relationship management(CRM), and analyze customer's demographic and transactional data statistically in order to understand their customer's behavior. However, managing customer information has still remained at the basic level, and the quality and the quantity of customer data were not enough not only to understand the customers but also to design a strategy for marketing and sales. For example, the currently used 3,074 legal regional divisions, which are originally defined by the government, were too broad to calculate sub-regional customer's service subscription and cancellation ratio. Additional external data such as house size, house price, and household demographics are also needed to measure sales potential. Furthermore, making tables and reports were time consuming and they were insufficient to make a clear judgment about the market situation. In 2009, this company needed a dramatic shift in the way marketing and sales activities, and finally developed a dedicated GIS_based market analysis and sales management system. This system made huge improvement in the efficiency with which the company was able to manage and organize all customer and sales related information, and access to those information easily and visually. After the GIS information system was developed, and applied to marketing and sales activities at the corporate level, the company was reported to increase sales and market share substantially. This was due to the fact that by analyzing past market and sales initiatives, creating sales potential, and targeting key markets, the system could make suggestions and enable the company to focus its resources on the demographics most likely to respond to the promotion. This paper reviews subjective and unclear marketing and sales activities that K telecommunication company operated, and introduces the whole process of developing the GIS information system. The process consists of the following 5 modules : (1) Customer profile cleansing and standardization, (2) Internal/External DB enrichment, (3) Segmentation of 3,074 legal regions into 46,590 sub_regions called blocks, (4) GIS data mart design, and (5) GIS system construction. The objective of this case study is to emphasize the need of GIS system and how it works in the private enterprises by reviewing the development process of the K company's market analysis and sales management system. We hope that this paper suggest valuable guideline to companies that consider introducing or constructing a GIS information system.

Quality characteristics of different parts of garlic sprouts produced by smart farms during growth (스마트팜 생산 새싹마늘의 부위별 및 생육 기간에 따른 품질 특성)

  • Yu-Ri Choi;Su-Hwan Kim;Chae-Mi Lee;Dong-Hun Lee;Chae-Yun Lee;Hyeong-Woo Jo;Jae-Hee Jeong;Imkyung Oh;Ho-Kyung Ha;Jungsil Kim;Chang-Ki Huh
    • Food Science and Preservation
    • /
    • v.30 no.2
    • /
    • pp.272-286
    • /
    • 2023
  • Garlic sprouts can provide data on functional and food processing materials. This study compared the leaves, bulbs, and roots of garlic sprouts grown on smart farms during two growth periods (20 and 25 days). In addition, data for garlic bulbs grown in open fields were presented as reference materials. All garlic sprouts' total free sugar content decreased as the growth period increased. All plant parts' total organic acid content decreased as the growth period progressed, except for the root section. Potassium, phosphorus, and sulfur content increased during growth in all parts of the garlic sprouts. Alliin content decreased in all parts of the plant over time, whereas thiosulfinate content increased in the roots but decreased in the leaves and bulbs. Total polyphenol content increased in all parts of the plant during the growth period, except for the bulb, whereas the flavonoid content did not change significantly over time. The 2,2-diphenyl-1-picrylhydrazy (DPPH) and 2,2'-azinobis (3-ethylben-zothiazoline 6-sulfonate) (ABTS) free radical scavenging activities, as well as the superoxide dismutase (SOD)-like activity of garlic sprouts were 37.45-65.47%, 59.12-89.81%, and 89.52-98.59%, respectively. These activities tend to decrease during the growth period. Here, we showed that garlic sprouts have higher levels of functional substances and physiological activities than general garlic sprouts. It was also determined that a growth period of 20 days was suitable for garlic sprouts. Data for research on functional and food-processing materials can be obtained by analyzing garlic sprouts produced by smart farms.

SUITABILITY OF SHELLFISHES FOR PROCESSING 3. Suitability of Pacific oyster for processing (패류의 가공적성 3. 굴의 가공적성)

  • LEE Eung-Ho;CHUNG Seung-Yong;KIM Soo-Hyeun;RYU Byeong-Ho;HA Jin-Hwan;OH Hoo-Gyu;SUNG Nak-Ju;YANG Syng-Tack
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.8 no.2
    • /
    • pp.90-100
    • /
    • 1975
  • The estimation of the pre-processing condition of oyster is of great importance for distributors and processors. This study was attempted to establish the basic data for evaluating the processing suitability of oyster, which is the most important shellfish for domestic use and export. The data were analysed by measuring the condition index, chemical composition and heavy metal content of oysters. In order to eliminate the manual work that has to be done on a tightly closed oyster shell and avoid shrinkage in the oyster meat which is attendant on the steaming process, chemical means to open oyster were examined. finding the method of pretreatment of polyphosphate for frozen oysters were attempted to improve the product quality. The prevention of undesirable color change of the canned oyster meat is another problem to solve. The important results are as follows : 1. The ratio of meat volume and meat weight to the holding capacity by shells may be useful as an index to measure the condition index of oysters. 2. As a whole, monthly changes of moisture and fat content in oysters were reversely correlated. Protein content slightly decreased from April and rapidly decreased in July, and again rapidly increased in August but from September to November decreased slightly. In April, the content of glycogen was 4 percent. From this period to September, glycogen was rapidly decreased. From July to September, it was only 0. 7 to 1 percent but increased from October. There were little seasonal changes in pH value. The pH value of oyster meat was 6.0 to 6.2. The crude ash content was slightly decreased from June to August. 3. The range of monthly change of heavy metal content are as follows: Total mercury was 0 to 0.019 ppm, cadmium was 0.026 to 0.053 ppm, copper was 0.111 to 0.594 ppm, and lead_was 0.061 to 0.581 ppm. 4. By the results of condition index, chemical composition and heavy metal content of oysters, the suitable harvest season as raw materials for processing was the end of December to the end of May of next year. 5. The pretreatment of 10 percent polyphosphate in 5 percent salt solution of oyster meat appeared effective to reduce thawing drip during cold storage. 6. The pretreatment of $Na_2EDTA$ and BHA did not show the color prevention effect to the canned oyster meat during storage. 7. Magnesium chloride affected to open the valves of oysters.

  • PDF

KoFlux's Progress: Background, Status and Direction (KoFlux 역정: 배경, 현황 및 향방)

  • Kwon, Hyo-Jung;Kim, Joon
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.12 no.4
    • /
    • pp.241-263
    • /
    • 2010
  • KoFlux is a Korean network of micrometeorological tower sites that use eddy covariance methods to monitor the cycles of energy, water, and carbon dioxide between the atmosphere and the key terrestrial ecosystems in Korea. KoFlux embraces the mission of AsiaFlux, i.e. to bring Asia's key ecosystems under observation to ensure quality and sustainability of life on earth. The main purposes of KoFlux are to provide (1) an infrastructure to monitor, compile, archive and distribute data for the science community and (2) a forum and short courses for the application and distribution of knowledge and data between scientists including practitioners. The KoFlux community pursues the vision of AsiaFlux, i.e., "thinking community, learning frontiers" by creating information and knowledge of ecosystem science on carbon, water and energy exchanges in key terrestrial ecosystems in Asia, by promoting multidisciplinary cooperations and integration of scientific researches and practices, and by providing the local communities with sustainable ecosystem services. Currently, KoFlux has seven sites in key terrestrial ecosystems (i.e., five sites in Korea and two sites in the Arctic and Antarctic). KoFlux has systemized a standardized data processing based on scrutiny of the data observed from these ecosystems and synthesized the processed data for constructing database for further uses with open access. Through publications, workshops, and training courses on a regular basis, KoFlux has provided an agora for building networks, exchanging information among flux measurement and modelling experts, and educating scientists in flux measurement and data analysis. Despite such persistent initiatives, the collaborative networking is still limited within the KoFlux community. In order to break the walls between different disciplines and boost up partnership and ownership of the network, KoFlux will be housed in the National Center for Agro-Meteorology (NCAM) at Seoul National University in 2011 and provide several core services of NCAM. Such concerted efforts will facilitate the augmentation of the current monitoring network, the education of the next-generation scientists, and the provision of sustainable ecosystem services to our society.

Comparative assessment and uncertainty analysis of ensemble-based hydrologic data assimilation using airGRdatassim (airGRdatassim을 이용한 앙상블 기반 수문자료동화 기법의 비교 및 불확실성 평가)

  • Lee, Garim;Lee, Songhee;Kim, Bomi;Woo, Dong Kook;Noh, Seong Jin
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.10
    • /
    • pp.761-774
    • /
    • 2022
  • Accurate hydrologic prediction is essential to analyze the effects of drought, flood, and climate change on flow rates, water quality, and ecosystems. Disentangling the uncertainty of the hydrological model is one of the important issues in hydrology and water resources research. Hydrologic data assimilation (DA), a technique that updates the status or parameters of a hydrological model to produce the most likely estimates of the initial conditions of the model, is one of the ways to minimize uncertainty in hydrological simulations and improve predictive accuracy. In this study, the two ensemble-based sequential DA techniques, ensemble Kalman filter, and particle filter are comparatively analyzed for the daily discharge simulation at the Yongdam catchment using airGRdatassim. The results showed that the values of Kling-Gupta efficiency (KGE) were improved from 0.799 in the open loop simulation to 0.826 in the ensemble Kalman filter and to 0.933 in the particle filter. In addition, we analyzed the effects of hyper-parameters related to the data assimilation methods such as precipitation and potential evaporation forcing error parameters and selection of perturbed and updated states. For the case of forcing error conditions, the particle filter was superior to the ensemble in terms of the KGE index. The size of the optimal forcing noise was relatively smaller in the particle filter compared to the ensemble Kalman filter. In addition, with more state variables included in the updating step, performance of data assimilation improved, implicating that adequate selection of updating states can be considered as a hyper-parameter. The simulation experiments in this study implied that DA hyper-parameters needed to be carefully optimized to exploit the potential of DA methods.

Effect of Sample Preparations on Prediction of Chemical Composition for Corn Silage by Near Infrared Reflectance Spectroscopy (시료 전처리 방법이 근적외선분광법을 이용한 옥수수 사일리지의 화학적 조성분 평가에 미치는 영향)

  • Park Hyung-Soo;Lee Jong-Kyung;Lee Hyo-Won;Hwang Kyung-Jun;Jung Ha-Yeon;Ko Moon-Suck
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.26 no.1
    • /
    • pp.53-62
    • /
    • 2006
  • Near infrared reflectance spectroscopy (NIRS) has been increasingly used as a rapid, accurate method of evaluating some chemical compositions in forages. Analysis of forage quality by NIRS usually involves dry ground samples. Costs might be reduced if samples could be analyzed without drying or grinding. The objective of this study was to investigate effect of sample preparations and spectral math treatments on prediction ability of chemical composition for corn silage by NIRS. A population of 112 corn silage representing a wide range in chemical parameters were used in this investigation. Samples of com silage were scanned at 2nm intervals over the wavelength range 400-2500nm and the optical data recorded as log l/Reflectance(log l/R) and scanned in overt-dried grinding(ODG), liquid nitrogen grinding(LNG) or intact fresh(IF) condition. Samples were analysed for neutral detergent fiber(NDF), acid detergent fiber(ADF), acid detergent lignin(ADL), crude protein(CP) and crude ash content were expressed on a dry-matter(DM) basis. The spectral data were regressed against a range of chemical parameters using modified partial least squares(MPLS) multivariate analysis in conjunction with four spectral math treatments to reduce the effect of extraneous noise. The optimum calibrations were selected on the basis of minimizing the standard error of cross validation(SECV). The results of this study show that NIRS predicted the chemical parameters with very high degree of accuracy(the correlation coefficient of cross validation$(R^2cv)$ range from $0.70{\sim}0.95$) in ODG. The optimum equations were selected on the basis of minimizing the standard error of prediction(SEP). The Optimum sample preparation methods and spectral math treatment were for ADF, the ODG method using 2,10,5 math treatment(SEP = 0.99, $R^2v=0.93$), and for CP, the ODG method using 1,4,4 math treatment(SEP = 0.29. $R^2v=0.91$).