• Title/Summary/Keyword: Data Sets

Search Result 3,769, Processing Time 0.029 seconds

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Sea Surface pCO2 and Its Variability in the Ulleung Basin, East Sea Constrained by a Neural Network Model (신경망 모델로 구성한 동해 울릉분지 표층 이산화탄소 분압과 변동성)

  • PARK, SOYEONA;LEE, TONGSUP;JO, YOUNG-HEON
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.21 no.1
    • /
    • pp.1-10
    • /
    • 2016
  • Currently available surface seawater partial pressure carbon dioxide ($pCO_2$) data sets in the East Sea are not enough to quantify statistically the carbon dioxide flux through the air-sea interface. To complement the scarcity of the $pCO_2$ measurements, we construct a neural network (NN) model based on satellite data to map $pCO_2$ for the areas, which were not observed. The NN model is constructed for the Ulleung Basin, where $pCO_2$ data are best available, to map and estimate the variability of $pCO_2$ based on in situ $pCO_2$ for the years from 2003 to 2012, and the sea surface temperature (SST) and chlorophyll data from the MODIS (Moderate-resolution Imaging Spectroradiometer) sensor of the Aqua satellite along with geographic information. The NN model was trained to achieve higher than 95% of a correlation between in situ and predicted $pCO_2$ values. The RMSE (root mean square error) of the NN model output was $19.2{\mu}atm$ and much less than the variability of in situ $pCO_2$. The variability of $pCO_2$ with respect to SST and chlorophyll shows a strong negative correlation with SST than chlorophyll. As SST decreases the variability of $pCO_2$ increases. When SST is lower than $15^{\circ}C$, $pCO_2$ variability is clearly affected by both SST and chlorophyll. In contrast when SST is higher than $15^{\circ}C$, the variability of $pCO_2$ is less sensitive to changes in SST and chlorophyll. The mean rate of the annual $pCO_2$ increase estimated by the NN model output in the Ulleung Basin is $0.8{\mu}atm\;yr^{-1}$ from 2003 to 2014. As NN model can successfully map $pCO_2$ data for the whole study area with a higher resolution and less RMSE compared to the previous studies, the NN model can be a potentially useful tool for the understanding of the carbon cycle in the East Sea, where accessibility is limited by the international affairs.

Generation of Pseudo Porosity Logs from Seismic Data Using a Polynomial Neural Network Method (다항식 신경망 기법을 이용한 탄성파 탐사 자료로부터의 유사공극률 검층자료 생성)

  • Choi, Jae-Won;Byun, Joong-Moo;Seol, Soon-Jee
    • Journal of the Korean earth science society
    • /
    • v.32 no.6
    • /
    • pp.665-673
    • /
    • 2011
  • In order to estimate the hydrocarbon reserves, the porosity of the reservoir must be determined. The porosity of the area without a well is generally calculated by extrapolating the porosity logs measured at wells. However, if not only well logs but also seismic data exist on the same site, the more accurate pseudo porosity log can be obtained through artificial neural network technique by extracting the relations between the seismic data and well logs at the site. In this study, we have developed a module which creates pseudo porosity logs by using the polynomial neural network method. In order to obtain more accurate pseudo porosity logs, we selected the seismic attributes which have high correlation values in the correlation analysis between the seismic attributes and the porosity logs. Through the training procedure between selected seismic attributes and well logs, our module produces the correlation weights which can be used to generate the pseudo porosity log in the well free area. To verify the reliability and the applicability of the developed module, we have applied the module to the field data acquired from F3 Block in the North Sea and compared the results to those from the probabilistic neural network method in a commercial program. We could confirm the reliability of our module because both results showed similar trend. Moreover, since the pseudo porosity logs from polynomial neural network method are closer to the true porosity logs at the wells than those from probabilistic method, we concluded that the polynomial neural network method is effective for the data sets with insufficient wells such as F3 Block in the North Sea.

Validation and Calibration of Semi-Quantitative Food Frequency Questionnaire - With Participants of the Korean Health and Genome Study - (반정량식품섭취빈도조사지의 타당성 검증 및 보정 - 지역사회 유전체 코호트 참여자를 대상으로 -)

  • Ahn, Youn-Jhin;Lee, Ji-Eun;Cho, Nam-Han;Shin, Chol;Park, Chan;Oh, Berm-Seok;Kimm, Ku-Chan
    • Korean Journal of Community Nutrition
    • /
    • v.9 no.2
    • /
    • pp.173-182
    • /
    • 2004
  • We carried out a validation-calibration study of the food frequency questionnaire (FFQ) that we had previously developed for a community-based cohort of the Korean Genome and Health Study of the Korea National Genome Research Institute. We have collected a total of 254 3-day diet records (DRs) from 400 subjects, 200 each randomly selected from the two study cohorts of Ansung and Ansan. FFQ was administered at the time of cohort recruitment in 2001, and DRs were collected during a two month period from January through February of 2002. The mean age was 52.2 years. Farming for men and housewife for women were the most common occupations. The majority of the subjects had undergone 6∼12 years of education. The general characteristics including demographic and other data were not different from the total cohort subjects. Absolute levels of consumed nutrients including total energy (energy), protein, fat, carbohydrate, calcium, phosphorus, sodium, potassium, iron, retinol, carotene, vitamin A, thiamin, riboflavin, niacin and vitamin C were compared. The average of energy intake was not significantly different between the data collected by the 2 methods. However, consumptions of protein and fat were higher in data of DRs, whereas that of carbohydrate was higher in FFQ data. Significant correlation of each nutrient consumption between the data sets was observed (p < 0.05) except in the case of iron, while the average correlation coefficient between them was 0.22 ranging from 0.33 for energy to 0.11 for iron. The results of cross classification by quantile for exact classification ranged from 25.2% (carotene) to 35.0% (phosphorus), and from 64.6% (vitamin A) to 76.4% (retinol) for adjacent classification. The proportion of completely opposite classification was 8.1% in average. Calibration slope was estimated by regression and calibration parameters ranged from 0.025 for carotene to 0.423 for niacin. We conclude that the FFQ we have developed is an appropriate tool for assessing the nutrient intakes as ranking exposures in epidemiology studies in view that amounts of consumed nutrients obtained by FFQ were similar to those collected by DRs, that correlations between consumed nutrients collected by these methods were significant, and that classification results were relatively fair. The correlation coefficients, however, were lower than expected, which may be mainly due to the survey season. In fact, any short-term dietary survey cannot accurately reflect the overall dietary intakes that change heavily depending on seasons. Further studies including the analysis of chemical indices would be helpful for the studies of causal relationship between the diet and disease.

Regeneration of a defective Railroad Surface for defect detection with Deep Convolution Neural Networks (Deep Convolution Neural Networks 이용하여 결함 검출을 위한 결함이 있는 철도선로표면 디지털영상 재 생성)

  • Kim, Hyeonho;Han, Seokmin
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.23-31
    • /
    • 2020
  • This study was carried out to generate various images of railroad surfaces with random defects as training data to be better at the detection of defects. Defects on the surface of railroads are caused by various factors such as friction between track binding devices and adjacent tracks and can cause accidents such as broken rails, so railroad maintenance for defects is necessary. Therefore, various researches on defect detection and inspection using image processing or machine learning on railway surface images have been conducted to automate railroad inspection and to reduce railroad maintenance costs. In general, the performance of the image processing analysis method and machine learning technology is affected by the quantity and quality of data. For this reason, some researches require specific devices or vehicles to acquire images of the track surface at regular intervals to obtain a database of various railway surface images. On the contrary, in this study, in order to reduce and improve the operating cost of image acquisition, we constructed the 'Defective Railroad Surface Regeneration Model' by applying the methods presented in the related studies of the Generative Adversarial Network (GAN). Thus, we aimed to detect defects on railroad surface even without a dedicated database. This constructed model is designed to learn to generate the railroad surface combining the different railroad surface textures and the original surface, considering the ground truth of the railroad defects. The generated images of the railroad surface were used as training data in defect detection network, which is based on Fully Convolutional Network (FCN). To validate its performance, we clustered and divided the railroad data into three subsets, one subset as original railroad texture images and the remaining two subsets as another railroad surface texture images. In the first experiment, we used only original texture images for training sets in the defect detection model. And in the second experiment, we trained the generated images that were generated by combining the original images with a few railroad textures of the other images. Each defect detection model was evaluated in terms of 'intersection of union(IoU)' and F1-score measures with ground truths. As a result, the scores increased by about 10~15% when the generated images were used, compared to the case that only the original images were used. This proves that it is possible to detect defects by using the existing data and a few different texture images, even for the railroad surface images in which dedicated training database is not constructed.

A Practical Method to Quantify Very Low Fluxes of Nitrous Oxide from a Rice Paddy (벼논에서 미량 아산화질소 플럭스의 정량을 위한 실용적 방법)

  • Okjung, Ju;Namgoo, Kang;Hoseup, Soh;Jung-Soo, Park
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.24 no.4
    • /
    • pp.285-294
    • /
    • 2022
  • In order to accurately calculate greenhouse gas emissions in the agricultural field, Korea has been developing national-specific emission factors through direct measurement of gas fluxes using the closed-chamber method. In the rice paddy, only national-specific emission factors for methane (CH4) have been developed. It is thus necessary to develop those for nitrous oxide (N2O) affected by the application of nitrogen fertilizer. However, since the concentration of N2O emission from rice cultivation is very low, the QA/QC methods such as method detection and practical quantification limits are important. In this study, N2O emission from a rice paddy was evaluated affected by the amount of nitrogen fertilizer, by taking into account both method detection and practical quantification limits for N2O concentration. The N2O emission from a rice paddy soils affected by the nitrogen fertilizer application was estimated in the following order. The method detection limit (MDL) of N2O concentration was calculated at 95% confidence level based on the pooled standard deviation of concentration data sets using a standard gas with 98 nmol mol-1 N2O 10 times for 3 days. The practical quantification limit (PQL) of the N2O concentration is estimated by multiplying 10 to the pooled standard deviation. For the N2O flux data measured during the rice cultivation period in 2021, the MDL and PQL of N2O concentration were 18 nmol mol-1 and 87 nmol mol-1, respectively. The measured values above the PQL were merely about 12% of the total data. The cumulative N2O emission estimated based on the MDL and PQL was higher than the cumulative emission without nitrogen fertilizer application. This research would contribute to improving the reliability in quantification of the N2O flux data for accurate estimates of greenhouse gas emissions and uncertainties.

A Study on the Medical Application and Personal Information Protection of Generative AI (생성형 AI의 의료적 활용과 개인정보보호)

  • Lee, Sookyoung
    • The Korean Society of Law and Medicine
    • /
    • v.24 no.4
    • /
    • pp.67-101
    • /
    • 2023
  • The utilization of generative AI in the medical field is also being rapidly researched. Access to vast data sets reduces the time and energy spent in selecting information. However, as the effort put into content creation decreases, there is a greater likelihood of associated issues arising. For example, with generative AI, users must discern the accuracy of results themselves, as these AIs learn from data within a set period and generate outcomes. While the answers may appear plausible, their sources are often unclear, making it challenging to determine their veracity. Additionally, the possibility of presenting results from a biased or distorted perspective cannot be discounted at present on ethical grounds. Despite these concerns, the field of generative AI is continually advancing, with an increasing number of users leveraging it in various sectors, including biomedical and life sciences. This raises important legal considerations regarding who bears responsibility and to what extent for any damages caused by these high-performance AI algorithms. A general overview of issues with generative AI includes those discussed above, but another perspective arises from its fundamental nature as a large-scale language model ('LLM') AI. There is a civil law concern regarding "the memorization of training data within artificial neural networks and its subsequent reproduction". Medical data, by nature, often reflects personal characteristics of patients, potentially leading to issues such as the regeneration of personal information. The extensive application of generative AI in scenarios beyond traditional AI brings forth the possibility of legal challenges that cannot be ignored. Upon examining the technical characteristics of generative AI and focusing on legal issues, especially concerning the protection of personal information, it's evident that current laws regarding personal information protection, particularly in the context of health and medical data utilization, are inadequate. These laws provide processes for anonymizing and de-identification, specific personal information but fall short when generative AI is applied as software in medical devices. To address the functionalities of generative AI in clinical software, a reevaluation and adjustment of existing laws for the protection of personal information are imperative.

The Study of Metrics development for Entrepreneurial Program Effectiveness (청소년 창업교육프로그램 효과성 측정지표 개발 연구)

  • Byun, Youngjo;Kim, Myung Seuk;Yang, Young Seok
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.9 no.4
    • /
    • pp.77-85
    • /
    • 2014
  • A goal of Bizcool entrepreneurship education targeting on the youth falls on letting understand the process of starts-up, enhance entrepreneurship will and their business creativities rather than training trivial starts-up skills such as writing business plan for successful starts-up. The effects of education enable Bizcoo students to recognize rightly the concept of starts-up training and lead to spread out demand for entrepreneurship education. The feedback check-up for how entrepreneurship education affects students getting through of it is necessary and possible to bring its' improvement alternatives. Despite of such highlight, not many measuring tools and indexes of evaluating an effectiveness of entrepreneurship education are developed and studied up until. This research suggests for the optimal indexes for them. In specific, this research 49 the first question sets of evaluating an effectiveness of entrepreneurship education classified 3 large categories and 11 following sub categories each of them such as entrepreneurship orientation, creativity, entrepreneurship preparing activities etc,. representing embedding education effects though entrepreneurship education. This research carry out the empirical survey research utilizing driven question sets against 5 different Bizcools sampling 287 students. The survey research delivers the final 3 large categories and 8 following sub categories(Innovativeness, risk-taking, problem-solving potent, cooperative decision-making potent, efficient behavior capacity, data collecting potent, career search, starts-up search and preparation), and 38 measuring indexes by search and confirming factor analysis. This research never drop the confidence test over each indexes and obtain the proper figures. Last but not least, this research confirm the gap between starts-up club members and non members as to an effectiveness of entrepreneurship education and 9 different indexes.

  • PDF

The Brassica rapa Tissue-specific EST Database (배추의 조직 특이적 발현유전자 데이터베이스)

  • Yu, Hee-Ju;Park, Sin-Gi;Oh, Mi-Jin;Hwang, Hyun-Ju;Kim, Nam-Shin;Chung, Hee;Sohn, Seong-Han;Park, Beom-Seok;Mun, Jeong-Hwan
    • Horticultural Science & Technology
    • /
    • v.29 no.6
    • /
    • pp.633-640
    • /
    • 2011
  • Brassica rapa is an A genome model species for Brassica crop genetics, genomics, and breeding. With the completion of sequencing the B. rapa genome, functional analysis of the genome is forthcoming issue. The expressed sequence tags are fundamental resources supporting annotation and functional analysis of the genome including identification of tissue-specific genes and promoters. As of July 2011, 147,217 ESTs from 39 cDNA libraries of B. rapa are reported in the public database. However, little information can be retrieved from the sequences due to lack of organized databases. To leverage the sequence information and to maximize the use of publicly-available EST collections, the Brassica rapa tissue-specific EST database (BrTED) is developed. BrTED includes sequence information of 23,962 unigenes assembled by StackPack program. The unigene set is used as a query unit for various analyses such as BLAST against TAIR gene model, functional annotation using MIPS and UniProt, gene ontology analysis, and prediction of tissue-specific unigene sets based on statistics test. The database is composed of two main units, EST sequence processing and information retrieving unit and tissue-specific expression profile analysis unit. Information and data in both units are tightly inter-connected to each other using a web based browsing system. RT-PCR evaluation of 29 selected unigene sets successfully amplified amplicons from the target tissues of B. rapa. BrTED provided here allows the user to identify and analyze the expression of genes of interest and aid efforts to interpret the B. rapa genome through functional genomics. In addition, it can be used as a public resource in providing reference information to study the genus Brassica and other closely related crop crucifer plants.

A Study on the Demand for Equipent Development in Nursing (간호기기 개발수요 조사연구)

  • Chang, Soon-Book;Kim, Eui-Sook;Whang, Ae-Ran;Kang, Kyu-Sook;Suh, Mi-Hae
    • The Korean Nurse
    • /
    • v.35 no.2
    • /
    • pp.71-91
    • /
    • 1996
  • The objectives of thes study were to identify the need for equipment development in nursing, and to determine the priorities for that development. The study was descriptive study done between March 2 and May 30, 1995, in which the subjects, including 421 patients, 223 family members, and 198 nurses from neurosurgery, orthopedic, rehabilitation medicine, internal medicine and intensive care units of nine general hospitals in Seoul, completed a questionnarie developed by the research team. The questionnaire consisted of 35 open and closed questions. Data was analyzed using frequencies and percentages. The results ware summarized as follows: 1) The average age of the nurses was 27.9 years, 48% of the patients were between 20 and 40 years of age, and 17% were over 60. The average lingth of experience for the nurse subjects was four years five months with 36.9%. having over five years experience. The most frequent diagnoses of patients were spinal disc(35.9%), internal medicine disease(26.0%), cerebral vascular accident(16.6%) and spinal cord injury(10%) 2) Many of the nurses(96.4%) reported deficiencies with existing equipment and 96.5% of the nurses, but only 79.8% of the patients, nurses' time. Further, 82.3% of the nurses and 75.8% of the patients felt that the development of new equipment would lead to a decrease in the cost of nursing care. 3) Nurses felt that the greatest areas of inconvenience were patient feeding(71.7%), hygiene(71.2%), caring for a patient confined to bed(70.7%), patient clothing(67.2%), mobility transfers(63.5%) and urinary elimination(52.0%). However, patients and family members listed the following as being the most inconvenient: urinary elimination(58.7%), Hygiene(50.5), feeding(48.4%), mobility transfers(47.1%) and bed care(45.2%). 4) Generally the nurses listed more inconveniences and patients and family members listed more demands for the development of equipment. These included utensils with large handles, and regulators for tube feedings; mattresses that provide for automatic position change and massage, which have patient controlled levers and a place for bed pan insertion; automatic lifts or transfer from bed to wheelchair; equipment to facilitate washing and oral hygiene as well as equipment that will allow patients with spinal cord injuries easy access to showers; a bed pan/urinal for women that is comfortable and effective from which urine can be measured and disposed of easily; disposable dressing sets and tracheostomy care sets and a convenient way of measuring changes in wound size; a safe delivery system for oxygen, a variety of mask sizes and better control of humidity, tracheal material than at present, as well as a communication system for patients with tracheostomies; clothing that will allow access to various parts of the body for treament or assessment without patients having to remove all of their clothing; and finally a system that will allow the patient to control lighting, telephones and pagers. Priority areas for equipment development reported by the nurses were, urinary elimination(58. 7%), hygiene(50.5%), feeding(48.4%), mobility transfers(47..1%), bowel elimination(40.8%). Those reported by the patients family members were feeding(71.7%), hygiene(70.0%), bedcare(70.7%), clothing(67.2%), mobility transfers(63.6%), urinary elimination(52.9%) and bowel elimination(50.5%) Altogether, nurses, patients and family members listed the following as priorities; clothing (178), bed care(144), urinary elimination(92), environment(81), hygiene(70). Further, a health professional forum listed urinary elimination, oxygen delivery, medication delivery, mobility transfers, bed care and hygiene in that order as priority areas. From this study it can be concluded that the first need is to develop equipment that will address the problems of urinary elimination. To do (l)This nurses who are interested in equipment development should organize an equipment development team to provide a forum for discussion and production of equipment for nursing.

  • PDF