• Title/Summary/Keyword: term co-occurrence

Search Result 53, Processing Time 0.024 seconds

Text Mining Driven Content Analysis of Social Perception on Schizophrenia Before and After the Revision of the Terminology (조현병과 정신분열병에 대한 뉴스 프레임 분석을 통해 본 사회적 인식의 변화)

  • Kim, Hyunji;Park, Seojeong;Song, Chaemin;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.4
    • /
    • pp.285-307
    • /
    • 2019
  • In 2011, the Korean Medical Association revised the name of schizophrenia to remove the social stigma for the sick. Although it has been about nine years since the revision of the terminology, no studies have quantitatively analyzed how much social awareness has changed. Thus, this study investigates the changes in social awareness of schizophrenia caused by the revision of the disease name by analyzing Naver news articles related to the disease. For text analysis, LDA topic modeling, TF-IDF, word co-occurrence, and sentiment analysis techniques were used. The results showed that social awareness of the disease was more negative after the revision of the terminology. In addition, social awareness of the former term among two terms used after the revision was more negative. In other words, the revision of the disease did not resolve the stigma.

A Method for Information Source Selection using Teasaurus for Distributed Information Retrieval

  • Goto, Shoji;Ozono, Tadachika;Shintani, Toramatsu
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.272-277
    • /
    • 2001
  • In this paper, we describe a new method for selecting information sources in a distributed environment. Recently, there has been much research on distributed information retrieval, that is information retrieval (IR) based on a multi-database model in which the existence of multiple sources is modeled explicitly. In distributed IR, a method is needed that would enable selecting appropriate sources for users\` queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate ate sources if a query contains polysemous words. In this paper, we describe an information-source selection method using two types of thesaurus. One is a thesaurus automatically constructed from documents in a source. The other is a hand-crafted general-purpose thesaurus(e.g. WordNet). Terms used in documents in a source differ from one another and the meanings of a term differ depending on th situation in which the term is used. The difference is a characteristic of the source. In our method, the meanings of a term are distinguished between by the relationship between the term and other terms, and the relationship appear in the co-occurrence-based thesaurus. In this paper, we describe an algorithm for evaluating a usefulness of a source for a query based on a thesaurus. For a practical application of our method, we have developed Papits, a multi-agent-based in formation sharing system. An experiment of selection shows that our method is effective for selecting appropriate sources.

  • PDF

Representative Keyword Extraction from Few Documents through Fuzzy Inference (퍼지 추론을 이용한 소수 문서의 대표 키워드 추출)

  • 노순억;김병만;허남철
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.117-120
    • /
    • 2001
  • In this work, we propose a new method of extracting and weighting representative keywords(RKs) from a few documents that might interest a user. In order to extract RKs, we first extract candidate terms and then choose a number of terms called initial representative keywords (IRKS) from them through fuzzy inference. Then, by expanding and reweighting IRKS using term co-occurrence similarity, the final RKs are obtained. Performance of our approach is heavily influenced by effectiveness of selection method of IRKS so that we choose fuzzy inference because it is more effective in handling the uncertainty inherent in selecting representative keywords of documents. The problem addressed in this paper can be viewed as the one of calculating center of document vectors. So, to show the usefulness of our approach, we compare with two famous methods - Rocchio and Widrow-Hoff - on a number of documents collections. The results show that our approach outperforms the other approaches.

  • PDF

Text Categorization using Topic Signature and Co-occurrence Features (Topic Signature와 동시 출현 단어 쌍을 이용한 문서 범주화)

  • Bae, Won-Sik;Han, Yo-Sub;Cha, Jeong-Won
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.262-267
    • /
    • 2008
  • 본 논문에서는 문서 내에서 동시에 출현하는 단어 쌍을 자질 추출 단위로 하는 문서 범주화 시스템에 대하여 기술한다. 자질 추출 단위를 단어 쌍으로 정의한 것은 문서에서 빈번하게 동시에 출현하는 단어들은 서로 연관관계가 높으며, 단어 하나보다는 연관관계가 높은 단어들의 쌍이 특정 범주의 문서에서만 나타날 확률이 높아지므로 문서 분류 능력을 높이는데 좋은 요인으로 작용할 수 있을 것이라는 가정 때문이다. 그리고 문서 요약 분야에서 제안된 Log-likelihood Ratio를 기반으로 하는 Topic Signature Term Extraction 방법을 사용하여 자질 추출을 하고, Naive Bayes 분류기를 이용하여 문서를 분류한다. 본 연구는 Reuters-21578 문서 집합을 이용한 성능평가에서 좋은 결과를 보였으며, 이는 앞으로의 연구에도 기여할 수 있을 것이라 기대한다.

  • PDF

Correlation of Primary Spontaneous Pneumothorax and Air Pollution in Adolescents

  • Gu, Byung Mo;Ko, Ho Hyun;Ra, Yong Joon;Lee, Hee Sung;Kim, Hyoung Soo;Lee, Hong Kyu
    • Journal of Chest Surgery
    • /
    • v.54 no.1
    • /
    • pp.53-58
    • /
    • 2021
  • Background: We aimed to investigate the characteristics of primary spontaneous pneumothorax (PSP) in adolescents and to analyze the relationship between the occurrence of PSP and air pollutants. Methods: Data pertaining to age, sex, body mass index, smoking status, initial pneumothorax volume, presence of bullae, treatment methods, and city of residence were retrospectively obtained from January 2010 to December 2014. We investigated the association between short-term exposure to air pollutants (SO2, NO2, O3, CO, and PM10) and the occurrence of PSP using a case-crossover design with conditional logistic regression. Results: We collected information from 598 patients who were admitted for PSP, with a mean follow-up duration of 62.9 months. The majority (91.1%) of the patients were male. In the case-crossover design, conditional logistic regression showed that no air pollutant was associated with the occurrence of pneumothorax. The results were consistent across all city subgroups (Anyang, Gunpo, Uiwang, and Gwacheon). Conclusion: In our study, the incidence rate of pneumothorax was 153.8 per 100,000 person-years in male adolescents and 16.7 per 100,000 person-years in female adolescents. The case-crossover design showed that PSP in adolescents is unlikely to be related to air pollution.

Variation of Calcium Carbonate Content and Dansgaard-Oeschger Events in the Continental Slope of the Central Bering Sea during the Last 65 Kyr (베링해 중부 대륙사면 지역의 지난 65,000년 동안 탄산염 함량 변화와 Dansgaard-Oeschger 사건들)

  • Kim, Sung-Han;Khim, Boo-Keun;Itaki, Takuya;Shin, Hye-Sun
    • Ocean and Polar Research
    • /
    • v.30 no.3
    • /
    • pp.215-224
    • /
    • 2008
  • A piston core (MR06-04 PC23A) collected from the northern continental slope in the central Bering Sea has recorded the high-resolution millennial-scale variation of calcium carbonate ($CaCO3$) content during the last 65 kyr. An estimation of the age of the core sediments was carried out by using the lithologic correlation of the deglacial laminated layers with a neighboring core (HLY02023JPC), complementing the last appearance datum of both Lychnocanoma nipponica sakaii (54 kyr) and Amphimelissa setosa (85 kyr). The probable age of core MR06-04 PC23A was approximately younger than 65 kyr. Two distinct events of a significant increase of $CaCO3$ in the deglacial laminated sediments clearly correspond to MWP1A and MWP1B in the Bering Sea (Gorbarenko et al. 2005) and to T1ANP and T1BNP in the North Pacific (Gorbarenko 1996). These pronounced peaks of $CaCO3$ contents result from the elevated carbonate production in the surface water and the subsequent weakened dilution due to terrestrial input, along with an enhanced oxygen minimum zone. The $CaCO3$ contents are low (${\sim}2%$) during the last glacial period mainly because of a low carbonate production caused by an expanded sea-ice cover and an increased dilution by terrigenous particles due to their closer distance to the continent during the sea-level low stand. The occurrence of seven distinct $CaCO3$ peaks in core MR06-04 PC23A is remarkable during MIS 3 and MIS 4, and they most likely correlate to the short-term millennial Dansgaard-Oeschger events.

Analysis of Consumer Awareness of Cycling Wear Using Web Mining (웹마이닝을 활용한 사이클웨어 소비자 인식 분석)

  • Kim, Chungjeong;Yi, Eunjou
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.5
    • /
    • pp.640-649
    • /
    • 2018
  • This study analyzed the consumer awareness of cycling wear using web mining, one of the big data analysis methods. For this, the texts of postings and comments related to cycling wear from 2006 to 2017 at Naver cafe, 'people who commute by bicycle' were collected and analyzed using R packages. A total of 15,321 documents were used for data analysis. The keywords of cycling wear were extracted using a Korean morphological analyzer (KoNLP) and converted to TDM (Term Document Matrix) and co-occurrence matrix to calculate the frequency of the keywords. The most frequent keyword in cycling wear was 'tights', including the opinion that they feel embarrassed because they are too tight. When they purchase cycling wear, they appeared to consider 'price', 'size', and 'brand'. Recently 'low price' and 'cost effectiveness' have become more frequent since 2016 than before, which indicates that consumers tend to prefer practical products. Moreover, the findings showed that it is necessary to improve not only the design and wearability, but also the material functionality, such as sweat-absorbance and quick drying, and the function of pad. These showed similar results to previous studies using a questionnaire. Therefore, it is expected to be used as an objective indicator that can be reflected in product development by real-time analysis of the opinions and requirements of consumers using web mining.

Automatic Product Feature Extraction for Efficient Analysis of Product Reviews Using Term Statistics (효율적인 상품평 분석을 위한 어휘 통계 정보 기반 평가 항목 추출 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.497-502
    • /
    • 2009
  • In this paper, we introduce an automatic product feature extracting system that improves the efficiency of product review analysis. Our system consists of 2 parts: a review collection and correction part and a product feature extraction part. The former part collects reviews from internet shopping malls and revises spoken style or ungrammatical sentences. In the latter part, product features that mean items that can be used as evaluation criteria like 'size' and 'style' for a skirt are automatically extracted by utilizing term statistics in reviews and web documents on the Internet. We choose nouns in reviews as candidates for product features, and calculate degree of association between candidate nouns and products by combining inner association degree and outer association degree. Inner association degree is calculated from noun frequency in reviews and outer association degree is calculated from co-occurrence frequency of a candidate noun and a product name in web documents. In evaluation results, our extraction method showed an average recall of 90%, which is better than the results of previous approaches.

Topic-Network based Topic Shift Detection on Twitter (트위터 데이터를 이용한 네트워크 기반 토픽 변화 추적 연구)

  • Jin, Seol A;Heo, Go Eun;Jeong, Yoo Kyung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.285-302
    • /
    • 2013
  • This study identified topic shifts and patterns over time by analyzing an enormous amount of Twitter data whose characteristics are high accessibility and briefness. First, we extracted keywords for a certain product and used them for representing the topic network allows for intuitive understanding of keywords associated with topics by nodes and edges by co-word analysis. We conducted temporal analysis of term co-occurrence as well as topic modeling to examine the results of network analysis. In addition, the results of comparing topic shifts on Twitter with the corresponding retrieval results from newspapers confirm that Twitter makes immediate responses to news media and spreads the negative issues out quickly. Our findings may suggest that companies utilize the proposed technique to identify public's negative opinions as quickly as possible and to apply for the timely decision making and effective responses to their customers.

Development of a smart rain gauge system for continuous and accurate observations of light and heavy rainfall

  • Han, Byungjoo;Oh, Yeontaek;Nguyen, Hoang Hai;Jung, Woosung;Shin, Daeyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.334-334
    • /
    • 2022
  • Improvement of old-fashioned rain gauge systems for automatic, timely, continuous, and accurate precipitation observation is highly essential for weather/climate prediction and natural hazards early warning, since the occurrence frequency and intensity of heavy and extreme precipitation events (especially floods) are recently getting more increase and severe worldwide due to climate change. Although rain gauge accuracy of 0.1 mm is recommended by the World Meteorological Organization (WMO), the traditional rain gauges in both weighting and tipping bucket types are often unable to meet that demand due to several existing technical limitations together with higher production and maintenance costs. Therefore, we aim to introduce a newly developed and cost-effective hybrid rain gauge system at 0.1 mm accuracy that combines advantages of weighting and tipping bucket types for continuous, automatic, and accurate precipitation observation, where the errors from long-term load cells and external environmental sources (e.g., winds) can be removed via an automatic drainage system and artificial intelligence-based data quality control procedure. Our rain gauge system consists of an instrument unit for measuring precipitation, a communication unit for transmitting and receiving measured precipitation signals, and a database unit for storing, processing, and analyzing precipitation data. This newly developed rain gauge was designed according to the weather instrument criteria, where precipitation amounts filled into the tipping bucket are measured considering the receiver's diameter, the maximum measurement of precipitation, drainage time, and the conductivity marking. Moreover, it is also designed to transmit the measured precipitation data stored in the PCB through RS232, RS485, and TCP/IP, together with connecting to the data logger to enable data collection and analysis based on user needs. Preliminary results from a comparison with an existing 1.0-mm tipping bucket rain gauge indicated that our developed rain gauge has an excellent performance in continuous precipitation observation with higher measurement accuracy, more correct precipitation days observed (120 days), and a lower error of roughly 27 mm occurred during the measurement period.

  • PDF