• Title/Summary/Keyword: Sparseness

Search Result 77, Processing Time 0.024 seconds

Korean Semantic Role Labeling Using Domain Adaptation Technique (도메인 적응 기술을 이용한 한국어 의미역 인식)

  • Lim, Soojong;Bae, Yongjin;Kim, Hyunki;Ra, Dongyul
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.475-482
    • /
    • 2015
  • Developing a high-performance Semantic Role Labeling (SRL) system for a domain requires manually annotated training data of large size in the same domain. However, such SRL training data of sufficient size is available only for a few domains. Performances of Korean SRL are degraded by almost 15% or more, when it is directly applied to another domain with relatively small training data. This paper proposes two techniques to minimize performance degradation in the domain transfer. First, a domain adaptation algorithm for Korean SRL is proposed which is based on the prior model that is one of domain adaptation paradigms. Secondly, we proposed to use simplified features related to morphological and syntactic tags, when using small-sized target domain data to suppress the problem of data sparseness. Other domain adaptation techniques were experimentally compared to our techniques in this paper, where news and Wikipedia were used as the sources and target domains, respectively. It was observed that the highest performance is achieved when our two techniques were applied together. In our system's performance, F1 score of 64.3% was considered to be 2.4~3.1% higher than the methods from other research.

EGFR Analysis in Cytologic Samples of Lung Adenocarcinoma by Microdissection (미세 절제에 의한 폐 선암 세포 검체에서 EGFR 분석)

  • Han, Jeong Yeon;Lee, Hoon Taek;Oh, Seo Young
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.47 no.3
    • /
    • pp.125-131
    • /
    • 2015
  • The discovery of activating mutations in EGFR in a subset of lung adenocarcinomas was a major advance in our understanding of lung adenocarcinoma biology, and has led to groundbreaking studies that have demonstrated the efficacy of tyrosine kinase inhibitor therapy. Cytologic specimen procedures have become increasingly popular for obtaining diagnostic material in lung carcinomas. However, frequently the small amount of material or sparseness of tumor cells obtained from cytologic preparations limit the number of specialized studies, such as mutation analysis, that can be performed. In this study we used microdissection to isolate small numbers of tumor cells to assess for EGFR mutations from 76 cytological smear slides of patients with lung adenocarcinomas. We compared our results with previous molecular assays that had been performed on either surgical or cytology specimens as part of the patient's initial clinical work-up. Not only were we able to detect the identical EGFR mutation through the pyrosequencing, but we were also able to consistently detect the mutation from as few as 25 microdissected tumor cells. Furthermore, isolating a purer population of tumor cells resulted in increased sensitivity of mutation detection as we were able to detect mutations from microdissection-enriched cases. Therefore, microdissection can not only significantly increase the number of lung adenocarcinoma patients that can be screened for EGFR mutations, but can also facilitate the use of cytologic samples in the newly emerging field of molecular-based personalized therapies.

A Homonym Disambiguation System based on Semantic Information Extracted from Dictionary Definitions (사전의 뜻풀이말에서 추출한 의미정보에 기반한 동형이의어 중의성 해결 시스템)

  • Hur, Jeong;Ock, Cheol-Young
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.9
    • /
    • pp.688-698
    • /
    • 2001
  • A homonym could be disambiguated by anther words in the context such as nouns, predicates used with the homonym. This paper proposes a homonym disambiguation system based on statistical semantic information which is extracted from definitions in dictionary. The semantic information consists of nouns and predicates that are used with the homonym in definitions. In order to extract accurate semantic information, definitions are used with the homonym in definitions. In order to extract accurate semantic information, definitions are classified into two types. One has hyponym-hypernym relation between title word and head word (homonym) in definition. The hyponym-hypernym relation is one level semantic hierarchy and can be extended to deeper levels in order to overcome the problem of data sparseness. The other is the case that the homonym is used in the middle of definition. The system considers nouns and predicates simultaneously to disambiguate the homonym. Nine homonyms are examined in order to determine the weight of nouns and predicates which affect accrutacy of homonym disambiguation. From experiments using training corpus(definitions in dictionary), the average accruracy of homonym disamguation is 96.11% when the weight is 0.9 and 0.1 for noun and verb respectively. And another experiment to meaure the generality of the homonym disambiguation system results in the 80.73% average accuracy to 1,796 untraining sentences from Korean Information Base I and ETRI corpus.

  • PDF

A Study of Recommendation Systems for Supporting Command and Control (C2) Workflow (지휘통제 워크플로우 지원 추천 시스템 연구)

  • Park, Gyudong;Jeon, Gi-Yoon;Sohn, Mye;Kim, Jongmo
    • Journal of Internet Computing and Services
    • /
    • v.23 no.1
    • /
    • pp.125-134
    • /
    • 2022
  • The development of information communication and artificial intelligence technology requires the intelligent command and control (C2) system for Korean military, and various studies are attempted to achieve it. In particular, as a volume ofinformation in the C2 workflow increases exponentially, this study pays attention to the collaborative filtering (CF) and recommendation systems (RS) that can provide the essential information for the users of the C2 system has been developed. The RS performing information filtering in the C2 system should provide an explanatory recommendation and consider the context of the tasks and users. In this paper, we propose a contextual pre-filtering CARS framework that recommends information in the C2 workflow. The proposed framework consists of four components: 1) contextual pre-filtering that filters data in advance based on the context and relationship of the users, 2) feature selection to overcome the data sparseness that is a weak point for the CF, 3) the proposed CF with the features distances between the users used to calculate user similarity, and 4) rule-based post filtering to reflect user preferences. In order to evaluate the superiority of this study, various distance methods of the existing CF method were compared to the proposed framework with two experimental datasets in real-world. As a result of comparative experiments, it was shown that the proposed framework was superior in terms of MAE, MSE, and MSLE.

Construction of X-band automatic radar scatterometer measurement system and monitoring of rice growth (X-밴드 레이더 산란계 자동 측정시스템 구축과 벼 생육 모니터링)

  • Kim, Yi-Hyun;Hong, Suk-Young;Lee, Hoon-Yol
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.43 no.3
    • /
    • pp.374-383
    • /
    • 2010
  • Microwave radar can penetrate cloud cover regardless of weather conditions and can be used day and night. Especially a ground-based polarimetric scatterometer has advantages of monitoring crop conditions continuously with full polarization and different frequencies. Kim et al. (2009) have measured backscattering coefficients of paddy rice using L-, C-, X-band scatterometer system with full polarization and various angles during the rice growth period and have revealed the necessity of near-continuous automatic measurement to eliminate the difficulties, inaccuracy and sparseness of data acquisitions arising from manual operation of the system. In this study, we constructed an X-band automatic scatterometer system, analyzed scattering characteristics of paddy rice from X-band scatterometer data and estimated rice growth parameter using backscattering coefficients in X-band. The system was installed inside a shelter in an experimental paddy field at the National Academy of Agricultural Science (NAAS) before rice transplanting. The scatterometer system consists of X-band antennas, HP8720D vector network analyzer, RF cables and personal computer that controls frequency, polarization and data storage. This system using automatically measures fully-polarimetric backscattering coefficients of rice crop every 10 minutes. The backscattering coefficients were calculated from the measured data at a fixed incidence angle of $45^{\circ}$ and with full polarization (HH, VV, HV, VH) by applying the radar equation and compared with rice growth data such as plant height, stem number, fresh dry weight and Leaf Area Index (LAI) that were collected at the same time of each rice growth parameter. We examined the temporal behaviour of the backscattering coefficients of the rice crop at X-band during rice growth period. The HH-, VV-polarization backscattering coefficients steadily increased toward panicle initiation stage, thereafter decreased and again increased in early-September. We analyzed the relationships between backscattering coefficients in X-band and plant parameters and predicted the rice growth parameters using backscattering coefficients. It was confirmed that X-band is sensitive to grain maturity at near harvesting season.

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

A Case of Kabuki Syndrome Confirmed by Genetic Analysis: A Novel Frameshift Mutation in the KMT2D Gene (분자유전학적으로 진단된 가부키 증후군 1례)

  • Park, Su Jin;Ahn, Moon Bae;Jang, Woori;Cho, Won Kyung;Chae, Hyo Jin;Kim, Myung Shin;Suh, Byung Kyu
    • Journal of The Korean Society of Inherited Metabolic disease
    • /
    • v.17 no.3
    • /
    • pp.103-108
    • /
    • 2017
  • Kabuki syndrome is a rare congenital disorder that causes multiple birth defects and mental retardation. Mutation of the lysine methyltransferase 2D (KMT2D) gene is the primary cause of Kabuki syndrome. We report a 4-year-old Korean girl diagnosed with Kabuki syndrome based on distinctive facial features (eversion of the lower lateral eyelid, arched eyebrows, depressed nasal tip, prominent ears), skeletal anomalies, short stature, and molecular analysis, which revealed a novel frameshift mutation in the KMT2D gene. A 4-year-old patient had a past history of congenital cardiac malformations (coarctation of the aorta, ventricular septal defect, atrial septal defect, patent ductus arteriosus), subclinical hypothyroidism and dysmorphic features at birth including webbed neck, short fingers, high arched palate, micrognathia and horseshoe kidney. She showed unique facial features such as a long palpebral fissure, long eyelashes, arched eyebrows with sparseness of the lateral third, broad nasal root, anteverted ears, and small mouth. Her facial features suggested Kabuki syndrome, and genetic analysis discovered a novel heterozygous frameshift mutation (c.4379dup, p.Leu1461Thrfs*30) in exon 15 of the KMT2D gene. The diagnosis of our 4-year-old patient was made through thorough physical examination and history taking, and genetic testing. It is challenging to diagnose patients with Kabuki syndrome at birth, since the characteristic facial features are expressed gradually during growth. Clinical suspicion aroused by regular follow-ups may lead to earlier diagnosis and interventions.

  • PDF