• Title/Summary/Keyword: Corpus analysis

Search Result 422, Processing Time 0.023 seconds

Structural Disambiguation using Mutual Information and the Measure of Confidence (상호 정보를 이용한 구조적 모호성 해소와 결과에 대한 확신도 측정)

  • 심광섭
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.153-176
    • /
    • 1993
  • Structual ambiguity is one of those problem that arise in the analysis of natural language sentences.It has been considered very difficult to solve the problem.Structural ambiguity,however,should be resolved no matter how difficult it may be.Otherwise natural language processing could be virtually impossible.A statistical approach to structural disambiguation is proposed in this dissertation.The information-theoretic concept of mutual information has been empolyed in resolving structural ambiguity Mutual information can be acquired in an automatic way.from text corpora. If a structural disambiguation subsystem had the capability of self-evaluating whether the results of structural disambiguation are correct or not.it would be possible to develop a more intelligent natural language proessing system.In this paper,the concept of confidence measure is also proposed to endow the disambiguation subsystem with such intelligence.Confidence measure is a numeric value calculated after structural disambiguation. Some experiments were performed in order to show the validity of the approach.Mutual information was auto matically acquired from a corpus of 1.6milion words that were collected from scientific abstracts.The accuracy of structural disambiguation was 80%when performed over 1,639 test sentences.Notice that there was no manual tuning in advance for the experiments.The task of detecting and correcting errors in structural disambiguation will be performed very effectively if the concept of confidence measure is employed in the process.

Effect of Ovarian Changes according to Four Season for Reproduction of Jeju Crossbred Horses (Jeju crossbred에서 계절에 따른 난소주기 변화 연구)

  • Yu, Yeong-Ju;Park, Seol-Hwa;Shin, Sang-Min;Yang, Byoung-Chul;Seong, Pil-Nam;Woo, Jae-Hoon;Kim, Nam-Young;Son, Jun-Kyu
    • Journal of Embryo Transfer
    • /
    • v.32 no.3
    • /
    • pp.177-182
    • /
    • 2017
  • This study was conducted to investigate the ovarian cycle changes of the mare according to the season. Twenty four Jeju crossbred horses(Thoroughbred ${\times}$ Jeju horse) raised in Subtropical Livestock Research Institute, National Institute of Animal Science, RDA were used to identify follicles and corpus luteum with ultrasonography once a week(May 2016~June 2017). Blood samples of experimental horses were collected twice a week for analysis of P4 hormone levels. The mares were considered to have resumed ovarian cyclicity on the day of ovulation if they followed by regular ovarian cycles. Only 13 cases(61.9%) of the total 21cases showed normal ovarian cycle, and 8 cases (38.1%) showed delayed ovarian cycle. Three cases(16.7%) in October, 5 cases(27.8%) in November and 5 cases(27.8%) in December(27.8%) ceased the heat and the remaining 5 cases(27.8%) showed that the estrus was maintained in winter. Horses that stopped estrus ceased the heat until March of next year, and 27.8% were continued the heat during non-breeding season. Eleven cases(61.1%) of 18 cases in April and 2 cases(11.1%) of 18 cases in May returned the estrus.

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

A Study on Building Knowledge Base for Intelligent Battlefield Awareness Service

  • Jo, Se-Hyeon;Kim, Hack-Jun;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose a method to build a knowledge base based on natural language processing for intelligent battlefield awareness service. The current command and control system manages and utilizes the collected battlefield information and tactical data at a basic level such as registration, storage, and sharing, and information fusion and situation analysis by an analyst is performed. This is an analyst's temporal constraints and cognitive limitations, and generally only one interpretation is drawn, and biased thinking can be reflected. Therefore, it is essential to aware the battlefield situation of the command and control system and to establish the intellignet decision support system. To do this, it is necessary to build a knowledge base specialized in the command and control system and develop intelligent battlefield awareness services based on it. In this paper, among the entity names suggested in the exobrain corpus, which is the private data, the top 250 types of meaningful names were applied and the weapon system entity type was additionally identified to properly represent battlefield information. Based on this, we proposed a way to build a battlefield-aware knowledge base through mention extraction, cross-reference resolution, and relationship extraction.

Research Analysis in Automatic Fake News Detection (자동화기반의 가짜 뉴스 탐지를 위한 연구 분석)

  • Jwa, Hee-Jung;Oh, Dong-Suk;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.7
    • /
    • pp.15-21
    • /
    • 2019
  • Research in detecting fake information gained a lot of interest after the US presidential election in 2016. Information from unknown sources are produced in the shape of news, and its rapid spread is fueled by the interest of public drawn to stimulating and interesting issues. In addition, the wide use of mass communication platforms such as social network services makes this phenomenon worse. Poynter Institute created the International Fact Checking Network (IFCN) to provide guidelines for judging the facts of skilled professionals and releasing "Code of Ethics" for fact check agencies. However, this type of approach is costly because of the large number of experts required to test authenticity of each article. Therefore, research in automated fake news detection technology that can efficiently identify it is gaining more attention. In this paper, we investigate fake news detection systems and researches that are rapidly developing, mainly thanks to recent advances in deep learning technology. In addition, we also organize shared tasks and training corpus that are released in various forms, so that researchers can easily participate in this field, which deserves a lot of research effort.

Exploiting Chunking for Dependency Parsing in Korean (한국어에서 의존 구문분석을 위한 구묶음의 활용)

  • Namgoong, Young;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.7
    • /
    • pp.291-298
    • /
    • 2022
  • In this paper, we present a method for dependency parsing with chunking in Korean. Dependency parsing is a task of determining a governor of every word in a sentence. In general, we used to determine the syntactic governor in Korean and should transform the syntactic structure into semantic structure for further processing like semantic analysis in natural language processing. There is a notorious problem to determine whether syntactic or semantic governor. For example, the syntactic governor of the word "먹고 (eat)" in the sentence "밥을 먹고 싶다 (would like to eat)" is "싶다 (would like to)", which is an auxiliary verb and therefore can not be a semantic governor. In order to mitigate this somewhat, we propose a Korean dependency parsing after chunking, which is a process of segmenting a sentence into constituents. A constituent is a word or a group of words that function as a single unit within a dependency structure and is called a chunk in this paper. Compared to traditional dependency parsing, there are some advantage of the proposed method: (1) The number of input units in parsing can be reduced and then the parsing speed could be faster. (2) The effectiveness of parsing can be improved by considering the relation between two head words in chunks. Through experiments for Sejong dependency corpus, we have shown that the USA and LAS of the proposed method are 86.48% and 84.56%, respectively and the number of input units is reduced by about 22%p.

Efficiency of Equilume light mask on the resumption of early estrous cyclicity and ovulation in Thoroughbred mares

  • Kim, Seongmin;Jung, Heejun;Murphy, Barbara Anne;Yoon, Minjung
    • Journal of Animal Science and Technology
    • /
    • v.64 no.1
    • /
    • pp.1-9
    • /
    • 2022
  • Equilume light masks had no impact on hastening the resumption of estrous cyclicity in mares maintained in outdoor pastures on the mainland of Korea due to the cold weather conditions. Jeju Island is a major horse-breeding site in Korea and is warmer than the mainland during the winter season. Therefore, the primary objective of this study was to explore the efficiency of the Equilume light mask on the resumption of seasonal estrous cycles in Thoroughbred mares on Jeju Island. A total of 20 nonpregnant mares were randomly divided into the Equilume light mask (n = 9) and stable lighting (n =11) groups. The experiment was performed at seven different horse-breeding farms located on Jeju Island from November 15, 2020, to February 15, 2021. The mares were exposed to the respective lights from 16:00 to 23:00. Follicle size and uterine edema were measured by ultrasound scanning. Body condition scores (BCS) were also monitored during the experiment. Statistical analysis was conducted using the SAS and SPSS software, and p-values of < 0.05 were considered statistically significant. Two of the nine (22.2%) mares in the Equilume light mask group and three of the 11 (27.28%) mares in the stable lighting group were still cycling in December and January, which were considered as all-year-round cycling mares. On February 15, there was no difference between groups in the resumption of early seasonal estrus cycle, which was determined by follicles > 25 mm in addition to uterine edema. All mares in the Equilume light mask group and five of the eight mares (62.5%) in the stable lighting group had resumed cycling. Interestingly, six of the seven mares (87.5%) in the Equilume light mask and four of eight mares (50%) in the stable lighting group had already ovulated on February 15 (p > 0.05), as determined by the presence of a recent corpus luteum. No difference was observed in BCS and uterine edema between groups (p > 0.05). In conclusion, the Equilume light mask can be an effective approach to induce early seasonal estrus cycles of mares in Jeju Island, and it also enhances the efficiency of farm management by reducing labor.

An Exploratory Study on ChatGPT's Performance to Answer to Police-related Traffic Laws: Using the Driver's License Test and the Road Traffic Accident Appraiser (ChatGPT의 경찰 관련 교통법규 응답 능력에 대한 탐색적 연구 - 운전면허 학과시험과 도로교통사고감정사 1차 시험을 대상으로 -)

  • Sang-yub Lee
    • Journal of Digital Policy
    • /
    • v.2 no.4
    • /
    • pp.1-10
    • /
    • 2023
  • This study conducted preliminary study to identify effective ways to use ChatGPT in traffic policing by analyzing ChatGPT's responses to the driver's license test and the road traffic accident appraiser test. I collected ChatGPT responses for the driver's license test item pool and the road traffic accident appraiser test using the OpenAI API with Python code for 30 iterative experiments, and analyzed the percentage of correct answers by test, year, section, and consistency. First, the average correct answer rate for the driver's license test and the for road traffic accident appraisers test was 44.60% and 35.45%, respectively, which was lower than the pass criteria, and the correct answer rate after 2022 was lower than the average correct answer rate. Second, the percentage of correct answers by section ranged from 29.69% to 56.80%, showing a significant difference. Third, it consistently produced the same response more than 95% of the time when the answer was correct. To effectively utilize ChatGPT, it is necessary to have user expertise, evaluation data and analysis methods, design a quality traffic law corpus and periodic learning.

Pattern Clustering of Symmetric Regional Cerebral Edema on Brain MRI in Patients with Hepatic Encephalopathy (간성뇌증 환자의 뇌 자기공명영상에서 대칭적인 지역 뇌부종 양상의 군집화)

  • Chun Geun Lim;Hui Joong Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.85 no.2
    • /
    • pp.381-393
    • /
    • 2024
  • Purpose Metabolic abnormalities in hepatic encephalopathy (HE) cause brain edema or demyelinating disease, resulting in symmetric regional cerebral edema (SRCE) on MRI. This study aimed to investigate the usefulness of the clustering analysis of SRCE in predicting the development of brain failure. Materials and Methods MR findings and clinical data of 98 consecutive patients with HE were retrospectively analyzed. The correlation between the 12 regions of SRCE was calculated using the phi (φ) coefficient, and the pattern was classified using hierarchical clustering using the φ2 distance measure and Ward's method. The classified patterns of SRCE were correlated with clinical parameters such as the model for end-stage liver disease (MELD) score and HE grade. Results Significant associations were found between 22 pairs of regions of interest, including the red nucleus and corpus callosum (φ = 0.81, p < 0.001), crus cerebri and red nucleus (φ = 0.72, p < 0.001), and red nucleus and dentate nucleus (φ = 0.66, p < 0.001). After hierarchical clustering, 24 cases were classified into Group I, 35 into Group II, and 39 into Group III. Group III had a higher MELD score (p = 0.04) and HE grade (p = 0.002) than Group I. Conclusion Our study demonstrates that the SRCE patterns can be useful in predicting hepatic preservation and the occurrence of cerebral failure in HE.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.