• Title/Summary/Keyword: Corpus analysis

Search Result 417, Processing Time 0.037 seconds

Age Determination by Tooth Wear and Histological Analysis of Seasonal Variation of Breeding in the Big White-Toothed Shrew, Crocidura lasiura (우수리땃쥐 Crocidura lasiura의 치아 마모에 의한 연령결정과 번식의 계절적 변이의 조직학적 분석)

  • Jeong, Soon-Jeong;Yoon, Myung-Hee;Choi, Jung-Mi;Kim, Hyun-Dae;Lim, Do-Seon;Park, Jin-Ju;Choi, Baik-Dong;Jeong, Moon-Jin
    • Applied Microscopy
    • /
    • v.40 no.1
    • /
    • pp.37-45
    • /
    • 2010
  • Captured wild specimens of the big white-toothed shrew, Crocidura lasiura were classified into three age classes by tooth wear and height of molars, and seasonal variations of breeding and reproductive organs were examined. Juveniles had not tooth wear in molars and height of the third molars were lower than the first and second molars, and had only non-breeding condition. Young adults had little tooth wear and the third molars reached to the first and second molars, and old adults had heavy tooth wear in molars, young adults and old adults had breeding or non-breeding condition according to the season. On the basis of histological examination, seasonal variations of breeding were confirmed that breeding condition of young and old adult males were continued from early February to early October although the breeding activity was the highest in April, that of females were continued from the end of March to October, males reached sexual maturity earlier than females. Whereas the breeding condition seems to cease for non-breeding season because of the deficiency of food resources, soil invertebrates. Young and old adult males of the breeding season had large testes with enlarged seminiferous tubules that were filled with numerous germ cells, and expanded caudal epididymides with a vast number of spermatozoa, and were more than 10.0 g in the body weight and 0.03 g in the testis and epididymis weight. The females of the breeding season were pregnant condition with 4~6 litters or had the Graafian follicles and the corpus lutea in the ovary, and were more than 9.6 g in the body weight.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Age Determination by Tooth Wear and Histological Analysis of Seasonal Variation of Breeding in the Lesser White-Toothed Shrew, Crocidura suaveolens (작은땃쥐 Crocidura suaveolens의 치아 마모에 의한 연령결정과 번식의 계절적 변이의 조직학적 분석)

  • Jeong, Soon-Jeong;Yoon, Myung-Hee;Kim, Sook-Hyang;Ham, Joo-Hyun;Lim, Do-Seon;Choi, Baik-Dong;Park, Jin-Ju;Jeong, Moon-Jin
    • Applied Microscopy
    • /
    • v.40 no.3
    • /
    • pp.125-132
    • /
    • 2010
  • Captured specimens of the lesser white-toothed shrew, Crocidura suaveolens were classified into three age classes by tooth wear and seasonal variations of reproductive organs were investigated. Molars of juveniles had not tooth wear and the height of the third molars were lower than the first and second molars, young adults had smooth tooth wear and the third molars reached to the first and second molars, and old adults had heavy tooth wear and the third molars also reached to the first and second molars. On the basis of histological examination, seasonal variation of breeding was confirmed that breeding season of adult males was from early February to early October, having a peak of the breeding in April and July, and non-breeding season was from in the middle of October to late January. Young and old adult males of the breeding season had large testes with enlarged seminiferous tubules filling with numerous germ cells and expanded caudal epididymides with a vast number of spermatozoa, Young and old adult males of the non-breeding season had the small testes with the extremely slender seminiferous tubules filling with only spermatogonia and the reduced caudal epididymides without spermatozoa. Males weighing more than 3.9 g in the body weight and 0.013 g in the testis and epididymis weight reached sexual maturation in breeding season, and the females weighing more than 3.8 g in body weight of the breeding season were pregnant condition having 5~6 litters or had the Graafian follicles and the corpus lutea in the ovary.

Effects of Characteristics of Ovarian follicular Fluid and Ant-Inhibin Serum on Steroid Hormone Secretion by Hanwoo Granulosa Cells In Vitro (한우 난소의 Follicular Fluid의 특징과 과립막 세포의 스테로이드호르몬 분비에 대한 Anti-Inhibin Serum의 첨가효과)

  • 성환후;민관식;양병철;노환국;최선호;임기순;장유민;박성재;장원경
    • Korean Journal of Animal Reproduction
    • /
    • v.25 no.2
    • /
    • pp.119-124
    • /
    • 2001
  • This study was performed to investigate the effects of the peptide to carrier ratio on the immune and biological functions to inhibin immunization in Hanwoo. A peptide sequence kom the alpha -subunit (19~32 peptide) of porcine inhibin was synthesized for antigen and conjugated to human serum albumin(HSA) for carrier protein. Anti-inhibin sera(AI) were produced 52 day later from rabbit after injection of inhibin-$\alpha$ -subunit peptide conjugator for antigen with the interval of 2 weeks. Immune-blotting analysis using antibody specific fur inhibin-$\alpha$ subunits revealed that the inhibin was detected at 1.0 cm bovine follicular fluid(bFF). However, each stage of corpus lutea and 0.1 cm of follicular fluid were not detected. The maximal contents of estradiol-17 $\beta$ in Hanwoo ovarian follicular fluid were detected at 2.0 cm of follicular size(diameter), but the mean total contents of these hormone decreased significantly with decreasing diameter of follicles. However, progesterone contents of follicular fluid were high at 1.0 cm of follicle. Progesterone secretion by Hanwoo granulosa cell cultured for 48 hr in vitro was significantly (p<0.05) inhibited in 5% bFF and 5% bFF + 5% AI addition group compared with control group. Estradiol-17 $\beta$ secretion by Hanwoo granulosa cell cultured for 48 hr in vitro was significantly (p<0.05) increased in 5% AI and 5% AI + 5% bFF addtion group compared with control group. However, the groups added 5% AI were not changed compared to control groups in progesterone and estradiol-17 $\beta$. Taken together, we suggested that inhibin in the mature FF plays a pivotal role on the biosynthesis of steroid hormone of follicular cells during follicular development.

  • PDF

The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective (의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰)

  • Choi, Youngseok;Park, Jinsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.111-123
    • /
    • 2013
  • Semantic similarity/relatedness measure between two concepts plays an important role in research on system integration and database integration. Moreover, current research on keyword recommendation or tag clustering strongly depends on this kind of semantic measure. For this reason, many researchers in various fields including computer science and computational linguistics have tried to improve methods to calculating semantic similarity/relatedness measure. This study of similarity between concepts is meant to discover how a computational process can model the action of a human to determine the relationship between two concepts. Most research on calculating semantic similarity usually uses ready-made reference knowledge such as semantic network and dictionary to measure concept similarity. The topological method is used to calculated relatedness or similarity between concepts based on various forms of a semantic network including a hierarchical taxonomy. This approach assumes that the semantic network reflects the human knowledge well. The nodes in a network represent concepts, and way to measure the conceptual similarity between two nodes are also regarded as ways to determine the conceptual similarity of two words(i.e,. two nodes in a network). Topological method can be categorized as node-based or edge-based, which are also called the information content approach and the conceptual distance approach, respectively. The node-based approach is used to calculate similarity between concepts based on how much information the two concepts share in terms of a semantic network or taxonomy while edge-based approach estimates the distance between the nodes that correspond to the concepts being compared. Both of two approaches have assumed that the semantic network is static. That means topological approach has not considered the change of semantic relation between concepts in semantic network. However, as information communication technologies make advantage in sharing knowledge among people, semantic relation between concepts in semantic network may change. To explain the change in semantic relation, we adopt the cognitive semantics. The basic assumption of cognitive semantics is that humans judge the semantic relation based on their cognition and understanding of concepts. This cognition and understanding is called 'World Knowledge.' World knowledge can be categorized as personal knowledge and cultural knowledge. Personal knowledge means the knowledge from personal experience. Everyone can have different Personal Knowledge of same concept. Cultural Knowledge is the knowledge shared by people who are living in the same culture or using the same language. People in the same culture have common understanding of specific concepts. Cultural knowledge can be the starting point of discussion about the change of semantic relation. If the culture shared by people changes for some reasons, the human's cultural knowledge may also change. Today's society and culture are changing at a past face, and the change of cultural knowledge is not negligible issues in the research on semantic relationship between concepts. In this paper, we propose the future directions of research on semantic similarity. In other words, we discuss that how the research on semantic similarity can reflect the change of semantic relation caused by the change of cultural knowledge. We suggest three direction of future research on semantic similarity. First, the research should include the versioning and update methodology for semantic network. Second, semantic network which is dynamically generated can be used for the calculation of semantic similarity between concepts. If the researcher can develop the methodology to extract the semantic network from given knowledge base in real time, this approach can solve many problems related to the change of semantic relation. Third, the statistical approach based on corpus analysis can be an alternative for the method using semantic network. We believe that these proposed research direction can be the milestone of the research on semantic relation.

The Effect of Domain Specificity on the Performance of Domain-Specific Pre-Trained Language Models (도메인 특수성이 도메인 특화 사전학습 언어모델의 성능에 미치는 영향)

  • Han, Minah;Kim, Younha;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.251-273
    • /
    • 2022
  • Recently, research on applying text analysis to deep learning has steadily continued. In particular, researches have been actively conducted to understand the meaning of words and perform tasks such as summarization and sentiment classification through a pre-trained language model that learns large datasets. However, existing pre-trained language models show limitations in that they do not understand specific domains well. Therefore, in recent years, the flow of research has shifted toward creating a language model specialized for a particular domain. Domain-specific pre-trained language models allow the model to understand the knowledge of a particular domain better and reveal performance improvements on various tasks in the field. However, domain-specific further pre-training is expensive to acquire corpus data of the target domain. Furthermore, many cases have reported that performance improvement after further pre-training is insignificant in some domains. As such, it is difficult to decide to develop a domain-specific pre-trained language model, while it is not clear whether the performance will be improved dramatically. In this paper, we present a way to proactively check the expected performance improvement by further pre-training in a domain before actually performing further pre-training. Specifically, after selecting three domains, we measured the increase in classification accuracy through further pre-training in each domain. We also developed and presented new indicators to estimate the specificity of the domain based on the normalized frequency of the keywords used in each domain. Finally, we conducted classification using a pre-trained language model and a domain-specific pre-trained language model of three domains. As a result, we confirmed that the higher the domain specificity index, the higher the performance improvement through further pre-training.

Content-based Recommendation Based on Social Network for Personalized News Services (개인화된 뉴스 서비스를 위한 소셜 네트워크 기반의 콘텐츠 추천기법)

  • Hong, Myung-Duk;Oh, Kyeong-Jin;Ga, Myung-Hyun;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.57-71
    • /
    • 2013
  • Over a billion people in the world generate new news minute by minute. People forecasts some news but most news are from unexpected events such as natural disasters, accidents, crimes. People spend much time to watch a huge amount of news delivered from many media because they want to understand what is happening now, to predict what might happen in the near future, and to share and discuss on the news. People make better daily decisions through watching and obtaining useful information from news they saw. However, it is difficult that people choose news suitable to them and obtain useful information from the news because there are so many news media such as portal sites, broadcasters, and most news articles consist of gossipy news and breaking news. User interest changes over time and many people have no interest in outdated news. From this fact, applying users' recent interest to personalized news service is also required in news service. It means that personalized news service should dynamically manage user profiles. In this paper, a content-based news recommendation system is proposed to provide the personalized news service. For a personalized service, user's personal information is requisitely required. Social network service is used to extract user information for personalization service. The proposed system constructs dynamic user profile based on recent user information of Facebook, which is one of social network services. User information contains personal information, recent articles, and Facebook Page information. Facebook Pages are used for businesses, organizations and brands to share their contents and connect with people. Facebook users can add Facebook Page to specify their interest in the Page. The proposed system uses this Page information to create user profile, and to match user preferences to news topics. However, some Pages are not directly matched to news topic because Page deals with individual objects and do not provide topic information suitable to news. Freebase, which is a large collaborative database of well-known people, places, things, is used to match Page to news topic by using hierarchy information of its objects. By using recent Page information and articles of Facebook users, the proposed systems can own dynamic user profile. The generated user profile is used to measure user preferences on news. To generate news profile, news category predefined by news media is used and keywords of news articles are extracted after analysis of news contents including title, category, and scripts. TF-IDF technique, which reflects how important a word is to a document in a corpus, is used to identify keywords of each news article. For user profile and news profile, same format is used to efficiently measure similarity between user preferences and news. The proposed system calculates all similarity values between user profiles and news profiles. Existing methods of similarity calculation in vector space model do not cover synonym, hypernym and hyponym because they only handle given words in vector space model. The proposed system applies WordNet to similarity calculation to overcome the limitation. Top-N news articles, which have high similarity value for a target user, are recommended to the user. To evaluate the proposed news recommendation system, user profiles are generated using Facebook account with participants consent, and we implement a Web crawler to extract news information from PBS, which is non-profit public broadcasting television network in the United States, and construct news profiles. We compare the performance of the proposed method with that of benchmark algorithms. One is a traditional method based on TF-IDF. Another is 6Sub-Vectors method that divides the points to get keywords into six parts. Experimental results demonstrate that the proposed system provide useful news to users by applying user's social network information and WordNet functions, in terms of prediction error of recommended news.