• Title/Summary/Keyword: Two-phase search method

Search Result 43, Processing Time 0.019 seconds

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Expression of EGFR in Non-small Cell Lung Cancer and its Effects on Survival (비소세포 폐암에서 EGFR의 발현률과 생존률에 미치는 영향)

  • Kim, Hak-Ryul;Jeong, Eun-Taik
    • Tuberculosis and Respiratory Diseases
    • /
    • v.44 no.6
    • /
    • pp.1285-1295
    • /
    • 1997
  • Background : EGFR is one of the initial step in signal transduction pathway about multistep carcinogenesis. It is homologous to oncogene erbB-2 and is the receptor for EGF and TGF alpha. EGFR has important role in the growth and differentiation of tumor cells. So, EGFR in non-small cell lung cancer was examined to search for possible evidence as clinical prognostic factor. Methods : To investigate the role of EGFR in lung cancer, the author performed immunohistochemical stain of EGFR on 57 resected primary non-small cell lung cancer specimens. And the author analyzed the correlation between EGFR expression, clinical parameters, Sand $G_1$ phase fraction and survival. Results : 1) EGFR were detected in 56% of total 57 patients (according to histologic type, squamous cancer 50%, adenocarcinoma 63%, large cell cancer 75%) (according to TNM stage, stage I 64%, stage II 38%, stage III 55%) (according to cellular differentiation, well 50%, moderately 52%, poorly 65%). All differences were insignificant 2) Using the flow cytometric analysis, mean S-phase fraction of EGFR (+) and (-) group were 22.3(${\pm}10.5$)%. 18.0(${\pm}10.9$)% (p>0.05), mean $G_1$-phase fraction of EGFR (+) and (-) group were 68.4(${\pm}11.6$)%, 71.1(${\pm}12.8$)%, (p>0.05) 3) Two-year survival rate of EGFR (+) and (-) group were 53%, 84%, median survival time of EGFR (+) and (-) group were 26, 53 months. (p<0.05, Kaplan-Meier, generalized Wilcox) Conclusion : EGFR immunostaining may be a simple and useful method for survival prediction in non-small cell lung cancer.

  • PDF

Au-Ag-Te Mineralization by Boiling and Dilution of Meteoric Ground-water in the Tongyeong Epithermal sold System, Korea: Implications from Reaction Path Modeling (광화유체의 비등과 희석에 의한 통영 천열수계 Au-Ag-Te 장화작용에 대한 반응경로 모델링)

  • Maeng-Eon Park;Kyu-Youl Sung
    • Economic and Environmental Geology
    • /
    • v.34 no.6
    • /
    • pp.507-522
    • /
    • 2001
  • At the Tongyeong mine, quartz, rhodochrosite (kutnahorite), muscovite, illite, pyrite, galena, chalcopyrite. sphalerite, acanthite, and hessite are the principal vein minerals. They were deposited under epithermal conditions in two stages. Ore mineral assemblages and associated gangue phases in stage can be clearly divided into two general associations: an early cycle (band) that appeared with introduction of most of the sulfides and electrum, and a later cycle in which base metal and carbonate-bearing assemblages (mostly rhodochrosite) became dominant. Tellurides and some electrum occur as small rounded grains within subhedral-to euhedral pyrite or anhedral galena in stageII. Sulfide mineralization is zoned from pyrite to galena and sphalerite. We have used computer modeling to simulate formation of four stages of vein genesis. The reaction of a single fluid with andesite host rock at 28$0^{\circ}C$, isobaric cooling of a single fluid from 26$0^{\circ}C$ to 12$0^{\circ}C$, and boiling and mixing of a fluid with both decreasing pressure and temperature were studied using the CHILLER program. Calculations show that the precipitation of alteration minerals is due to fluid-andesite interaction as temperature drops. Speciation calculations confirm that the hydrothermal fluids with moderately high salinities and pH 5.7 (acid), were capable of transporting significant quantities of base metals. The abundance of gold in fluid depends critically on the ratio of total base metals and iron to sulfide in the aqueous phase because gold is transported as an Au(HS)$_2$- complex, which is sensitive to sulfide activity. Modeling results for Tongyeong mineralization show strong influence of shallow hydrogenic processes such as boiling and fluid mixing. The variable handing in stageII mineralization is best explained by maltiple boilings of hydrothermal fluid followed by lateral mixing of the fluid with overlying diluted, steam-heated ground water. The degree of similarity of calculated mineral assemblages and observed electrum composition and field relationships shows the utility of the numerical simulation method in identifying chemical processes that accompany boiling and mixing in Te-bearing Au-Ag system. This has been applied in models to narrow the search area for epithermal ores.

  • PDF