• Title/Summary/Keyword: One-Dimensional Search

Search Result 145, Processing Time 0.02 seconds

Dipole-Dipole Array Geoelectric Survey for Gracture Zone Detection (전기비저항 탐사법을 이용한 지하 천부 파쇄대 조사)

  • Kim, Geon Yeong;Lee, Jeong Mo;Jang, Tae U
    • Journal of the Korean Geophysical Society
    • /
    • v.2 no.3
    • /
    • pp.217-224
    • /
    • 1999
  • Although faults can be found by geological surveys, the surface traces of faults are not easily discovered by traditional geological surveys due to alluvia. In and around faults and fracture zones, the electrical resistivity appears to be lower than that of the surroundings due to the content of groundwater and clay minerals. Therefore, electrical resistivity surveys are effective to search buried faults and fracture zones. The dipole-dipole array electrical resistivity surveys, which could show the two dimensional subsurface electrical resistivity structure, were carried out in two areas, Yongdang-ri, Woongsang-eup, Yangsan-si, Kyungsangnam-do and Malbang-ri, Woedong-eup, Kyungju-si, Kyungsangpook-do. The one was next to the Dongrae Fault and the other near the Ulsan Fault was close to the region in which debatable quaternary fault traces had been found recently. From each measured data set, the electrical resistivity cross-section was obtained using the inversion program the reliability of which was analyzed using analytic solutions. A low resistivity zone was found in the inverted cross-section from the Yongdang-ri area survey data, and two low resistivity zones were found in that from the Malbang-ri area survey data. They were almost vertical and were 15∼20 m wide. Accounting the shape and the very low resistivity values of those zones (<100 Ωm)in the inverted section, they were interpreted as fracture zones although they should be proven by trenching. The reliability of the interpretation might be improved by adding some more parallel resistivity survey lines and interpreting the results in 3 and/or adding other geophysical survey.

  • PDF

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

The Impact of Market Environments on Optimal Channel Strategy Involving an Internet Channel: A Game Theoretic Approach (시장 환경이 인터넷 경로를 포함한 다중 경로 관리에 미치는 영향에 관한 연구: 게임 이론적 접근방법)

  • Yoo, Weon-Sang
    • Journal of Distribution Research
    • /
    • v.16 no.2
    • /
    • pp.119-138
    • /
    • 2011
  • Internet commerce has been growing at a rapid pace for the last decade. Many firms try to reach wider consumer markets by adding the Internet channel to the existing traditional channels. Despite the various benefits of the Internet channel, a significant number of firms failed in managing the new type of channel. Previous studies could not cleary explain these conflicting results associated with the Internet channel. One of the major reasons is most of the previous studies conducted analyses under a specific market condition and claimed that as the impact of Internet channel introduction. Therefore, their results are strongly influenced by the specific market settings. However, firms face various market conditions in the real worlddensity and disutility of using the Internet. The purpose of this study is to investigate the impact of various market environments on a firm's optimal channel strategy by employing a flexible game theory model. We capture various market conditions with consumer density and disutility of using the Internet.

    shows the channel structures analyzed in this study. Before the Internet channel is introduced, a monopoly manufacturer sells its products through an independent physical store. From this structure, the manufacturer could introduce its own Internet channel (MI). The independent physical store could also introduce its own Internet channel and coordinate it with the existing physical store (RI). An independent Internet retailer such as Amazon could enter this market (II). In this case, two types of independent retailers compete with each other. In this model, consumers are uniformly distributed on the two dimensional space. Consumer heterogeneity is captured by a consumer's geographical location (ci) and his disutility of using the Internet channel (${\delta}_{N_i}$).
    shows various market conditions captured by the two consumer heterogeneities.
    (a) illustrates a market with symmetric consumer distributions. The model captures explicitly the asymmetric distributions of consumer disutility in a market as well. In a market like that is represented in
    (c), the average consumer disutility of using an Internet store is relatively smaller than that of using a physical store. For example, this case represents the market in which 1) the product is suitable for Internet transactions (e.g., books) or 2) the level of E-Commerce readiness is high such as in Denmark or Finland. On the other hand, the average consumer disutility when using an Internet store is relatively greater than that of using a physical store in a market like (b). Countries like Ukraine and Bulgaria, or the market for "experience goods" such as shoes, could be examples of this market condition. summarizes the various scenarios of consumer distributions analyzed in this study. The range for disutility of using the Internet (${\delta}_{N_i}$) is held constant, while the range of consumer distribution (${\chi}_i$) varies from -25 to 25, from -50 to 50, from -100 to 100, from -150 to 150, and from -200 to 200.
    summarizes the analysis results. As the average travel cost in a market decreases while the average disutility of Internet use remains the same, average retail price, total quantity sold, physical store profit, monopoly manufacturer profit, and thus, total channel profit increase. On the other hand, the quantity sold through the Internet and the profit of the Internet store decrease with a decreasing average travel cost relative to the average disutility of Internet use. We find that a channel that has an advantage over the other kind of channel serves a larger portion of the market. In a market with a high average travel cost, in which the Internet store has a relative advantage over the physical store, for example, the Internet store becomes a mass-retailer serving a larger portion of the market. This result implies that the Internet becomes a more significant distribution channel in those markets characterized by greater geographical dispersion of buyers, or as consumers become more proficient in Internet usage. The results indicate that the degree of price discrimination also varies depending on the distribution of consumer disutility in a market. The manufacturer in a market in which the average travel cost is higher than the average disutility of using the Internet has a stronger incentive for price discrimination than the manufacturer in a market where the average travel cost is relatively lower. We also find that the manufacturer has a stronger incentive to maintain a high price level when the average travel cost in a market is relatively low. Additionally, the retail competition effect due to Internet channel introduction strengthens as average travel cost in a market decreases. This result indicates that a manufacturer's channel power relative to that of the independent physical retailer becomes stronger with a decreasing average travel cost. This implication is counter-intuitive, because it is widely believed that the negative impact of Internet channel introduction on a competing physical retailer is more significant in a market like Russia, where consumers are more geographically dispersed, than in a market like Hong Kong, that has a condensed geographic distribution of consumers.
    illustrates how this happens. When mangers consider the overall impact of the Internet channel, however, they should consider not only channel power, but also sales volume. When both are considered, the introduction of the Internet channel is revealed as more harmful to a physical retailer in Russia than one in Hong Kong, because the sales volume decrease for a physical store due to Internet channel competition is much greater in Russia than in Hong Kong. The results show that manufacturer is always better off with any type of Internet store introduction. The independent physical store benefits from opening its own Internet store when the average travel cost is higher relative to the disutility of using the Internet. Under an opposite market condition, however, the independent physical retailer could be worse off when it opens its own Internet outlet and coordinates both outlets (RI). This is because the low average travel cost significantly reduces the channel power of the independent physical retailer, further aggravating the already weak channel power caused by myopic inter-channel price coordination. The results implies that channel members and policy makers should explicitly consider the factors determining the relative distributions of both kinds of consumer disutility, when they make a channel decision involving an Internet channel. These factors include the suitability of a product for Internet shopping, the level of E-Commerce readiness of a market, and the degree of geographic dispersion of consumers in a market. Despite the academic contributions and managerial implications, this study is limited in the following ways. First, a series of numerical analyses were conducted to derive equilibrium solutions due to the complex forms of demand functions. In the process, we set up V=100, ${\lambda}$=1, and ${\beta}$=0.01. Future research may change this parameter value set to check the generalizability of this study. Second, the five different scenarios for market conditions were analyzed. Future research could try different sets of parameter ranges. Finally, the model setting allows only one monopoly manufacturer in the market. Accommodating competing multiple manufacturers (brands) would generate more realistic results.

  • PDF