• Title/Summary/Keyword: Small-scale mining

Search Result 42, Processing Time 0.025 seconds

Improving methods for normalizing biomedical text entities with concepts from an ontology with (almost) no training data at BLAH5 the CONTES

  • Ferre, Arnaud;Ba, Mouhamadou;Bossy, Robert
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.20.1-20.5
    • /
    • 2019
  • Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.

A study on the CRM strategy for medium and small industry of distribution (중소유통업체의 CRM 도입방안에 관한 연구)

  • Kim, Gi-Pyoung
    • Journal of Distribution Science
    • /
    • v.8 no.3
    • /
    • pp.37-47
    • /
    • 2010
  • CRM refers to the operating activities that always maintain and promote good relationship with customers to ultimately maximize the company's profits by understanding the value of customers to meet their demands, establishing a strategy which may maximize the Life Time Value and successfully operating the business by integrating the customer management processes. In our country, many big businesses are introducing CRM initiatively to use it in marketing strategy however, most medium and small sized companies do not understand CRM clearly or they feel difficult to introduce it due to huge investment needed. This study is intended to present CRM promotion strategy and activities plan fit for the medium and small sized companies by analyzing the success factors of the leading companies those have already executed CRM by surveying the precedents to make the distributors out of the industries have close relation with consumers to overcome their weakness in scale and strengthen their competitiveness in such a rapidly changing and fiercely competing market. There are 5 stages to build CRM such as the recognition of the needs of CRM establishment, the establishment of CRM integrated database, the establishment of customer analysis and marketing strategy through data mining, the practical use of customer analysis through data mining and the implementation of response analysis and close loop process. Through the case study of leading companies, CRM is needed in types of businesses where the companies constantly contact their customers. To meet their needs, they assertively analyze their customer information. Through this, they develop their own CRM programs personalized for their customers to provide high quality service products. For customers helping them make profits, the VIP marketing strategy is conducted to keep the customers from breaking their relationships with the companies. Through continuous management, CRM should be executed. In other words, through customer segmentation, the profitability for the customers should be maximized. The maximization of the profitability for the customers is the key to CRM. These are the success factors of the CRM of the distributors in Korea. Firstly, the top management's will power for CS management is needed. Secondly, the culture across the company should be made to respect the customers. Thirdly, specialized customer management and CRM workers should be trained. Fourthly, CRM behaviors should be developed for the whole staff members. Fifthly, CRM should be carried out through systematic cooperation between related departments. To make use of the case study for CRM, the company should understand the customer and establish customer management programs to set the optimal CRM strategy and continuously pursue it according to a long-term plan. For this, according to collected information and customer data, customers should be segmented and the responsive customer system should be designed according to the differentiated strategy according to the class of the customers. In terms of the future CRM, integrated CRM is essential where the customer information gathers together in one place. As the degree of customers' expectation increases a lot, the effective way to meet the customers' expectation should be pursued. As the IT technology improved rapidly, RFID (Radio Frequency Identification) appears. On a real-time basis, information about products and customers is obtained massively in a very short time. A strategy for successful CRM promotion should be improving the organizations in charge of contacting customers, re-planning the customer management processes and establishing the integrated system with the marketing strategy to keep good relation with the customers according to a long-term plan and a proper method suitable to the market conditions and run a company-wide program. In addition, a CRM program should be continuously improved and complemented to meet the company's characteristics. Especially, a strategy for successful CRM for the medium and small sized distributors should be as follows. First, they should change their existing recognition in CRM and keep in-depth care for the customers. Second, they should benchmark the techniques of CRM from the leading companies and find out success points to use. Third, they should seek some methods best suited for their particular conditions by achieving the ideas combining their own strong points with marketing. Fourth, a CRM model should be developed that will promote relationship with individual customers just like the precedents of small sized businesses in Switzerland through small but noticeable events.

  • PDF

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

  • Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.227-240
    • /
    • 2011
  • New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.

Discovery of Market Convergence Opportunity Combining Text Mining and Social Network Analysis: Evidence from Large-Scale Product Databases (B2B 전자상거래 정보를 활용한 시장 융합 기회 발굴 방법론)

  • Kim, Ji-Eun;Hyun, Yoonjin;Choi, Yun-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.87-107
    • /
    • 2016
  • Understanding market convergence has became essential for small and mid-size enterprises. Identifying convergence items among heterogeneous markets could lead to product innovation and successful market introduction. Previous researches have two limitations. First, traditional researches focusing on patent databases are suitable for detecting technology convergence, however, they have failed to recognize market demands. Second, most researches concentrate on identifying the relationship between existing products or technology. This study presents a platform to identify the opportunity of market convergence by using product databases from a global B2B marketplace. We also attempt to identify convergence opportunity in different industries by applying Structural Hole theory. This paper shows the mechanisms for market convergence: attributes extraction of products and services using text mining and association analysis among attributes, and network analysis based on structural hole. In order to discover market demand, we analyzed 240,002 e-catalog from January 2013 to July 2016.

A Study on the Revitalization Strategy for Inter-Korean Railway by Building the Railway Logistics Depot - Focused on the Donghae Line - (철도 물류기지 구축을 통한 남북철도 활성화 방안 연구 - 동해선을 중심으로 -)

  • Kim, Young-Min;Cho, Chi-Hyun
    • Journal of Distribution Science
    • /
    • v.8 no.2
    • /
    • pp.5-12
    • /
    • 2010
  • The allotment rate for railway transportation keeps an yearly 6% in Korea. However, the railway logistics will cause the opposite result according to the continuous investment and logistics rationalization. The study on railway logistics as well as inter-Korean railway that might highly contribute to the development of railway logistics is not enough at all. The purpose of this paper is to study the revitalization strategy for inter-Korean railway by forecasting the demand and the scale of railway logistics depot. The revitalization strategies for inter-Korean railway through railway logistics depot are as followings. First, it is necessary to strengthen the partnership with coal user in the logistics depot. Second, it is encouraged to provide the financial assistance that are needed in the maintenance of the decrepit North Korea's track as well as the establishment of Donghae northern line that is from Gangneung to Jejin. Third, the railway cost on long/short transportation and large sized shipper is needed to apply in a flexible way. Fourth, it is necessary to obtain the railway traffic right by involving the foreign mining development. Fifth, it is encouraged to constantly find the small sized shipper like cement company.

  • PDF

Network Anomaly Traffic Detection Using WGAN-CNN-BiLSTM in Big Data Cloud-Edge Collaborative Computing Environment

  • Yue Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.3
    • /
    • pp.375-390
    • /
    • 2024
  • Edge computing architecture has effectively alleviated the computing pressure on cloud platforms, reduced network bandwidth consumption, and improved the quality of service for user experience; however, it has also introduced new security issues. Existing anomaly detection methods in big data scenarios with cloud-edge computing collaboration face several challenges, such as sample imbalance, difficulty in dealing with complex network traffic attacks, and difficulty in effectively training large-scale data or overly complex deep-learning network models. A lightweight deep-learning model was proposed to address these challenges. First, normalization on the user side was used to preprocess the traffic data. On the edge side, a trained Wasserstein generative adversarial network (WGAN) was used to supplement the data samples, which effectively alleviates the imbalance issue of a few types of samples while occupying a small amount of edge-computing resources. Finally, a trained lightweight deep learning network model is deployed on the edge side, and the preprocessed and expanded local data are used to fine-tune the trained model. This ensures that the data of each edge node are more consistent with the local characteristics, effectively improving the system's detection ability. In the designed lightweight deep learning network model, two sets of convolutional pooling layers of convolutional neural networks (CNN) were used to extract spatial features. The bidirectional long short-term memory network (BiLSTM) was used to collect time sequence features, and the weight of traffic features was adjusted through the attention mechanism, improving the model's ability to identify abnormal traffic features. The proposed model was experimentally demonstrated using the NSL-KDD, UNSW-NB15, and CIC-ISD2018 datasets. The accuracies of the proposed model on the three datasets were as high as 0.974, 0.925, and 0.953, respectively, showing superior accuracy to other comparative models. The proposed lightweight deep learning network model has good application prospects for anomaly traffic detection in cloud-edge collaborative computing architectures.

Treatment of Contaminated Sediment for Water Quality Improvement of Small-scale Reservoir (소하천형 호수의 수질개선을 위한 퇴적저니 처리방안 연구)

  • 배우근;이창수;정진욱;최동호
    • Journal of Soil and Groundwater Environment
    • /
    • v.7 no.4
    • /
    • pp.31-39
    • /
    • 2002
  • Pollutants from industry, mining, agriculture, and other sources have contaminated sediments in many surface water bodies. Sediment contamination poses a severe threat to human health and environment because many toxic contaminants that are barely detectable in the water column can accumulate in sediments at much higher levels. The purpose of this study was to make optimal treatment and disposal plan o( sediment for water quality improvement in small-scale resevoir based on an evaluation of degree of contamination. The degree of contamination were investigated for 23 samples of 9 site at different depth of sediment in small-scale J river. Results for analysis of contaminated sediments were observed that copper concentration of 4 samples were higher than the regulation of hazardous waste (3 mg/L) and that of all samples were exceeded soil pollution warning levels for agricultural areas. Lead and mercury concentration of all samples were detected below both regulations. Necessary of sediment dredge was evaluated for organic matter and nutrient through standard levels of Paldang lake and the lower Han river in Korea and Tokyo bay and Yokohama bay in Japan. The degree of contamination for organic matter and nutrient was not serious. Compared standard levels of Japan, America, and Canada for heavy metal, contaminated sediment was concluded as lowest effect level or limit of tolerance level because standard levels of America and Canada was established worst effect of benthic organisms. The optimal treatment method of sediment contained heavy metal was cement-based solidification/stabilization to prevent heavy metal leaching.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

Effects of Geological Conditions on the Geomorphological Development of the Southwestern Coastal Regions of Korea (서남해안지역(西南海岸地域)의 지형발달(地形發達)에 미친 지질조건(地質條件))

  • Kim, Suh Woon
    • Economic and Environmental Geology
    • /
    • v.4 no.1
    • /
    • pp.11-18
    • /
    • 1971
  • The geotectonics and geomorphic structure of Korea resulted from the Song-rim Disturbance and the Daebo orogenic movements. Afterward this mountainous peninsula underwent several geological changes on a small scale, and it was also claimed that the steady rising of the elevated peneplain of the eastern coast and the submerging of the southwestern coastal area are largely due to the tilted block movement. These views have been generally accepted good in several ways, but they are limited in range or lacking in theoretical integration. The present writer investigated the geology of the Mt. Chi-ri-san and the Honam coal mining area for a geological map in 1965, respectively. The results of these studies convinced the present writer that the conventional views, which were based upon a theory of lateral pressure should be reconsidered in many respects, and more recent studies made it clear that the morphological development in the southwestern area can be better explained by the orogenic movement and rock control. The measurement of submerging speed of the western coastal area (Pak. Y. A., 1969) and a new account on the geology and tectonics of the Mid-central region of South Korea (Kim O.J., 1970) act as an encouragement to a new explanation. The present writer's researches on the extreme southwestern portion of the peninsula show that the steady submerging of this area cannot be attributed to a simple downthrown block phenomenon caused by block movement. It is no more than the result of the differential movement of uplifting in the eastern and western coastal areas and the rising of sea-level in the post-glacial period. This phenomenon could be easily explained by the comparison of the rate of rise in sea-level and amount of heat flow between Korea and other areas in the world. The existance of the erosional planes in the Sobaik-San ranges also provide an evidence of an upheaval in the western coast area. Though the Sobaik-San ranges largely follow the direction of the Sinian system. They consist of the numerous branches, whose trends run more or less differently from their main trend because of the disharmonic folding, are converged into Mt. Sobaik-San and Chupungryung. The undulation of the land is not wholely caused by orogenic movements, where as the present writer confirmed that the diversity of morphological development is the direct reflection of geological conditions such as rocks and processes which constitute the basic elements of geomorphic structure. An east-west directed mountain range which could be named as Hansan mountain range, was claimed to be oriented by the joint control. The geological conditions such as a special erosion and weathering of agglomerate and breccia tuff usually produce pot-hole like submarine features which cause the whirling phenomenon at the southwestern coast channel.

  • PDF

Simulation Study on E-commerce Recommender System by Use of LSI Method (LSI 기법을 이용한 전자상거래 추천자 시스템의 시뮬레이션 분석)

  • Kwon, Chi-Myung
    • Journal of the Korea Society for Simulation
    • /
    • v.15 no.3
    • /
    • pp.23-30
    • /
    • 2006
  • A recommender system for E-commerce site receives information from customers about which products they are interested in, and recommends products that are likely to fit their needs. In this paper, we investigate several methods for large-scale product purchase data for the purpose of producing useful recommendations to customers. We apply the traditional data mining techniques of cluster analysis and collaborative filtering(CF), and CF with reduction of product-dimensionality by use of latent semantic indexing(LSI). If reduced product-dimensionality obtained from LSI shows a similar latent trend of customers for buying products to that based on original customer-product purchase data, we expect less computational effort for obtaining the nearest-neighbor for target customer may improve the efficiency of recommendation performance. From simulation experiments on synthetic customer-product purchase data, CF-based method with reduction of product-dimensionality presents a better performance than the traditional CF methods with respect to the recall, precision and F1 measure. In general, the recommendation quality increases as the size of the neighborhood increases. However, our simulation results shows that, after a certain point, the improvement gain diminish. Also we find, as a number of products of recommendation increases, the precision becomes worse, but the improvement gain of recall is relatively small after a certain point. We consider these informations may be useful in applying recommender system.

  • PDF