• Title/Summary/Keyword: Web system

Search Result 7,854, Processing Time 0.041 seconds

A Study on the Meaning and Strategy of Keyword Advertising Marketing

  • Park, Nam Goo
    • Journal of Distribution Science
    • /
    • v.8 no.3
    • /
    • pp.49-56
    • /
    • 2010
  • At the initial stage of Internet advertising, banner advertising came into fashion. As the Internet developed into a central part of daily lives and the competition in the on-line advertising market was getting fierce, there was not enough space for banner advertising, which rushed to portal sites only. All these factors was responsible for an upsurge in advertising prices. Consequently, the high-cost and low-efficiency problems with banner advertising were raised, which led to an emergence of keyword advertising as a new type of Internet advertising to replace its predecessor. In the beginning of 2000s, when Internet advertising came to be activated, display advertisement including banner advertising dominated the Net. However, display advertising showed signs of gradual decline, and registered minus growth in the year 2009, whereas keyword advertising showed rapid growth and started to outdo display advertising as of the year 2005. Keyword advertising refers to the advertising technique that exposes relevant advertisements on the top of research sites when one searches for a keyword. Instead of exposing advertisements to unspecified individuals like banner advertising, keyword advertising, or targeted advertising technique, shows advertisements only when customers search for a desired keyword so that only highly prospective customers are given a chance to see them. In this context, it is also referred to as search advertising. It is regarded as more aggressive advertising with a high hit rate than previous advertising in that, instead of the seller discovering customers and running an advertisement for them like TV, radios or banner advertising, it exposes advertisements to visiting customers. Keyword advertising makes it possible for a company to seek publicity on line simply by making use of a single word and to achieve a maximum of efficiency at a minimum cost. The strong point of keyword advertising is that customers are allowed to directly contact the products in question through its more efficient advertising when compared to the advertisements of mass media such as TV and radio, etc. The weak point of keyword advertising is that a company should have its advertisement registered on each and every portal site and finds it hard to exercise substantial supervision over its advertisement, there being a possibility of its advertising expenses exceeding its profits. Keyword advertising severs as the most appropriate methods of advertising for the sales and publicity of small and medium enterprises which are in need of a maximum of advertising effect at a low advertising cost. At present, keyword advertising is divided into CPC advertising and CPM advertising. The former is known as the most efficient technique, which is also referred to as advertising based on the meter rate system; A company is supposed to pay for the number of clicks on a searched keyword which users have searched. This is representatively adopted by Overture, Google's Adwords, Naver's Clickchoice, and Daum's Clicks, etc. CPM advertising is dependent upon the flat rate payment system, making a company pay for its advertisement on the basis of the number of exposure, not on the basis of the number of clicks. This method fixes a price for advertisement on the basis of 1,000-time exposure, and is mainly adopted by Naver's Timechoice, Daum's Speciallink, and Nate's Speedup, etc, At present, the CPC method is most frequently adopted. The weak point of the CPC method is that advertising cost can rise through constant clicks from the same IP. If a company makes good use of strategies for maximizing the strong points of keyword advertising and complementing its weak points, it is highly likely to turn its visitors into prospective customers. Accordingly, an advertiser should make an analysis of customers' behavior and approach them in a variety of ways, trying hard to find out what they want. With this in mind, her or she has to put multiple keywords into use when running for ads. When he or she first runs an ad, he or she should first give priority to which keyword to select. The advertiser should consider how many individuals using a search engine will click the keyword in question and how much money he or she has to pay for the advertisement. As the popular keywords that the users of search engines are frequently using are expensive in terms of a unit cost per click, the advertisers without much money for advertising at the initial phrase should pay attention to detailed keywords suitable to their budget. Detailed keywords are also referred to as peripheral keywords or extension keywords, which can be called a combination of major keywords. Most keywords are in the form of texts. The biggest strong point of text-based advertising is that it looks like search results, causing little antipathy to it. But it fails to attract much attention because of the fact that most keyword advertising is in the form of texts. Image-embedded advertising is easy to notice due to images, but it is exposed on the lower part of a web page and regarded as an advertisement, which leads to a low click through rate. However, its strong point is that its prices are lower than those of text-based advertising. If a company owns a logo or a product that is easy enough for people to recognize, the company is well advised to make good use of image-embedded advertising so as to attract Internet users' attention. Advertisers should make an analysis of their logos and examine customers' responses based on the events of sites in question and the composition of products as a vehicle for monitoring their behavior in detail. Besides, keyword advertising allows them to analyze the advertising effects of exposed keywords through the analysis of logos. The logo analysis refers to a close analysis of the current situation of a site by making an analysis of information about visitors on the basis of the analysis of the number of visitors and page view, and that of cookie values. It is in the log files generated through each Web server that a user's IP, used pages, the time when he or she uses it, and cookie values are stored. The log files contain a huge amount of data. As it is almost impossible to make a direct analysis of these log files, one is supposed to make an analysis of them by using solutions for a log analysis. The generic information that can be extracted from tools for each logo analysis includes the number of viewing the total pages, the number of average page view per day, the number of basic page view, the number of page view per visit, the total number of hits, the number of average hits per day, the number of hits per visit, the number of visits, the number of average visits per day, the net number of visitors, average visitors per day, one-time visitors, visitors who have come more than twice, and average using hours, etc. These sites are deemed to be useful for utilizing data for the analysis of the situation and current status of rival companies as well as benchmarking. As keyword advertising exposes advertisements exclusively on search-result pages, competition among advertisers attempting to preoccupy popular keywords is very fierce. Some portal sites keep on giving priority to the existing advertisers, whereas others provide chances to purchase keywords in question to all the advertisers after the advertising contract is over. If an advertiser tries to rely on keywords sensitive to seasons and timeliness in case of sites providing priority to the established advertisers, he or she may as well make a purchase of a vacant place for advertising lest he or she should miss appropriate timing for advertising. However, Naver doesn't provide priority to the existing advertisers as far as all the keyword advertisements are concerned. In this case, one can preoccupy keywords if he or she enters into a contract after confirming the contract period for advertising. This study is designed to take a look at marketing for keyword advertising and to present effective strategies for keyword advertising marketing. At present, the Korean CPC advertising market is virtually monopolized by Overture. Its strong points are that Overture is based on the CPC charging model and that advertisements are registered on the top of the most representative portal sites in Korea. These advantages serve as the most appropriate medium for small and medium enterprises to use. However, the CPC method of Overture has its weak points, too. That is, the CPC method is not the only perfect advertising model among the search advertisements in the on-line market. So it is absolutely necessary that small and medium enterprises including independent shopping malls should complement the weaknesses of the CPC method and make good use of strategies for maximizing its strengths so as to increase their sales and to create a point of contact with customers.

  • PDF

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

The Effect of Expert Reviews on Consumer Product Evaluations: A Text Mining Approach (전문가 제품 후기가 소비자 제품 평가에 미치는 영향: 텍스트마이닝 분석을 중심으로)

  • Kang, Taeyoung;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.63-82
    • /
    • 2016
  • Individuals gather information online to resolve problems in their daily lives and make various decisions about the purchase of products or services. With the revolutionary development of information technology, Web 2.0 has allowed more people to easily generate and use online reviews such that the volume of information is rapidly increasing, and the usefulness and significance of analyzing the unstructured data have also increased. This paper presents an analysis on the lexical features of expert product reviews to determine their influence on consumers' purchasing decisions. The focus was on how unstructured data can be organized and used in diverse contexts through text mining. In addition, diverse lexical features of expert reviews of contents provided by a third-party review site were extracted and defined. Expert reviews are defined as evaluations by people who have expert knowledge about specific products or services in newspapers or magazines; this type of review is also called a critic review. Consumers who purchased products before the widespread use of the Internet were able to access expert reviews through newspapers or magazines; thus, they were not able to access many of them. Recently, however, major media also now provide online services so that people can more easily and affordably access expert reviews compared to the past. The reason why diverse reviews from experts in several fields are important is that there is an information asymmetry where some information is not shared among consumers and sellers. The information asymmetry can be resolved with information provided by third parties with expertise to consumers. Then, consumers can read expert reviews and make purchasing decisions by considering the abundant information on products or services. Therefore, expert reviews play an important role in consumers' purchasing decisions and the performance of companies across diverse industries. If the influence of qualitative data such as reviews or assessment after the purchase of products can be separately identified from the quantitative data resources, such as the actual quality of products or price, it is possible to identify which aspects of product reviews hamper or promote product sales. Previous studies have focused on the characteristics of the experts themselves, such as the expertise and credibility of sources regarding expert reviews; however, these studies did not suggest the influence of the linguistic features of experts' product reviews on consumers' overall evaluation. However, this study focused on experts' recommendations and evaluations to reveal the lexical features of expert reviews and whether such features influence consumers' overall evaluations and purchasing decisions. Real expert product reviews were analyzed based on the suggested methodology, and five lexical features of expert reviews were ultimately determined. Specifically, the "review depth" (i.e., degree of detail of the expert's product analysis), and "lack of assurance" (i.e., degree of confidence that the expert has in the evaluation) have statistically significant effects on consumers' product evaluations. In contrast, the "positive polarity" (i.e., the degree of positivity of an expert's evaluations) has an insignificant effect, while the "negative polarity" (i.e., the degree of negativity of an expert's evaluations) has a significant negative effect on consumers' product evaluations. Finally, the "social orientation" (i.e., the degree of how many social expressions experts include in their reviews) does not have a significant effect on consumers' product evaluations. In summary, the lexical properties of the product reviews were defined according to each relevant factor. Then, the influence of each linguistic factor of expert reviews on the consumers' final evaluations was tested. In addition, a test was performed on whether each linguistic factor influencing consumers' product evaluations differs depending on the lexical features. The results of these analyses should provide guidelines on how individuals process massive volumes of unstructured data depending on lexical features in various contexts and how companies can use this mechanism from their perspective. This paper provides several theoretical and practical contributions, such as the proposal of a new methodology and its application to real data.

A Study on the Buyer's Decision Making Models for Introducing Intelligent Online Handmade Services (지능형 온라인 핸드메이드 서비스 도입을 위한 구매자 의사결정모형에 관한 연구)

  • Park, Jong-Won;Yang, Sung-Byung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.119-138
    • /
    • 2016
  • Since the Industrial Revolution, which made the mass production and mass distribution of standardized goods possible, machine-made (manufactured) products have accounted for the majority of the market. However, in recent years, the phenomenon of purchasing even more expensive handmade products has become a noticeable trend as consumers have started to acknowledge the value of handmade products, such as the craftsman's commitment, belief in their quality and scarcity, and the sense of self-esteem from having them,. Consumer interest in these handmade products has shown explosive growth and has been coupled with the recent development of three-dimensional (3D) printing technologies. Etsy.com is the world's largest online handmade platform. It is no different from any other online platform; it provides an online market where buyers and sellers virtually meet to share information and transact business. However, Etsy.com is different in that shops within this platform only deal with handmade products in a variety of categories, ranging from jewelry to toys. Since its establishment in 2005, despite being limited to handmade products, Etsy.com has enjoyed rapid growth in membership, transaction volume, and revenue. Most recently in April 2015, it raised funds through an initial public offering (IPO) of more than 1.8 billion USD, which demonstrates the huge potential of online handmade platforms. After the success of Etsy.com, various types of online handmade platforms such as Handmade at Amazon, ArtFire, DaWanda, and Craft is ART have emerged and are now competing with each other, at the same time, which has increased the size of the market. According to Deloitte's 2015 holiday survey on which types of gifts the respondents plan to buy during the holiday season, about 16% of U.S. consumers chose "homemade or craft items (e.g., Etsy purchase)," which was the same rate as those for the computer game and shoes categories. This indicates that consumer interests in online handmade platforms will continue to rise in the future. However, this high interest in the market for handmade products and their platforms has not yet led to academic research. Most extant studies have only focused on machine-made products and intelligent services for them. This indicates a lack of studies on handmade products and their intelligent services on virtual platforms. Therefore, this study used signaling theory and prior research on the effects of sellers' characteristics on their performance (e.g., total sales and price premiums) in the buyer-seller relationship to identify the key influencing e-Image factors (e.g., reputation, size, information sharing, and length of relationship). Then, their impacts on the performance of shops within the online handmade platform were empirically examined; the dataset was collected from Etsy.com through the application of web harvesting technology. The results from the structural equation modeling revealed that the reputation, size, and information sharing have significant effects on the total sales, while the reputation and length of relationship influence price premiums. This study extended the online platform research into online handmade platform research by identifying key influencing e-Image factors on within-platform shop's total sales and price premiums based on signaling theory and then performed a statistical investigation. These findings are expected to be a stepping stone for future studies on intelligent online handmade services as well as handmade products themselves. Furthermore, the findings of the study provide online handmade platform operators with practical guidelines on how to implement intelligent online handmade services. They should also help shop managers build their marketing strategies in a more specific and effective manner by suggesting key influencing e-Image factors. The results of this study should contribute to the vitalization of intelligent online handmade services by providing clues on how to maximize within-platform shops' total sales and price premiums.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Gene Expression Analysis of Inducible cAMP Early Repressor (ICER) Gene in Longissimus dorsi of High- and Low Marbled Hanwoo Steers (한우 등심부위 근육 내 조지방함량에 따른 inducible cAMP early repressor (ICER) 유전자발현 분석)

  • Lee, Seung-Hwan;Kim, Nam-Kuk;Kim, Sung-Kon;Cho, Yong-Min;Yoon, Du-hak;Oh, Sung-Jong;Im, Seok-Ki;Park, Eung-Woo
    • Journal of Life Science
    • /
    • v.18 no.8
    • /
    • pp.1090-1095
    • /
    • 2008
  • Marbling (intramuscular fat) is an important factor in determining meat quality in Korean beef market. A grain based finishing system for improving marbling leads to inefficient meat production due to an excessive fat production. Identification of intramuscular fat-specific gene might be achieved more targeted meat production through alternative genetic improvement program such as marker assisted selection (MAS). We carried out ddRT-PCR in 12 and 27 month old Hanwoo steers and detected 300 bp PCR product of the inducible cAMP early repressor (ICER) gene, showing highly gene expression in 27 months old. A 1.5 kb sequence was re-sequenced using primer designed base on the Hanwoo EST sequence. We then predicted the open reading frame (ORF) of ICER gene in ORF finder web program. Tissue distribution of ICER gene expression was analysed in eight Hanwoo tissue using realtime PCR analysis. The highest ICER gene expression showed in Small intestine followed by Longissimus dorsi. Interestingly, the ICER gene expressed 2.5 time higher in longissimus dorsi than in same muscle type, Rump. For gene expression analysis in high- and low marbled individuals, we selected 4 and 3 animal based on the muscle crude fat contents (high is 17-32%, low is 6-7% of crude fat contents). The ICER gene expression was analysed using ANOVA model. Marbling (muscle crude fat contents) was affected by ICER gene (P=0.012). Particularly, the ICER gene expression was 4 times higher in high group (n=4) than low group (n=3). Therefore, ICER gene might be a functional candidate gene related to marbling in Hanwoo.

Stud and Puzzle-Strip Shear Connector for Composite Beam of UHPC Deck and Inverted-T Steel Girder (초고성능 콘크리트 바닥판과 역T형 강거더의 합성보를 위한 스터드 및 퍼즐스트립 전단연결재에 관한 연구)

  • Lee, Kyoung-Chan;Joh, Changbin;Choi, Eun-Suk;Kim, Jee-Sang
    • Journal of the Korea Concrete Institute
    • /
    • v.26 no.2
    • /
    • pp.151-157
    • /
    • 2014
  • Since recently developed Ultra-High-Performance-Concrete (UHPC) provides very high strength, stiffness, and durability, many studies have been made on the application of the UHPC to bridge decks. Due to high strength and stiffness of UHPC bridge deck, the structural contribution of top flange of steel girder composite to UHPC deck would be much lower than that of conventional concrete deck. At this point of view, this study proposes a inverted-T shaped steel girder composite to UHPC deck. This girder requires a new type of shear connector because conventional shear connectors are welded on top flange. This study also proposes three different types of shear connectors, and evaluate their ultimate strength via push-out static test. The first one is a stud shear connector welded directly to the web of the girder in the transverse direction. The second one is a puzzle-strip type shear connector developed by the European Commission, and the last one is the combination of the stud and the puzzle-strip shear connectors. Experimental results showed that the ultimate strength of the transverse stud was 26% larger than that given in the AASHTO LRFD Bridge Design Specifications, but a splitting crack observed in the UHPC deck was so severe that another measure needs to be developed to prevent the splitting crack. The ultimate strength of the puzzle-strip specimen was 40% larger than that evaluated by the equation of European Commission. The specimens combined with stud and puzzle-strip shear connectors provided less strength than arithmetical sum of those. Based on the experimental observations, there appears to be no advantage of combining transverse stud and puzzle-strip shear connectors.

The Brassica rapa Tissue-specific EST Database (배추의 조직 특이적 발현유전자 데이터베이스)

  • Yu, Hee-Ju;Park, Sin-Gi;Oh, Mi-Jin;Hwang, Hyun-Ju;Kim, Nam-Shin;Chung, Hee;Sohn, Seong-Han;Park, Beom-Seok;Mun, Jeong-Hwan
    • Horticultural Science & Technology
    • /
    • v.29 no.6
    • /
    • pp.633-640
    • /
    • 2011
  • Brassica rapa is an A genome model species for Brassica crop genetics, genomics, and breeding. With the completion of sequencing the B. rapa genome, functional analysis of the genome is forthcoming issue. The expressed sequence tags are fundamental resources supporting annotation and functional analysis of the genome including identification of tissue-specific genes and promoters. As of July 2011, 147,217 ESTs from 39 cDNA libraries of B. rapa are reported in the public database. However, little information can be retrieved from the sequences due to lack of organized databases. To leverage the sequence information and to maximize the use of publicly-available EST collections, the Brassica rapa tissue-specific EST database (BrTED) is developed. BrTED includes sequence information of 23,962 unigenes assembled by StackPack program. The unigene set is used as a query unit for various analyses such as BLAST against TAIR gene model, functional annotation using MIPS and UniProt, gene ontology analysis, and prediction of tissue-specific unigene sets based on statistics test. The database is composed of two main units, EST sequence processing and information retrieving unit and tissue-specific expression profile analysis unit. Information and data in both units are tightly inter-connected to each other using a web based browsing system. RT-PCR evaluation of 29 selected unigene sets successfully amplified amplicons from the target tissues of B. rapa. BrTED provided here allows the user to identify and analyze the expression of genes of interest and aid efforts to interpret the B. rapa genome through functional genomics. In addition, it can be used as a public resource in providing reference information to study the genus Brassica and other closely related crop crucifer plants.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

End-use analysis of household water by metering (가정용수의 용도별 사용량 조사 및 원단위 분석)

  • Kim, Hwa-Soo;Lee, Doo-Jin;Kim, Ju-Whan;Kim, Jung-Hyun;Jung, Kwan-Soo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2008.05a
    • /
    • pp.869-877
    • /
    • 2008
  • The purpose of this study is to investigate the trends and patterns of variou kind of water uses in a household by metering in Korea. Water use components are classified by toilet, washbowl, bathing, laundry, kitchen, etc. Flow meters are installed in 146 household selected by sampling in all around Korea. The data are gathered by web-based data collection system from the year 2002 to 2006, considering pre-investigated data such as occupation, revenue, family members, housing types, age, floor area, water saving devices, education, etc. Reliable data are selected by upper fence method for each observed water use component and statistical characteristics are estimated for each residential type to determine liter per capita per day. Estimated domestic per capita day show an indoor water use with the range from $150{\ell}pcd$ to $169{\ell}pcd$ for each housing type as the order of high rise apartment, multi-house, and single house. As the order of consuming amount among water use components, it is investigated that toilet($38.5{\ell}pcd$) is the first, and the second is laundry water($30.8{\ell}pcd$), the third is kitchen($28.4{\ell}pcd$), the fourth is bathtub($24.7{\ell}pcd$), the next is washbowl($15.4{\ell}pcd$). The results are compared with water uses in U.K. and U.S. As life style has been changed into western style, pattern of water use in Korea is tend to be similar with the U.S. water use pattern. Compared with the surveying results by Bradley, on 1985. Thirty liter of total use increased with the advancement of economic level, and a little change of water use pattern can be found. Especially, toilet water take almost half part of total water use and laundry water shows lowest as 11% in surveying at the year of 1985. But, this study shows that 39 liter, 28% of toilet water, has been decreased by the spread of saving devices and campaign. It is supposed that the spread large sized laundry machine make by-hand laundry has been decreased and water use increased. Unit water amount of each end-use in household can be applied to design factor for water and wastewater facilities, and it play a role as information in establishing water demand forecasting and conservation policy.

  • PDF