• Title/Summary/Keyword: web data

Search Result 5,560, Processing Time 0.035 seconds

GWB: An integrated software system for Managing and Analyzing Genomic Sequences (GWB: 유전자 서열 데이터의 관리와 분석을 위한 통합 소프트웨어 시스템)

  • Kim In-Cheol;Jin Hoon
    • Journal of Internet Computing and Services
    • /
    • v.5 no.5
    • /
    • pp.1-15
    • /
    • 2004
  • In this paper, we explain the design and implementation of GWB(Gene WorkBench), which is a web-based, integrated system for efficiently managing and analyzing genomic sequences, Most existing software systems handling genomic sequences rarely provide both managing facilities and analyzing facilities. The analysis programs also tend to be unit programs that include just single or some part of the required functions. Moreover, these programs are widely distributed over Internet and require different execution environments. As lots of manual and conversion works are required for using these programs together, many life science researchers suffer great inconveniences. in order to overcome the problems of existing systems and provide a more convenient one for helping genomic researches in effective ways, this paper integrates both managing facilities and analyzing facilities into a single system called GWB. Most important issues regarding the design of GWB are how to integrate many different analysis programs into a single software system, and how to provide data or databases of different formats required to run these programs. In order to address these issues, GWB integrates different analysis programs byusing common input/output interfaces called wrappers, suggests a common format of genomic sequence data, organizes local databases consisting of a relational database and an indexed sequential file, and provides facilities for converting data among several well-known different formats and exporting local databases into XML files.

  • PDF

Features of Korean Webtoons through the Statistical Analysis (웹툰 통계 분석을 통한 한국 웹툰의 특징)

  • Yoon, Ki-Heon;Jung, Kiu-Ha;Choi, In-Soo;Choi, Hae-Sol
    • Cartoon and Animation Studies
    • /
    • s.38
    • /
    • pp.177-194
    • /
    • 2015
  • This study that had been conducted two months by a research team of Pusan National University at the request of Korea Manwha Contents Agency in Dec. 2013 is about the statistical analysis on 'Korean Webtoon DB and its Flow Report' which resulted from the complete survey of Korean webtoons which had been published with payment in official media from early 2000 to 2013. Webtoon which means the cartoons published on web has become a typical type of Korean cartoons and has developed into a main industry since 2000s when traditional published cartoons had declined and social environments had changed. Today, it represents cultural contents in Korea. This study collected the webtoons officially published in media with payment, among Korean webtoons having been published from the early 2000s to Jan. Based on the collected data, it analyzed the general characteristics of webtoons, including cartoonists, the number of cartoons, distribution chart of each media, genre, and publication cycle. According to the data analysis and statistics, a great deal of Korean webtoons are still published in main portal websites, but their platform is being diversified and a webtoon's publication cycle tends to be shortened. In terms of genre, traditional popular genres, such as drama, comic, fantasy, and action, are still popular, and the genres of history, sports, and food are on the rise along with a social trend. Regarding webtoon application, such events as relay webtoon and brand webtoon, and a new type of webtoon featuring PPL commercialism appear. Such phenomena can realize the common profits of cartoonists, media, and ordering bodies, and are various trials to test the possibility of webtoons. In addition, what needs to pay attention on in the expansion of webtoons is increasing webtoons for adults. The study subjects are the webtoons published with payment, excluding free webtoons. However, this study failed to collect the webtoons published on the online websites already closed, and the lost information on cartoonists and their lost webtoons, and it is necessary to conduct a complete survey on all webtoons including free ones. Despite the limitations, this study is meaningful in the points that it categorized and analyzed Korean webtoons accoridng to official media, webtoons, cartoonists, and genres and that it provided a fundamental material to understand the current conditions of webtoons. It is expected that this study will be able to contribute to activating more research on webtoons and producing more supplementary data which will be used for the Korean cartoon industry and academia.

An Efficient Estimation of Place Brand Image Power Based on Text Mining Technology (텍스트마이닝 기반의 효율적인 장소 브랜드 이미지 강도 측정 방법)

  • Choi, Sukjae;Jeon, Jongshik;Subrata, Biswas;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.113-129
    • /
    • 2015
  • Location branding is a very important income making activity, by giving special meanings to a specific location while producing identity and communal value which are based around the understanding of a place's location branding concept methodology. Many other areas, such as marketing, architecture, and city construction, exert an influence creating an impressive brand image. A place brand which shows great recognition to both native people of S. Korea and foreigners creates significant economic effects. There has been research on creating a strategically and detailed place brand image, and the representative research has been carried out by Anholt who surveyed two million people from 50 different countries. However, the investigation, including survey research, required a great deal of effort from the workforce and required significant expense. As a result, there is a need to make more affordable, objective and effective research methods. The purpose of this paper is to find a way to measure the intensity of the image of the brand objective and at a low cost through text mining purposes. The proposed method extracts the keyword and the factors constructing the location brand image from the related web documents. In this way, we can measure the brand image intensity of the specific location. The performance of the proposed methodology was verified through comparison with Anholt's 50 city image consistency index ranking around the world. Four methods are applied to the test. First, RNADOM method artificially ranks the cities included in the experiment. HUMAN method firstly makes a questionnaire and selects 9 volunteers who are well acquainted with brand management and at the same time cities to evaluate. Then they are requested to rank the cities and compared with the Anholt's evaluation results. TM method applies the proposed method to evaluate the cities with all evaluation criteria. TM-LEARN, which is the extended method of TM, selects significant evaluation items from the items in every criterion. Then the method evaluates the cities with all selected evaluation criteria. RMSE is used to as a metric to compare the evaluation results. Experimental results suggested by this paper's methodology are as follows: Firstly, compared to the evaluation method that targets ordinary people, this method appeared to be more accurate. Secondly, compared to the traditional survey method, the time and the cost are much less because in this research we used automated means. Thirdly, this proposed methodology is very timely because it can be evaluated from time to time. Fourthly, compared to Anholt's method which evaluated only for an already specified city, this proposed methodology is applicable to any location. Finally, this proposed methodology has a relatively high objectivity because our research was conducted based on open source data. As a result, our city image evaluation text mining approach has found validity in terms of accuracy, cost-effectiveness, timeliness, scalability, and reliability. The proposed method provides managers with clear guidelines regarding brand management in public and private sectors. As public sectors such as local officers, the proposed method could be used to formulate strategies and enhance the image of their places in an efficient manner. Rather than conducting heavy questionnaires, the local officers could monitor the current place image very shortly a priori, than may make decisions to go over the formal place image test only if the evaluation results from the proposed method are not ordinary no matter what the results indicate opportunity or threat to the place. Moreover, with co-using the morphological analysis, extracting meaningful facets of place brand from text, sentiment analysis and more with the proposed method, marketing strategy planners or civil engineering professionals may obtain deeper and more abundant insights for better place rand images. In the future, a prototype system will be implemented to show the feasibility of the idea proposed in this paper.

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

GIS-based Disaster Management System for a Private Insurance Company in Case of Typhoons(I) (지리정보기반의 재해 관리시스템 구축(I) -민간 보험사의 사례, 태풍의 경우-)

  • Chang Eun-Mi
    • Journal of the Korean Geographical Society
    • /
    • v.41 no.1 s.112
    • /
    • pp.106-120
    • /
    • 2006
  • Natural or man-made disaster has been expected to be one of the potential themes that can integrate human geography and physical geography. Typhoons like Rusa and Maemi caused great loss to insurance companies as well as public sectors. We have implemented a natural disaster management system for a private insurance company to produce better estimation of hazards from high wind as well as calculate vulnerability of damage. Climatic gauge sites and addresses of contract's objects were geo-coded and the pressure values along all the typhoon tracks were vectorized into line objects. National GIS topog raphic maps with scale of 1: 5,000 were updated into base maps and digital elevation model with 30 meter space and land cover maps were used for reflecting roughness of land to wind velocity. All the data are converted to grid coverage with $1km{\times}1km$. Vulnerability curve of Munich Re was ad opted, and preprocessor and postprocessor of wind velocity model was implemented. Overlapping the location of contracts on the grid value coverage can show the relative risk, with given scenario. The wind velocities calculated by the model were compared with observed value (average $R^2=0.68$). The calibration of wind speed models was done by dropping two climatic gauge data, which enhanced $R^2$ values. The comparison of calculated loss with actual historical loss of the insurance company showed both underestimation and overestimation. This system enables the company to have quantitative data for optimizing the re-insurance ratio, to have a plan to allocate enterprise resources and to upgrade the international creditability of the company. A flood model, storm surge model and flash flood model are being added, at last, combined disaster vulnerability will be calculated for a total disaster management system.

The Current Status of Utilization of Palliative Care Units in Korea: 6 Month Results of 2009 Korean Terminal Cancer Patient Information System (말기암환자 정보시스템을 이용한 우리나라 암환자 완화의료기관의 이용현황)

  • Shin, Dong-Wook;Choi, Jin-Young;Nam, Byung-Ho;Seo, Won-Seok;Kim, Hyo-Young;Hwang, Eun-Joo;Kang, Jina;Kim, So-Hee;Kim, Yang-Hyuck;Park, Eun-Cheol
    • Journal of Hospice and Palliative Care
    • /
    • v.13 no.3
    • /
    • pp.181-189
    • /
    • 2010
  • Purpose: Recently, health policy making is increasingly based on evidence. Therefore, Korean Terminal Cancer Patient Information System (KTCPIS) was developed to meet such need. We aimed to report its developmental process and statistics from 6 months data. Methods: Items for KTCPIS were developed through the consultation with practitioners. E-Velos web-based clinical trial management system was used as a technical platform. Data were collected for patients who were registered to 34 inpatient palliative care services, designated by Ministry of Health, Welfare, and Family Affairs, from $1^{st}$ of January to $30^{th}$ of June in 2009. Descriptive statistics were used for the analysis. Results: From the nationally representative set of 2,940 patients, we obtained the following results. Mean age was $64.8{\pm}12.9$ years, and 56.6% were male. Lung cancer (18.0%) was most common diagnosis. Only 50.3% of patients received the confirmation of terminal diagnosis by two or more physicians, and 69.7% had an insight of terminal diagnosis at the time of admission. About half of patients were admitted to the units on their own without any formal referral. Average and worst pain scores were significantly reduced after 1 week when compared to those at the time of admission. 73.4% faced death in the units, and home-discharge comprised only 13.3%. Mean length of stay per admission was $20.2{\pm}21.2$ days, with median value of 13. Conclusion: Nationally representative data on the characteristics of patients and their caregiver, and current practice of service delivery in palliative care units were obtained through the operation of KTCPIS.

A User Profile-based Filtering Method for Information Search in Smart TV Environment (스마트 TV 환경에서 정보 검색을 위한 사용자 프로파일 기반 필터링 방법)

  • Sean, Visal;Oh, Kyeong-Jin;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.97-117
    • /
    • 2012
  • Nowadays, Internet users tend to do a variety of actions at the same time such as web browsing, social networking and multimedia consumption. While watching a video, once a user is interested in any product, the user has to do information searches to get to know more about the product. With a conventional approach, user has to search it separately with search engines like Bing or Google, which might be inconvenient and time-consuming. For this reason, a video annotation platform has been developed in order to provide users more convenient and more interactive ways with video content. In the future of smart TV environment, users can follow annotated information, for example, a link to a vendor to buy the product of interest. It is even better to enable users to search for information by directly discussing with friends. Users can effectively get useful and relevant information about the product from friends who share common interests or might have experienced it before, which is more reliable than the results from search engines. Social networking services provide an appropriate environment for people to share products so that they can show new things to their friends and to share their personal experiences on any specific product. Meanwhile, they can also absorb the most relevant information about the product that they are interested in by either comments or discussion amongst friends. However, within a very huge graph of friends, determining the most appropriate persons to ask for information about a specific product has still a limitation within the existing conventional approach. Once users want to share or discuss a product, they simply share it to all friends as new feeds. This means a newly posted article is blindly spread to all friends without considering their background interests or knowledge. In this way, the number of responses back will be huge. Users cannot easily absorb the relevant and useful responses from friends, since they are from various fields of interest and knowledge. In order to overcome this limitation, we propose a method to filter a user's friends for information search, which leverages semantic video annotation and social networking services. Our method filters and brings out who can give user useful information about a specific product. By examining the existing Facebook information regarding users and their social graph, we construct a user profile of product interest. With user's permission and authentication, user's particular activities are enriched with the domain-specific ontology such as GoodRelations and BestBuy Data sources. Besides, we assume that the object in the video is already annotated using Linked Data. Thus, the detail information of the product that user would like to ask for more information is retrieved via product URI. Our system calculates the similarities among them in order to identify the most suitable friends for seeking information about the mentioned product. The system filters a user's friends according to their score which tells the order of whom can highly likely give the user useful information about a specific product of interest. We have conducted an experiment with a group of respondents in order to verify and evaluate our system. First, the user profile accuracy evaluation is conducted to demonstrate how much our system constructed user profile of product interest represents user's interest correctly. Then, the evaluation on filtering method is made by inspecting the ranked results with human judgment. The results show that our method works effectively and efficiently in filtering. Our system fulfills user needs by supporting user to select appropriate friends for seeking useful information about a specific product that user is curious about. As a result, it helps to influence and convince user in purchase decisions.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

A Study on the Model of Appraisal and Acquisition for Digital Documentary Heritage : Focused on 'Whole-of-Society Approach' in Canada (디지털기록유산 평가·수집 모형에 대한 연구 캐나다 'Whole-of-Society 접근법'을 중심으로)

  • Pak, Ji-Ae;Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.44
    • /
    • pp.51-99
    • /
    • 2015
  • The purpose of the archival appraisal has gradually changed from the selection of records to the documentation of the society. In particular, the qualitative and quantitative developments of the current digital technology and web have become the driving force that enables semantic acquisition, rather than physical one. Under these circumstances, the concept of 'documentary heritage' has been re-established internationally, led by UNESCO. Library and Archives Canada (LAC) reflects this trend. LAC has been trying to develop a new appraisal model and an acquisition model at the same time to revive the spirit of total archives, which is the 'Whole-of-society approach'. Features of this approach can be summarized in three main points. First, it is for documentary heritage and the acquisition refers to semantic acquisition, not the physical one. And because the object of management is documentary heritage, the cooperation between documentary heritage institutions has to be a prerequisite condition. Lastly, it cannot only documenting what already happened, it can documenting what is happening in the current society. 'Whole-of-society approach', as an appraisal method, is a way to identify social components based on social theories. The approach, as an acquisition method, is targeting digital recording, which includes 'digitized' heritage and 'born-digital' heritage. And it makes possible to the semantic acquisition of documentary heritage based on the data linking by mapping identified social components as metadata component and establishing them into linked open data. This study pointed out that it is hard to realize documentation of the society based on domestic appraisal system since the purpose is limited to selection. To overcome this limitation, we suggest a guideline applied with 'Whole-of-society approach'.

Current Status and Future Development Direction of University Archives' Information Services : Based on the Interview with the Archives' Staff (대학기록관 기록정보서비스의 현황과 발전 방안 실무자 면담을 중심으로)

  • Lee, Hye Kyoung;Rieh, Hae-Young
    • The Korean Journal of Archival Studies
    • /
    • no.40
    • /
    • pp.131-180
    • /
    • 2014
  • Various theoretical studies have been conducted to activate university archives, but the services provided currently in the field haven't been much studied. This study aims to investigate the usage and users of the domestic university archives, examine the types of the archival information services provided, understand the characteristics and limitations of the services, and suggest the development direction. This study set 3 objectives for the research. First, Identify the users of the university archives, the reason of the use, and the kinds of archival materials used. Second, the kinds of services and programs the university archives provide to the users. Third, the difficulties the university archives face to execute information services, the plans they consider in the future, and the best possible direction to prove the services. The authors of the study determined to apply interviews with the staffs at university archives to identify the current status of the services. For this, the range of the services offered in the field of university archives was defined first, and then, key research questions were composed. To collect valid data, authors carried out face to face interviews and email/phone interviews with the staff of 12 university archives, as well as the investigation of their Web sites. The collected data were categorized by the topic of the interview questions for analysis. By analyzing the data, some useful information was yielded including the demographic information of the research participants, the characteristics of the archives' users and requests, the types and activities of the services the university archives offered, and the limitations of archival information services, the archives' future plans, and the best possible development direction. Based on the findings, this study proposed the implications and suggestions for archival information services in university archives, in 3 domains as follows. First, university archives should build close relationship with internal university administrative units, student groups, and faculty members for effective collection and better use of archives. Second, university archives need to acquire both administrative records by transfer and manuscripts and archives by active collection. Especially, archives need to try to acquire unique archives of the universities own. Third, the archives should develop and provide various services that can elevate the awareness of university archives and induce more potential users to the archives. Finally, to solve the problems the archives face, such as the lack of the understanding of the value of the archives and the shortage of the archival materials, it was suggested that the archivists need to actively collect archival materials, and provide the valuable information by active seeking in the archives where ever it is needed.