• Title/Summary/Keyword: Information searching

Search Result 2,872, Processing Time 0.027 seconds

The Ontology Based, the Movie Contents Recommendation Scheme, Using Relations of Movie Metadata (온톨로지 기반 영화 메타데이터간 연관성을 활용한 영화 추천 기법)

  • Kim, Jaeyoung;Lee, Seok-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.25-44
    • /
    • 2013
  • Accessing movie contents has become easier and increased with the advent of smart TV, IPTV and web services that are able to be used to search and watch movies. In this situation, there are increasing search for preference movie contents of users. However, since the amount of provided movie contents is too large, the user needs more effort and time for searching the movie contents. Hence, there are a lot of researches for recommendations of personalized item through analysis and clustering of the user preferences and user profiles. In this study, we propose recommendation system which uses ontology based knowledge base. Our ontology can represent not only relations between metadata of movies but also relations between metadata and profile of user. The relation of each metadata can show similarity between movies. In order to build, the knowledge base our ontology model is considered two aspects which are the movie metadata model and the user model. On the part of build the movie metadata model based on ontology, we decide main metadata that are genre, actor/actress, keywords and synopsis. Those affect that users choose the interested movie. And there are demographic information of user and relation between user and movie metadata in user model. In our model, movie ontology model consists of seven concepts (Movie, Genre, Keywords, Synopsis Keywords, Character, and Person), eight attributes (title, rating, limit, description, character name, character description, person job, person name) and ten relations between concepts. For our knowledge base, we input individual data of 14,374 movies for each concept in contents ontology model. This movie metadata knowledge base is used to search the movie that is related to interesting metadata of user. And it can search the similar movie through relations between concepts. We also propose the architecture for movie recommendation. The proposed architecture consists of four components. The first component search candidate movies based the demographic information of the user. In this component, we decide the group of users according to demographic information to recommend the movie for each group and define the rule to decide the group of users. We generate the query that be used to search the candidate movie for recommendation in this component. The second component search candidate movies based user preference. When users choose the movie, users consider metadata such as genre, actor/actress, synopsis, keywords. Users input their preference and then in this component, system search the movie based on users preferences. The proposed system can search the similar movie through relation between concepts, unlike existing movie recommendation systems. Each metadata of recommended candidate movies have weight that will be used for deciding recommendation order. The third component the merges results of first component and second component. In this step, we calculate the weight of movies using the weight value of metadata for each movie. Then we sort movies order by the weight value. The fourth component analyzes result of third component, and then it decides level of the contribution of metadata. And we apply contribution weight to metadata. Finally, we use the result of this step as recommendation for users. We test the usability of the proposed scheme by using web application. We implement that web application for experimental process by using JSP, Java Script and prot$\acute{e}$g$\acute{e}$ API. In our experiment, we collect results of 20 men and woman, ranging in age from 20 to 29. And we use 7,418 movies with rating that is not fewer than 7.0. In order to experiment, we provide Top-5, Top-10 and Top-20 recommended movies to user, and then users choose interested movies. The result of experiment is that average number of to choose interested movie are 2.1 in Top-5, 3.35 in Top-10, 6.35 in Top-20. It is better than results that are yielded by for each metadata.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

A Study on the Risk Factors for Maternal and Child Health Care Program with Emphasis on Developing the Risk Score System (모자건강관리를 위한 위험요인별 감별평점분류기준 개발에 관한 연구)

  • 이광옥
    • Journal of Korean Academy of Nursing
    • /
    • v.13 no.1
    • /
    • pp.7-21
    • /
    • 1983
  • For the flexible and rational distribution of limited existing health resources based on measurements of individual risk, the socalled Risk Approach is being proposed by the World Health Organization as a managerial tool in maternal and child health care program. This approach, in principle, puts us under the necessity of developing a technique by which we will be able to measure the degree of risk or to discriminate the future outcomes of pregnancy on the basis of prior information obtainable at prenatal care delivery settings. Numerous recent studies have focussed on the identification of relevant risk factors as the Prior infer mation and on defining the adverse outcomes of pregnancy to be dicriminated, and also have tried on how to develope scoring system of risk factors for the quantitative assessment of the factors as the determinant of pregnancy outcomes. Once the scoring system is established the technique of classifying the patients into with normal and with adverse outcomes will be easily de veloped. The scoring system should be developed to meet the following four basic requirements. 1) Easy to construct 2) Easy to use 3) To be theoretically sound 4) To be valid In searching for a feasible methodology which will meet these requirements, the author has attempted to apply the“Likelihood Method”, one of the well known principles in statistical analysis, to develop such scoring system according to the process as follows. Step 1. Classify the patients into four groups: Group $A_1$: With adverse outcomes on fetal (neonatal) side only. Group $A_2$: With adverse outcomes on maternal side only. Group $A_3$: With adverse outcome on both maternal and fetal (neonatal) sides. Group B: With normal outcomes. Step 2. Construct the marginal tabulation on the distribution of risk factors for each group. Step 3. For the calculation of risk score, take logarithmic transformation of relative proport-ions of the distribution and round them off to integers. Step 4. Test the validity of the score chart. h total of 2, 282 maternity records registered during the period of January 1, 1982-December 31, 1982 at Ewha Womans University Hospital were used for this study and the“Questionnaire for Maternity Record for Prenatal and Intrapartum High Risk Screening”developed by the Korean Institute for Population and Health was used to rearrange the information on the records into an easy analytic form. The findings of the study are summarized as follows. 1) The risk score chart constructed on the basis of“Likelihood Method”ispresented in Table 4 in the main text. 2) From the analysis of the risk score chart it was observed that a total of 24 risk factors could be identified as having significant predicting power for the discrimination of pregnancy outcomes into four groups as defined above. They are: (1) age (2) marital status (3) age at first pregnancy (4) medical insurance (5) number of pregnancies (6) history of Cesarean sections (7). number of living child (8) history of premature infants (9) history of over weighted new born (10) history of congenital anomalies (11) history of multiple pregnancies (12) history of abnormal presentation (13) history of obstetric abnormalities (14) past illness (15) hemoglobin level (16) blood pressure (17) heart status (18) general appearance (19) edema status (20) result of abdominal examination (21) cervix status (22) pelvis status (23) chief complaints (24) Reasons for examination 3) The validity of the score chart turned out to be as follows: a) Sensitivity: Group $A_1$: 0.75 Group $A_2$: 0.78 Group $A_3$: 0.92 All combined : 0.85 b) Specificity : 0.68 4) The diagnosabilities of the“score chart”for a set of hypothetical prevalence of adverse outcomes were calculated as follows (the sensitivity“for all combined”was used). Hypothetidal Prevalence : 5% 10% 20% 30% 40% 50% 60% Diagnosability : 12% 23% 40% 53% 64% 75% 80%.

  • PDF

A Regression-Model-based Method for Combining Interestingness Measures of Association Rule Mining (연관상품 추천을 위한 회귀분석모형 기반 연관 규칙 척도 결합기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.127-141
    • /
    • 2017
  • Advances in Internet technologies and the proliferation of mobile devices enabled consumers to approach a wide range of goods and services, while causing an adverse effect that they have hard time reaching their congenial items even if they devote much time to searching for them. Accordingly, businesses are using the recommender systems to provide tools for consumers to find the desired items more easily. Association Rule Mining (ARM) technology is advantageous to recommender systems in that ARM provides intuitive form of a rule with interestingness measures (support, confidence, and lift) describing the relationship between items. Given an item, its relevant items can be distinguished with the help of the measures that show the strength of relationship between items. Based on the strength, the most pertinent items can be chosen among other items and exposed to a given item's web page. However, the diversity of the measures may confuse which items are more recommendable. Given two rules, for example, one rule's support and confidence may not be concurrently superior to the other rule's. Such discrepancy of the measures in distinguishing one rule's superiority from other rules may cause difficulty in selecting proper items for recommendation. In addition, in an online environment where a web page or mobile screen can provide a limited number of recommendations that attract consumer interest, the prudent selection of items to be included in the list of recommendations is very important. The exposure of items of little interest may lead consumers to ignore the recommendations. Then, such consumers will possibly not pay attention to other forms of marketing activities. Therefore, the measures should be aligned with the probability of consumer's acceptance of recommendations. For this reason, this study proposes a model-based approach to combine those measures into one unified measure that can consistently determine the ranking of recommended items. A regression model was designed to describe how well the measures (independent variables; i.e., support, confidence, and lift) explain consumer's acceptance of recommendations (dependent variables, hit rate of recommended items). The model is intuitive to understand and easy to use in that the equation consists of the commonly used measures for ARM and can be used in the estimation of hit rates. The experiment using transaction data from one of the Korea's largest online shopping malls was conducted to show that the proposed model can improve the hit rates of recommendations. From the top of the list to 13th place, recommended items in the higher rakings from the proposed model show the higher hit rates than those from the competitive model's. The result shows that the proposed model's performance is superior to the competitive model's in online recommendation environment. In a web page, consumers are provided around ten recommendations with which the proposed model outperforms. Moreover, a mobile device cannot expose many items simultaneously due to its limited screen size. Therefore, the result shows that the newly devised recommendation technique is suitable for the mobile recommender systems. While this study has been conducted to cover the cross-selling in online shopping malls that handle merchandise, the proposed method can be expected to be applied in various situations under which association rules apply. For example, this model can be applied to medical diagnostic systems that predict candidate diseases from a patient's symptoms. To increase the efficiency of the model, additional variables will need to be considered for the elaboration of the model in future studies. For example, price can be a good candidate for an explanatory variable because it has a major impact on consumer purchase decisions. If the prices of recommended items are much higher than the items in which a consumer is interested, the consumer may hesitate to accept the recommendations.

(Image Analysis of Electrophoresis Gels by using Region Growing with Multiple Peaks) (다중 피크의 영역 성장 기법에 의한 전기영동 젤의 영상 분석)

  • 김영원;전병환
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.444-453
    • /
    • 2003
  • Recently, a great interest of bio-technology(BT) is concentrated and the image analysis technique for electrophoresis gels is highly requested to analyze genetic information or to look for some new bio-activation materials. For this purpose, the location and quantity of each band in a lane should be measured. In most of existing techniques, the approach of peak searching in a profile of a lane is used. But this peak is improper as the representative of a band, because its location does not correspond to that of the brightest pixel or the center of gravity. Also, it is improper to measure band quantity in most of these approaches because various enhancement processes are commonly applied to original images to extract peaks easily. In this paper, we adopt an approach to measure accumulated brightness as a band quantity in each band region, which Is extracted by not using any process of changing relative brightness, and the gravity center of the region is calculated as a band location. Actually, we first extract lanes with an entropy-based threshold calculated on a gel-image histogram. And then, three other methods are proposed and applied to extract bands. In the MER method, peaks and valleys are searched on a vertical search line by which each lane is bisected. And the minimum enclosing rectangle of each band is set between successive two valleys. On the other hand, in the RG-1 method, each band is extracted by using region growing with a peak as a seed, separating overlapped neighbor bands. In the RG-2 method, peaks and valleys are searched on two vertical lines by which each lane is trisected, and the left and right peaks nay be paired up if they seem to belong to the same band, and then each band region is grown up with a peak or both peaks if exist. To compare above three methods, we have measured the location and amount of bands. As a result, the average errors in band location of MER, RG-1, and RG-2 were 6%, 3%, and 1%, respectively, when the lane length is normalized to a unit value. And the average errors in band amount were 8%, 5%, and 2%, respectively, when the sum of band amount is normalized to a unit value. In conclusion, RG-2 was shown to be more reliable in the accuracy of measuring the location and amount of bands.

A Discourse Analysis Related to the Media Reform -A Case Study of Chosun Ilbo and Hankyoreb Shinmun- (언론개혁에 관련된 담론 분석 : $\ll$조선일보$\gg$$\ll$한겨레신문$\gg$을 중심으로)

  • Chung, Jae-Chorl
    • Korean journal of communication and information
    • /
    • v.17
    • /
    • pp.112-144
    • /
    • 2001
  • This study attempts to analyze how and why Chosun Ilbo and Hankyoreh Shinmun produce particular social discourses about the media reform in different ways. In doing so, this paper attempts to disclose the ideological nature of media reform discourses in social contexts. For the purpose, a content analysis method was applied to the analysis of straight news, while an interpretive discourse analysis was appled to analyze both editorials and columns in newspapers. As a theoretical framework, an articulation theory was applied to explain the relationships among social forces, ideological elements, discourse practices and subjects to produce the media reform discourses. In doing so, I attempted to understand the overall conjuncture of the media reform aspects in social contexts. The period for the analysis was limited from January 10th to August 10th this year. Newspaper articles related to the media reform were obtained from the database of newspaper articles, "KINDS," produced by Korean Press Foundation, in searching the key word, "media reform". Total articles to be analyzed were 765, 429 from Hankyoreh Sinmun and 236 from Chosun Ilbo. The research results, first of all, empirically show that both Chosun Ilbo and Hankure Synmun used straight news for their firms' interests and value judgement, in selecting and excluding events related to media reform or in exaggerating and reducing the meanings of the events, although there are differences in a greater or less degree between two newspaper companies. Accordingly, this paper argues that the monopoly of newspaper subscriber by three major newspapers in Korean society could result in the forming of one-sided social consensus about various social issues through the distorting and unequal reporting by them. Second, this paper's discourse analysis related to the media reform indicates that the discourse of ideology confrontation between the right and the left produced by Chosen Ilbo functioned as a mechanism to realize law enforcement of the right in articulating the request of media reform and the anti-communist ideology. It resulted in the discursive effect of suppressing the request of media reform by civic groups and scholars and made many people to consider the media reform as a ideological matter in Korean society.

  • PDF

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

A Study on Netwotk Effect by using System Dynamics Analysis: A Case of Cyworld (시스템 다이내믹스 기법을 이용한 네트워크 효과 분석: 싸이월드 사례)

  • Kim, Ga-Hye;Yang, Hee-Dong
    • Information Systems Review
    • /
    • v.11 no.1
    • /
    • pp.161-179
    • /
    • 2009
  • Nowadays an increasing number of Internet users are running individual websites as Blog or Cyworld. As this type of personal media has a great influence on communication among people, business comes to care about Network Effect, Network Software, and Social Network. For instance, Cyworld created the web service called 'Minihompy' for individual web-logs, and acquired 2.4milion users in 2007. Although many people assumed that the popularity of Minihompy, or Blog would be a passing fad, Cyworld has improved its service, and expanded its Network with various contents. This kind of expansion reflects survival efforts from infinite competitions among ISPs (Internet Service Provider) with focus on enhancing usability to users. However, Cyworld's Network Effect is gradually diminished in these days. Both of low production cost of service vendors and the low searching/conversing costs of users combine to make ISPs hard to keep their market share sustainable. To overcome this lackluster trend, Cyworld has adopted new strategies and try to lock their users in their service. Various efforts to improve the continuance and expansion of Network effect remain unclear and uncertain. If we understand beforehand how a service would improve Network effect, and which service could bring more effect, ISPs can get substantial help in launching their new business strategy. Regardless many diverse ideas to increase their user's duration online ISPs cannot guarantee 'how the new service strategies will end up in profitability. Therefore, this research studies about Network effect of Cyworld's 'Minihompy' using System-Dynamics method which could analyze dynamic relation between users and ISPs. Furthermore, the research aims to predict changes of Network Effect based on the strategy of new service. 'Page View' and 'Duration Time' can be enhanced for the short tenn because they enhance the service functionality. However, these services cannot increase the Network in the long-run. Limitations of this research include that we predict the future merely based on the limited data. We also limit the independent variables over Network Effect only to the following two issues: Increasing the number of users and increasing the Service Functionality. Despite of some limitations, this study perhaps gives some insights to the policy makers or others facing the stiff competition in the network business.

The Validity and Reliability of 'Computerized Neurocognitive Function Test' in the Elementary School Child (학령기 정상아동에서 '전산화 신경인지기능검사'의 타당도 및 신뢰도 분석)

  • Lee, Jong-Bum;Kim, Jin-Sung;Seo, Wan-Seok;Shin, Hyoun-Jin;Bai, Dai-Seg;Lee, Hye-Lin
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.11 no.2
    • /
    • pp.97-117
    • /
    • 2003
  • Objective: This study is to examine the validity and reliability of Computerized Neurocognitive Function Test among normal children in elementary school. Methods: K-ABC, K-PIC, and Computerized Neurocognitive Function Test were performed to the 120 body of normal children(10 of each male and female) from June, 2002 to January, 2003. Those children had over the average of intelligence and passed the rule out criteria. To verify test-retest reliability for those 30 children who were randomly selected, Computerized Neurocognitive Function Test was carried out again 4 weeks later. Results: As a results of correlation analysis for validity test, four of continues performance tests matched with those on adults. In the memory tests, results presented the same as previous research with a difference between forward test and backward test in short-term memory. In higher cognitive function tests, tests were consist of those with different purpose respectively. After performing factor analysis on 43 variables out of 12 tests, 10 factors were raised and the total percent of variance was 75.5%. The reasons were such as: 'sustained attention, information processing speed, vigilance, verbal learning, allocation of attention and concept formation, flexibility, concept formation, visual learning, short-term memory, and selective attention' in order. In correlation with K-ABC to prepare explanatory criteria, selectively significant correlation(p<.0.5-001) was found in subscale of K-ABC. In the test-retest reliability test, the results reflecting practice effect were found and prominent especially in higher cognitive function tests. However, split-half reliability(r=0.548-0.7726, p<.05) and internal consistency(0.628-0.878, p<.05) of each examined group were significantly high. Conclusion: The performance of Computerized Neurocognitive Function Test in normal children represented differ developmental character than that in adult. And basal information for preparing the explanatory criteria could be acquired by searching for the relation with standardized intelligence test which contains neuropsycological background.

  • PDF

Construction of Web-Based Database for Anisakis Research (고래회충 연구를 위한 웹기반 데이터베이스 구축)

  • Lee, Yong-Seok;Baek, Moon-Ki;Jo, Yong-Hun;Kang, Se-Won;Lee, Jae-Bong;Han, Yeon-Soo;Cha, Hee-Jae;Yu, Hak-Sun;Ock, Mee-Sun
    • Journal of Life Science
    • /
    • v.20 no.3
    • /
    • pp.411-415
    • /
    • 2010
  • Anisakis simplex is one of the parasitic nematodes, and has a complex life cycle in crustaceans, fish, squid or whale. When people eat under-processed or raw fish, it causes anisakidosis and also plays a critical role in inducing serious allergic reactions in humans. However, no web-based database on A. simplex at the level of DNA or protein has been so far reported. In this context, we constructed a web-based database for Anisakis research. To build up the web-based database for Anisakis research, we proceeded with the following measures: First, sequences of order Ascaridida were downloaded and translated into the multifasta format which was stored as database for stand-alone BLAST. Second, all of the nucleotide and EST sequences were clustered and assembled. And EST sequences were translated into amino acid sequences for Nuclear Localization Signal prediction. In addition, we added the vector, E. coli, and repeat sequences into the database to confirm a potential contamination. The web-based database gave us several advantages. Only data that agrees with the nucleotide sequences directly related with the order Ascaridida can be found and retrieved when searching BLAST. It is also very convenient to confirm contamination when making the cDNA or genomic library from Anisakis. Furthermore, BLAST results on the Anisakis sequence information can be quickly accessed. Taken together, the Web-based database on A. simplex will be valuable in developing species specific PCR markers and in studying SNP in A. simplex-related researches in the future.