• Title/Summary/Keyword: electronic paper

Search Result 20,046, Processing Time 0.049 seconds

Improved Method of License Plate Detection and Recognition using Synthetic Number Plate (인조 번호판을 이용한 자동차 번호인식 성능 향상 기법)

  • Chang, Il-Sik;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.26 no.4
    • /
    • pp.453-462
    • /
    • 2021
  • A lot of license plate data is required for car number recognition. License plate data needs to be balanced from past license plates to the latest license plates. However, it is difficult to obtain data from the actual past license plate to the latest ones. In order to solve this problem, a license plate recognition study through deep learning is being conducted by creating a synthetic license plates. Since the synthetic data have differences from real data, and various data augmentation techniques are used to solve these problems. Existing data augmentation simply used methods such as brightness, rotation, affine transformation, blur, and noise. In this paper, we apply a style transformation method that transforms synthetic data into real-world data styles with data augmentation methods. In addition, real license plate data are noisy when it is captured from a distance and under the dark environment. If we simply recognize characters with input data, chances of misrecognition are high. To improve character recognition, in this paper, we applied the DeblurGANv2 method as a quality improvement method for character recognition, increasing the accuracy of license plate recognition. The method of deep learning for license plate detection and license plate number recognition used YOLO-V5. To determine the performance of the synthetic license plate data, we construct a test set by collecting our own secured license plates. License plate detection without style conversion recorded 0.614 mAP. As a result of applying the style transformation, we confirm that the license plate detection performance was improved by recording 0.679mAP. In addition, the successul detection rate without image enhancement was 0.872, and the detection rate was 0.915 after image enhancement, confirming that the performance improved.

Function and Use Evaluation of 'Classification & Disposal Schedule Management' in the Standard Records Management System (표준 기록관리시스템의 '기준관리' 기능 및 이용 평가)

  • Chung, Sang-hee
    • The Korean Journal of Archival Studies
    • /
    • no.37
    • /
    • pp.189-237
    • /
    • 2013
  • Since central governments began to establish and use the Standard Records Management System(RMS) in 2007, more and more local governments and other public organizations have constructed RMS. RMS is the essential tool for records management in electronic environments, but it is not known how well the functions of RMS reflect standards and practice related records management or how many records managers use RMS in performing their works. This paper deals with analyzing the evaluation of 'classification & disposal schedule management' function in RMS. 'Classification & disposal schedule management' function has 4 subfunctions of review of classification & preservation period, management of the schedule items, assignment of classification scheme and reclassification. Classification and disposal schedule is at the heart of intellectual control of records and core area of records management. So it is important to analyze whether this function plays well a role in RMS or not. This research carried out evaluation of function and use about classification & disposal schedule management in RMS. Functional evaluation is to compare and analyze how well RMS meets the functional requirements which home and foreign standards give. Use evaluation is to investigate how records managers use RMS in accomplishing their task of managing classification & disposal schedule and to look into what is the problem with the use. This paper could get the implications through the survey of records managers who are working at central governments, regional local governments and basic local governments. And these implications are considered in institutional, functional, use and administrative aspect. It is important to communicate with stakeholders so that 'classification & disposal schedule management' function, further, all functions of the RMS in practice of records management could be used smoothly. Users of RMS have to raise demands or call for technical solutions of the problems which come up in use, while RMS developers and administrators must make more of an effort to satisfy their demands, reflect them on the RMS and enhance the system.

Analysis of Fire Occurrence Characteristics According to Ignition Heat Sources (발화열원에 따른 화재발생 특성 분석)

  • Lee, Kyung-Su;Kim, Tae-Hyeung;Lee, Jae-Ou
    • Journal of the Society of Disaster Information
    • /
    • v.18 no.2
    • /
    • pp.280-289
    • /
    • 2022
  • Purpose: In this study, the characteristics of fire occurrence according to ignition heat sources such as operating equipment, cigarette/lighter fire, and flame/fire were analyzed. Method: One-way ANOVA and cross-analysis were used to analyze the characteristics of fire occurrence by verifying the difference between the ignition environment, fire damage status and scale, and cause of ignition according to the ignition heat source. Result: The fire occurrence characteristics were analyzed through As a result of the analysis, it was found that fires caused by operating devices occurred more frequently on weekdays than other ignition heat sources, and the number of victims and the number of victims were the highest, so mobilization of firefighting power and property damage were the greatest. The initial ignition was generated by electric and electronic devices, and the combustion was expanded by the synthetic resin. For fires caused by cigarette and lighter fires, the most fires occurred on Saturdays and Sundays, and the mobilization of the police force was more characteristic than the mobilization of the firefighting force. In particular, it was found that the initial ignition and combustion expansion were caused by paper, wood, and hay. Fires caused by sparks and sparks occurred most frequently on Saturdays and Sundays, and initial ignition and combustion expansion were found to be caused by paper, wood, and hay. In particular, it showed the characteristic that it occurred in the place farthest from the fire station. The common characteristic of all ignition heat sources was that the fire occurred most frequently in the afternoon time, and the fire type was predominantly the building structure fire, and only the ignition point was burned the most. Conclusion: In order to prevent fire and minimize damage, it is necessary to analyze the tendency of fire occurrence and to prepare appropriate preparations according to the fire occurrence factors. In order to analyze the characteristics of fire occurrence using public data in the future, it is necessary to standardize disaster data and to open and activate data.

Records Management and Archives in Korea : Its Development and Prospects (한국 기록관리행정의 변천과 전망)

  • Nam, Hyo-Chai
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.1 no.1
    • /
    • pp.19-35
    • /
    • 2001
  • After almost one century of discontinuity in the archival tradition of Chosun dynasty, Korea entered the new age of records and archival management by legislating and executing the basic laws (The Records and Archives Management of Public Agencies Ad of 1999). Annals of Chosun dynasty recorded major historical facts of the five hundred years of national affairs. The Annals are major accomplishment in human history and rare in the world. It was possible because the Annals were composed of collected, selected and complied records of primary sources written and compiled by generations of historians, As important public records are needed to be preserved in original forms in modern archives, we had to develop and establish a modern archival system to appraise and select important national records for archival preservation. However, the colonialization of Korea deprived us of the opportunity to do the task, and our fine archival tradition was not succeeded. A centralized archival system began to develop since the establishment of GARS under the Ministry of Government Administration in 1969. GARS built a modem repository in Pusan in 1984 succeeding to the tradition of History Archives of Chosun dynasty. In 1998, GARS moved its headquarter to Taejon Government Complex and acquired state-of-the-art audio visual archives preservation facilities. From 1996, GARS introduced an automated archival management system to remedy the manual registration and management system complementing the preservation microfilming. Digitization of the holdings was the key project to provided the digital images of archives to users. To do this, the GARS purchased new computer/server systems and developed application softwares. Parallel to this direction, GARS drastically renovated its manpower composition toward a high level of professionalization by recruiting more archivists with historical and library science backgrounds. Conservators and computer system operators were also recruited. The new archival laws has been in effect from January 1, 2000. The new laws made following new changes in the field of records and archival administration in Korea. First, the laws regulate the records and archives of all public agencies including the Legislature, the Judiciary, the Administration, the constitutional institutions, Army, Navy, Air Force, and National Intelligence Service. A nation-wide unified records and archives management system became available. Second, public archives and records centers are to be established according to the level of the agency; a central archives at national level, special archives for the National Assembly and the Judiciary, local government archives for metropolitan cities and provinces, records center or special records center for administrative agencies. A records manager will be responsible for the records management of each administrative divisions. Third, the records in the public agencies are registered in the computer system as they are produced. Therefore, the records are traceable and will be searched or retrieved easily through internet or computer network. Fourth, qualified records managers and archivists who are professionally trained in the field of records management and archival science will be assigned mandatorily to guarantee the professional management of records and archives. Fifth, the illegal treatment of public records and archives constitutes a punishable crime. In the future, the public records find archival management will develop along with Korean government's 'Electronic Government Project.' Following changes are in prospect. First, public agencies will digitize paper records, audio-visual records, and publications as well as electronic documents, thus promoting administrative efficiency and productivity. Second, the National Assembly already established its Special Archives. The judiciary and the National Intelligence Service will follow it. More archives will be established at city and provincial levels. Third, the more our society develop into a knowledge-based information society, the more the records management function will become one of the important national government functions. As more universities, academic associations, and civil societies participate in promoting archival awareness and in establishing archival science, and more people realize the importance of the records and archives management up to the level of national public campaign, the records and archival management in Korea will develop significantly distinguishable from present practice.

How Enduring Product Involvement and Perceived Risk Affect Consumers' Online Merchant Selection Process: The 'Required Trust Level' Perspective (지속적 관여도 및 인지된 위험이 소비자의 온라인 상인선택 프로세스에 미치는 영향에 관한 연구: 요구신뢰 수준 개념을 중심으로)

  • Hong, Il-Yoo B.;Lee, Jung-Min;Cho, Hwi-Hyung
    • Asia pacific journal of information systems
    • /
    • v.22 no.1
    • /
    • pp.29-52
    • /
    • 2012
  • Consumers differ in the way they make a purchase. An audio mania would willingly make a bold, yet serious, decision to buy a top-of-the-line home theater system, while he is not interested in replacing his two-decade-old shabby car. On the contrary, an automobile enthusiast wouldn't mind spending forty thousand dollars to buy a new Jaguar convertible, yet cares little about his junky component system. It is product involvement that helps us explain such differences among individuals in the purchase style. Product involvement refers to the extent to which a product is perceived to be important to a consumer (Zaichkowsky, 2001). Product involvement is an important factor that strongly influences consumer's purchase decision-making process, and thus has been of prime interest to consumer behavior researchers. Furthermore, researchers found that involvement is closely related to perceived risk (Dholakia, 2001). While abundant research exists addressing how product involvement relates to overall perceived risk, little attention has been paid to the relationship between involvement and different types of perceived risk in an electronic commerce setting. Given that perceived risk can be a substantial barrier to the online purchase (Jarvenpaa, 2000), research addressing such an issue will offer useful implications on what specific types of perceived risk an online firm should focus on mitigating if it is to increase sales to a fullest potential. Meanwhile, past research has focused on such consumer responses as information search and dissemination as a consequence of involvement, neglecting other behavioral responses like online merchant selection. For one example, will a consumer seriously considering the purchase of a pricey Guzzi bag perceive a great degree of risk associated with online buying and therefore choose to buy it from a digital storefront rather than from an online marketplace to mitigate risk? Will a consumer require greater trust on the part of the online merchant when the perceived risk of online buying is rather high? We intend to find answers to these research questions through an empirical study. This paper explores the impact of enduring product involvement and perceived risks on required trust level, and further on online merchant choice. For the purpose of the research, five types or components of perceived risk are taken into consideration, including financial, performance, delivery, psychological, and social risks. A research model has been built around the constructs under consideration, and 12 hypotheses have been developed based on the research model to examine the relationships between enduring involvement and five components of perceived risk, between five components of perceived risk and required trust level, between enduring involvement and required trust level, and finally between required trust level and preference toward an e-tailer. To attain our research objectives, we conducted an empirical analysis consisting of two phases of data collection: a pilot test and main survey. The pilot test was conducted using 25 college students to ensure that the questionnaire items are clear and straightforward. Then the main survey was conducted using 295 college students at a major university for nine days between December 13, 2010 and December 21, 2010. The measures employed to test the model included eight constructs: (1) enduring involvement, (2) financial risk, (3) performance risk, (4) delivery risk, (5) psychological risk, (6) social risk, (7) required trust level, (8) preference toward an e-tailer. The statistical package, SPSS 17.0, was used to test the internal consistency among the items within the individual measures. Based on the Cronbach's ${\alpha}$ coefficients of the individual measure, the reliability of all the variables is supported. Meanwhile, the Amos 18.0 package was employed to perform a confirmatory factor analysis designed to assess the unidimensionality of the measures. The goodness of fit for the measurement model was satisfied. Unidimensionality was tested using convergent, discriminant, and nomological validity. The statistical evidences proved that the three types of validity were all satisfied. Now the structured equation modeling technique was used to analyze the individual paths along the relationships among the research constructs. The results indicated that enduring involvement has significant positive relationships with all the five components of perceived risk, while only performance risk is significantly related to trust level required by consumers for purchase. It can be inferred from the findings that product performance problems are mostly likely to occur when a merchant behaves in an opportunistic manner. Positive relationships were also found between involvement and required trust level and between required trust level and online merchant choice. Enduring involvement is concerned with the pleasure a consumer derives from a product class and/or with the desire for knowledge for the product class, and thus is likely to motivate the consumer to look for ways of mitigating perceived risk by requiring a higher level of trust on the part of the online merchant. Likewise, a consumer requiring a high level of trust on the merchant will choose a digital storefront rather than an e-marketplace, since a digital storefront is believed to be trustworthier than an e-marketplace, as it fulfills orders by itself rather than acting as an intermediary. The findings of the present research provide both academic and practical implications. The first academic implication is that enduring product involvement is a strong motivator of consumer responses, especially the selection of a merchant, in the context of electronic shopping. Secondly, academicians are advised to pay attention to the finding that an individual component or type of perceived risk can be used as an important research construct, since it would allow one to pinpoint the specific types of risk that are influenced by antecedents or that influence consequents. Meanwhile, our research provides implications useful for online merchants (both online storefronts and e-marketplaces). Merchants may develop strategies to attract consumers by managing perceived performance risk involved in purchase decisions, since it was found to have significant positive relationship with the level of trust required by a consumer on the part of the merchant. One way to manage performance risk would be to thoroughly examine the product before shipping to ensure that it has no deficiencies or flaws. Secondly, digital storefronts are advised to focus on symbolic goods (e.g., cars, cell phones, fashion outfits, and handbags) in which consumers are relatively more involved than others, whereas e- marketplaces should put their emphasis on non-symbolic goods (e.g., drinks, books, MP3 players, and bike accessories).

  • PDF

The Effects of Environmental Dynamism on Supply Chain Commitment in the High-tech Industry: The Roles of Flexibility and Dependence (첨단산업의 환경동태성이 공급체인의 결속에 미치는 영향: 유연성과 의존성의 역할)

  • Kim, Sang-Deok;Ji, Seong-Goo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.2
    • /
    • pp.31-54
    • /
    • 2007
  • The exchange between buyers and sellers in the industrial market is changing from short-term to long-term relationships. Long-term relationships are governed mainly by formal contracts or informal agreements, but many scholars are now asserting that controlling relationship by using formal contracts under environmental dynamism is inappropriate. In this case, partners will depend on each other's flexibility or interdependence. The former, flexibility, provides a general frame of reference, order, and standards against which to guide and assess appropriate behavior in dynamic and ambiguous situations, thus motivating the value-oriented performance goals shared between partners. It is based on social sacrifices, which can potentially minimize any opportunistic behaviors. The later, interdependence, means that each firm possesses a high level of dependence in an dynamic channel relationship. When interdependence is high in magnitude and symmetric, each firm enjoys a high level of power and the bonds between the firms should be reasonably strong. Strong shared power is likely to promote commitment because of the common interests, attention, and support found in such channel relationships. This study deals with environmental dynamism in high-tech industry. Firms in the high-tech industry regard it as a key success factor to successfully cope with environmental changes. However, due to the lack of studies dealing with environmental dynamism and supply chain commitment in the high-tech industry, it is very difficult to find effective strategies to cope with them. This paper presents the results of an empirical study on the relationship between environmental dynamism and supply chain commitment in the high-tech industry. We examined the effects of consumer, competitor, and technological dynamism on supply chain commitment. Additionally, we examined the moderating effects of flexibility and dependence of supply chains. This study was confined to the type of high-tech industry which has the characteristics of rapid technology change and short product lifecycle. Flexibility among the firms of this industry, having the characteristic of hard and fast growth, is more important here than among any other industry. Thus, a variety of environmental dynamism can affect a supply chain relationship. The industries targeted industries were electronic parts, metal product, computer, electric machine, automobile, and medical precision manufacturing industries. Data was collected as follows. During the survey, the researchers managed to obtain the list of parts suppliers of 2 companies, N and L, with an international competitiveness in the mobile phone manufacturing industry; and of the suppliers in a business relationship with S company, a semiconductor manufacturing company. They were asked to respond to the survey via telephone and e-mail. During the two month period of February-April 2006, we were able to collect data from 44 companies. The respondents were restricted to direct dealing authorities and subcontractor company (the supplier) staff with at least three months of dealing experience with a manufacture (an industrial material buyer). The measurement validation procedures included scale reliability; discriminant and convergent validity were used to validate measures. Also, the reliability measurements traditionally employed, such as the Cronbach's alpha, were used. All the reliabilities were greater than.70. A series of exploratory factor analyses was conducted. We conducted confirmatory factor analyses to assess the validity of our measurements. A series of chi-square difference tests were conducted so that the discriminant validity could be ensured. For each pair, we estimated two models-an unconstrained model and a constrained model-and compared the two model fits. All these tests supported discriminant validity. Also, all items loaded significantly on their respective constructs, providing support for convergent validity. We then examined composite reliability and average variance extracted (AVE). The composite reliability of each construct was greater than.70. The AVE of each construct was greater than.50. According to the multiple regression analysis, customer dynamism had a negative effect and competitor dynamism had a positive effect on a supplier's commitment. In addition, flexibility and dependence had significant moderating effects on customer and competitor dynamism. On the other hand, all hypotheses about technological dynamism had no significant effects on commitment. In other words, technological dynamism had no direct effect on supplier's commitment and was not moderated by the flexibility and dependence of the supply chain. This study makes its contribution in the point of view that this is a rare study on environmental dynamism and supply chain commitment in the field of high-tech industry. Especially, this study verified the effects of three sectors of environmental dynamism on supplier's commitment. Also, it empirically tested how the effects were moderated by flexibility and dependence. The results showed that flexibility and interdependence had a role to strengthen supplier's commitment under environmental dynamism in high-tech industry. Thus relationship managers in high-tech industry should make supply chain relationship flexible and interdependent. The limitations of the study are as follows; First, about the research setting, the study was conducted with high-tech industry, in which the direction of the change in the power balance of supply chain dyads is usually determined by manufacturers. So we have a difficulty with generalization. We need to control the power structure between partners in a future study. Secondly, about flexibility, we treated it throughout the paper as positive, but it can also be negative, i.e. violating an agreement or moving, but in the wrong direction, etc. Therefore we need to investigate the multi-dimensionality of flexibility in future research.

  • PDF

Oxide perovskite crystals type ABCO4:application and growth

  • Pajaczkowska, A.
    • Proceedings of the Korea Association of Crystal Growth Conference
    • /
    • 1996.06a
    • /
    • pp.258-292
    • /
    • 1996
  • In the last year great interest appears to YBCO thin films preparation on different substrate materials. Preparation of epitaxial film is a very difficult problem. There are many requirements to substrate materials that must be fullfilled. Main problems are lattice mismatch (misfit) and similarity of structure. From paper [1] or follows that difference in interatomic distances and angles of substrate and film is mire important problem than similarity of structure. In this work we present interatomic distances and angle relations between substrate materials belonging to ABCO4 group (where A-Sr or Ca, B-rare earth element, C-Al or Ga) of different orientations and YBCO thin films. There are many materials used as substrates for HTsC thin films. ABCO4 group of compounds is characterized by small dielectric constants (it is necessary for microwave applications of HTsC films), absence of twins and small misfit [2]. There most interesting compounds CaNdAlO4, SrLaAlO4 and SrLaGaO4 were investigated. All these compounds are of pseudo-perovskite structure with space group 14/mmm. This structure is very similar to structure of YBCO. SLG substrate has the lowest misfit (0.3%) and dielectric constant. For preparation of then films of substrates of this group of compound plane of <100> orientation are mainly used. Good quality films of <001> orientations are obtained [3]. In this case not only a-a misfit play role, but c-3b misfit is very important too. Sometimes, for preparation of thin films substrates of <001> and <110> orientations were manufactured [3]. Different misfits for different YBCO faces have been analyzed. It has been found that the mismatching factor for (100) face is very similar to that for (001) face so there is possibility of preparation of thin films on both orientations. SrLaAlO4(SLA) and SrLaGaO4(SLG) crystals of general formula ABCO4 have been grown by the Czochralski method. The quality of SLA and SLG crystals strongly depends on axial gradient of temperature and growth and rotation rates. High quality crystals were obtained at axial gradient of temperature near crystal-melt interface lower than 50℃/cm, growth rate 1-3 mm/h and the rotation rate changing from 10-20pm[4]. Strong anisotropy in morphology of SLA and SLG single crystals grown by the Czochralski method is clearly visible. On the basics of our considerations for ABCO4 type of the tetragonal crystals there can appear {001}, {101}, and {110} faces for ionic type model [5]. Morphology of these crystals depend on ionic-covalent character of bonding and crystal growth parameters. Point defects are observed in crystals and they are reflected in color changes (colorless, yellow, green). Point defects are detected in directions perpendicular to oxide planes and are connected with instability of oxygen position in lattice. To investigate facets formations crystals were doped with Cr3+, Er3+, Pr3+, Ba2+. Chromium greater size ion which is substituted for Al3+ clearly induces faceting. There appear easy {110} faces and SLA crystals crack even then the amount of Cr is below 0.3at.% SLG single crystals are not so sensitive to the content of chromium ions. It was also found that if {110} face appears at the beginning of growth process the crystal changes its color on the plane {110} but it happens only on the shoulder part. The projection of {110} face has a great amount of oxygen positions which can be easy defected. Pure and doped SLA and SLG crystals measured by EPR in the<110> direction show more intensive lines than in other directions which allows to suggest that the amount of oxygen defects on the {110} plane is higher. In order to find the origin of colors and their relation with the crystal stability, a set of SLA and SLG crystals were investigated using optical spectroscopy. The colored samples exhibit an absorption band stretching from the UV absorption edge of the crystal, from about 240 nm to about 550 m. In the case of colorless sample, the absorption spectrum consists of a relatively weak band in the UV region. The spectral position and intensities of absorption bands of SLA are typical for imperfection similar to color centers which may be created in most of oxide crystals by UV and X-radiation. It is pointed out that crystal growth process of polycomponent oxide crystals by Czochralski method depends on the preparation of melt and its stoichiometry, orientation of seed, gradient of temperature at crystal-melt interface, parameters of growth (rotation and pulling rate) and control of red-ox atmosphere during seeding and growth (rotation and pulling rate) and control of red-ox atmosphere during seeding and growth. Growth parameters have an influence on the morphology of crystal-melt interface, type and concentration of defects.

  • PDF

A Study On Design of ZigBee Chip Communication Module for Remote Radiation Measurement (원격 방사선 측정을 위한 ZigBee 원칩형 통신 모듈 설계에 대한 연구)

  • Lee, Joo-Hyun;Lee, Seung-Ho
    • Journal of IKEEE
    • /
    • v.18 no.4
    • /
    • pp.552-558
    • /
    • 2014
  • This paper suggests how to design a ZigBee-chip-based communication module to remotely measure radiation level. The suggested communication module consists of two control processors for the chip as generally required to configure a ZigBee system, and one chip module to configure a ZigBee RF device. The ZigBee-chip-based communication module for remote radiation measurement consists of a wireless communication controller; sensor and high-voltage generator; charger and power supply circuit; wired communication part; and RF circuit and antenna. The wireless communication controller is to control wireless communication for ZigBee and to measure radiation level remotely. The sensor and high-voltage generator generates 500 V in two consecutive series to amplify and filter pulses of radiation detected by G-M Tube. The charger and power supply circuit part is to charge lithium-ion battery and supply power to one-chip processors. The wired communication part serves as a RS-485/422 interface to enable USB interface and wired remote communication for interfacing with PC and debugging. RF circuit and antenna applies an RLC passive component for chip antenna to configure BALUN and antenna impedance matching circuit, allowing wireless communication. After configuring the ZigBee-chip-based communication module, tests were conducted to measure radiation level remotely: data were successfully transmitted in 10-meter and 100-meter distances, measuring radiation level in a remote condition. The communication module allows an environment where radiation level can be remotely measured in an economically beneficial way as it not only consumes less electricity but also costs less. By securing linearity of a radiation measuring device and by minimizing the device itself, it is possible to set up an environment where radiation can be measured in a reliable manner, and radiation level is monitored real-time.

Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability (오디세우스/Parallel-OOSQL: 오디세우스 정보검색용 밀결합 DBMS를 사용한 병렬 정보 검색 엔진)

  • Ryu, Jae-Joon;Whang, Kyu-Young;Lee, Jae-Gil;Kwon, Hyuk-Yoon;Kim, Yi-Reun;Heo, Jun-Suk;Lee, Ki-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.412-429
    • /
    • 2008
  • As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBMS tightly coupled with information retrieval capability. We first introduce the architecture of the parallel search engine-Odysseus/parallel-OOSQL. We then show the effectiveness of the proposed system through systematic experiments. The experimental results show that the query processing time of the document-identifier based partitioning method is approximately inversely proportional to the number of blocks in the partition of the inverted index. The results also show that the keyword-identifier based partitioning method has good performance in top-k query processing. The proposed parallel search engine can be optimized for performance by customizing the methods of partitioning the inverted index according to the application environment. The Odysseus/parallel OOSQL parallel search engine is capable of indexing, storing, and querying 100 million web documents per node or tens of billions of web documents for the entire system.

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • v.18 no.1
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.