1. Introduction
Imagine yourself standing in front of an exquisite bu et lled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximizethe overall pleasure of your palate?
Although much less pleasurable and subjective, that is the type of problem that query optimizers are called to solve. Given a query, there are many plans that a database management system (DBMS) can follow to process it and produce its answer. All plans are equivalent in terms of their nal output but vary in their cost, i.e., the amount of time that they need to run. What is the plan that needs the least amount of time?
Such query optimization is absolutely necessary in a DBMS. The cost di erence between two alternatives can be enormous. For example, consider the following database schema, which will be Partially supported by the National Science Foundation under Grants IRI-9113736 and IRI-9157368 (PYI Award) and by grants from DEC, IBM, HP, AT&T, Informix, and Oracle.
used throughout this chapter:
emp(name,age,sal,dno)
dept(dno,dname, oor,budget,mgr,ano)
acnt(ano,type,balance,bno)
bank(bno,bname,address)
Further, consider the following very simple SQL query:
select name, floor
from emp, dept
where emp.dno=dept.dno and sal>100K.
Assume the characteristics below for the database contents, structure, and run-time environment:
Consider the following three di erent plans:
P1 Through the B+-tree nd all tuples of emp that satisfy the selection on emp.sal. For each one, use the hashing index to nd the corresponding dept tuples. (Nested loops, using the index on both relations.) P2 For each dept page, scan the entire emp relation. If an emp tuple agrees on the dno attribute with a tuple on the dept page and satis es the selection on emp.sal, then the emp-dept tuple pair appears in the result. (Page-level nested loops, using no index.)
P3 For each dept tuple, scan the entire emp relation and store all emp-dept tuple pairs.
Then, scan this set of pairs and, for each one, check if it has the same values in the two dno attributes and satis es the selection on emp.sal. (Tuple-level formation of the cross product, with subsequent scan to test the join and the selection.) Calculating the expected I/O costs of these three plans shows the tremendous di erence in e ciency that equivalent plans may have. P1 needs 0.32 seconds, P2 needs a bit more than an hour, and P3 needs more than a whole day. Without query optimization, a system may choose plan P2 or P3 to execute this query with devastating results. Query optimizers, however, examine \all" alternatives, so they should have no trouble choosing P1 to process the query.
The path that a query traverses through a DBMS until its answer is generated is shown in Figure 1. The system modules through which it moves have the following functionality:
- The Query Parser checks the validity of the query and then translates it into an internal form, usually a relational calculus expression or something equivalent.
- The Query Optimizer examines all algebraic expressions that are equivalent to the given query and chooses the one that is estimated to be the cheapest.
- The Code Generator or the Interpreter transforms the access plan generated by the optimizer into calls to the query processor.
- The Query Processor actually executes the query.
Queries are posed to a DBMS by interactive users or by programs written in general-purpose programming languages (e.g., C/C++, Fortran, PL-1) that have queries embedded in them. An interactive (ad hoc) query goes through the entire path shown in Figure 1. On the other hand, an embedded query goes through the rst three steps only once, when the program in which it is em-
bedded is compiled (compile time). The code produced by the Code Generator is stored in the database and is simply invoked and executed by the Query Processor whenever control reaches that query during the program execution (run time). Thus, independent of the number of times an embedded query needs to be executed, optimization is not repeated until database updates make the access plan invalid (e.g., index deletion) or highly suboptimal (e.g., extensive changes in database contents). There is no real di erence between optimizing interactive or embedded queries, so we make no distinction between the two in this chapter.
The area of query optimization is very large within the database eld. It has been studied in a great variety of contexts and from many di erent angles, giving rise to several diverse solutions in each case. The purpose of this chapter is to primarily discuss the core problems in query optimization and their solutions, and only touch upon the wealth of results that exist beyond that. More speci cally, we concentrate on optimizing a single at SQL query with `and' as the only boolean connective in its quali cation (also known as conjunctive query, select-project-join query, or nonrecursive Horn clause) in a centralized relational DBMS, assuming that full knowledge of the run-time environment exists at compile time. Likewise, we make no attempt to provide a complete survey of the literature, in most cases providing only a few example references. More extensive surveys can be found elsewhere [JK84, MCS88].
The rest of the chapter is organized as follows. Section 2 presents a modular architecture for a query optimizer and describes the role of each module in it. Section 3 analyzes the choices that exist in the shapes of relational query access plans, and the restrictions usually imposed by current optimizers to make the whole process more manageable. Section 4 focuses on the dynamic programming search strategy used by commercial query optimizers and brie y describes alternative strategies that have been proposed. Section 5 de nes the problem of estimating the sizes of query results and/or the frequency distributions of values in them, and describes in detail histograms, which represent the statistical information typically used by systems to derive such estimates.
Section 6 discusses query optimization in non-centralized environments, i.e., parallel and distributed DBMSs. Section 7 brie y touches upon several advanced types of query optimization that have been proposed to solve some hard problems in the area. Finally, Section 8 summarizes the chapter and raises some questions related to query optimization that still have no good answer.
2 Query Optimizer Architecture
2.1 Overall Architecture
In this section, we provide an abstraction of the query optimization process in a DBMS. Given a database and a query on it, several execution plans exist that can be employed to answer the query.
In principle, all the alternatives need to be considered so that the one with the best estimated performance is chosen. An abstraction of the process of generating and testing these alternatives
is shown in Figure 2, which is essentially a modular architecture of a query optimizer. Although one could build an optimizer based on this architecture, in real systems, the modules shown do not always have so clear-cut boundaries as in Figure 2. Based on Figure 2, the entire query optimization
process can be seen as having two stages: rewriting and planning. There is only one module in the rst stage, the Rewriter, whereas all other modules are in the second stage. The functionality of each of the modules in Figure 2 is analyzed below.
2.2 Module Functionality
Rewriter: This module applies transformations to a given query and produces equivalent queries
that are hopefully more e cient, e.g., replacement of views with their de nition, attening out of
nested queries, etc. The transformations performed by the Rewriter depend only on the declarative, i.e., static, characteristics of queries and do not take into account the actual query costs for the speci c DBMS and database concerned. If the rewriting is known or assumed to always be bene cial, the original query is discarded; otherwise, it is sent to the next stage as well. By the nature of the rewriting transformations, this stage operates at the declarative level.
Planner: This is the main module of the ordering stage. It examines all possible execution plans for each query produced in the previous stage and selects the overall cheapest one to be used to generate the answer of the original query. It employs a search strategy, which examines the space of execution plans in a particular fashion. This space is determined by two other modules of the optimizer, the Algebraic Space and the Method-Structure Space. For the most part, these two modules and the search strategy determine the cost, i.e., running time, of the optimizer itself, which should be as low as possible. The execution plans examined by the Planner are compared based on estimates of their cost so that the cheapest may be chosen. These costs are derived by the last two modules of the optimizer, the Cost Model and the Size-Distribution Estimator.
Algebraic Space: This module determines the action execution orders that are to be considered
by the Planner for each query sent to it. All such series of actions produce the same query answer, but usually di er in performance. They are usually represented in relational algebra as formulas or in tree form. Because of the algorithmic nature of the objects generated by this module and sent to the Planner, the overall planning stage is characterized as operating at the procedural level.
Method-Structure Space: This module determines the implementation choices that exist for the execution of each ordered series of actions speci ed by the Algebraic Space. This choice is related to the available join methods for each join (e.g., nested loops, merge scan, and hash join), if supporting data structures are built on the y, if/when duplicates are eliminated, and other implementation characteristics of this sort, which are predetermined by the DBMS implementation.
This choice is also related to the available indices for accessing each relation, which is determined by the physical schema of each database stored in its catalogs. Given an algebraic formula or tree from the Algebraic Space, this module produces all corresponding complete execution plans, which specify the implementation of each algebraic operator and the use of any indices. Cost Model: This module speci es the arithmetic formulas that are used to estimate the cost of execution plans. For every di erent join method, for every di erent index type access, and in general for every distinct kind of step that can be found in an execution plan, there is a formula that gives its cost. Given the complexity of many of these steps, most of these formulas are simple approximations of what the system actually does and are based on certain assumptions regarding issues like bu er management, disk-cpu overlap, sequential vs. random I/O, etc. The most impor- tant input parameters to a formula are the size of the bu er pool used by the corresponding step, the sizes of relations or indices accessed, and possibly various distributions of values in these rela- tions. While the rst one is determined by the DBMS for each query, the other two are estimated by the Size-Distribution Estimator.
Size-Distribution Estimator: This module speci es how the sizes (and possibly frequency dis- tributions of attribute values) of database relations and indices as well as (sub)query results are
estimated. As mentioned above, these estimates are needed by the Cost Model. The speci c es- timation approach adopted in this module also determines the form of statistics that need to be maintained in the catalogs of each database, if any.
2.3 Description Focus
Of the six modules of Figure 2, three are not discussed in any detail in this chapter: the Rewriter,
the Method-Structure Space, and the Cost Model. The Rewriter is a module that exists in some commercial DBMSs (e.g., DB2-Client/Server and Illustra), although not in all of them. Most of the transformations normally performed by this module are considered an advanced form of query optimization, and not part of the core (planning) process. The Method-Structure Space speci es alternatives regarding join methods, indices, etc., which are based on decisions made outside the development of the query optimizer and do not really a ect much of the rest of it. For the Cost Model, for each alternative join method, index access, etc., o ered by the Method-Structure Space,either there is a standard straightforward formula that people have devised by simple accounting of the corresponding actions (e.g., the formula for tuple-level nested loops join) or there are numerous variations of formulas that people have proposed and used to approximate these actions (e.g., formulas for nding the tuples in a relation having a random value in an attribute). In either case, the derivation of these formulas is not considered an intrinsic part of the query optimization eld. For these reasons, we do not discuss these three modules any further until Section 7, where some Rewriter transformations are described. The following three sections provide a detailed description of the Algebraic Space, the Planner, and the Size-Distribution Estimator modules, respectively.
3 Algebraic Space
As mentioned above, a at SQL query corresponds to a select-project-join query in relational algebra. Typically, such an algebraic query is represented by a query tree whose leaves are database relations and non-leaf nodes are algebraic operators like selections (denoted by ), projections (denoted by ), and joins1 (denoted by 1). An intermediate node indicates the application of the corresponding operator on the relations generated by its children, the result of which is then sent further up. Thus, the edges of a tree represent data ow from bottom to top, i.e., from the leaves, which correspond to data in the database, to the root, which is the nal operator producing the query answer. Figure 3 gives three examples of query trees for the query
select name, floor
from emp, dept
where emp.dno=dept.dno and sal>100K .
For a complicated query, the number of all query trees may be enormous. To reduce the size of the space that the search strategy has to explore, DBMSs usually restrict the space in several ways. The rst typical restriction deals with selections and projections:
R1 Selections and projections are processed on the y and almost never generate inter- mediate relations. Selections are processed as relations are accessed for the rst time. Projections are processed as the results of other operators are generated. For example, plan P1 of Section 1 satis es restriction R1: the index scan of emp nds emp tuples that satisfy the selection on emp.sal on they and attempts to join only those; furthermore, the projection on the result attributes occurs as the join tuples are generated. For queries with no join, R1 is moot. For queries with joins, however, it implies that all operations are dealt with as part of join execution. Restriction R1 eliminates only suboptimal query trees, since separate processing of selections and projections incurs additional costs. Hence, the Algebraic Space module speci es alternative query trees with join operators only, selections and projections being implicit.
Given a set of relations to be combined in a query, the set of all alternative join trees is deter- mined by two algebraic properties of join: commutativity (R1 1 R2 R2 1 R1) and associativity ((R1 1 R2) 1 R3 R1 1 (R2 1 R3)). The rst determines which relation will be inner and which outer in the join execution. The second determines the order in which joins will be executed. Even with the R1 restriction, the alternative join trees that are generated by commutativity and associativity is very large, (N!) for N relations. Thus, DBMSs usually further restrict the space that must be explored. In particular, the second typical restriction deals with cross products. R2 Cross products are never formed, unless the query itself asks for them. Relations are combined always through joins in the query.
For example, consider the following query:
select name, floor, balance, address
from emp, dept, acnt, bank
where emp.dno=dept.dno and dept.ano=acnt.ano and acnt.bno=bank.bno
Figure 5 shows three possible cross-product-free join trees that can be used to combine the emp,
dept, acnt, and bank relations to answer the query. Tree T1 satis es restriction R3, whereas trees
T2 and T3 do not, since they have at least one join with an intermediate result as the inner relation.
Because of their shape (Figure 5) join trees that satisfy restriction R3, e.g., tree T1, are called left- deep. Trees that have their outer relation always being a database relation, e.g., tree T2, are called right-deep. Trees with at least one join between two intermediate results, e.g., tree T3, are called bushy. Restriction R3 is of a more heuristic nature than R1 and R2 and may well eliminate the optimal plan in several cases. It has been claimed that most often the optimal left-deep tree is not much more expensive than the optimal tree overall. The typical arguments used are two:
- Having original database relations as inners increases the use of any preexisting indices.
- Having intermediate relations as outers allows sequences of nested loops joins to be executed in a pipelined fashion.2 Both index usage and pipelining reduce the cost of join trees. Moreover, restriction R3 signi cantly reduces the number of alternative join trees, to O(2N) for many queries with N relations. Hence, the Algebraic Space module of the typical query optimizer only speci es join trees that are left-deep.
In summary, typical query optimizers make restrictions R1, R2, and R3 to reduce the size of the space they explore. Hence, unless otherwise noted, our descriptions follow these restrictions as well.
Thursday, April 8, 2010
//
Labels:
Programming
//
0
comments
//
Computers are now no longer only membranous Humans Used for their jobs, but as already started to operate to replace most of the human work does not Require That thinking and routines. Further development experts are trying to replace the system of the human brain into a computer system.
Neural network is one of the information processing system Designed to mimic the way human brains work in conducting a problem with the process of learning through on their synaptic weight changes. Neural network is Able to identify activities based on past data. Past data will from be studied by artificial neural networks capable That have to inform decisions on That data have not been studied.
Neural network is defined as a system of information processing have That resembling human neural Characteristics. Neural network is created as a mathematical model of understanding generelization Humans (human cognition) based upon assumptions Mutation
1st. Information processing occurs in simple elements Called neurons.
2. Signal Flow Between the nerve cells / neurons via a link connection.
3. Each connection link has a weight Which Will Be Used to double / multiplying sent through the signal.
4. Each nerve cell function will of activation apply to the weighted sum of signals coming to him "to determine the output signal.
Artificial neural networks have a large excess dibandingakan with another calculation method, namely:
1st. The ABILITY even though on their acquired knowledge in a disturbance and uncertain conditions.
2. Ability to present knowledge flexibly.
3. The ABILITY to Provide tolerance to a distortion (error / faults), Nowhere a small disturbance in the data Can be regarded as noise (shake) them.
4. Ability to process knowledge efficiently for wearing a parallel system, so That Time needed to operate Them Is Becoming Shorter.
With a very good level of ABILITY, Some applications of artificial neural network is suitable to apply to:
1st. Classification, selecting one input specific data into one category specified.
2. Association, describes an object as a whole only with a part of another object.
3. Self organizing, ABILITY to process the input data without having to have the data as a target.
4. Optimization, finding an answer or solution best That Can optimizing so often with a cost function (the optimizer).
Characteristic determined by the artificial neural network;
1st. The pattern of relations Between neurons (Called the network architecture)
2. The method to determine the connection weights (Called training or learning network)
3. Activation function.
The basic concept of neural networks
Arsitecture division of neural networks Can be seen from the number of working framework and interconnection schemes. Working framework artificial neural network bias seen from the number of layers and the number of nodes on all layers.
Layer neural networks Compiler Can be divided into three, namely;
1st. Input layer: Node-node in the input layer of input units is Called. Input unit receives input from the outside world. Input is entered is representation of a problem.
2. Hidden layer: Node-node in the hidden layers, hidden units is Called. The output of this layer is not directly observable.
3. Output layer: Node-node at the output layer of output units is Called. Output or the output of this layer is the output of neural networks to a problem.
Most of the neural network adjusts its weight During the weight-training procedure. Training Can be guided training (supervised training) Nowhere pair targets the required inputs for Each pattern was Trained. The second kind is not guided training (unsupervised training). In this method, the adjustment of weights (as a response to the input), the target need not be accompanied. In no supervised training, the network classifies the existing patterns based on category similarity.
The difference is the use of guided training class membership information of Each training example. With this information unsupervised training algorithm for pattern classification cans detect the error as a feedback in the network.
While not supervised neural network training based on how to Modify parameters in a way That masks any sense. In this training model, neural networks do not utilize of membership is no class of training examples, but use the information in a group of neurons to local Modify parameters.
Artificial neural network to solve the problem through a process of learning from examples. Usually the artificial neural network is given a set of training patterns Which parties in a set of sample patterns. From the example neural network learning process.
Can Humans learn, understand, and remember it fully, partially, and Sometimes not all, depending on the person's capacity to learn and store information in on their brains. As the brain stores information is not in full then it is likely to lose the information stored in the brain it will from be great.
The main difference, Between the human brain with an artificial neural network is biased forget That the human brain, whereas neural networks are not likely to forget. Trained neural networks have been Crops Will Be Deeply and Permanently serve targeted information within the cells. Nerve cells in the neural networks Can not be damaged while the human nerve cells is likely corrupted. When nerve cells in the human brain is damaged then the information contained therein will of will of some lost and people forget the information contained therein.
Data and information on human cells is stored in a structured unit in the brain. While the neural network, data and information stored in the weights and biases shape files so That Can be the Anticipation of potential future damage by using a back-up or data backups.
Another difference is the accuracy. When finished Trained neural network, then he Will Be Able to solve the problem of premises The Same Same results even if the problem is repeated Arm-time, while the Humans are not Able to do so.
In The Same unit length of artificial neural networks Quickly Can transmit more information than the human brain. Because this is the work in electronic neural networks while the human brain works chemically.
The Following is a complete comparison Between the capabilities Possessed by the human brain with a CPU:
Neural network is one of the information processing system Designed to mimic the way human brains work in conducting a problem with the process of learning through on their synaptic weight changes. Neural network is Able to identify activities based on past data. Past data will from be studied by artificial neural networks capable That have to inform decisions on That data have not been studied.
Neural network is defined as a system of information processing have That resembling human neural Characteristics. Neural network is created as a mathematical model of understanding generelization Humans (human cognition) based upon assumptions Mutation
1st. Information processing occurs in simple elements Called neurons.
2. Signal Flow Between the nerve cells / neurons via a link connection.
3. Each connection link has a weight Which Will Be Used to double / multiplying sent through the signal.
4. Each nerve cell function will of activation apply to the weighted sum of signals coming to him "to determine the output signal.
Artificial neural networks have a large excess dibandingakan with another calculation method, namely:
1st. The ABILITY even though on their acquired knowledge in a disturbance and uncertain conditions.
2. Ability to present knowledge flexibly.
3. The ABILITY to Provide tolerance to a distortion (error / faults), Nowhere a small disturbance in the data Can be regarded as noise (shake) them.
4. Ability to process knowledge efficiently for wearing a parallel system, so That Time needed to operate Them Is Becoming Shorter.
With a very good level of ABILITY, Some applications of artificial neural network is suitable to apply to:
1st. Classification, selecting one input specific data into one category specified.
2. Association, describes an object as a whole only with a part of another object.
3. Self organizing, ABILITY to process the input data without having to have the data as a target.
4. Optimization, finding an answer or solution best That Can optimizing so often with a cost function (the optimizer).
Characteristic determined by the artificial neural network;
1st. The pattern of relations Between neurons (Called the network architecture)
2. The method to determine the connection weights (Called training or learning network)
3. Activation function.
The basic concept of neural networks
Arsitecture division of neural networks Can be seen from the number of working framework and interconnection schemes. Working framework artificial neural network bias seen from the number of layers and the number of nodes on all layers.
Layer neural networks Compiler Can be divided into three, namely;
1st. Input layer: Node-node in the input layer of input units is Called. Input unit receives input from the outside world. Input is entered is representation of a problem.
2. Hidden layer: Node-node in the hidden layers, hidden units is Called. The output of this layer is not directly observable.
3. Output layer: Node-node at the output layer of output units is Called. Output or the output of this layer is the output of neural networks to a problem.
Most of the neural network adjusts its weight During the weight-training procedure. Training Can be guided training (supervised training) Nowhere pair targets the required inputs for Each pattern was Trained. The second kind is not guided training (unsupervised training). In this method, the adjustment of weights (as a response to the input), the target need not be accompanied. In no supervised training, the network classifies the existing patterns based on category similarity.
The difference is the use of guided training class membership information of Each training example. With this information unsupervised training algorithm for pattern classification cans detect the error as a feedback in the network.
While not supervised neural network training based on how to Modify parameters in a way That masks any sense. In this training model, neural networks do not utilize of membership is no class of training examples, but use the information in a group of neurons to local Modify parameters.
Artificial neural network to solve the problem through a process of learning from examples. Usually the artificial neural network is given a set of training patterns Which parties in a set of sample patterns. From the example neural network learning process.
Can Humans learn, understand, and remember it fully, partially, and Sometimes not all, depending on the person's capacity to learn and store information in on their brains. As the brain stores information is not in full then it is likely to lose the information stored in the brain it will from be great.
The main difference, Between the human brain with an artificial neural network is biased forget That the human brain, whereas neural networks are not likely to forget. Trained neural networks have been Crops Will Be Deeply and Permanently serve targeted information within the cells. Nerve cells in the neural networks Can not be damaged while the human nerve cells is likely corrupted. When nerve cells in the human brain is damaged then the information contained therein will of will of some lost and people forget the information contained therein.
Data and information on human cells is stored in a structured unit in the brain. While the neural network, data and information stored in the weights and biases shape files so That Can be the Anticipation of potential future damage by using a back-up or data backups.
Another difference is the accuracy. When finished Trained neural network, then he Will Be Able to solve the problem of premises The Same Same results even if the problem is repeated Arm-time, while the Humans are not Able to do so.
In The Same unit length of artificial neural networks Quickly Can transmit more information than the human brain. Because this is the work in electronic neural networks while the human brain works chemically.
The Following is a complete comparison Between the capabilities Possessed by the human brain with a CPU:
//
Labels:
internet
//
0
comments
//
About Me
- teknotutorial
- Realistic, full consideration and the principle holds. They try to live life with a high ideal standard. Harbored anger as weakness and try to muffle. They follow strict standards of behavior and / or trying to make the world a l. .. ebih good. # Advantages: holding the ethical, dependable, productive, wise, idealistic, fair, honest, orderly, and self-discipline. # Ugliness: judgmental, inflexible, dogmatic, obsessive-compulsive disorder, like criticizing someone else, too seriously, mastering, anxiety, and envy. # How to get along with me: Do what becomes your responsibility, so I do not have to do the whole job. Acknowledge my achievements. I'm hard on myself. Make sure again that I am fine. Say that you appreciate my advice. Fair and full perhatianlah, like me. Apologize if you're careless. That will help me to forgive. Gently persuade me to relax and laugh at myself if I am uptight, but hear my worries first.