I/O Cost and Buffer Requirements for Various Join Implementation

Author: Christoph F. Eick
Source (in part): He. Garcia-Molina,...:"Database System Implementation"

Assumptions: Cost to generate the output relation are not considered; that means, the minimum required buffer size is 2 blocks (one block for each of the two relations to be joined). The two relations to be joined are R and S, and B(S) and B(R) denote the number of blocks of S, and R, and T(S) and T(R) denote the number of tupels of R and S, respectively. Moreover, we assume B(S)<=B(R); moreover, we assume that R has a hashed index on the join attribute, and that each value of R's join attribute occurs with frequency f.

Join VariantBuffersizeApprox. Cost
Block Nested Loop JoinB(S)+1B(R)+B(S)
Block Nested Loop JoinM>=2B(R)*B(S)/(M-1)
Simple Sort Merge Joinsqrt(B(S))5*(B(R)+B(S))
Sort Merge Joinsqrt(B(S)+B(R))3*(B(R)+B(S))
Hash Joinsqrt(B(S))3*(B(R)+B(S))
Hybrid Hash Joinsqrt(B(S))(3 - 2M/B(S))*(B(R)+B(S))
Hash Index-based JoinM>=3B(S) + T(S)*(1+f)

last updated: February 25, 2000, 2:22p