关系代数角度的三种连接

连接可以理解为Selection+Cartesian product,Cartesian product是固定的,让连接操作变得复杂的,是selection操作,连接的区分条件也就是这些筛选条件。

  • θ连接,R和S笛卡尔积的结果选取R中属性A与S中属性H之间满足θ条件的元组。

R(BθH)S=σ{R(B)θS(H)}(R×S)R\Join_{(B{\theta}H)} S=\sigma_{\{R(B){\theta}S(H)\}}(R\times S)

  • 等值连接,也就是先做笛卡尔积,筛选规定字段值相同的元组。由关系R和关系S的笛卡尔积中选取R中属性B和S中属性H上值相等的元组所构成。等值连接是一种特殊的θ连接,当θ连接中的运算符为等号,就是等值连接。

R(B=H)S=σ{R(B)=S(H)}(R×S)R\Join_{(B=H)} S=\sigma_{\{R(B)=S(H)\}}(R\times S)

  • 自然连接,由关系R和关系S的笛卡尔积中选取相同属性组B上值相等的元组所构成。其实自然连接是一种特殊的等值连接,要求关系R和关系S必须必须有相同的属性组B。

RS=σ{R(B)=S(B)}(R×S)R\Join S=\sigma_{\{R(B)=S(B)\}}(R\times S)

除法操作

适用特征: 查询…全部的/所有的

前提条件: 给定关系 R(A1 ,A2 , … ,An) 为 n 度关系,关系 S(B1 ,B2 , … ,Bm) 为 m 度关系 。如果可以进行关系 R 与关系 S 的除运算,当且仅当:属性集 { B1 ,B2 , … , Bm } 是属性集 { A1 ,A2 , … ,An } 的真子集。

The relation returned by the division operator will have attributes=(All attributes of R - All attributes of S), for each tuple, return all of them from relation R which are associated to every S’s tuple.

思考

像Select、Project、Union、Differ、Cartesian product、Rename在关系运算中都是属于基本运算操作符,那么Join、Intersect、Divide则属于扩展运算符因为它们都能由基本运算符获得组成部分,比如: 我们可以将除法操作记成:

temp1πRS(R)temp1 \leftarrow \pi_{R-S}(R)

temp2πRS((temp1×S)R)temp2 \leftarrow \pi_{R-S}((temp1\times{S})-R)

resulttemp1temp2result \leftarrow temp1-temp2

temp1表示的投影部分在关系R中存在,但是在关系S中不存在,本质上即选择了R中不在S中的属性,并将结果存储在temp1中。

temp2我们逐步拆解,(TEMP1 × S)表示两部分的笛卡尔积,意在将temp1中的每个元组与S中的元组两两组合来建立一个新的关系;((TEMP1 × S) - R)表示从上面的笛卡尔积中去除关系R。返回在笛卡尔积中出现但是不在关系R中的元组;这里的投影操作表示只保留关系R中,但不在关系S中的属性,并将结果存储在temp2中。

最后从temp1中去除temp2。

未完待续

聚合运算

What if we want a relation with information about “sum of salaries” of employees, or the “average age” of students?

We need more expressive power, we can use aggregation functions to summarize information from multiple tuples into aggregate values. We can use an aggregation operator γ and a function such as SUM, AVG, MIN, MAX, or COUNT. What if NULL?

SUM: NULL values are generally treated as if they were zeros. So, if a column has NULL values, they won’t affect the sum.

AVG: NULL values are typically ignored when calculating the average. The average is computed based on the non-NULL values in the specified column.

MIN and MAX: NULL values may or may not be included in the result, depending on the specific database system and query settings. You can use the IS NULL or IS NOT NULL conditions to control how NULL values are treated in these cases.

COUNT: NULL values are generally included in the count. If you want to count non-NULL values only, you can use the COUNT(column_name) function.

Attention: it’s important to note that the handling of NULL values in aggregation functions can vary between database management systems.

group by 多个字段

假设你有一个员工表(Employees),其中包含员工的信息,包括部门(Department)和地区(Region)。你想要计算每个部门和地区的平均工资。可以使用GROUP BY子句来实现这个目标:

1
2
3
SELECT Department, Region, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY Department, Region;

在这个查询中,我们使用了两个字段,即Department和Region,来指定分组的方式。AVG(Salary)计算了每个部门和地区的平均工资,并使用AS关键字给结果列取了一个别名AverageSalary。

这样,就可以将数据按照多个字段的组合进行分组,并对每个组执行聚合操作。

exercise 2

COMP9311 Exercise 2

Question 1

  1. Get the student ids of the students who got more than 70 marks in the subject 1011.

π{student}(σ(Subject=1011 and Mark>70)MARK)\pi_{\{student\}} (\sigma_{(Subject=1011 \text{ } and \text{ } Mark>70)}MARK)

  1. Get the names and subject codes for which the students got more than 70 marks in the subject.

π{Name,Subject}(σ(Mark>70)STUDENT(Student=Sid)MARK)\pi_{\{Name,Subject\}} (\sigma_{(Mark>70)} STUDENT \Join_{(Student=Sid)} MARK)

  1. Get the name of students whose average marks were over 80.

R1σ(avg(Mark)>80)(γStudent,avg(Mark))MARKR1 \leftarrow \sigma_{(avg(Mark)>80)} (\gamma Student,avg(Mark))MARK

R2STUDENT(Sid=Student)R1R2 \leftarrow STUDENT \Join_{(Sid=Student)} R1

R3π{Name}R2R3 \leftarrow \pi_{\{Name\}}R2

Question 2

  1. Find the name of all authors who are book editors.

π{firstName,lastName}(authorbook)\pi{\{firstName,lastName\}}(author \Join book)

natural join

  1. Find the name of all authors who are book editors.

R1π{authorID}authorπ{authorID}bookR1 \leftarrow \pi_{\{authorID\}}author-\pi_{\{authorID\}}book

R2π{firstName,lastName}(authorR1)R2 \leftarrow \pi_{\{firstName,lastName\}}(author \Join R1)

  1. Find the name of all authors who have at least one publication in the database.

π{firstName,lastName}authorauthorPub\pi_{\{firstName,lastName\}} author \Join authorPub

  1. Find the name of the authors authored a pub that was published in July.

R1π{bookID}(σ(month=July)book)R1 \leftarrow \pi_{\{bookID\}}(\sigma_{(month='July')} book)

R2π{bookID}pubR1R2 \leftarrow \pi_{\{bookID\}}pub \Join R1

R3π{authorID}R2authorR3 \leftarrow \pi_{\{authorID\}} R2 \Join author

R4π{firstName,lastName}R3R4 \leftarrow \pi_{\{firstName,lastName\}} R3

Confusing: Is there anything wrong with the sequence of selection and join operation? In each step, can i omit the projection for intermediate attribute?

  1. Find the name of all authors who are book editors but do not have any publication in the database.

R1π{authorID}bookR1 \leftarrow \pi_{\{authorID\}}book

R2π{authorID}authorPubR2 \leftarrow \pi_{\{authorID\}}authorPub

R3author(R1R2)R3 \leftarrow author \Join (R1-R2)

R4π{firstName,lastName}authorR4 \leftarrow \pi_{\{firstName,lastName\}} author

assignment_1_this_year

assignment_1.pdf