https://moodle.cs.ubbcluj.ro/pluginfile.php/50651/mod_resource/content/1/DBMSs_Lecture12_I.pdf
Parallel Database Systems
- performance improvement
- parallelize operations:
- loading data
- building indexes
- query evaluation
- data can be distributed, but distribution is dictated solely by performance reasons
Parallel Databases Architectures
shared-memory
-
several CPUs:
- attached to an interconnection network
- can access a common region in the main memory

shared-disk
-
a CPU:
- its own private memory
- can access all disks through a network

shared-nothing
Interference
-
specific to shared-memory and shared-disk architectures
-
add CPUs:
- increased contention for memory and network bandwidth
⇒ existing CPUs are slowing down
-
main reason that led to the shared-nothing architecture, currently considered as the best option for large parallel database systems
The Shared-Nothing
- linear speed-up
- required processing time for operations decreases proportionally to the increase in the number of CPUs and disks

- linear scale-up
- num. of CPUs and disks grows proportionally to the amount of data
⇒ performance is sustained


Parallel Query Evaluation
- context
- DBMS based on a shared-nothing architecture
- evaluate a query in a parallel manner
- operators in an execution plan can be evaluated in parallel
- 2 operators are evaluated in parallel
- one operator is evaluated in a parallel manner
- an operator is said to block if it doesn’t produce results until it consumes all its inputs (e.g. sorting, aggregation)
- pipelined parallelism
- an operator consumes the output of another operator
- limited by blocking operators
- parallel evaluation on partitioned data
- every operator in a plan can be evaluated in a parallel manner by partitioning the input data
- partitions are processed in parallel, the results are then combined
- processor = CPU + its local disk
Data Partitioning
- horizontally partition a large dataset on several disks
- partitions are then read / written in parallel
- round-robin partitioning
- n processors
- the ith tuple is assigned to processor i % n
- hash partitioning
- determine the processor for a tuple t
- apply a hash function to t ((some of) its attributes)