Lecture9:10.pdf
💡Recall
🖋 Notes
Indexes
- motivating example
- file of students records sorted by name
- good file organization
- retrieve students in alphabetical order
- not a good file organization
- retrieve students whose age is in a given range
- retrieve students who live in Timișoara
- index
- auxiliary data structure that speeds up operations which can't be efficiently carried out given the file's organization
- enables the retrieval of the rids of records that meet a selection condition (e.g. the rids of records describing students who live in Timișoara)
Syntax
- search key
- set of one or more attributes of the indexed file (different from the key that identifies records)
- an index speeds up queries with equality / range selection conditions on the search key
- entries
- records in the index (e.g., <search key, rid>)
- enable the retrieval of records with a given search key value
- an index can improve the efficiency of certain types of queries, not of all queries (analogy - when searching for a book at the library, index cards sorted on author name cannot be used to efficiently locate a book given its title)
- organization techniques (access methods) - examples
- B+ trees
- hash-based structures
- changing the data in the file ⇒ update the indexes associated with the file (e.g. inserting records, updating search key columns, updating columns that are not part of the key, but are included in the index)
- index size
- as small as possible, as indexes are brought into main memory for searches
Data Entries
Let k* be a data entry in an index, the data entry:
alternative 1
- is an actual data record with search key value = k
alternative 2
- is a pair <k, rid> (rid – id of a data record with search key value = k)
alternative 3
- is a pair <k, rid_list> (rid_list – list of ids of data records with search key value = k)
a1
- the file of data records needn't be stored in addition to the index
- the index is seen as a special file organization
- at most 1 index / collection of records should use alternative a1 (to avoid redundancy)
a2, a3
- data entries point to corresponding data records
- in general, the size of an entry is much smaller than the size of a data record
- a3 is more compact than a2, but can contain variable-length records
- can be used by several indexes on a collection of records
- independent of the file organization
Examples

- file with Employee records hashed on age
- record <Ionescu, 60, F, 200>:
- apply hash function to age: convert 60 to its binary representation, take the 2 least significant bits as the bucket identifier for the record
- index file that uses alternative 1 (data entries are the actual data records), search key age
- index that uses alternative 2 (data entries have the form <search key, rid>), search key bonus
- both indexes use hashing to locate data entries

Ex. 3
- files with students records
- index built on attribute city
- entries: <city, rid>, where rid identifies a student record
- such an index would speed up queries about students living in a given city:
- find entries in the index with city = 'Timișoara'
- follow rids from obtained entries to retrieve records describing students who live in Timișoara
Clustered / Unclustered Indexes
- clustered index:
- the order of the data records is close to / the same as the order of the data entries
- unclustered index:
- index that is not clustered

- index that uses alternative 1 - clustered (by definition, since the data entries are the actual data records)
- indexes using alternatives 2 / 3 are clustered only if the data records are ordered on the search key
- in practice:
- expensive to maintain the sort order for files, so they are rarely kept sorted
- a clustered index is an index that uses alternative 1 for data entries
- an index that uses alternative 2 or 3 for data entries is unclustered
- on a collection of records:
- there can be at most 1 clustered index
- and several unclustered indexes
- range search query (e.g., where age between 20 and 30)
- cost of using an unclustered index
- each data entry that meets the condition in the query could contain a rid pointing to a distinct page
- the number of I/O operations could be equal to the number of data entries that satisfy the query's condition
Primary / Secondary Indexes
- primary index
- the search key includes the primary key
- secondary index
- index that is not primary
- unique index
- duplicates
- primary indexes, unique indexes cannot contain duplicates
- secondary indexes can contain duplicates
Composite Search Keys