β
What is the difference between random I/O and sequential I/O?
β
What is an index in a database?
data structure to help speed up data retrieval operation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| - Employee table
| EmpID | Name | Department |
| ----- | ------- | ---------- |
| 101 | Alice | HR |
| 102 | Bob | Sales |
| 103 | Charlie | HR |
- create Index on EmpID
Key | Value (RowID / Pointer)
------+------------------------
101 | β Row 1
102 | β Row 2
103 | β Row 3
- SQL
SELECT * FROM Employees WHERE EmpID = 102;
|
- index is always SORTED
- ππ»
SELECT
, scan entire table β, quickly locate data βοΈ - ππ»
INSERT
, UPDATE
, DELETE
β
How does an index work?
β
What factors should you consider when creating an index?
- 1οΈβ£ Query patterns: add index to
WHERE
, JOIN
, ORDER BY
, GROUP BY
- 2οΈβ£ Selectivity: high selectivity = many unique values
- 3οΈβ£ Read vs. Write workload: index speed up reads, slow down writes
- 4οΈβ£ Table size: index is better for large tables
- 5οΈβ£ Sort, Group: index help sort, grouping columns
- 6οΈβ£ uniqueness: use
UNIQUE
index to enforce data integrity
1
2
3
4
5
| When is index most efficient?
- ππ» GROUP BY, ORDER BY
- ππ» high cardinality, lots of unique values
- ππ» more read, not write
- ππ» big table
|
β
What factors should we be aware when creating index?
- 1οΈβ£ aware of Range conditions like
BETWEEN
, LIKE
, <
, >
- index is only used up to the column where range starts
- columns after that in multi-column index are ignored
-- composite index in month, city
WHERE month BEWTEEN 202301 AND 202312
AND city = Seoul
-- month use index
-- city might NOT use index
- 2οΈβ£ Use Equality Conditions
- use
=
and IN
-- replace range with equality
-- not efficient
AND month BETWEEN 200801 AND 200812
-- more efficient
AND month IN ('200801','200802','200803','200804','200805','200806','200807','200808','200809','200810','200811','200812')
-- column is String
WHERE CAST(col AS INT) = 123
-- ππ» disable index due to data mismatch
β
What should you be careful about when using indexes?
- β οΈ overuse of index slow down write operations
- β οΈ index maintenance overhead: changing indexed columns often lead to constant re-balancing of index structure
- β οΈ using index in low cardinality columns: if use index in
gender
column with only a few distinct values, index will not be effective! - β οΈ correct order in
composite(multi-level) index
: composite index
query should start with first column. Put high-cardinality columns first
β
Is it good to create many indexes on a table?
π‘ What is a covering index
?
covering index: index that contains all the columns needed to satisfy a query
DB can answer the query just using index, WO accessing table
- index holds required data
- DB DOES NOT NEED to access actual table rows
- no table lookup
- ππ» reduce I/O, just get data from index, not real table
- ππ» faster read, do not have to access table
- ππ» efficient for read
1
2
3
4
5
6
7
8
9
10
11
12
13
| CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255),
age INT
);
-- create covering index
CREATE INDEX idx_email_age ON users (email, age);
SELECT age FROM users WHERE email = 'john@example.com';
-- WHERE clause knows wmail
-- SELECT needs only age
-- index idx_email_age already has both email and age
|
π‘ What is a multi-column index
(or composite index)?
multi-column index: single index that includes multiple columns from table
useful when queries filter or sort using more than one column
- when queries involve more than one column
- βοΈ order of column matters
- index is most effective when query filters based on leftmost columns first
- better to set from broad to narrow categories
- λλΆλ₯ β‘οΈ μ€λΆλ₯ β‘οΈ μλΆλ₯
food
β‘οΈ vegetable
β‘οΈ carrot
1
| an index on (first_name, last_name) can efficiently support queries filtering on first_name or both, but not just last_name
|
- example of composite index
1
2
3
4
5
6
7
8
9
10
| CREATE INDEX idx_email_age ON users (email, age);
-- βοΈ efficient, use leftmost column
SELECT * FROM users WHERE email = 'john@example.com';
-- βοΈ efficient, use both index columns
SELECT * FROM users WHERE email = 'john@example.com' AND age = 30;
-- β not efficient, skips first column
SELECT * FROM users WHERE age = 30;
|
π‘ What are nodes in B-Tree
and B+Tree
indexes?
- root node: topmost node, one node per tree, search starts here
- internal node: node that routes search paths, between root and leaf node
- leaf node: most bottom node, node that stores actual data(same leaf node, same depth)
- linked leaf nodes:
B+Tree
only, connected from left β‘οΈ right for range scanning
1
2
3
4
5
| [50] β Root
/ \
[20, 40] [60, 80] β Internal nodes
|
[10] [20] [30] [40] β Leaf nodes
|
π‘ Can you explain the difference between B-Tree
and B+Tree
indexes?
- 곡ν΅μ : balanced tree structures used in indexing
μ°¨μ΄μ : how/where they store data
- βοΈ B-Tree: store key and actual data in both internal and leaf nodes
- tree with sorted keys
DB navigates through the tree to find target key
, get value
, then access the associated data
- ππ» fast point lookup
- ππ» increase node size and I/O
- ππ» need more memory, data is saved both in root, branch, leaf node
1
2
3
4
5
6
| internal node: store key β data
leaf node: store key β data
leaf connection: not linked
point lookup: fast
range query: slow(internal nodes are not linked, has to go through multiple levels)
use: old DBs
|
1
2
3
4
5
6
7
| if DB is [5, 10, 15, 20, 25, 30, 35, 40]
[15] /*internal node*/
/ \
[5,10] [20,25,30] /*leaf nodes*/
\
[35,40]
|
- βοΈ B+Tree: store data in only in leaf node
- store only
key and pointer
in root, internal node - actual data resides in leaf nodes
- leaf nodes are linked by
LinkedList
together - ππ» fast range query, support sequential scans
BETWEEN, <, >, LIKE %abc%
- ππ» memory efficiency, only has to save data in leaf node
- used as default in modern databases
- used in
MySQL
, PostgreSQL
1
2
3
4
5
6
| internal node: only keys
leaf node: store key β data
leaf connection: linked
point lookup: fast
range query:fast(leaces are linked)
use: modern RDBMS
|
1
2
3
4
5
6
7
8
9
10
11
| if DB is [5, 10, 15, 20, 25, 30, 35, 40]
[15] /*internal node: store only keys*/
/ \
[5,10] [20,25] /*leaf node: store key and data*/
\
[30,35]
\
[40]
Leaf node chain: [5,10] β [20,25] β [30,35] β [40] /*linked*/
|
- example of real-world query
1
2
3
4
5
6
| CREATE INDEX idx_age ON users (age);
SELECT id, email
FROM users
WHERE age BETWEEN 20 AND 30
ORDER BY id;
|
π‘ What is a Hash index
, and when would you use it?
use hash function to map search keys
into fixed-sized bucket number
π‘ What is a clustering index
?
determine physical order of data in the table
- rows are stored on disk in the same order as index
- only ONE clustering index per table is possible
- Why? only one row can be
clustering index
- (idμμΌλ‘ μ λ ¬ν΄λμ§, λμ΄μμΌλ‘ μ λ ¬ν΄λμ§, λ μ§ μμΌλ‘ μ λ ¬ν΄ λμ§ νλλ§ κ³ λ₯Ό μ μμΌλκΉ)
- in InnoDB,
primary key
is clustering index
by default
1
2
3
4
5
6
7
8
9
10
11
12
13
| clustering index: created_at
+-----------------------------+
| posts Table (Physical Rows)|
+-----------------------------+
| id=9 | created_at=2023-01-01 |
| id=4 | created_at=2023-01-02 |
| id=12 | created_at=2023-01-03 |
| id=5 | created_at=2023-01-04 |
| id=7 | created_at=2023-01-05 |
+-----------------------------+
β
(physically sorted)
|
- ππ»
RANGE queries
like ORDER BY
- ππ» ordered scans are fast
- ππ» if many
INSERT
, DELETE
, UPDATE
, lot of re-ordering has to take place
1
2
| SELECT * FROM posts
WHERE created_at BETWEEN '2023-01-02' AND '2023-01-04';
|
- leaf node of the clustering index store the actual data rows and pointers
β
What is a non-clustering index?
- index that DOES NOT affect the physical order of data in the table
- instead, use
index columns
- and
pointer
: point to where to find data in actual table row - thus, only store
index
and pointer
1
2
3
4
5
| email β row location
------------------------------
alice@example.com β row ID 1
bob@example.com β row ID 2
carol@example.com β row ID 3
|
1
2
3
4
5
| SELECT * FROM users WHERE email = 'bob@example.com';
-- find email
-- then go to row ID 2 with pointer
-- then get user data
|
β
What is a full-text index?
- specialized index used for searching large text data
- such as paragraph, documents hash- support advanced text search
- search
words inside text
- ππ»
MATCH AGAINST
1
2
| SELECT * FROM posts
WHERE MATCH(content) AGAINST('database indexing');
|
- this query will return posts that mention
database
or indexing
- then, sort by relevance

β
Can you explain different types of index scans in a database?
1
| SELECT name FROM users;
|
- βοΈ Index Range Scan: scan specific range based on conditions
BETWEEN
, <
, >
, LIKE 'A%
1
| SELECT * FROM employees WHERE age BETWEEN 30 AND 40;
|
- βοΈ Index Unique Scan: retrieve single row with unique index
1
| SELECT * FROM users WHERE id = 123;
|
- βοΈ Index Skip Scan: when you are using composite index(multi-column index) but your query skips the first column
1
2
3
4
5
6
7
8
9
10
| -- have composite index on department, job_title
CREATE INDEX idx_dept_job ON employees(department, job_title);
-- however, run query on job_title
-- the query DOES NOT filter the first column
SELECT * FROM employees WHERE job_title = 'Manager';
-- in composite index, need to use first or both index in query
-- thus, normally, index will not be used
-- π’ with index skip scan, DB will use index smartly
|
1
2
| SELECT * FROM employees
WHERE department = 'Sales' AND job_title = 'Manager';
|
β
Have you ever checked a query execution plan? Can you explain what it is and how you analyze it?

β
What is a hint in SQL, and when would you use it?
1
2
3
| SELECT /*+ INDEX(employees idx_department) */ *
FROM employees
WHERE department = 'Engineering';
|
- β οΈ overusing hints can reduce maintainability
β
How do you check if an index is being used by a query?

β
How does GROUP BY affect index usage in queries?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| -- Index: (country, gender, name)
-- βοΈ index is used
SELECT country, COUNT(*)
FROM users
GROUP BY country;
SELECT country, gender, COUNT(*)
FROM users
GROUP BY country, gender;
-- β index is NOT used
-- order is wrong
SELECT gender, country
FROM users
GROUP BY gender, country;
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| μ) μΈλ±μ€ (a, b, c)κ° μμ λ
1. μΈλ±μ€κ° μ μ©λλ κ²½μ°:
- GROUP BY a : μΈλ±μ€ 첫 λ²μ§Έ 컬λΌμΈ aλ§ μ¬μ©νλ―λ‘, μΈλ±μ€ μ μ©
- GROUP BY a, b : aμ bλ μΈλ±μ€μ 첫 λ²μ§Έμ λ λ²μ§Έ 컬λΌμ ν΄λΉνλ―λ‘, μΈλ±μ€ μ μ©
- GROUP BY a, b, c : λͺ¨λ 컬λΌμ΄ μΈλ±μ€μ ν¬ν¨λμ΄ μμ΄ μΈλ±μ€ μ μ©
2. μΈλ±μ€κ° μ μ©λμ§ μλ κ²½μ°:
- GROUP BY b : bλ λ λ²μ§Έ 컬λΌμ΄μ§λ§ 첫 λ²μ§Έ 컬λΌμΈ aκ° ν¬ν¨λμ§ μμμ, μΈλ±μ€κ° μ μ©λμ§ μμ
- GROUP BY b, a : bκ° μ²« λ²μ§Έλ‘ λμ€κ³ , aλ λ λ²μ§Έλ‘ λμ€κΈ° λλ¬Έμ, μμκ° λ§μ§ μμ μΈλ±μ€κ° μ μ©λμ§ μμ
- GROUP BY a, c, b, d : μΈλ±μ€μ ν¬ν¨λμ§ μμ d 컬λΌμ΄ μκΈ° λλ¬Έμ, μΈλ±μ€κ° μ μ©λμ§ μμ
|
β
How would you create an index on a table with name, country, and gender?
1
| CREATE INDEX idx_name_country_gender ON users(name, country, gender);
|
- β οΈ Be careful, in
composite index
including leading column is important!
- always include leading column
1
2
3
4
5
6
7
8
| -- βοΈ use index
SELECT * FROM users WHERE name = 'John';
SELECT * FROM users WHERE name = 'John' AND country = 'Korea';
-- β index will not be used
-- unless Index Skip Scan is enabled
SELECT * FROM users WHERE country = 'Korea';
|
- β οΈ In
GROUP-BY
, leading column and order is very important
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| -- βοΈ use index
SELECT name, COUNT(*) FROM users
GROUP BY name;
SELECT name, country, COUNT(*) FROM users
GROUP BY name, country;
SELECT name, country, gender, COUNT(*) FROM users
GROUP BY name, country, gender;
-- β index will not be used
-- leading column is not included
SELECT country, COUNT(*) FROM users
GROUP BY country;
-- β index will not be used
-- order is wrong
SELECT country, name FROM users
GROUP BY country, name;
|