Angorithm4 Webinar #10

Co-host by Jiawei Wang 2021-12-24

1. 10th?

1. Intro to Performance Engineering

2021-10-15
Hardware development (Complex)
Matrix multiplication (Python 21041.67s -> Optimized C 3.04s)

2. Intro to Computer Architecture

2021-10-22
M1 Pro/Max Released
RISC vs CISC

3. Comparision Sort I

2021-10-29
Trade off in ISA (Semantic Gap)
Merge Sort (Comparision Sort i)

4. Comparision Sort II

2021-11-05
Dynamic Programming (Fds)
Quick Sort (Comparision Sort ii)

5. Comparision Sort III

2021-11-19
Virtual Machine
Heap Sort (Comparision Sort iii)
Lower Bounds of Comparision Sort (O(NlogN))

6. Microarchitecture

2021-11-26
angorithm4.org
Turing Machine (State)
MIPS-Simulator and Eval (AST-Interpreter)

7. JavaScript. The Core

2021-12-03
Prototype inherience
Runtime

8. Intro to DataBase I

2021-12-10
Buffer Pool
Hashing Part I (Static Hashing)

9. Intro to DataBase II and Digital Modulation

2021-12-17
Digital Modulation
Hashing Part II (Dynamic Hashing)

10. Intro to DataBase III

2021-12-24
Below the DBMS VM
B-Tree

2. B-Tree

Dynamic Hashing

Trade off
Chained Hashing: Worst case O(N)
- Resolve collisions by placing all elements with the same hash key into the same bucket.
Extendible Hashing: Worst case O(1) + extra manage time
- “Only Split the chain when overflowed”
Linear Hashing: Same as Extendible Hashing, but take less time to manage it
- “Two Hash Functions”, Balance

What is a B-Tree?

B Stands for Balance
B- Tree (1971 Boeing)
B+ Tree (1973 IBM)
Blink Tree (1981 CMU)

A B-Tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in O(log n).

The original B-Tree from 1971 stored keys + values in all nodes in the tree.
A B+Tree only stores values in leaf nodes. Inner nodes only guide the search process.
Like Binary Search Tree (Kind of)

Example: B+ Tree Visualization by David Gales

3. Below the DBMS VM

Video: D. Richard Hipp 2015 in CMU

What Happends Underneath?

`emails_random.txt`

ckenzie_Leigh5000@irrepsy.com
Chester_Zaoui4418@sheye.org
Gladys_Pratt1129@hourpy.biz
Vera_Driscoll43@cispeto.com
John_Smith9073@brety.org
Alexander_Brett1097@sheye.org
Phillip_May4129@hourpy.biz
Nina_Nicholls4940@kideod.biz
Johnathan_Needham5231@yahoo.com
Chad_Cox1524@twipet.com
Johnny_Eastwood9872@kideod.biz
Henry_Fleming7688@liret.org
David_Jenkin6789@twace.org
Angela_Foxley2732@fuliss.net
Jaylene_Rixon5612@qater.org
Lara_Snell8705@qater.org
Enoch_Clayton8217@zorer.org
Claire_Tindall9581@fuliss.net
Cristal_Tate2593@famism.biz
Erin_Eagle3884@eirey.tech
Caleb_Dubois1513@fuliss.net
Martin_Neal1410@nimogy.biz
Hazel_Dale9267@kideod.biz
Barry_Williams8002@hourpy.biz
Macy_Roth2551@eirey.tech
Irene_Kaur5056@twipet.com
Ethan_Newton6130@ovock.tech
Kieth_Warden8091@bretoux.com
Maria_Flynn2748@mafthy.com
Ronald_Palmer8118@kideod.biz
... 10000 lines

1. Postgresql

$  sudo -u postgres psql postgres

2. Create Table emails

CREATE TABLE emails (email VARCHAR(128));

3. Import Data

COPY emails(email) FROM '/Users/Angold4/WorkSpace/Webinar/2021-12-24/src/emails_random.txt' DELIMITER ',' CSV;

4. Do some stuff:

Find all emails that begin with ‘A’

SELECT * FROM emails WHERE email LIKE 'A%';

Count all emails that begin with ‘A’

SELECT count(*) FROM emails WHERE email LIKE 'A%';

Find Min Email

SELECT min(email) FROM emails;

SELECT * FROM emails WHERE email = 'Abbey_Chester9370@famism.biz';

explain SELECT * FROM emails WHERE email = 'Abbey_Chester9370@famism.biz';

                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on emails  (cost=0.00..199.00 rows=1 width=28)
Filter: ((email)::text = 'Abbey_Chester9370@famism.biz'::text)

Build a Hash Index

A table index is a replica of a subset of a table’s attributes
that are organized and/or sorted for efficient access using a subset of those attributes.

CREATE INDEX idx_emails_hash ON emails USING HASH(email);

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using idx_emails_hash on emails  (cost=0.00..8.02 rows=1 width=28)
 Index Cond: ((email)::text = 'Abbey_Chester9370@famism.biz'::text)

Build a B-Tree Index (Limitation of Hashing)

SELECT * FROM emails WHERE email like 'Abbey%';

explain SELECT * FROM emails WHERE email like 'Abbey%';

                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on emails  (cost=0.00..199.00 rows=1 width=28)
 Filter: ((email)::text ~~ 'Abbey%'::text)

CREATE INDEX idx_emails_tree ON emails USING BTREE(email);

explain SELECT * FROM emails WHERE email like 'Abbey%';

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Index Only Scan using idx_emails_tree on emails  (cost=0.29..4.31 rows=1 width=28)
   Index Cond: ((email >= 'Abbey'::text) AND (email < 'Abbez'::text))
   Filter: ((email)::text ~~ 'Abbey%'::text)