Angorithm4 Webinar #10

Co-host by Jiawei Wang 2021-12-24

1. 10th?

1st

1. Intro to Performance Engineering

  • 2021-10-15
  • Hardware development (Complex)
  • Matrix multiplication (Python 21041.67s -> Optimized C 3.04s)

2. Intro to Computer Architecture

3. Comparision Sort I

  • 2021-10-29
  • Trade off in ISA (Semantic Gap)
  • Merge Sort (Comparision Sort i)

4. Comparision Sort II

  • 2021-11-05
  • Dynamic Programming (Fds)
  • Quick Sort (Comparision Sort ii)

5. Comparision Sort III

  • 2021-11-19
  • Virtual Machine
  • Heap Sort (Comparision Sort iii)
  • Lower Bounds of Comparision Sort (O(NlogN))

6. Microarchitecture

  • 2021-11-26
  • angorithm4.org
  • Turing Machine (State)
  • MIPS-Simulator and Eval (AST-Interpreter)

7. JavaScript. The Core

8. Intro to DataBase I

  • 2021-12-10
  • Buffer Pool
  • Hashing Part I (Static Hashing)

9. Intro to DataBase II and Digital Modulation

10. Intro to DataBase III

2. B-Tree

Dynamic Hashing

Dhash
  • Trade off

  • Chained Hashing: Worst case O(N)

    • Resolve collisions by placing all elements with the same hash key into the same bucket.
  • Extendible Hashing: Worst case O(1) + extra manage time

    • “Only Split the chain when overflowed”
  • Linear Hashing: Same as Extendible Hashing, but take less time to manage it

    • “Two Hash Functions”, Balance

What is a B-Tree?

  • B Stands for Balance
  • B- Tree (1971 Boeing)
  • B+ Tree (1973 IBM)
  • Blink Tree (1981 CMU)

A B-Tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in O(log n).

B-Tree
  • The original B-Tree from 1971 stored keys + values in all nodes in the tree.
  • A B+Tree only stores values in leaf nodes. Inner nodes only guide the search process.
  • Like Binary Search Tree (Kind of)

Example: B+ Tree Visualization by David Gales

3. Below the DBMS VM

Video: D. Richard Hipp 2015 in CMU

What Happends Underneath?

emails_random.txt

ckenzie_Leigh5000@irrepsy.com
Chester_Zaoui4418@sheye.org
Gladys_Pratt1129@hourpy.biz
Vera_Driscoll43@cispeto.com
John_Smith9073@brety.org
Alexander_Brett1097@sheye.org
Phillip_May4129@hourpy.biz
Nina_Nicholls4940@kideod.biz
Johnathan_Needham5231@yahoo.com
Chad_Cox1524@twipet.com
Johnny_Eastwood9872@kideod.biz
Henry_Fleming7688@liret.org
David_Jenkin6789@twace.org
Angela_Foxley2732@fuliss.net
Jaylene_Rixon5612@qater.org
Lara_Snell8705@qater.org
Enoch_Clayton8217@zorer.org
Claire_Tindall9581@fuliss.net
Cristal_Tate2593@famism.biz
Erin_Eagle3884@eirey.tech
Caleb_Dubois1513@fuliss.net
Martin_Neal1410@nimogy.biz
Hazel_Dale9267@kideod.biz
Barry_Williams8002@hourpy.biz
Macy_Roth2551@eirey.tech
Irene_Kaur5056@twipet.com
Ethan_Newton6130@ovock.tech
Kieth_Warden8091@bretoux.com
Maria_Flynn2748@mafthy.com
Ronald_Palmer8118@kideod.biz
... 10000 lines

1. Postgresql

$  sudo -u postgres psql postgres

2. Create Table emails

CREATE TABLE emails (email VARCHAR(128));

3. Import Data

COPY emails(email) FROM '/Users/Angold4/WorkSpace/Webinar/2021-12-24/src/emails_random.txt' DELIMITER ',' CSV;
page

4. Do some stuff:

Find all emails that begin with ‘A’
SELECT * FROM emails WHERE email LIKE 'A%';
Count all emails that begin with ‘A’
SELECT count(*) FROM emails WHERE email LIKE 'A%';
Find Min Email
SELECT min(email) FROM emails;
SELECT * FROM emails WHERE email = 'Abbey_Chester9370@famism.biz';
explain SELECT * FROM emails WHERE email = 'Abbey_Chester9370@famism.biz';
                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on emails  (cost=0.00..199.00 rows=1 width=28)
Filter: ((email)::text = 'Abbey_Chester9370@famism.biz'::text)
Build a Hash Index

A table index is a replica of a subset of a table’s attributes
that are organized and/or sorted for efficient access using a subset of those attributes.

CREATE INDEX idx_emails_hash ON emails USING HASH(email);
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using idx_emails_hash on emails  (cost=0.00..8.02 rows=1 width=28)
 Index Cond: ((email)::text = 'Abbey_Chester9370@famism.biz'::text)
Build a B-Tree Index (Limitation of Hashing)
SELECT * FROM emails WHERE email like 'Abbey%';
explain SELECT * FROM emails WHERE email like 'Abbey%';
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on emails  (cost=0.00..199.00 rows=1 width=28)
 Filter: ((email)::text ~~ 'Abbey%'::text)
CREATE INDEX idx_emails_tree ON emails USING BTREE(email);
explain SELECT * FROM emails WHERE email like 'Abbey%';
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Index Only Scan using idx_emails_tree on emails  (cost=0.29..4.31 rows=1 width=28)
   Index Cond: ((email >= 'Abbey'::text) AND (email < 'Abbez'::text))
   Filter: ((email)::text ~~ 'Abbey%'::text)
BufferPool
10th