Database Management System

A C++ implementation of a database management system that supports both row-oriented (heap files) and column-oriented storage formats. This project demonstrates fundamental database concepts including record serialization, page management, and various database operations.

Overview

This project implements a complete database management system with the following key features:

Fixed-length record serialization with configurable field sizes
Slotted page layout for efficient record storage
Heap file organization with linked-list directory structure
Column-oriented storage for analytical queries
CRUD operations: Create, Read, Update, Delete
Range-based selection queries
Performance measurement capabilities

Architecture

Core Components

Record Management

Record Structure: Fixed-length records with configurable field sizes (default: 100 fields × 10 bytes each)
Serialization: Fixed-length serialization scheme with padding/truncation
Memory Management: Automatic record allocation and deallocation

Page Layout

Slotted Directory: Efficient page organization with bitmap-based slot management
Fixed-length Slots: Each slot accommodates exactly one record
Page Capacity: Calculated based on page size and slot size constraints

Heap File Organization

Linked-list Directory: Quick access to pages with free slots
Page Allocation: Dynamic page allocation with growing file size
Directory Structure: | address: int | {address: int, free_slots: int}[] |

Programs

Data Conversion and Storage

`csv2heapfile`

Converts CSV data to row-oriented heap file format.

./csv2heapfile <csv_file> <heap_file> <page_size>

Parameters:

csv_file: Input CSV file path
heap_file: Output heap file path
page_size: Page size in bytes

Features:

Processes CSV records and stores them in heap file format
Automatically handles page overflow and allocation
Measures and reports processing time

`csv2colstore`

Converts CSV data to column-oriented storage format.

./csv2colstore <csv_file> <directory> <page_size>

Parameters:

csv_file: Input CSV file path
directory: Output directory for column files
page_size: Page size in bytes

Features:

Splits records by columns and stores each column in separate heap files
Creates RECORD_SIZE number of heap files (one per column)
Each column file contains {record_id, value} pairs

Database Operations

`scan`

Scans and displays all records in a heap file.

./scan <heap_file> <page_size>

Parameters:

heap_file: Input heap file path
page_size: Page size in bytes

Output: CSV format to stdout

`select`

Performs range-based selection on row-oriented heap files.

./select <heap_file> <attribute> <range_start> <range_end> <page_size>

Parameters:

heap_file: Input heap file path
attribute: Attribute index (0-based)
range_start: Start value for range query
range_end: End value for range query
page_size: Page size in bytes

Features:

Performs range queries on specified attribute
Outputs first 5 characters of matching values
Measures and reports query execution time

`select2`

Performs range-based selection on column-oriented storage.

./select2 <directory> <attribute> <range_start> <range_end> <page_size>

Parameters:

directory: Directory containing column files
attribute: Attribute index (0-based)
range_start: Start value for range query
range_end: End value for range query
page_size: Page size in bytes

Features:

Optimized for column-oriented queries
Only scans the relevant column file
Measures and reports query execution time

`select3`

Performs a join operation between two columns in column-oriented storage.

./select3 <directory> <s_attribute> <r_attribute> <range_start> <range_end> <page_size>

Parameters:

directory: Directory containing column files
s_attribute: Source attribute index for range query (0-based)
r_attribute: Result attribute index to output (0-based)
range_start: Start value for range query on s_attribute
range_end: End value for range query on s_attribute
page_size: Page size in bytes

Features:

Performs range query on s_attribute and outputs corresponding values from r_attribute
Essentially a column-to-column join operation
Measures and reports query execution time

CRUD Operations

`insert`

Inserts new records from CSV into existing heap file.

./insert <heap_file> <csv_file> <page_size>

Parameters:

heap_file: Target heap file (read-write mode)
csv_file: CSV file containing records to insert
page_size: Page size in bytes

`update`

Updates a specific field in a record.

./update <heap_file> <record_id> <attribute> <new_value> <page_size>

Parameters:

heap_file: Target heap file (read-write mode)
record_id: Record identifier in format <PID>/<SLOT>
attribute: Attribute index to update (0-based)
new_value: New value for the attribute
page_size: Page size in bytes

`delete`

Deletes a record from the heap file.

./delete <heap_file> <record_id> <page_size>

Parameters:

heap_file: Target heap file (read-write mode)
record_id: Record identifier in format <PID>/<SLOT>
page_size: Page size in bytes

Features:

Logical deletion using bitmap flags
Reports free slot count before and after deletion
Outputs "Free: X" messages showing slot availability

Utility Programs

`write_fixed_len_pages`

Converts CSV data to raw page files.

./write_fixed_len_pages <csv_file> <page_file> <page_size>

Parameters:

csv_file: Input CSV file path
page_file: Output raw page file path
page_size: Page size in bytes

Features:

Converts CSV records to fixed-length page format
Outputs raw page data without heap file structure
Reports record count, page count, and processing time

`read_fixed_len_page`

Reads and displays raw page files in CSV format.

./read_fixed_len_page <page_file> <page_size>

Parameters:

page_file: Input raw page file path
page_size: Page size in bytes

Features:

Reads raw page files and converts back to CSV format
Outputs records to stdout in CSV format
Reports record count, page count, and processing time

Configuration

Constants (in `library.h`)

constexpr int RECORD_SIZE = 100;  // Maximum fields per record
constexpr int FIELD_SIZE = 10;    // Maximum bytes per field

Page Size Considerations

The page size must satisfy:

page_size >= capacity * (slot_size) + capacity + sizeof(int)
capacity <= (page_size - sizeof(int)) / (slot_size + 1)

For heap files: slot_size = RECORD_SIZE * FIELD_SIZE For column stores: slot_size = 2 * FIELD_SIZE

Building

Prerequisites

GCC/G++ compiler
Make utility

Compilation

make

This will compile all programs and create the following executables:

csv2heapfile
csv2colstore
scan
select
select2
select3
insert
update
delete
write_fixed_len_pages
read_fixed_len_page

Cleaning

make clean

Usage Examples

1. Convert CSV to Heap File

./csv2heapfile data.csv data.heap 4096

2. Convert CSV to Column Store

./csv2colstore data.csv colstore/ 4096

3. Scan All Records

./scan data.heap 4096

4. Range Query on Row Store

./select data.heap 0 "A" "Z" 4096

5. Range Query on Column Store

./select2 colstore/ 0 "A" "Z" 4096

6. Insert New Records

./insert data.heap new_records.csv 4096

7. Update a Record

./update data.heap "0/5" 2 "new_value" 4096

8. Delete a Record

./delete data.heap "0/5" 4096

9. Convert CSV to Raw Pages

./write_fixed_len_pages data.csv pages.raw 4096

10. Read Raw Pages

./read_fixed_len_page pages.raw 4096

11. Column Join Query

./select3 colstore/ 0 2 "A" "Z" 4096

Performance Features

Timing Measurement: Most programs report execution time in milliseconds
Memory Management: Efficient memory allocation and deallocation
Page-level Operations: Optimized for page-based I/O
Column-oriented Optimization: Faster analytical queries on column stores
Statistics Reporting: Page and record counts for utility programs
Slot Management: Free slot tracking for deletion operations

File Formats

Heap File Format

Header: Linked-list directory structure
Data Pages: Fixed-length pages with slotted organization
Record Layout: Fixed-length fields with padding/truncation

Column Store Format

Directory Structure: One heap file per column
Column Files: Each contains {record_id, value} pairs
Optimized Layout: Efficient for column-wise access patterns

Error Handling

The system includes comprehensive error handling for:

File I/O errors
Invalid record sizes
Out-of-range attribute indices
Memory allocation failures
Page overflow conditions

Design Patterns

Iterator Pattern: RecordIterator, page_iterator, heap_iterator
Wrapper Classes: page_wrapper, heap_wrapper for safe operations
RAII: Automatic resource management for pages and records
Fixed-length Serialization: Predictable storage requirements

Limitations

Fixed record and field sizes
No indexing support
No transaction management
No concurrent access control
Limited to string-based data types

Future Enhancements

Variable-length record support
B-tree indexing
Transaction management
Concurrent access control
Additional data types
Query optimization
Compression support

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
csv2colstore.cc		csv2colstore.cc
csv2heapfile.cc		csv2heapfile.cc
delete.cc		delete.cc
insert.cc		insert.cc
library.cc		library.cc
library.h		library.h
makefile		makefile
read_fixed_len_page.cc		read_fixed_len_page.cc
scan.cc		scan.cc
select.cc		select.cc
select2.cc		select2.cc
select3.cc		select3.cc
update.cc		update.cc
write_fixed_len_pages.cc		write_fixed_len_pages.cc

Uh oh!

Uh oh!

nguyennm1024/PageDB

Folders and files

Latest commit

History

Repository files navigation

Database Management System

Overview

Architecture

Core Components

Record Management

Page Layout

Heap File Organization

Programs

Data Conversion and Storage

csv2heapfile

csv2colstore

Database Operations

scan

select

select2

select3

CRUD Operations

insert

update

delete

Utility Programs

write_fixed_len_pages

read_fixed_len_page

Configuration

Constants (in library.h)

Page Size Considerations

Building

Prerequisites

Compilation

Cleaning

Usage Examples

1. Convert CSV to Heap File

2. Convert CSV to Column Store

3. Scan All Records

4. Range Query on Row Store

5. Range Query on Column Store

6. Insert New Records

7. Update a Record

8. Delete a Record

9. Convert CSV to Raw Pages

10. Read Raw Pages

11. Column Join Query

Performance Features

File Formats

Heap File Format

Column Store Format

Error Handling

Design Patterns

Limitations

Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`csv2heapfile`

`csv2colstore`

`scan`

`select`

`select2`

`select3`

`insert`

`update`

`delete`

`write_fixed_len_pages`

`read_fixed_len_page`

Constants (in `library.h`)

Packages