Skip to content

nguyennm1024/PageDB

Repository files navigation

Database Management System

A C++ implementation of a database management system that supports both row-oriented (heap files) and column-oriented storage formats. This project demonstrates fundamental database concepts including record serialization, page management, and various database operations.

Overview

This project implements a complete database management system with the following key features:

  • Fixed-length record serialization with configurable field sizes
  • Slotted page layout for efficient record storage
  • Heap file organization with linked-list directory structure
  • Column-oriented storage for analytical queries
  • CRUD operations: Create, Read, Update, Delete
  • Range-based selection queries
  • Performance measurement capabilities

Architecture

Core Components

Record Management

  • Record Structure: Fixed-length records with configurable field sizes (default: 100 fields Ă— 10 bytes each)
  • Serialization: Fixed-length serialization scheme with padding/truncation
  • Memory Management: Automatic record allocation and deallocation

Page Layout

  • Slotted Directory: Efficient page organization with bitmap-based slot management
  • Fixed-length Slots: Each slot accommodates exactly one record
  • Page Capacity: Calculated based on page size and slot size constraints

Heap File Organization

  • Linked-list Directory: Quick access to pages with free slots
  • Page Allocation: Dynamic page allocation with growing file size
  • Directory Structure: | address: int | {address: int, free_slots: int}[] |

Programs

Data Conversion and Storage

csv2heapfile

Converts CSV data to row-oriented heap file format.

./csv2heapfile <csv_file> <heap_file> <page_size>

Parameters:

  • csv_file: Input CSV file path
  • heap_file: Output heap file path
  • page_size: Page size in bytes

Features:

  • Processes CSV records and stores them in heap file format
  • Automatically handles page overflow and allocation
  • Measures and reports processing time

csv2colstore

Converts CSV data to column-oriented storage format.

./csv2colstore <csv_file> <directory> <page_size>

Parameters:

  • csv_file: Input CSV file path
  • directory: Output directory for column files
  • page_size: Page size in bytes

Features:

  • Splits records by columns and stores each column in separate heap files
  • Creates RECORD_SIZE number of heap files (one per column)
  • Each column file contains {record_id, value} pairs

Database Operations

scan

Scans and displays all records in a heap file.

./scan <heap_file> <page_size>

Parameters:

  • heap_file: Input heap file path
  • page_size: Page size in bytes

Output: CSV format to stdout

select

Performs range-based selection on row-oriented heap files.

./select <heap_file> <attribute> <range_start> <range_end> <page_size>

Parameters:

  • heap_file: Input heap file path
  • attribute: Attribute index (0-based)
  • range_start: Start value for range query
  • range_end: End value for range query
  • page_size: Page size in bytes

Features:

  • Performs range queries on specified attribute
  • Outputs first 5 characters of matching values
  • Measures and reports query execution time

select2

Performs range-based selection on column-oriented storage.

./select2 <directory> <attribute> <range_start> <range_end> <page_size>

Parameters:

  • directory: Directory containing column files
  • attribute: Attribute index (0-based)
  • range_start: Start value for range query
  • range_end: End value for range query
  • page_size: Page size in bytes

Features:

  • Optimized for column-oriented queries
  • Only scans the relevant column file
  • Measures and reports query execution time

select3

Performs a join operation between two columns in column-oriented storage.

./select3 <directory> <s_attribute> <r_attribute> <range_start> <range_end> <page_size>

Parameters:

  • directory: Directory containing column files
  • s_attribute: Source attribute index for range query (0-based)
  • r_attribute: Result attribute index to output (0-based)
  • range_start: Start value for range query on s_attribute
  • range_end: End value for range query on s_attribute
  • page_size: Page size in bytes

Features:

  • Performs range query on s_attribute and outputs corresponding values from r_attribute
  • Essentially a column-to-column join operation
  • Measures and reports query execution time

CRUD Operations

insert

Inserts new records from CSV into existing heap file.

./insert <heap_file> <csv_file> <page_size>

Parameters:

  • heap_file: Target heap file (read-write mode)
  • csv_file: CSV file containing records to insert
  • page_size: Page size in bytes

update

Updates a specific field in a record.

./update <heap_file> <record_id> <attribute> <new_value> <page_size>

Parameters:

  • heap_file: Target heap file (read-write mode)
  • record_id: Record identifier in format <PID>/<SLOT>
  • attribute: Attribute index to update (0-based)
  • new_value: New value for the attribute
  • page_size: Page size in bytes

delete

Deletes a record from the heap file.

./delete <heap_file> <record_id> <page_size>

Parameters:

  • heap_file: Target heap file (read-write mode)
  • record_id: Record identifier in format <PID>/<SLOT>
  • page_size: Page size in bytes

Features:

  • Logical deletion using bitmap flags
  • Reports free slot count before and after deletion
  • Outputs "Free: X" messages showing slot availability

Utility Programs

write_fixed_len_pages

Converts CSV data to raw page files.

./write_fixed_len_pages <csv_file> <page_file> <page_size>

Parameters:

  • csv_file: Input CSV file path
  • page_file: Output raw page file path
  • page_size: Page size in bytes

Features:

  • Converts CSV records to fixed-length page format
  • Outputs raw page data without heap file structure
  • Reports record count, page count, and processing time

read_fixed_len_page

Reads and displays raw page files in CSV format.

./read_fixed_len_page <page_file> <page_size>

Parameters:

  • page_file: Input raw page file path
  • page_size: Page size in bytes

Features:

  • Reads raw page files and converts back to CSV format
  • Outputs records to stdout in CSV format
  • Reports record count, page count, and processing time

Configuration

Constants (in library.h)

constexpr int RECORD_SIZE = 100;  // Maximum fields per record
constexpr int FIELD_SIZE = 10;    // Maximum bytes per field

Page Size Considerations

The page size must satisfy:

page_size >= capacity * (slot_size) + capacity + sizeof(int)
capacity <= (page_size - sizeof(int)) / (slot_size + 1)

For heap files: slot_size = RECORD_SIZE * FIELD_SIZE For column stores: slot_size = 2 * FIELD_SIZE

Building

Prerequisites

  • GCC/G++ compiler
  • Make utility

Compilation

make

This will compile all programs and create the following executables:

  • csv2heapfile
  • csv2colstore
  • scan
  • select
  • select2
  • select3
  • insert
  • update
  • delete
  • write_fixed_len_pages
  • read_fixed_len_page

Cleaning

make clean

Usage Examples

1. Convert CSV to Heap File

./csv2heapfile data.csv data.heap 4096

2. Convert CSV to Column Store

./csv2colstore data.csv colstore/ 4096

3. Scan All Records

./scan data.heap 4096

4. Range Query on Row Store

./select data.heap 0 "A" "Z" 4096

5. Range Query on Column Store

./select2 colstore/ 0 "A" "Z" 4096

6. Insert New Records

./insert data.heap new_records.csv 4096

7. Update a Record

./update data.heap "0/5" 2 "new_value" 4096

8. Delete a Record

./delete data.heap "0/5" 4096

9. Convert CSV to Raw Pages

./write_fixed_len_pages data.csv pages.raw 4096

10. Read Raw Pages

./read_fixed_len_page pages.raw 4096

11. Column Join Query

./select3 colstore/ 0 2 "A" "Z" 4096

Performance Features

  • Timing Measurement: Most programs report execution time in milliseconds
  • Memory Management: Efficient memory allocation and deallocation
  • Page-level Operations: Optimized for page-based I/O
  • Column-oriented Optimization: Faster analytical queries on column stores
  • Statistics Reporting: Page and record counts for utility programs
  • Slot Management: Free slot tracking for deletion operations

File Formats

Heap File Format

  • Header: Linked-list directory structure
  • Data Pages: Fixed-length pages with slotted organization
  • Record Layout: Fixed-length fields with padding/truncation

Column Store Format

  • Directory Structure: One heap file per column
  • Column Files: Each contains {record_id, value} pairs
  • Optimized Layout: Efficient for column-wise access patterns

Error Handling

The system includes comprehensive error handling for:

  • File I/O errors
  • Invalid record sizes
  • Out-of-range attribute indices
  • Memory allocation failures
  • Page overflow conditions

Design Patterns

  • Iterator Pattern: RecordIterator, page_iterator, heap_iterator
  • Wrapper Classes: page_wrapper, heap_wrapper for safe operations
  • RAII: Automatic resource management for pages and records
  • Fixed-length Serialization: Predictable storage requirements

Limitations

  • Fixed record and field sizes
  • No indexing support
  • No transaction management
  • No concurrent access control
  • Limited to string-based data types

Future Enhancements

  • Variable-length record support
  • B-tree indexing
  • Transaction management
  • Concurrent access control
  • Additional data types
  • Query optimization
  • Compression support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published