A C++ implementation of a database management system that supports both row-oriented (heap files) and column-oriented storage formats. This project demonstrates fundamental database concepts including record serialization, page management, and various database operations.
This project implements a complete database management system with the following key features:
- Fixed-length record serialization with configurable field sizes
- Slotted page layout for efficient record storage
- Heap file organization with linked-list directory structure
- Column-oriented storage for analytical queries
- CRUD operations: Create, Read, Update, Delete
- Range-based selection queries
- Performance measurement capabilities
- Record Structure: Fixed-length records with configurable field sizes (default: 100 fields Ă— 10 bytes each)
- Serialization: Fixed-length serialization scheme with padding/truncation
- Memory Management: Automatic record allocation and deallocation
- Slotted Directory: Efficient page organization with bitmap-based slot management
- Fixed-length Slots: Each slot accommodates exactly one record
- Page Capacity: Calculated based on page size and slot size constraints
- Linked-list Directory: Quick access to pages with free slots
- Page Allocation: Dynamic page allocation with growing file size
- Directory Structure:
| address: int | {address: int, free_slots: int}[] |
Converts CSV data to row-oriented heap file format.
./csv2heapfile <csv_file> <heap_file> <page_size>Parameters:
csv_file: Input CSV file pathheap_file: Output heap file pathpage_size: Page size in bytes
Features:
- Processes CSV records and stores them in heap file format
- Automatically handles page overflow and allocation
- Measures and reports processing time
Converts CSV data to column-oriented storage format.
./csv2colstore <csv_file> <directory> <page_size>Parameters:
csv_file: Input CSV file pathdirectory: Output directory for column filespage_size: Page size in bytes
Features:
- Splits records by columns and stores each column in separate heap files
- Creates
RECORD_SIZEnumber of heap files (one per column) - Each column file contains
{record_id, value}pairs
Scans and displays all records in a heap file.
./scan <heap_file> <page_size>Parameters:
heap_file: Input heap file pathpage_size: Page size in bytes
Output: CSV format to stdout
Performs range-based selection on row-oriented heap files.
./select <heap_file> <attribute> <range_start> <range_end> <page_size>Parameters:
heap_file: Input heap file pathattribute: Attribute index (0-based)range_start: Start value for range queryrange_end: End value for range querypage_size: Page size in bytes
Features:
- Performs range queries on specified attribute
- Outputs first 5 characters of matching values
- Measures and reports query execution time
Performs range-based selection on column-oriented storage.
./select2 <directory> <attribute> <range_start> <range_end> <page_size>Parameters:
directory: Directory containing column filesattribute: Attribute index (0-based)range_start: Start value for range queryrange_end: End value for range querypage_size: Page size in bytes
Features:
- Optimized for column-oriented queries
- Only scans the relevant column file
- Measures and reports query execution time
Performs a join operation between two columns in column-oriented storage.
./select3 <directory> <s_attribute> <r_attribute> <range_start> <range_end> <page_size>Parameters:
directory: Directory containing column filess_attribute: Source attribute index for range query (0-based)r_attribute: Result attribute index to output (0-based)range_start: Start value for range query on s_attributerange_end: End value for range query on s_attributepage_size: Page size in bytes
Features:
- Performs range query on s_attribute and outputs corresponding values from r_attribute
- Essentially a column-to-column join operation
- Measures and reports query execution time
Inserts new records from CSV into existing heap file.
./insert <heap_file> <csv_file> <page_size>Parameters:
heap_file: Target heap file (read-write mode)csv_file: CSV file containing records to insertpage_size: Page size in bytes
Updates a specific field in a record.
./update <heap_file> <record_id> <attribute> <new_value> <page_size>Parameters:
heap_file: Target heap file (read-write mode)record_id: Record identifier in format<PID>/<SLOT>attribute: Attribute index to update (0-based)new_value: New value for the attributepage_size: Page size in bytes
Deletes a record from the heap file.
./delete <heap_file> <record_id> <page_size>Parameters:
heap_file: Target heap file (read-write mode)record_id: Record identifier in format<PID>/<SLOT>page_size: Page size in bytes
Features:
- Logical deletion using bitmap flags
- Reports free slot count before and after deletion
- Outputs "Free: X" messages showing slot availability
Converts CSV data to raw page files.
./write_fixed_len_pages <csv_file> <page_file> <page_size>Parameters:
csv_file: Input CSV file pathpage_file: Output raw page file pathpage_size: Page size in bytes
Features:
- Converts CSV records to fixed-length page format
- Outputs raw page data without heap file structure
- Reports record count, page count, and processing time
Reads and displays raw page files in CSV format.
./read_fixed_len_page <page_file> <page_size>Parameters:
page_file: Input raw page file pathpage_size: Page size in bytes
Features:
- Reads raw page files and converts back to CSV format
- Outputs records to stdout in CSV format
- Reports record count, page count, and processing time
constexpr int RECORD_SIZE = 100; // Maximum fields per record
constexpr int FIELD_SIZE = 10; // Maximum bytes per fieldThe page size must satisfy:
page_size >= capacity * (slot_size) + capacity + sizeof(int)
capacity <= (page_size - sizeof(int)) / (slot_size + 1)
For heap files: slot_size = RECORD_SIZE * FIELD_SIZE
For column stores: slot_size = 2 * FIELD_SIZE
- GCC/G++ compiler
- Make utility
makeThis will compile all programs and create the following executables:
csv2heapfilecsv2colstorescanselectselect2select3insertupdatedeletewrite_fixed_len_pagesread_fixed_len_page
make clean./csv2heapfile data.csv data.heap 4096./csv2colstore data.csv colstore/ 4096./scan data.heap 4096./select data.heap 0 "A" "Z" 4096./select2 colstore/ 0 "A" "Z" 4096./insert data.heap new_records.csv 4096./update data.heap "0/5" 2 "new_value" 4096./delete data.heap "0/5" 4096./write_fixed_len_pages data.csv pages.raw 4096./read_fixed_len_page pages.raw 4096./select3 colstore/ 0 2 "A" "Z" 4096- Timing Measurement: Most programs report execution time in milliseconds
- Memory Management: Efficient memory allocation and deallocation
- Page-level Operations: Optimized for page-based I/O
- Column-oriented Optimization: Faster analytical queries on column stores
- Statistics Reporting: Page and record counts for utility programs
- Slot Management: Free slot tracking for deletion operations
- Header: Linked-list directory structure
- Data Pages: Fixed-length pages with slotted organization
- Record Layout: Fixed-length fields with padding/truncation
- Directory Structure: One heap file per column
- Column Files: Each contains
{record_id, value}pairs - Optimized Layout: Efficient for column-wise access patterns
The system includes comprehensive error handling for:
- File I/O errors
- Invalid record sizes
- Out-of-range attribute indices
- Memory allocation failures
- Page overflow conditions
- Iterator Pattern:
RecordIterator,page_iterator,heap_iterator - Wrapper Classes:
page_wrapper,heap_wrapperfor safe operations - RAII: Automatic resource management for pages and records
- Fixed-length Serialization: Predictable storage requirements
- Fixed record and field sizes
- No indexing support
- No transaction management
- No concurrent access control
- Limited to string-based data types
- Variable-length record support
- B-tree indexing
- Transaction management
- Concurrent access control
- Additional data types
- Query optimization
- Compression support