Input / Output Operations
The bottleneck in most data pipelines is not calculation, but IO (Input/Output). PardoX solves this by moving data ingestion entirely to the Rust core, bypassing Python's slow file handling and object creation overhead.
1. CSV Files (Text Data)
PardoX features a multi-threaded CSV reader. Instead of reading line-by-line like standard Python libraries, PardoX memory-maps the file and uses parallel workers to parse chunks simultaneously.
Basic Usage
import pardox as px
# Automatically detects headers and infers schema
df = px.read_csv("dataset.csv")
How it works
Intelligent Type Inference
The engine scans the first N rows to determine if a column is Integer, Float, or String.
Parallel Parsing
The file is split into logical blocks, and multiple CPU cores parse them concurrently.
Fail-Fast Error Handling
If a row is malformed, PardoX will report the error immediately rather than silently corrupting data.
2. Native SQL (Database Ingestion)
Load data directly from SQL databases without using Python drivers (like psycopg2 or sqlalchemy) as intermediaries. PardoX connects to the database at the Rust level, fetches the binary stream, and constructs the DataFrame in memory.
Usage
The read_sql function requires a standard connection string and a SQL query.
# Format: postgres://user:password@host:port/database
conn_str = "postgres://admin:secret@localhost:5432/analytics_db"
query = """
SELECT id, amount, date
FROM sales
WHERE region = 'US-West'
"""
# Executes query and returns PardoX DataFrame
df = px.read_sql(conn_str, query)
Performance Note
This method is significantly faster than pandas.read_sql because it avoids converting SQL types to Python Objects (PyObject) before converting them again to internal arrays.
3. The Native Format (.prdx)
The PRDX format is the native binary representation of a PardoX HyperBlock. It is designed for instant persistence.
Key Features
- No Serialization Overhead: Unlike CSV or JSON, saving to
.prdxis effectively a direct memory dump to disk. - Memory Mapping: Reading a
.prdxfile leverages OS-level memory mapping, allowing near-instant access to data without CPU-intensive parsing.
Saving to Disk
Loading from Disk
Benchmark
In tests with 10GB datasets, reading a .prdx file achieves throughputs of 4.6 GB/s, limited only by the speed of the NVMe SSD.
4. Apache Arrow Bridge
PardoX is designed to play well with others. If you have data in PyArrow, you can convert it to PardoX with zero-copy overhead (passing memory pointers).
import pyarrow as pa
import pardox as px
# Assuming you have a PyArrow Table
arrow_table = pa.Table.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
# Convert to PardoX DataFrame
df = px.from_arrow(arrow_table)
Interoperability
This bridge allows seamless integration with the Arrow ecosystem, including Polars, DuckDB, and Apache Spark.