Distributed Queries¶

How queries are distributed and executed across nodes.

Query Execution Flow¶

graph TB
    Q[Query] --> P[Query Planner]
    P --> S[Shard Selection]
    S --> N1[Node 1: Partition A]
    S --> N2[Node 2: Partition B]
    S --> N3[Node 3: Partition C]
    N1 --> A[Aggregator]
    N2 --> A
    N3 --> A
    A --> R[Final Result]

Query Types¶

Type	Distribution	Example
Point	Single node	`get_account(pubkey)`
Range	Multiple nodes	`get_transactions(start, end)`
Scatter-Gather	All nodes	`search(pattern)`

Query Planning¶

pub struct QueryPlan {
    pub sub_queries: Vec<SubQuery>,
    pub aggregation: AggregationType,
    pub estimated_cost: u64,
}

impl QueryPlanner {
    pub fn plan(&self, query: &Query) -> QueryPlan {
        match query.query_type {
            QueryType::PointLookup { key } => {
                let node = self.ring.get_node(&key);
                QueryPlan::single(node, query)
            }
            QueryType::RangeScan { start, end } => {
                let nodes = self.ring.get_range_nodes(start, end);
                QueryPlan::parallel(nodes, query)
            }
            QueryType::FullScan => {
                QueryPlan::scatter_gather(self.all_nodes(), query)
            }
        }
    }
}

Parallel Execution¶

pub async fn execute_parallel(&self, plan: QueryPlan) -> Result<QueryResult> {
    let futures: Vec<_> = plan.sub_queries
        .iter()
        .map(|sq| self.execute_sub_query(sq))
        .collect();

    let results = futures::future::join_all(futures).await;

    self.aggregate(results, plan.aggregation)
}

Result Aggregation¶

Operation	Method
`COUNT`	Sum partial counts
`SUM`	Sum partial sums
`AVG`	Weighted average
`MIN/MAX`	Compare partials
`LIMIT`	Merge and truncate

Performance Tips¶

Use point queries when possible
Add filters to reduce data transfer
Limit result size with pagination
Cache frequent queries client-side