DataHub Python Builds

These prebuilt wheel files can be used to install our Python packages as of a specific commit.

Build context

Built at 2026-07-02T13:25:13.077566+00:00.

{
  "timestamp": "2026-07-02T13:25:13.077566+00:00",
  "branch": "cr-timerseries-agg-batch",
  "commit": {
    "hash": "68c311883525abd2c3602ab303823ca81676d1d7",
    "message": "perf(graphql): batch-load Dashboard.statsSummary via timeseries aggregation API\n\nAdds `TimeseriesAspectService.batchGetAggregatedStats()` and the backing\n`ESAggregatedStatsDAO.getBatchAggregatedStats()` to execute a single\nOpenSearch request for up to N URNs at once, using an outer\n`terms(\"batch_urn_outer\")` aggregation keyed by URN.\n\nPreviously the DAO had no batch path; every caller fired one ES request\nper entity, and per-entity aggregation queries had two hardcoded limits:\n\n- Bucket sort order was always ascending `_key` for STRING grouping\n  buckets \u2014 there was no way to request top-N by metric value.\n- Bucket size was always `MAX_TERM_BUCKETS` (1,440 for 24 h \u00d7 60 min),\n  forcing ES to materialise all user buckets so the caller could sort\n  and trim client-side.\n\nNew additions:\n\n- `GroupingBucket.pdl`: adds optional `size`, `orderByMetric`, and\n  `ascending` fields so callers can express \"top-5 by metric DESC\"\n  directly in the query rather than fetching everything and discarding.\n- `ESAggregatedStatsDAO.makeGroupingAggregationBuilder`: respects the\n  new PDL fields \u2014 STRING buckets use `_key` order by default but switch\n  to metric-ordered `terms` when `orderByMetric=true`; `size` caps the\n  bucket count at the call site instead of always defaulting to\n  MAX_TERM_BUCKETS.\n- `ElasticSearchTimeseriesAspectService.batchGetAggregatedStats`:\n  sub-batches the URN list by `TimeseriesAspectServiceConfig.BatchAggConfig\n  .maxUrnsPerBatch` (default 50) and delegates each sub-batch to the DAO.\n  When the feature flag is off the method falls back to the existing\n  per-URN `getAggregatedStats` default interface method, making the\n  change backward-compatible.\n\n`DashboardStatsSummaryResolver` previously shared the same utility path\nas the `DashboardUsageStatsResolver` detail view (`getUserUsageCounts`\nfrom `DashboardUsageStatsUtils`):\n\n- It fired two separate ES queries per dashboard (one `getAspectValues`\n  for viewCount, one `getAggregatedStats` for users).\n- The user query fetched all users with six aggregation specs\n  (SUM + CARDINALITY for usageCount, viewsCount, executionsCount) using\n  a STRING grouping bucket with no size cap \u2014 returning every user ever\n  seen for that dashboard.\n- `uniqueUserCountLast30Days` was computed as `userUsageCounts.size()`,\n  a client-side count of the returned buckets.\n- Top-5 users were selected by client-side sort of the full list, with\n  the rest discarded.\n- On a search results page showing N dashboards this meant 2 \u00d7 N\n  concurrent ES requests, each potentially materialising thousands of\n  user buckets.\n\nThe new `DashboardStatsSummaryBatchLoader` (DataLoader pattern) fires\nexactly three query types for all N dashboards in a single GraphQL\nrequest, each sub-batched into groups of \u2264 50 URNs:\n\n  A. `batchGetAspectValues` (no time window, limit=1) \u2192 viewCount\n  B. `batchGetAggregatedStats` CARDINALITY on `userCounts.user` \u2192\n     uniqueUserCountLast30Days (ES-native cardinality, exact regardless\n     of user volume)\n  C. `batchGetAggregatedStats` SUM on `userCounts.usageCount`, STRING\n     grouping size=5 orderByMetric=true ascending=false \u2192\n     topUsersLast30Days already ranked by ES, no client sort needed\n\n`DashboardStatsSummaryResolver` is gated by the existing\n`timeseriesAspectAggBatchLoadEnabled` feature flag (env var\n`TIMESERIES_ASPECT_AGG_BATCH_LOAD_ENABLED`, default true); when disabled\nit falls back to the old per-URN path unchanged.\n\nDetail pages that fetch statsSummary for a single dashboard at a time\nare unaffected \u2014 the old path is still used there because there is no\nfan-out to batch. The fan-out reduction applies to search and browse\nresult pages that render many dashboards simultaneously.\n\nA smoke test (`test_stats_summary_graphql.py`) is added that seeds 3\ndashboards and asserts result correctness. Parity between the batch path\nand the per-URN fallback path was verified manually by running the test\nwith the feature flag both enabled and disabled.\n\nNote: the unbatched fallback path in this resolver and the feature flag\ngating it will be removed in a follow-up PR once the batch path has\nbeen validated in production, making the batch loader the permanent\nimplementation.\n\nNote: a further follow-up PR will apply the same batch-loader pattern to\nDatasetStatsSummaryResolver and ChartStatsSummaryResolver. The timeseries\nservice and DAO changes in Part 1 require no further modification to\nsupport those resolvers.\n\nCo-Authored-By: Claude Sonnet 4.6 "
  },
  "base": {
    "hash": "70c1d041f56b0463b114a18d628ea0bf19da4264",
    "message": "feat(teradata): report missing tables from cache during column extraction (#18096)"
  },
  "pr": {
    "number": 18131,
    "title": "perf(graphql): batch-load Dashboard.statsSummary via timeseries aggregation API",
    "url": "https://github.com/datahub-project/datahub/pull/18131"
  }
}

Usage

Current base URL: unknown

Package Size Install command
acryl-datahub 4.299 MB uv pip install 'acryl-datahub @ <base-url>/artifacts/wheels/acryl_datahub-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-actions 0.116 MB uv pip install 'acryl-datahub-actions @ <base-url>/artifacts/wheels/acryl_datahub_actions-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-airflow-plugin 0.072 MB uv pip install 'acryl-datahub-airflow-plugin @ <base-url>/artifacts/wheels/acryl_datahub_airflow_plugin-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-dagster-plugin 0.021 MB uv pip install 'acryl-datahub-dagster-plugin @ <base-url>/artifacts/wheels/acryl_datahub_dagster_plugin-0.0.0.dev1-py3-none-any.whl'
acryl-datahub-gx-plugin 0.011 MB uv pip install 'acryl-datahub-gx-plugin @ <base-url>/artifacts/wheels/acryl_datahub_gx_plugin-0.0.0.dev1-py3-none-any.whl'
prefect-datahub 0.011 MB uv pip install 'prefect-datahub @ <base-url>/artifacts/wheels/prefect_datahub-0.0.0.dev1-py3-none-any.whl'