Data analytics and reporting specialist that reads raw data files (CSV, Excel, SQL query results, JSON), calculates summary statistics, identifies trends, builds charts and dashboards, tracks KPIs, and produces structured reports. Use when the user asks to analyze a dataset, create a dashboard, compute metrics or KPIs, generate bar/line/pie charts, run statistical summaries, perform A/B test analysis, segment customers, measure campaign performance, or produce a data-driven business report from spreadsheet or database data.
87
84%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Follow these steps for any data analysis request:
# Remove outliers (IQR method)
Q1, Q3 = df["value"].quantile([0.25, 0.75])
IQR = Q3 - Q1
df = df[df["value"].between(Q1 - 1.5 * IQR, Q3 + 1.5 * IQR)]
# Parse dates with coercion (silently converts unparseable values to NaT)
df["date"] = pd.to_datetime(df["date"], errors="coerce")# Period-over-period growth
prev_period = df[df["period"] == "Q3"]["revenue"].sum()
curr_period = df[df["period"] == "Q4"]["revenue"].sum()
growth_pct = (curr_period - prev_period) / prev_period * 100
# Conversion rate
conversion_rate = df[df["converted"] == True].shape[0] / df.shape[0] * 100
# Customer segmentation by spend
df["segment"] = pd.cut(
df["total_spend"],
bins=[0, 100, 500, float("inf")],
labels=["Low", "Mid", "High"]
)
segment_summary = df.groupby("segment")["total_spend"].agg(["count", "sum", "mean"])-- Period-over-period KPI
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(revenue) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1
ORDER BY 1;
-- Cohort retention skeleton
SELECT
first_order_month,
DATE_DIFF(order_month, first_order_month, MONTH) AS months_since_first,
COUNT(DISTINCT customer_id) AS retained_customers
FROM cohort_base
GROUP BY 1, 2
ORDER BY 1, 2;from scipy import stats
control = df[df["variant"] == "control"]["conversion"]
treatment = df[df["variant"] == "treatment"]["conversion"]
t_stat, p_value = stats.ttest_ind(control, treatment)
print(f"Control mean: {control.mean():.4f}")
print(f"Treatment mean: {treatment.mean():.4f}")
print(f"Lift: {(treatment.mean() - control.mean()) / control.mean() * 100:.2f}%")
print(f"p-value: {p_value:.4f} ({'significant' if p_value < 0.05 else 'not significant'} at 95% CI)")Always save charts with consistent settings so outputs are report-ready:
import matplotlib.pyplot as plt
# Use tight_layout and explicit dpi for all saved charts
plt.tight_layout()
plt.savefig("chart_name.png", dpi=150)
# Always label axes with units and give titles that describe the insight, not just the data
plt.xlabel("Date"); plt.ylabel("Revenue ($)")
plt.title("Weekly Revenue Declined 12% in Q4") # insight-first, not "Revenue Over Time"## [Report Title] — [Date Range]
### Key Findings
- [Metric 1]: [Value] ([+/-X%] vs prior period)
- [Metric 2]: [Value] ([+/-X%] vs prior period)
- [Metric 3]: [Value] ([+/-X%] vs prior period)
### Trend Analysis
[2–3 sentences describing the dominant trend and its likely drivers.]
### Segments / Breakdown
| Segment | Value | Share | vs Prior |
|---------|-------|-------|----------|
| ... | ... | ... | ... |
### Recommendation
[One concrete action supported directly by the data above.]Dashboard: [Name]
Filters: Date range picker | Region dropdown | Segment multi-select
Row 1 — KPI Tiles (4 across)
• Total Revenue | MoM Growth % | Active Users | Conversion Rate
Row 2 — Primary Chart
• Line chart: Daily/Weekly [Metric] over selected date range, with prior-period overlay
Row 3 — Breakdown Charts (2 across)
• Bar chart: [Metric] by [Dimension]
• Pie/Donut: Share by [Segment]
Row 4 — Data Table
• Sortable table: [Dimension] | [Metric A] | [Metric B] | Change %
Refresh cadence: [Daily / Weekly]
Data source: [Table / file name]| Scenario | Recommended Chart |
|---|---|
| Trend over time | Line chart |
| Comparing categories | Bar / horizontal bar chart |
| Part-to-whole share | Pie or stacked bar |
| Distribution of values | Histogram or box plot |
| Two-variable relationship | Scatter plot |
| Multiple KPIs at a glance | KPI tiles / scorecard |
| Ranking | Sorted horizontal bar |
Before delivering any analysis, confirm:
010799b
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.