Back to Blog

Making sense of social attribution

7 min readWired Sixth Intelligence
#data-engineering#aws-glue#s3#marketing-ops#quadrant-analysis

Making sense of social attribution

The attribution gap we didn't see coming

We had a marketing team watching posts go viral. Engagement numbers looked great. But when asked "Which posts actually drove revenue," we had no idea.

That is the problem with most social media dashboards: they are islands. Likes, shares, reach, sentiment, each metric lives in its own corner of the internet, and none of them talk to each other. You end up making decisions based on whatever number looks best that week.

Here's the ugly truth: our performance data sat in Looker with strict access controls. Our social sentiment lived in Sprinklr, exported as CSVs every morning. Different teams, different tools, different stories.

So we built something to bridge the gap. Not perfect automation, pragmatic workarounds that actually work in a messy enterprise environment.

Two metrics, one picture

Two metrics drive this analysis, and it helps to define them upfront.

  • First streams: The number of unique users who consume a piece of content for the first time. It captures the initial engagement moment — the first interaction a user has with a title.
  • Account reach: The total number of unique accounts exposed to content, whether through organic discovery, paid promotion, or owned channels.

Together they tell a story: first streams tells you which content hooks new viewers, account reach tells you how far your distribution goes.

Making sense of the mess: the quadrant

Before touching any code, we needed a framework marketers could actually use. We plot normalized first streams against normalized mentions, and suddenly the chaos has a pattern.

What the quadrants actually mean:

  • High Performers (High first streams, high mentions): Double down. These posts are your winners.
  • Platform Push (High first streams, low mentions): The message is spreading but people aren't talking about it. Consider adjusting the content.
  • Leave It (Low first streams, low mentions): Content isn't resonating. Kill it or fix it.
  • Underdogs (Low first streams, high mentions): Your audience loves it, but not enough people see it. Amplify these.

The stack we chose

We went hybrid: serverless for heavy historical data, PostgreSQL for fast-moving cache.

ComponentWhy we picked it
AirflowPython-based orchestration, industry standard
Sprinklr exportsDaily CSVs give us cross-account mentions
StreamlitFast Python web apps stakeholders can actually use
PostgreSQLFast caching for uploaded CSVs (first streams & account reach)
Amazon S3Cheap storage for messy, multi-format data
AWS GlueETL without paying for a database we don't need
AthenaQuery layer for Sprinklr mentions data

The pattern: Sprinklr data (large, historical) goes S3 → Glue → Athena. Uploaded CSVs (small, frequently refreshed) go Streamlit → PostgreSQL. The dashboard joins them on the fly.

How the Pipeline Fits Together

Getting the data in

Sprinklr data goes into S3 first, then through Glue ETL, into Athena. The scheduled CSV exports from Sprinklr land here, automated, reliable, queryable.

First Streams & Account Reach takes a different path. Marketers upload CSVs directly through Streamlit, which stores them in PostgreSQL. The data gets cached, refreshed on demand, and stays close to where people actually work.

Cataloging the chaos

AWS Glue lets us treat S3 folders like SQL tables without actually running a database. Schema-on-read means we don't pay thousands for infrastructure that changes every time a social platform tweaks their export format.

The actual logic

The heavy lifting happens when data meets. Sprinklr data lives in Athena (S3-backed). First Streams & Account Reach lives in PostgreSQL (Streamlit cache). The join happens on the fly when someone queries the dashboard.

# Join Athena (Sprinklr mentions) with PostgreSQL (cached first streams data)
def correlate_intelligence():
    """
    Fetches Sprinklr mentions from Athena, cached first streams from PostgreSQL.
    Joins and normalizes on the fly.
    """
    # Query Sprinklr mentions from Athena
    mentions_df = query_athena(
        "SELECT * FROM social_mentions WHERE date >= CURRENT_DATE - 7"
    )
    
    # Fetch cached first streams data from PostgreSQL
    streams_df = query_postgres("SELECT * FROM first_streams_cache")
    
    # Join on date/account
    joined = mentions_df.merge(streams_df, on=['date', 'account'])
    
    # Normalize and categorize
    norm_streams = joined['first_streams'] / joined['first_streams'].max()
    norm_mentions = joined['mentions'] / joined['mentions'].max()
    
    # Categorize based on median performance
    if norm_streams > median_streams:
        return "High Performers" if norm_mentions > median_mentions else "Platform Push"
    else:
        return "Leave It" if norm_mentions > median_mentions else "Underdogs"

The key insight: We don't move all the data into one place. We query where it lives and join on the fly. PostgreSQL handles the fast-changing first streams data; Athena handles the historical Sprinklr mentions. The dashboard stitches them together when you need them.

Streamlit makes it actionable

This is where it clicks. Our Streamlit dashboard is designed for action. Marketers hover over dots and immediately see which campaign gets more budget versus which one needs a PR fix.

Upload flow: CSVs uploaded through Streamlit land in PostgreSQL, cached for fast queries. The cache refreshes when needed, not on a rigid schedule, so teams can upload fresh data whenever they want.

Query flow: When someone opens the dashboard, it queries Athena for Sprinklr mentions and pulls from PostgreSQL for cached first streams. The join happens in real-time, and results populate instantly.

What we found

High Performers (High first streams, high mentions): These are your winners. Pour more fuel on the fire.

Platform Push (High first streams, low mentions): The message is spreading but people aren't talking about it. Consider adjusting the content.

Leave It (Low first streams, low mentions): Content isn't resonating. Kill it or fix it.

Underdogs (Low first streams, high mentions): Your audience loves it, but not enough people see it. Amplify these.

What changed

MetricBeforeAfter
Response timeDays of manual reconciliationNear real-time
Data ownershipVendor lock-inOwned logic
Team alignmentSiloed teams, different storiesOne view, one conversation
DecisionsGut feeling, vanity metricsCorrelation-based, data-backed

What we learned

  1. Solve for politics first. Perfect automation is a myth in enterprise settings. Build for the "semi-automated" reality where humans still click buttons and email CSVs.

  2. Own your logic. Don't let some BI tool decide what "successful" looks like. Define your own quadrants based on what actually matters to your business.

  3. Keep it serverless. S3, Glue, and Athena give you a stack that's cheap to run and scales when you need it. No database maintenance. No hardware upgrades.

The point

This pipeline turns social media data from isolated metrics into actual insights. Marketing teams stop guessing and start knowing.

The goal isn't just data. It's clarity that moves the needle.

Thanks

This project started with late-night brainstorms about marketing bottlenecks. Haikal (Head of Digital) pushed us to bridge the attribution gap. Jerry (Social Media Manager) gave feedback, aligned the required data, and helped shape every decision. Justin, another intern, bootstrapped the sprinkler data pipeline before Bernice took over on the refinement and build.


This article is part of Wired Sixth's "Intel" series on data strategy and engineering.

Comments (0)

Loading comments...