Stack Overflow:

100 Million Uniques / Month on 9 Web Servers

David Haney | hachyderm.io/@haney

Stack Overflow Logo

David Haney

  • Sr. Director of Eng at Stack Overflow by day
  • Craft beer, movie, and gaming nerd by night
  • Creator: codesession.io
  • 8+ years @ Stack Overflow
  • Joined Stack as Developer

LET'S GO BACK IN TIME!

First stop: 2008

Tardis from Dr. Who

2008 Is Cool

2008 stackoverflow.com

StackOverflow.com Launches

  • C#, .NET, MSSQL, jQuery, Subversion
  • That's what founders knew
  • Started with off-the-shelf tools:
    • ASP.NET MVC
    • LINQ to SQL
    • MSSQL + SQL fulltext search
    • Built-in output caching

Lessons Learned

  1. Start with what you know

2009-2011: SO Gets Popular!

  • And starts chugging w/ more traffic
  • How do you fix performance problems?
  • MEASURE THEM!
  • ...But if no .NET profiling tools exist yet...
  • CREATE ONE FIRST!

MiniProfiler (c. 2011)

Stack Exchange Mini Profiler

MiniProfiler: https://github.com/miniprofiler/dotnet

Stack Exchange Mini Profiler

Profiling Perf

  • Shows you where it's slow
  • Lets you make it fast
  • Verifies your perf fixes
  • If you're not profiling, you're guessing
Clue board game

Lessons Learned

  1. Start with what you know
  2. Measure performance (w/ existing tools)

What Did MiniProfiler Reveal?

  • We wrote some slow methods
  • .NET Framework had big flaws (in 2011)

<3 you Microsoft!

Fixing Our Slow

  • All created objects cost memory; memory is finite
  • Calculations cost CPU; CPU is finite
  • Managed GCs are tricky
  • Enter Caching, Optimization, and Pooling

Caching / Memoization

  • Storing method output for future callers (thus skipping the expensive method after ~1st time)
  • Trades CPU for memory
  • Local or remote (start local)
  • LIFO, FIFO, etc. algs used for eviction
  • TERRIFIC for read-often write-rarely loads

Optimization

  • GC: what is created must be later destroyed
  • A common pitfall: string concatenation
  • In .NET, strings are immutable (read-only) reference objects.
  • So each "string1" + "string2" creates a new object
  • Solution: StringBuilder
  • Uses byte buffer under hood, strings are written as raw bytes

Pooling

  • Creating the same object a lot? Pool it!
  • Store n copies of the object in a queue, push when done and pop when needed
  • Reset / reinitialize object on push
  • Reusing objects ~= constant memory
  • Object pool example (a bit dated)

Fixing Framework Slow

  • LINQ to SQL very slow (in 2011)
  • Entity Framework 4 was slow too
  • Too much manual SQL DTO mapping code
  • USE A BETTER .NET ORM!
  • ...But if no better .NET ORM exists yet...
  • CREATE ONE FIRST!

Prototype Dapper (Sam's ORM)

var cars = con.Query<Car>("SELECT * FROM cars").ToList();

ORM Benchmarks

Dapper (c. 2011)

Dapper Dot Net

Dapper: https://github.com/DapperLib/Dapper

Lessons Learned

  1. Start with what you know
  2. Measure performance (w/ existing tools)
  3. Fix / replace the slow (w/ existing tools)

More Traffic = More Logs

  • Logging in SQL table, but hard to grok
  • Tough to see trends, groups, counts
  • Queries are hard: avg errors / min? hour? day?
  • USE A LOG AGGREGATOR!
  • ...But if no good log aggregator exists yet...
  • CREATE ONE FIRST!

Opserver (c. 2011)

Opserver Opserver Opserver

Opserver: https://opserver.github.io/Opserver/

Lessons Learned

  1. Start with what you know
  2. Measure performance (w/ existing tools)
  3. Fix / replace the slow (w/ existing tools)
  4. Logs are a gold mine (if mined)

DBs Are Slow (But Needed)

  • Tune database perpetually
  • Use indexes and covered queries
  • Avoid smells like everything WITH (NOLOCK)
  • Readonly replica reduces locks
  • Measure your queries (Opserver can help)
Opserver SQL Queries

Lessons Learned

  1. Start with what you know
  2. Measure performance (w/ existing tools)
  3. Fix / replace the slow (w/ existing tools)
  4. Logs are a gold mine (if mined)
  5. Measure & tune your database

Defer Complexity

  • Our initial tools worked well for YEARS!
  • As load caused tools to fail:
    • ASP.NET MVC → .NET Core MVC
    • LINQ to SQL → Dapper
    • SQL fulltext search → Elasticsearch
    • Built-in output caching → Redis

Lessons Learned

  1. Start with what you know
  2. Measure performance (w/ existing tools)
  3. Fix / replace the slow (w/ existing tools)
  4. Logs are a gold mine (if mined)
  5. Measure & tune your database
  6. Solve actual problems, not future ones

Fast Forward to 2014

  • 54th most visited website in the world
  • 110 Stack Exchange network sites
  • 4 million users
  • 8 million questions
  • 40 million answers
  • 560 million pageviews / month
  • Just 25 servers

Lessons Learned

  1. Start with what you know
  2. Measure performance (w/ existing tools)
  3. Fix / replace the slow (w/ existing tools)
  4. Logs are a gold mine (if mined)
  5. Measure & tune your database
  6. Solve actual problems, not future ones
  7. Performance is a (cost-saving) feature

Stack Overflow Today (2022)

  • 179 Stack Exchange network sites
  • 1.3 billion monthly page views (network)
  • 51.2 million unique SO visitors / month
  • 22.8 / 33.7 million SO questions / answers
  • ...AND STILL JUST 25 SERVERS

9 (+2) Web Servers

Web Servers

4 SQL Servers

SQL Servers

2 Redis Servers

Redis Servers

3 TagEngine Servers

TagEngine Servers

3 Elasticsearch Servers

Elasticsearch Servers

2 HAProxy Servers

HAProxy Servers

LESSONS LEARNED RECAP

  1. Start with what you know
  2. Measure performance (w/ existing tools)
  3. Fix / replace the slow (w/ existing tools)
  4. Logs are a gold mine (if mined)
  5. Measure & tune your database
  6. Solve actual problems, not future ones
  7. Performance is a (cost-saving) feature

Sound Fun? Work With Us!

https://s.tk/jobs

THANK YOU

David Haney | hachyderm.io/@haney