#GlueCon 2014 Notes: DevOps vs. the Enterprise: What We Can Learn from Mainframe Developers – Mike Baukes, ScriptRock

(Notes by @tomkane)

DevOps vs. the Enterprise: What We Can Learn from Mainframe Developers – Mike Baukes, ScriptRock

  • “Going big & screwing up on DevOps implementations”
  • What is a “DevOps implementation”? No one knows. But that doesn’t stop people from using that term.
  • DevOps is not like this: http://www.scriptrock.com/devops-in-a-box
  • Case Study
    • High frequency trading company
    • Old industry. Legacy systems. Fighting inertia.
    • Timeframe of 9 months to spin up continuous integration & delivery
    • Threatening a marketwide disruption in Australia
    • Multi-million dollar project w/ multiple layers
  • Key Observations
    • Are development environments representative of production?
    • Are we at risk?
    • The Risk/Security teams loved DevOps
  • Existing Company Culture
    • Teams were spread across multiple geogrphies & org charts
    • Avg. request fulfillment was around 2 weeks
    • Fulfillment could take months
    • Rife w/ process debt
    • A pre-cursor to this project had already taken ~16 months. Only 70% to spec.
  • Team Culture for this project
    • Recruited the best
    • Immediately caused tension w/ other teams. (210 other devs)
    • This caused elitism and arrogance.
  • Existing Processes
    • Plan, Build, Run through different environments
      • ~170 apps
      • ~750 machines
      • ~10 environments
    • Configuration management & environment provisioning were bottlenecks
  • Approach
    • Automate w/ Chef
    • Speak w/ teams in their language
    • Project Managers loved reporting artifacts from Test-Driven Infrastructure
    • Spent 90% of time writing code. Only 10% educating stakeholders
      • “Ended up doing it all ourselves in order to hit deadlines.”
      • Some things were blown away 6 months later because we didn’t invest in sustainability. Left a train wreck behind us.
  • Lessons
    • Empathize w/ stakeholders
      • Spend more time understanding the problem & existing systems, processes, etc.
    • Don’t automate what you don’t understand
    • DevOps is organizational change
    • Any difference between dev & production means #fail
  • How I would do this next time
    • Wave 1
      • Know what you have
      • Configuration drift detection
      • Configuration visibility across all devices
    • Wave 2
      • Map accountabilities to org
      • Executable documentation (Cucumber?)
      • Pre-Flight Checks (Software Checklists)
    • Wave 3
      • Test Driven Automation (after you understand the problem)
      • Service Driven Infrastructure
      • Remove duplicate documentation