Infrastructure simplifying engineer
Date: 2019-01-25
I’d like to share my story about migration monolith application into microservices. Please, keep in mind that it was during 2012 - 2014. It is transcription of my presentation at dotnetconf(RU) 2014-10-12.
The main project idea was to crawl articles from the internet, analyze it, save & create user feed. On one hand, our infrastructure had to be reliable, but on the other hand, we were on a budget. As a result, we agreed that:
I’m going to share a story about changing every part of the infrastructure.
It was a simple part of the infrastructure. Crawlers should download, analyze & save. The first one implementation was a single crawler, however, the world was changing & a lot of different crawlers appeared. Crawlers were communicating with each other by MsSQL.
Downtime wasn’t a problem for crawlers, so it was really easy to scale them:
Our database was about 1TB.
There were different ways how to create a cluster:
As a result, we decided to use the 1st. I’d like to notice that we didn’t use witness because of async mode, so we created scripts for auto switch master -> slave & manual slave -> master.
The clock was ticking, a performance was degrading, we were searching bottlenecks. Sometimes, it wasn’t easy, i.e. were optimizing SQL queries by disk io, when, we found that the performance was low because of lack of RAM. However it was not enough, as a temporary solution, we migrated from HDD to SSD. On one hand, it increased the performance dramatically, but on the other hand, it wasn’t a long-term solution.
Our application had been migrated to a message queue. We were writing only results in the database. As a result, we reduced database load. But we faced a problem: how to organize a message queue cluster? At the first, we created a cold standby.
A web part consisted of two parts: static content & user dynamic content.
WEB static part of the infrastructure was about 2TB, it had to:
There were 2 major issues: how to sync files & how to create a web cluster. There were some ways how to sync files: buy a storage, use DFS & save files at each server. It was tough decision, however, we decided to choose the 3rd way. For the web cluster, we decided to use NLB & CDN.
It’s not a really good idea to use a single server for a high-load project, you have to balance traffic somehow. There were 4 ways in our case:
We chose the 4th way. We were balancing haproxy by DNS round robin & users by cookie.
A few months later, we faced that SQL performance was not good enough again. It was a sophisticated issue, however, after long conversations we decided to choose Availability & Partition tolerance from CAP theorem. As a result, we implemented a MongoDB cluster(sharding & replica). There was an interesting experience: how to create MongoDB backup, how to upgrade & a lot of MongoDB bugs.
We implemented the 3-2-1 rule:
Also, we created & tested disaster recovery plan. You can read about monitoring here.
As you can see, the infrastructure wasn’t as easy as pie. We had to update it somehow. Casual update looked like:
All logical steps were identical for staging/preprod/productions environments, however, it was slightly different at details. So, we created PowerShell scripts with OOP magic. It was a continuous process of improving our CI/CD infrastructure.
2012 | 2014 | |
---|---|---|
Servers | 3 | 60 |
RAM GB | 72 | 800 |
SSD GB | 200 | 10000 |
MsSQL GB | 150 | 700 |
MongoDB GB | 0 | 700 |
Articles per day | 10000 | 150000 |
It was an amazing story about moving 3 desktop PC into reliable infrastructure. Be patient & do plans.