September 6, 2015 / permalink
Every profession has a toolbelt, that set of tools they need to do their job and that has to be within reach at all times: writers have their pen and paper, chefs have their knives, carpenters have their actual tools.
And if you do DevOps, then your tools are usually on your computer, tablet, smart phone and sometimes your smart watch.
Our tools vary but all serve the purpose of keeping the servers running, keeping the developers happy, and even more importantly, keeping the clients happy.
Here’s a list of tools I use in my day to day DevOps work:
- Operating Systems
- Linux (RHEL, CentOS, Ubuntu, Debian)
- Unix (Solaris, AIX, HP/UX, etc.)
- Mac OS X
- Infrastructure as a Service
- Amazon Web Services
- Digital Ocean
- Virtualization Platforms
- Containerization Tools
- Linux OS Installation
- Configuration Management
- Test, Build and Deployment Systems
- Travis CI
- Circle CI
- Web Servers
- Monitoring, Alerting, and Trending
- New Relic
- Uptime Robot
- Push Over
This list will probably grow every other week, DevOps means staying on top of this growing stack of tools and using which ever tools help you get the job done more efficiently.
July 16, 2015 / permalink
The only productive way to load test web site is to test a real-world page that itself performs –
- Loading and processing of multiple PHP files.
- Establishment of multiple MySQL connections, and performing multiple table reads.
This is the minimum, because the test of an almost empty and static page (used by most examples) tells us nothing about how
the different parts of a web-server hold up under stress, nor how that web-server setup will handle real-world concurrent
connections to websites running on web-apps such as WordPress.
- Ideally, this test would also:
- perform GETs of all page assets (css, js, images) and
- simulate traffic of which 10% is DB writes (we’ll skip this because its more complicated to set up).
Luckily, this type of test is very easy to do in a quick (and somewhat dirty) way by using Apache’s
ab (Apache Bench)
application (that’s included with each Apache version in its \bin directory).
ab test won’t be the most extensive test, and it comes with its own caveats, but it will quickly show you –
- If there is an immediate problem with the setup (this problem will manifest itself in Apache crashing).
- How far you can push the Apache, PHP, and MySQL web-server (with concurrent connections and page request load).
- And what Apache and PHP settings you should modify to get better performance and eliminate the crashes.
There are some problems with
ab to be aware of –
ab will not parse HTML to get the additional assets of each page (css, images, etc).
ab can start to error out, breaking the test, as the number of requests to perform is increased, more connections are established but not returned, and as the load increases and more time passes (see
ab -h for explanation of
ab is an HTTP/1.0 client, not a HTTP/1.1 client, and “
Connection: KeepAlive” (
ab -k switch) requests of dynamic pages will not work (dynamic pages don’t have a predetermined “
Content-Length: value“, and using “
Transfer-Encoding: chunked” is not possible with HTTP/1.0 clients).
AB and the
KeepAlive issue –
KeepAlive – Apache Directive
A Keep-Alive connection with an HTTP/1.0 client can only be used when the length of the content is known in advance. This
implies that dynamic content will generally not use Keep-Alive connections to HTTP/1.0 clients.
Compatibility with HTTP/1.0 Persistent Connections – Hypertext Transfer Protocol HTTP/1.1 Standard
A persistent connection with an HTTP/1.0 client cannot make use of the chunked transfer-coding, and therefore MUST use a
Content-Length for marking the ending boundary of each message.
Chunked transfer encoding – Wikipedia
Chunked transfer encoding allows a server to maintain an HTTP persistent connection for dynamically generated content. In
this case the HTTP Content-Length header cannot be used to delimit the content and the next HTTP request/response, as the
content size is as yet unknown.
ab will flood the Apache server with requests – as fast as it can generate them (not unlike in a DDoS attack). AB has no
option to set a delay between these requests.
And given that these requests are generated from the same local system they are going to (i.e., the network layer is bypassed),
this will create a peak level of requests that will cause Apache to stop responding and the OS to start blocking/dropping
additional requests. Especially if the requested page is a simple PHP file that can be processed within a millisecond.
In this context, with
ab, the bigger the
-c (concurrent number of requests to do at the same time) is, the lower
-n (total number of requests to perform) should be… Even with a
-c of 5,
-n should not be more than 200.
Expect the behavior of the
ab tests to be very non-deterministic under higher concurrent loads, they will fail and
succeed randomly. Even a
-c of 2 will cause issues.
These are the error messages displayed by
apr_socket_recv: An existing connection was forcibly closed by the remote host. (730054)
apr_pollset_add(): Not enough space (12)
When this happens (a message is displayed that Apache has crashed), just ignore it (Apache is still running), and keep repeating the test until “Failed requests:” is reported as “0”, AND “Percentage of the requests served within a certain time (ms)” is about 2-20x between the 50% and 99% mark (and not 200x). Otherwise, the test is not reliable due to the issues that present themselves when
ab floods Apache on loopback (and due to how the OS responds to that flood).
This is what you should see on a good test of a simple index.php page…
ab -l -r -n 100 -c 10 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/
Benchmarking www.example.com (be patient).....done
Server Software: Apache/2.4.10
Server Hostname: www.example.com
Server Port: 80
Document Path: /
Document Length: Variable
Concurrency Level: 10
Time taken for tests: 0.046 seconds
Complete requests: 100
Failed requests: 0
Keep-Alive requests: 100
Total transferred: 198410 bytes
HTML transferred: 167500 bytes
Requests per second: 2173.91 [#/sec] (mean)
Time per request: 4.600 [ms] (mean)
Time per request: 0.460 [ms] (mean, across all concurrent requests)
Transfer rate: 4212.17 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 2
Processing: 1 4 5.9 3 33
Waiting: 1 4 5.9 3 32
Total: 1 4 6.0 3 33
Percentage of the requests served within a certain time (ms)
100% 33 (longest request)
Before Performing The Load Test
Make sure that –
- You’ve rebooted the system and don’t have anything extra open/running (i.e., YouTube videos playing in your Browser).
- These extra PHP extensions are not loaded: Zend OPcache, APC, nor XDebug.
- You wait 4 minutes before performing another
ab test to avoiding TCP/IP Port Exhaustion (also known as ephemeral port exhaustion).
- And in a test where
KeepAlive works (it doesn’t in
ab tests getting dynamic pages), the number of Apache Worker Threads are set to be greater than the number of concurrent users/visitors/connections.
- If Apache or PHP crashes, you’ve rebooted the computer or VM before performing another test (some things get stuck and continue to persist after Apache and/or mod_fcgid’s PHP processes are restarted).
Start The AB Test
Install WordPress as http://www.example.com/blog
Open the terminal.
Restart Apache and MySQL, and prime the web-server (with 1 request):
ab -n 1 -c 1 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/blog/
- Run the Apache Bench program to simulate –
1 concurrent user doing 100 page hits
This is 100 sequential page loads by a single user:
ab -l -r -n 100 -c 1 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/blog/
This shows you how well the web-server will handle a simple load of 1 user doing a number of page loads.
5 concurrent users each doing 10 page hits
This is 100 page loads by 5 different concurrent users, each user is doing 10 sequential pages loads.
ab -l -r -n 50 -c 10 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/blog/
This represents a peak load of a website that gets about 50,000+ hits a month. Congratulations, your website / business / idea
has made it (and no doubt is on its way up).
10 concurrent users each doing 10 page hits
This is 100 page loads by 10 different concurrent users, each user is doing 10 sequential pages loads.
ab -l -r -n 100 -c 10 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/blog/
This is where the load starts to really stress test the web-server, as 10 concurrent (simultaneous) users is a lot of traffic.
Most websites will be lucky to see 1 or 2 users (visitors) a minute… So let me say it again, 10 users per second is a lot
30 concurrent users each doing 20 page hits
This is 600 page loads by 30 different concurrent users, each user is doing 20 sequential pages loads.
ab -l -r -n 600 -c 30 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/blog/
This is the edge of what a non-cached WordPress setup will be able to handle without crashing or timing-out the web-server
ab itself). This type of load represents an extremely active website or forum, the top 1%.
90 concurrent users each doing 30 page hits
This is 2700 page loads by 90 different concurrent users, each user is doing 30 sequential pages loads.
ab -n 2700 -c 90 -k -H "Accept-Encoding: gzip, deflate" http://www.example.com/blog/
Only a fully cached (using mod_cache) Apache setup will be able to handle this type of a load. This represents some of the
busiest sites on the net, and there is no hope of this not maxing out and crashing (if your settings are not just right) the
web-server with a non-cached WordPress setup.
Analyze the AB Results
We only care about 3 things:
How many Requests Per Second are we seeing? The other metrics are not really useful, as they are not representative of
anything real in this
ab context. * This value will remain somewhat the same regardless of the concurrency level used.
Are there any errors in the website’s or Apache’s (general) error and php logs? * When things stat to choke, PHP memory
issues will start coming up. A lot of PHP scripts also begin to crash (and take out Apache + PHP processes) if they are not
written with concurrency in mind.
At what concurrency level does Apache crash and/or time-out? * If this is happening at a lower concurrency level, something
is wrong and you need to adjust these settings either lower of higher…
Also, in my experience, the switch from 32 bit to 64 bit Apache, PHP, and MySQL versions only provides limited/marginal
performance gains (and in some cases it’s even negative).
To sum everything up, 99% of all performance gains will come from utilizing Apache’s caching mechanisms (via mod_cache), using
PHP Zend OPcache (extension), and afterwards (once the bottleneck is moved from Apache with PHP to MySQL), improving MySQL performance by tuning my.ini settings, and optimizing/restructuring MySQL queries by utilizing MySQL’s Slow Query log (to see what the problem is).
Having said that, there are also performance robbing issues that can exist on the OS, in the Apache/MySQL/PHP settings, and even
the client’s Browser, that are covered here.
July 13, 2015 / permalink
NoSQL databases are a class of databases in which data is NOT stored in tabular relations - Unlike other relational database solutions, and this is why it is called “Not Only SQL”. Data which is stored in other data-structure other than tabular relations are sometimes better for some scenarios which are outlined later in this post.
NoSQL databases are great at doing a bunch of things. For example - storing big data for internet scale applications, scenarios which prefer low latency data-storage, improved performance, and more.
Here are three signs why you should be considering NoSQL databases for your solution:
#####Your data is BIG:
You’re collecting a ton of data over the internet - like web analytics, sensor data from the Internet of Things (IoT), saving a lot of user data (like tweets in twitter), etc and you need a place to store + have an efficient way to do a “distributed read” on all of the data. If this is what you’re looking for - then NoSQL is a perfect fit for you.
#####You have a “schema less” data:
Data which you’re storing in the traditional RDBMS databases like SQL has a pre-defined schema ahead of time. If your schema changes a lot OR if you don’t have a pre-defined schema where you can add a field anytime you like then NoSQL is the good fit for these type of scenarios.
There are two things you can do when you want to scale your solution. You can either scale up where you keep adding more memory, processing capacity and storage to a single machine or you can scale out where you can have multiple machines supporting your solution.
NoSQL is a perfect solution when you have a ton of data that doesn’t fit on a single server. Most of the NoSQL databases are horizontally scalable - What I mean by this is you can shard or divide your data into multiple servers and still have efficient access to your data. NoSQL databases are elastic, you can have thousands of nodes / servers and you can scale horizontally well.
####What will you be missing?
#####No relations and joins:
Most NoSQL databases for example MongoDb have no relation support between tables (Graph Databases like Neo4j does, but that’s a completely different type of database to look at) - Which means there’s no support for joins when you query your data. If you have joins between your data you can combine documents together and save them as sub-documents or do a manual join by running two queries over your database.
#####No atomic operations:
Most of the NoSQL databases like MongoDB have no atomic operations or transactions across multiple tables / record. For example, In MongoDB a single write is atomic but if you write multiple records / documents at the same time then as a whole they’re not atomic. If you ever need to have transaction support on your data which is critical for your solution. You should store that piece of data in RDBMS instead.
NoSQL has no reporting in general - as you’re used to when you were on RDBMS databases. If you want to generate reports, create graphs, or do something with all of the data in your NoSQL stack, you’ll need to start coding and write what we call - “map-reduce” jobs for the reports you want.
####Where is NoSQL the best fit?
These are few of the sample scenarios where NoSQL is the best fit. Typically these include high-volume / low-latency data-storage scenarios. Few examples of these are:
#####Storing logs / user analytics data:
Servers generate a large number of logs that contain useful information about their operation including errors, warnings, and user behaviour. By default, most servers, store these data in plain text log files on their local file systems.
While plain-text logs are accessible and human-readable, they are difficult to use and analyse. NoSQL Column Oriented database like Cassandra is a perfect databases to store log files. Cassandra can handle huge number of writes which is useful when you store logs or Meta data about where user behaviour on your site.
#####Internet of Things - Storing data from the sensors:
Sensor-enabled devices are pumping out a ton of data, but sensor data is only useful if you can do something with it. With NoSQL databases, you can make sense of sensor data, building applications never before possible.
The Internet of Things is a world where all your physical assets and devices are connected to each other and share information, making life easier and more convenient. To enable this inter-connected world, You need to have all the data these devices are pumping out and making sense of it and this is one of the scenarios where NoSQL is the great fit.
####So, do I migrate?
It depends on your application which you’re building. If your application has a ton of transactional and relational features - You can go ahead with RDBMS, if your application has a scalability issue with your current data infrastructure. Then I think you seriously need to use NoSQL. You can even use both and this is what most companies do.
For piece of data which is related, they tend to go ahead with Relational Database, but for a piece of data which has no relations / or is big data. They use a NoSQL database solution and I think this is the best approach to go forward with.
July 2, 2015 / permalink
I recently rolled my own PAAS solution using Dokku, and ran into a couple problems that I think others might run in to, as well. It took just a few hours and I have a very cheap, but efficient CI stack. All things considered, the entire stack is made up of:
- Circle CI
- dokku-alt (and Digital Ocean for VM hosting)
My main goal was to be able to simply run my test commands, build my Docker container, and deploy the app every time I pushed to Github.
The first thing that you need to is Docker-ize your app. There are many different tutorials on this, so I will assume that this part has been done.
dokku is an open source competitor to Heroku. It is gaining a lot of popularity and even Digital Ocean has a premade VM that runs
dokku out of the box. While I wanted to go this route, it could not handle Docker containers.
Another open source project based on
dokku did, however, called dokku-alt.
dokku-alt is essentially
dokku but with much more features that made things very simple. The setup is a bit more manual than the Digital Ocean
dokku but it is still pretty easy. To set it up, start a new Digital Ocean/Ubuntu VM. I would recommend selecting at least the second “size” Digital Ocean droplet as I tried the $5/month one and was constantly running out of memory for
npm installs. I also would recommend including your ssh keys here so that Digital Ocean can add them on creating the droplet and you won’t have to do this later.
If you are new to Digital Ocean, it is highly recommend and if you’d like, you can use my referral link and get $10 towards a droplet (one free month in this case!).
Once the droplet is set up, go to the DNS records and route a new A record to the droplet’s IP that was just created (this is optional but makes things a lot nicer). For this tutorial, I added
*.ci.freekrai to the IP of the droplet. I will explain why this is nice later in the tutorial.
After everything is setup,
ssh [email protected]_IP or
ssh [email protected], if you set up the A records, on your local machine. Now, run the script suggested by dokku-alt.
$ sudo bash -c "$(curl -fsSL bit.ly/dokku-alt)"
This will handle the full installation for you, and at the end, will tell you to point to
ci.website.com:2000 to finish the setup. I used the pre-set SSH key, as it was mine already, and then changed the IP to be
ci.website.com and then enabled virtual hosting. As you can see it allows for different projects to have much better URLs such as
app.ci.website.com. This is the reason that I suggested to add the wilcard A record to Digital Ocean.
After you save these settings,
ctrl-c the start up script out of the
dokku-alt vm. That’s it!
You can now test that this works by adding a new remote origin to your Docker project’s git remotes. To do use this following command, and be sure to change the URL correctly:
Also, be sure that you change
app to whatever you want
app.ci.website.com to be. I made mine
dev in this case because it would be the development version of the app. After this is done, run
git push dokku master to push the app up to
dokku-alt will recognize the
Dockerfile and appropriately start the container. You will know if its working if at the end it says something about visiting
One issue that I had a problem with was exposing the proper port! If
dokku-alt does not detect that ports 80, 8080, or 5000 are not
EXPOSE‘d using the
Dockerfile, it won’t actually launch the site.
CircleCI is a continuous integration service (written in React!!) that integrates well with Github and is very cheap. It is actually free if you are okay with only running one CI at a time. In my case, this was perfectly acceptable as I would never have too many builds going on at once.
To begin sign up with your Github account and link your project. That’s technically it. At this point, Circle recognized that I was running a Node.js app so it read my
package.json and built my Docker container. It ran the tests based off my test suite in
package.json and simply ended. You may or may not have to do more to get Circle to work with your app specifically. This is where the
circle.yml file comes in. The docs are very good if you find that you have to use more configuration.
Add a private ssh key to the
Project Settings for this specific project. In my case, I used my own private key but it would definitely be better to create a deploy key for Circle and then configure
dokku-alt to receive that deployments (and deployments only) from that key.
For this tutorial, however, we will use the
circle.yml file to deploy to our
dokku-alt instance at the end of a successful test. It is very simple:
All it says is at the
deployment step, in
production, on the
master branch, run
./deploy.sh. This allows you to have multiple different configurations for production, demo, staging, etc. and can all run different deploy scripts.
circle.yml file in the root of you repository, and then create another new file called
deploy.sh. This file is even simpler:
We just do exactly the same thing we did as before to see if
dokku-alt was running correctly! Just push to the
dokku branch, as if it were being done manually. Be sure to
chmod +x deploy.sh so that Circle can execute the file, and push to master.
At this point you can watch Circle receive the new push, initialize your container, run your tests, and then, on success, run the
So far this has been working great, and I hope that if you followed this tutorial, you might avoid some of the mistakes that I made. If you have any questions, please email me or find me on Twitter.