A Behind The Scenes Look at Daylite Cloud and How We Monitor Its Performance

Our Company / December 11, 2015 / Kristie

Last week we announced the biggest news in Marketcircle history– Daylite Cloud. We’ve had some customers ask questions about how we go about tracking its performance, so we’ll give you a behind the scenes look at what we’re using to monitor Daylite Cloud.

monitoring clouds_600_orange

Monitoring Daylite Cloud

With any cloud service, you need to be able to monitor it. Monitoring was built into the core of Daylite Cloud. It was thought about before even starting the new architecture. Facilities for collecting stats and the tools to emit these numbers were there from day one.

With Daylite Cloud, we’re collecting right down to the minutiae of stats that it’s almost ridiculous. For example, we monitor how long various parts of each sync takes. Every sync has 80 metrics associated with it. These metrics show us what part of the sync happened at what time, how long it took, etc. By having this information so granular, we can quickly dissect problems and fix them.

This way if sync speed were to slow down we could pinpoint what’s slowing it down. By knowing which component of the sync is slowing down, we know where to look into first instead of assuming where the problem is.

Here are two of the six boards that we monitor for sync.

Sync Performance

Here you can see a breakdown of the different parts of a sync. Generally we want things in the millisecond and microsecond range.

Monitoring-SyncPerformance1

Sync Overhead

This shows us the overhead required to make a sync happen such as memory and CPU. We set thresholds for this and use a service called PagerDuty to notify us if anything reaches the threshold. By monitoring the Sync Overhead, we can see for example if it’s taking up too much memory.

Monitoring-sync-overhead

Sync isn’t the only thing we monitor…

DAV Performance

We also monitor performance and overhead for CardDAV and CalDAV so we can look at sync speed, how much memory is being taken up, etc. CardDAV & CalDAV are the industry standard languages that Apple Contacts and Calendars use to communicate with other apps. To learn more about how Daylite uses CardDAV and CalDAV, you can read our blog 8 Benefits of Sharing Integration with Apple Contacts & Calendars.

Often times changes can happen outside of our control (ex. if Apple makes changes). By tracking these metrics it allows us to figure out what is causing the issue so we can resolve it for our customers using CardDAV and CalDAV.

In one instance we had a customer that was using a third party app and we could see that the client app kept asking for verification of credentials. By looking at DAV performance, we could see that the authentication was failing and asking frequently. We knew right away that it meant the customer’s password was incorrect and we let them know. Having these stats so fine grained is what allows us to identify the root of any problem.

Grafana - DAV Performance copy

Grafana - DAV Performance2

monitoring-DAV Overhead Overview

Attachment Data

We monitor attachments because we’re dealing with terabytes of attachment data that is stored (fully encrypted) on third party services. We monitor attachments so we can look at the amount of data in attachments being downloaded, uploaded, etc. over a period of time.

Monitoring-Attachment Server

Account Creation

The Daylite Bridge board allows us to track the time it takes for account creation. It also shows us the creation time when you invite a user.

Monitoring-Daylite Bridge

Emails

We use Mandril for delivering emails such as welcome emails, inviting a user, requesting a password change, etc. We monitor these emails to make sure that they’re getting sent out in time.

monitoring-emails

Authentication

A number of services need to verify credentials (such as DAV) so we need to make sure that this is being done in a reasonable timeframe. This board shows authentication for DAV.

Monitoring-authentication-DAV

Postgress

We use Postgress (on FreeBSD with ZFS) heavily. Deleted rows tell us whether we should vacuum the database more aggressively or not. Vacuuming the database directly impacts performance. Basically, it’s an indicator of when we need to “empty the trash” so to speak.

Monitoring-postgress

Network Traffic

Here’s an example of the network activity that we monitor for specific services. We also have other boards that show more general activity.

Monitoring-Network Traffic

These are just a few of the things we monitor for Daylite Cloud. By putting this infrastructure in place, we’re able to pinpoint issues and mitigate unexpected issues that may arise. Being that most of our customers are also entrepreneurs, you probably know as well as we do that good old Murphy is always ready to pounce when you least expect. This way, we’re better prepared. 😉

Join 38,877 subscribers making clients happy and growing their business.