Demo culture for remote companies

The engineering community has undergone a remote transformation in the last two years. With such a radical change in corporate culture comes a number of challenges that an office culture never had to face. Is scrum still relevant in fully remote companies? Are standups needed? What is the process to onboard new hires into the engineering culture? How do we invent better remote-first approaches, rather than trying to (dysfunctionally) emulate an office experience in a remote environment?

A well implemented “demo culture” can help companies establish healthy habits for sharing, soliciting constructive comments without specifically asking for them, being a source of inspiration, and facilitating cross-team collaboration while preventing “silo-building” within the company. As we continue to build The Company of Golioth, we are also building The Culture of Golioth. This article will walk through some of the already implemented pieces.

How Demo Culture works

Demo culture also informally translates “lunch conversations” you might have had in an in-office setup into a more efficient, asynchronous remote equivalent. You no longer need to find a place to sit at the right table: anyone can be part of any conversation.

On a day-to-day basis, we use two tools to achieve our goals:

  • Loom – A video sharing service to quickly record demos
  • A “#demos” slack channel to capture conversations around the topic

An engineer might have a new feature on the Golioth back end that they have been developing, but it only exists on their local branch and local build. They decide to set up a small example application that tests the feature and helps to showcase that it is part of an upcoming release.

These manifest as 2-3 minute videos showing a very small part of development, possibly a subtask of an overall story in our development. Because they act as markers throughout a sprint, we also reference these during sprint reviews. This is especially helpful if there is work that wasn’t planned, carries across multiple sprints, or is an idea for an upcoming feature. Sometimes…the demos are just for fun or to showcase something the engineer is learning.

Benefits

You might be asking, “Why are you so informal about it? If demos and showcasing work are critical to The Company of Golioth, why not make it a formalized task that engineers need to do every day/week/sprint/month?”

The informality is one of the best parts, and also why it’s a part of The Culture of Golioth. We want team members to share ideas they’re having and to make it part of their process as they explore new things. Most of all, we want people to be excited about the things they’re working on.

On a broader basis, the entire company benefits. First and foremost is early access to new ideas.  An engineer who shares what they are thinking about on a regular basis helps to start conversations that come up during formalized conversations (sprint planning). Since we are a fully remote company, video meetings are a necessity to transfer information, but also a hindrance to getting other work done (as in any company). Demos pollinate interesting ideas without the formality (and synchronous demands) of group video chats.

Another benefit is that demos always includes a hands-on approach, and not just a passing comment about “how something could be better”. This means that an engineer that doesn’t like the placement of a button could take the initiative to actually move it on their local machine and then showcase it, all without interrupting the flow of other engineers. They simply show the difference, state their case, and move on. There’s a lot of power in “show, don’t tell” to help foster understanding of the points being proposed.

When there are ideas that bubble up as a possible change to direction, either on a smaller development scale or even at a company level, we can start to discuss around this particular video showcase of the proposed change or demonstration. Because it’s informal, it’s just part of the “lunch conversation” mentioned earlier. Hashing out an idea that has come up in-person at lunch is much easier, unfortunately; you can immediately talk through what might be good or bad about a new approach. With Demo Culture, a video showcases the idea and the Slack channel acts as an anchor around which we can discuss. That feedback might also spin off future demos from the original submitter or from others. And they make it easier to come back to those conversations weeks or months later (more on that in just a bit).

Finally, in an ever growing company, we need more ways to interact with each other, as simple video chats will get harder to do. Watching someone on video is no replacement for sitting next to them at lunch, but can help to get to know someone’s personality and recognize their areas of interest. As teams grow, the tendency is for teams to silo as it becomes harder for everyone to keep track of more people. Seeing new, friendly faces showcasing something they’re interested in not only allows people in other teams to see what’s being worked on, but also might inspire new collaborations with other teams. Cross pollination is possibly the most beneficial output of Demo Culture.

Challenges

Trying to change company culture is always tough, even if you’re building it from the ground up. And we are! We are still a small team, so this is the right time to putting ideas and practices in place. But a corollary to that is the challenge as a company grows and shifts. When we bring on new employees, we will need to lead by example and work to maintain the culture over time.

As much as we can encourage employees to demonstrate what they’re working on, each employee is different. Some might want to only show completely finished work, whereas others might be comfortable showing an early demo of something. You need to plan for an uneven distribution, especially as new employees lean into the culture. At the top level, putting requirements on top of something like demos could negatively impact how they are perceived. And while we showed above that sharing demos should be very “low overhead”, it’s still important that employees feel like they have the time to think through and contribute something to the discourse.

Finally, the work itself can be another challenge. Someone working on a long term project might feel that there’s only something to contribute at the end. We still try to encourage them to share waypoints along their journey. It helps others in the organization understand what they’re working on, allows people to get excited about upcoming features, and solicits soft feedback from future stakeholders. In all cases, it’s important that we encourage our teammates to share whatever they feel comfortable and hopefully enjoy doing it.

Demo of a Demo

We must walk the walk! Let’s look at how this works in practice.

In our last blog post, we showed how to set credentials of devices over goliothctl and using the Zephyr shell. This was the result of lots of great work from our backend team over the past few weeks. Throughout the process, they shared progress and showed how it would work using different tools.

First our lead engineer Alvaro showed a demo of the provisioning from a shell on our test Arduino SDK:

Then Miguel showed how to set up credentials using goliothctl:

Then there was another demo where Alvaro showed how he was using goliothctl

Mike from the DevRel team showed how he was using this to provision devices for a training event, using similar methods that weren’t yet available to the public:

Finally this all culminated in a released feature that is now available to all users and that we shared publicly on the blog:

Demo Culture is an evolving concept

We are continuously codifying and experimenting with the concept of Demo Culture at Golioth. As we continue to build out this idea, we will share with the broader community. If you know of other companies doing similar things, we would love to hear about it on our community Discord or on social media. We will share some of the companies we took inspiration from below.

Reference material

The concept of “Demo culture” is not new as an overall topic. We have borrowed and modified from a range of different sources–most notably GitLab–to help bolster our initial concept of how it will work. What’s more, we continue to refine this process:

Why is it so hard to build IoT products? Beyond the many technical challenges during R&D, companies accustomed to building products that are not connected have to go through a challenge of cultural transformation. A big part of this is shifting from selling a standalone hardware product to selling a hardware product with a software service. This move a company from the low-bar of warranty-based service to a company working with service level agreements (SLA) and all their challenges. Let’s explore how this impacts company culture inside these organizations.

Building great, but not connected devices

If your company is used to shipping devices that are not connected, it’s likely the structure of your operations is fairly straightforward. If the device works, everyone is happy. If the device doesn’t work, the customer sooner or later finds out.

They might get a free repair or replacement, if the device is still under warranty. If it’s outside of the warranty window, it depends on your company’s policy of post-warranty support, or possibly a 3rd party that made it a business supporting your older equipment.

Expectations for connected IoT devices

For connected products, the bar is set higher. The expectations of customers are higher because they no longer pay only for “a box”, but they also pay a monthly subscription.

From an engineering perspective, it’s no longer just hardware tightly coupled with software; it’s now hardware and software-as-a-service (SaaS). And while hardware has a warranty, service has an service level agreement (SLA). These three letters cause a lot of headaches if not executed right. They will turn a successful proof of concept into a production disaster.

Challenges of meeting an SLA

A transformation needs to happen across your organization in order to provide a good service.  Most of these steps will not be discovered in the proof of concept phase. Only an organization that works across teams will survive this transition.

While proving out your product’s viability, the scale you are dealing with is orders of magnitudes smaller than your production scale. In the early days, your engineering team handles both development and operations, because none of those are refined enough to be handed over to the operations team.

If you forego thinking about the transition to production, you are in for a surprise! It is critical to prove both the hardware and the software-as-a-service can be supported and operated at scale.

The following are a handful of critical checks you should do before wrapping up your proof-of-concept.

SLA means regular software updates and quick bug fixes

With any large population of connected devices, your customer will need guarantees of security of the system, and therefore require periodic updates and a commitment to timely fixes of critical vulnerabilities.

To do that at scale, you need a system that can support rolling out a large volume of over-the-air (OTA) updates in a very short time, while keeping the cloud and personnel costs at commercially sensible levels. That also means you must be able to quickly identify which devices require an update, which are a priority, and which ones can wait. In summary, you need a flexible, scalable rollout system.

SLA means 3rd party issues are your issues

Where traditionally you would have full end-to-end control of your product, there is now a 3rd party component to your problems: a network. It is unlikely you or your customer fully control the end-to-end communication path between the IoT device and the cloud application. It is even less likely that you can do anything to solve those network problems, apart from picking up a phone and calling the network provider. That does not scale. Nonetheless, what customers see is your device and your application, so it is your problem when it doesn’t work–whatever the reason.

An obvious mitigation strategy is to have solid contracts in place with all the vendors, but even that is not bulletproof or commercially feasible. My experience is that even with great contracts, most real-world situations end up in a finger-pointing discussion.

In those situations, the best you can do is arm yourself with sophisticated tools and evidence.

Both your device and your cloud application need to be equipped with network diagnostics tools, not only for the initial installation purposes, but also for long term operations.

Ideally, your device will be able to self-recover from frequently encountered (or transient) problems, while caching data it was unable to communicate during the outage. After a major outage, you better make sure devices reconnect in an orderly fashion, otherwise your cloud will crash as soon as your devices recover. And then your devices crash, and you end up in a loop of engineering horror and customer disappointment.

Where automated recovery is not possible, an alert to responsible personnel should be made as soon as possible from the cloud side. At the same time, you don’t want to overload your support with false alarms or transient problems, as that will ultimately make them ignore the alarms when they are actually needed. Your tooling needs to be smart enough to assess the (un)certainty and urgency of the problems.

SLA means resolving problems on time

Historically, response (not resolution) time has been a parameter that defined a good SLA. Response time in simple terms meant how long did it take your support team to react to a reported issue. However, a reaction doesn’t mean much if the customer is left stranded for weeks without a resolution. Therefore, more and more companies want a higher degree of confidence to sign a contract, and will require commitment to resolution time.

In the past shipping back a device would have been an option. Today, you will be expected to resolve problems remotely.

At scale, you certainly don’t want to remotely access the console of each individual device. If you have 100 devices and a patch takes you 10 minutes, that’s 1000 minutes. That’s 16 developer-hours. At 1000 devices, that’s 166 developer-hours–almost a month of regular 8 hour work-days! Will your customers wait one month to have their problem fixed? Or is it worth employing a four person team to bring that number down to one business week (and still have unhappy customers)?

What you need in this case is a system that can

  • Get you information about the entire population of devices efficiently
  • Lets you perform actions across a large number of devices at scale
  • Report back per-device results at scale

With those tools, a single engineer will have the power of an entire team.

SLA requires efficient root cause analysis

While you might have resolved the problem for your specific customer, you don’t want to stop there. There are other customers having the same problems, they just don’t know it yet, or haven’t reported it.

That’s where you will benefit from a continuous stream of logs and operational metrics from your devices that can be browsed at scale. Let’s say a specific version of software on your device randomly crashes. With a good set of logs and metrics, you could quickly narrow down the problem to something like high CPU usage, low memory, or specific revision of hardware coupled with specific history of updates.

The holy grail of exceeding SLAs

The reactive approach outlined above still carries a lot of cost and risk. But it’s a starting point from which you can get to the optimal solution. Ideally, you should be able to find problems ahead of time and prevent them, rather than reacting to them. That’s where having access to large volumes of historical monitoring data and logs will unlock a ton of opportunities.
With a detailed knowledge of what happened with each of your devices over their lifetime, it becomes much easier to start seeing patterns, and to connect the dots.

Ultimately, a couple of months of work of a data engineer (together with a domain expert) can get you business insights beyond what you might be able to imagine.

Conclusion

The challenge of transforming a hardware-based business into a hardware and software-as-a-service business is tremendous. Proofs of concept are therefore omnipresent, but a successful hardware trial is only the beginning of a long journey to a scalable, profitable product line. Ultimately, your company’s reputation as a “cloud” business is at stake.

Obviously I have been thinking deeply about this topic as we build Golioth to make IoT products easier to bootstrap, and much easier to maintain. In future posts I’ll dig into the details of how we address each issue raised here. The important thing today is that we all embrace the concept that adding connectivity is a fundamental change, and not just a hardware revision.

Before you start your migration from proof-of-concept to production (and deployment), make sure you have all the infrastructure in place to support rapid growth. Great engineers that made the early trials successful require tools and support to handle issues at production scale.

A successful proof-of-concept or prototype should prove scalability. It should demonstrate your company’s ability to exceed service level expectations of your customers, and make profit for the company building a connected device.