How (not) to build a secure mobile messaging platform

Lately there has been noticeable efforts for secure mobile messaging platforms. There are simply too many already to event start listing them. Most of the nation states seem to be working to obtain one, with or without commercial partners. Products come and go. So far I have not seen one that touches the fundamental problem that there is a difference between mass surveillance and being actually targeted by a state level aggressor. This is a post about a few things that you would have to take into account when the game was not only about mass surveillance.

Hardware architecture

The biggest issue with just taking some generic reference hardware and slapping a hardened Android on it is the architecture. This should convey the idea perfectly: common_architecture

The hardening is most usually focused around what is referred to as application processor that runs the main operating system. The communications processor is ignored, although it is significant for several reasons:

  • It is actively in contact with the phone network
  • It is not simple – some tasks require complex logic and serious calculating power
  • Some of those tasks include adjusting operations to the feedback given by the network base stations
  • The architecture allows it to bypass the application processor and independently access sources of potentially interesting information – for instance the microphone
  • Malign activities done by the communications processor can be made mostly invisible to the other components
  • It does offer an attack surface for parties with resources to hack the common chipsets
  • Some or most of these chips are black boxes to the vendors using them

As a result, if you are using an encrypted VOIP service while someone has control of the communications processor, listening to the conversation is undetectable and possible via a side channel attack. This is not a theoretical threat either. Let’s take this historical device for instance:


Yes, it’s a Nokia 3310. A few models in that line had a network monitoring firmware. At least on those models you could command the application processor to power off independently. After that you could call that phone with special codes, the communications processor would answer the call and let you listen to everything. The phone looked completely dead to the user. Taking a device that has even a chance of functioning like that into any secure working area is a huge risk!

The point here is, taking into account only hardening the application processor is a major issue. I am not saying that the most common phones nowadays have security vulnerabilities or backdoors in their communications processors. What is significant is that the hardware architecture of modern phones was never designed for security. Every single component is handled as being trustworthy. At least the ones facing the network should be sandboxed properly, but they are not!

It should be noted that secure sourcing is hard. Remember these images of NSA intercepting hardware shipments to modify them before they reach customers:


A proper architecture that would not implicitly trust every component would at least require meddling with several components in the supply chain, making the option less attractive and harder to pull off. Some of the commonly exploited basic cases might prove impossible. Defensive architecture really could level the playing field.

Other interfaces

Let’s say the mobile device had a perfect encrypted VOIP solution. It was completely audited, and accredited for use. You could go into hostile network environments and communicate securely from there, without any fear of incidents. The encryption was heavy duty enough to protect even the most precious state secrets for the required 50+ years.

Then someone bought a cheap Bluetooth headset, paired it with the mobile device, staying within 500 meters or so eavesdropped the connection, and took a jab at the several magnitudes easier encryption scheme. While not broken, those features were not typically designed for securing really classified communications. They were designed for consumer market. Also, nearly every implementation allows by default downgrading the protocol version because consumers want their devices to work.

Now, you would be running the risk of information leak. Also, you would have to audit and accredit the Bluetooth chip, with its settings and all, and the client devices. At an immense cost. Depending on the mandated requirements for encryption and key management, that might even prove impossible. Or, you would have to disable the features altogether in a secure fashion. This, times how many similar interface features the mobile device offers. The answer to that is probably: many.

The users have hard time accepting disabling several expected features of a mobile phone, while the pointy haired bosses wish to keep the costs down. One way to meet in the middle might be allowing some devices while the user is not working with secured connections, and drop everything while a secure mode would be on. But even that requires that there can be certain level of guarantee that nothing coming via those interfaces can have permanent effect. This is something not even the NSA’s recommendations take properly into account.

Trust models

Now here’s the issue. The cryptographic algorithms are just one part of encryption. After the basics are laid out correctly the key management becomes more important, and the primary attack surface of the encryption. Probably no one is stupid enough to challenge for instance ECDHE-ECDSA-AES256-GCM-SHA384, when you can just look for weaknesses from how the keys are produced, transported, and utilized.

Interestingly, here the requirements for private users and larger organizations (government agencies, large corporations) differ. Others want a full blown PKI, because they need the flexibility and management features. The solutions for that target audience usually offer all of that. The others, well. They’d rather not trust external CAs, because they really are not trustworthy for implementation issues or for principle. The principle is probably most significant issue here for many.

After all, if the CA can screw users over by for instance generating secondary certificates with their identities, why would a citizen that does not trust his government trust a CA run by the same government? Why would he trust any commercial CA that can be affected by that government? Such a user might be better off with something that has less centralized key management. Ideally things should, if you can not trust PKI systems, work more like key signing parties where you meet with people, authenticate them, and cross-sign your keys. However solutions that come halfway like Silent Circle are probably more convenient.

Any secure mobile messaging platform that wishes to gain considerable market penetration would probably have to offer both models simultaneously, and let the users choose.

Identification and key management

It is clearly insufficient to identify the mobile device for trusting information to be shared with the other party. That’s where strong electronic identification comes into play as the gold standard of user authentication. Basically it stands for 2-factor authentication, requiring combining both “what you have” and “what you know” to determine the identity.

However, some alternatives fail spectacularly when talking about mobile devices. Mobile certificates and other locally installed certificates are roughly as useful as the classic ident system. At best they make a nice Douglas Adams style skit where the device is trying to figure out whether it actually has the certificate file that it has, and can it detect tampering done to itself. (It is impossible, and usually just leads to chicken-and-egg type problems.)

What is required is HSM, providing cryptographic services to the system while simultaneously guaranteeing that leaking the private keys is impossible. Before that is available in some form to mitigate risks related to key storage, building secure mobile messaging platform is slightly dubious as an idea. The current architecture of common mobile devices to my knowledge lack HSM functionality. It might change however because there is a considerable push for enabling mobile payments, which  ultimately requires solving the same issue.

The role could however be fulfilled also by a commercial token or by some of the most common HSMs on this planet:


The example card has no magnetic stripe, and the number series are informational only. It is a direct debit card, where the chip works as HSM. In order to exploit it the user most commonly has lost control of both their PIN (“what you know”) and the physical card itself (“what you have”). While cloning the chip including the encrypted data is possible, there haven’t been attacks based on that in the wild. The security model related to the smart card is actually surprisingly solid. The phones just lack reader hardware.

Availability, quality of keys and key management, pricing, easy of use, and the relation all work against building truly secure mobile messaging platform. Most solutions I have seen so far have been heavily based at the end of the day on the strength of user passwords, and unwavering trust to the components of the application.

Layering issues

Let’s take two different approaches to secure mobile messaging. The first is to use whatever VOIP solution and slap an ordinary VPN product on top of it. The other is to build an integrated end-to-end encrypted messaging stack. The security profiles of these two types of solutions are significantly different.

Take the following issues for example:

  • How can the VPN credentials and the credentials to the services be enforced to be same and irrefutable?
  • After the VPN credentials are lost, what kind of attack surface the services offer when compared to the more integrated solution with end-to-end encryption?
  • Was the VOIP system actually designed to be secure by itself? Is the vendor just slapping pieces of mediocre applications together to build complexity against auditors?
  • If the VOIP system actually has security features such as encryption, why they are not good enough to stand on their own? Why is the VPN required at all?
  • How much information does the VPN solution leak for side channel attacks?
  • If the centralized parts of the messaging platform are (partially) compromised, have all the messages been leaked? Would end-to-end type system prove at least somewhat more resilient?

If nothing more, the VPN based solution is significantly more complex because the loose coupling of layers. Working a simple sequence diagram of all the key management and encryption related activity while using the messaging platform proves this instantly. Likewise the risks of simply screwing up a detail are several magnitudes higher. It is instinctively from a viewpoint of complexity a very bad idea to mix common VPN tools with what started off as a relatively simple messaging system.

About that complexity

There are 13 M SLOC in Android alone. The hardware components working with the communications, audio processing, accessories, and so on probably have a few million lines more. That’s a lot of Java, C/C++, and some other more niche languages to audit and accredit for security. Too much actually, and several features allow pulling in other code dynamically from external sources. Take browser plugins for example…

To be honest, I would rather have something that is entirely stupid, but thoroughly auditable, and audited for secure messaging. It is okay to have a separate “recreational” phone, and the one for serious tasks. Even this nasty TETRA phone is too complex probably:


That’s however where the problems begin. It is hard for especially decision-makers to understand what is the value of simplicity when it comes to security. After all his 12 yo son seems to be doing fine with all the gadgets, and they constantly come with the promises of security.

Again, I am not saying there are known major vulnerabilities with the alternative. I am saying that I like the risk profile of going KISS more when the security really counts.


In my humble opinion we are still several evolutions away of reaching any hope for being able to build truly secure mobile messaging platforms. While some solutions are doing actually alright against lower level adversaries, most of them have architectural problems that become significant when state level aggressors enter the play.

The main issue is that there are too many unchecked components in the present hardware platforms. There is no real security architecture either with several important components such as sandboxing critical components and HSM modules for key storage missing. User identification will continue to be slightly unsatisfactory in the near future, and several of the marketed as secure solutions are scarily complex. The solution would be to re-design the basics from entirely different viewpoint.

My dream would be to get some actual hardware developers to work on this, and get for instance OpenBSD folks build the software layer from ground to top. Make the foundation preferable open source to benefit all. That will probably never happen though, because most target audiences are simply happy with commercial level security features. There just is not enough support I fear to warrant the use of resources. Furthermore, the phone could probably never be sold in some countries, because the authorities would not certify it for sales.


6 thoughts on “How (not) to build a secure mobile messaging platform

  1. Interestingly my first GSM mobile the Motorola MicroTAC II had a slot for a credit card with chip reader in addition to the SIM card. I am not aware however of any application that ever made use of it.


  2. Good post.

    Do you think that Project Ara, with a modular smartphone, could help with the issues you mention?
    (securing the hardware, and then securing the software on top of it)


  3. Very detailed post.

    Anvaya Solutions, Inc. is developing hardware security modules to address the exact issues highlighted in this article. We understand that security protection and containment at localized hardware level is realistic and realizable instead of solutions at the distributed software stack level. Our solutions include dedicated hardware, offering a very low gate count and a very low power profile for small form factor applications as well.


  4. You can get micro SD cards with a Secure Element. Can be used for TRNG and key storage. Adds a few seconds to ZRTP key negotiation and about a second to IM encryption.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s