Threats / 01 Decompilation and modification
A guide to mobile application protection
- Introduction
-
Principles
Overview - The big picture
- What needs protecting
- Develop a threat model for your application
- The four layers of mobile application protection
Decompilation - transforming compiled binary machine code into a reconstructed version of the original source code - is often the first step attackers will take when targeting your app. The aim here is to analyze your app’s code, understand its logic, and identify any vulnerabilities it contains. This can be done manually - opening the app in a decompiler, reading the decompiled code, and searching for keywords - or automatically with the help of a variety of scanning tools.
Decompilation may give attackers access to data they can use immediately, without even needing to run the app:
- credentials
- API keys
- cryptographic keys
- proprietary algorithms
- resource files
But it also gives attackers insights into how exactly your app
- enforces user authentication
- uses platform APIs and IPC mechanisms
- manages sensitive data
- communicates with backend services
- makes use of cryptography
- tries to protect itself
And it enables them to target any weak points. They might use the knowledge gleaned to exploit your app directly, or to design malware targeting those weak points in order to harvest your end users’ sensitive data.
There’s also a danger of modification or tampering with the compiled binaries, by adding, changing, or removing code. Adversaries might use this for something relatively benign, such as accessing ‘locked’ features and content on their own devices.
Or they could inject malicious code into the app, repackage it, and use social engineering techniques like phishing to distribute the malicious clone to your legitimate customers.
And here’s an additional problem with attackers modifying the compiled binaries: they may be able to remove or override any RASP (Runtime Application Self-Protection) or network security features. This would allow them to run (and/or to redistribute) an unprotected version of what you might think is a secure application.
Decompilation and modification, in other words, can be used to target all of your application’s key assets:
- Internal data and intellectual property (IP)
- Restricted functionalities
- Sensitive user data
This is why it’s so crucial that applications have comprehensive protection, including:
- Code and resource hardening, to prevent reverse engineering and static analysis through decompilation
- Integrity and anti-tampering mechanisms, to prevent the binaries from executing if they have been modified
Decompilation and Modification in Practice
When we think of APKs and IPAs, we usually think of them being downloaded from Google Play or the App Store. And these distribution services (and others like Huawei AppGallery and the Amazon AppStore) offer some reassurance to developers and users in terms of underwriting the integrity and authenticity of apps.
These platforms perform analysis and validation of apps before publishing them; they offer code and package signing functionalities to associate apps with their authentic developers. They also use this to enforce the integrity of apps so that developers can be confident that users are running the same app that they built with so much care, without any modifications.
iOS apps uploaded to and downloaded from the App Store additionally have their binaries and some assets encrypted automatically using Apple's FairPlay technology. When an app is downloaded from the App Store, it is encrypted using the unique device ID of the device that initiated the download. In principle, this means that the app can only be used on the device that downloaded it. So, it cannot be shared or used on other devices.
But since the files are decrypted after download, with a jailbroken device it is possible to extract the IPA to another machine. There, an attacker is free of the iOS device restrictions and can wield any tools they need to crack it.
It’s even more straightforward to transfer an APK from an Android device onto another machine. And it’s easier still to download it directly from a third-party APK downloader site, of which there are dozens.
At this point, the attacker can begin decompilation and modification. So, what do these involve?
Decompilation involves the ‘reversal’ of the compilation stage of this process.
- For native Android apps, that means extrapolating Java & Kotlin code (via Smali) from Dalvik bytecode stored in .dex files. [In fact, for Android applications, it is not even necessary for attackers to decompile them. For anyone with knowledge of Smali or bytecode, native Android apps are even simpler to reverse engineer and modify.] There are free, open source tools to do this automatically with Android apps (APK, AAB), and libraries (AAR), some with intuitive graphical interfaces. In other cases, attackers will use a combination of tools first to extract the code, and then to disassemble, and then to decompile it. They might then open the decompiled source code in a text editor or a standard IDE (Integrated Development Environment).
- For native iOS apps, it means reconstructing Swift and Objective-C code from the machine code stored in the .app file, first through disassembly, and then through decompilation. Just as for Android, there are free, open source tools to do this automatically with iOS apps (IPA, XCARCHIVE), and libraries (Frameworks).
For hybrid apps, to access the proprietary code, decompilation is not necessary at all: JavaScript, HTML, and CSS are interpreted, not compiled, languages, meaning the source code is bundled into applications as it is, perhaps with some obfuscation applied (easily de-obfuscated using tools such as JStillery). Generally speaking, they are open targets.
And all three types of app - native Android, native iOS, and hybrid - may contain native libraries written in C and C++. These can also be extracted and disassembled, although it may be a relatively labor-intensive process. Understanding disassembled native code is generally considered a harder and rarer skill than understanding decompiled Kotlin or Swift.
Modification - also known as tampering, modding, and patching - involves changing the app’s code so that it behaves differently from how the original developer intended. This is also functionally equivalent to extracting code for repurposed use in another app.
Tampering can also be achieved dynamically during runtime. But our focus in this section is on modification of the app’s binaries.
In this scenario, an adversary decompiles the app to be able to work at the source code level in an IDE (Integrated Development Environment) such as Android Studio or Xcode. Or, if they have the expertise, they simply use a hex editor to examine and modify the binaries directly.
After making any changes they like, the attacker can then recompile, repackage, and re-sign the app. Then they can use the modified version themselves and/or distribute it to other users.
What can attackers achieve by decompilation?
Reverse Engineering; Internal Data and Intellectual Property (IP) Theft
Application binaries may contain sensitive logic, some necessary secrets, and (in most cases) some unnecessary secrets as well. Decompilation allows adversaries access to all of these.
Even if developers follow secure coding practices and avoid unforced errors such as hardcoding credentials or cryptographic keys into their compiled package, there will always be information for attackers to exploit.
Strings are a typical target. And attackers can easily search for sensitive indicators such as ‘key’, ‘API’, ‘nfc’, ‘password’, ‘crypto’, ‘URL’, or ‘http’ using GUI or command-line tools.
Intellectual Property (IP) is another important example: that might be anything from image files, to prototyped new features, to proprietary algorithms.
Decompilation also enables an attacker to identify the application’s weak spots. Likely targets are how the app
- enforces user authentication
- uses platform APIs and IPC mechanisms
- manages sensitive data
- communicates with backend services
- makes use of cryptography
Let’s look at each of these in a bit more depth.
User authentication
User authentication is fundamental to how applications manage access to sensitive data and functionalities. By decompiling the application and studying its authentication logic, attackers may be able to find ways to exploit it.
That may mean gaining insights into how exactly authentication is divided between local authentication and authentication performed at the backend; whether and how any biometric authentication is implemented; and if cryptographic materials are handled effectively.
Attackers will also be able to identify what authentication APIs and libraries are used, plus whether they contain any known vulnerabilities. Even if the APIs, libraries, and protocols such as OAUTH 2.0 themselves are secure, the developer’s implementation of the authentication logic itself may be flawed.
If local data (such as tokens and session IDs) are used, how are they generated and stored? And for how long do they remain valid? The decompiled code offers answers to these questions, and attackers might use such information to target the app for malware-based attacks and credential harvesting.
And if the attacker identifies that credentials and tokens are not properly encrypted in transmission to servers, they may be able to target the app’s users by intercepting network communications.
Use of platform APIs and Inter-process communication (IPC) mechanisms
At runtime, applications request access to data and functionalities from other applications, from the system, and from the device itself. The operating system manages these requests, granting and denying access based on available APIs, the security policy, and the user’s own inputs.
Decompiling the application provides insights into what access the app requests, the permissions it asks for, and what access it provides to other applications. This information can be exploited if the attacker can identify entry points for accessing sensitive data and functionalities via the target application.
For Android, one particularly important source of information is available to attackers even without decompilation. The AndroidManifest.xml declares the app’s target API levels, data sharing and IPC mechanisms, plus permissions requested from the end user. This information becomes even more valuable to an attacker when combined with information gained through decompilation and reverse engineering.
Here we can also mention WebViews: whether the app manages them itself (rather than deferring to the default browser); whether JavaScript input is accepted, and, if so, whether it is cached.
Sensitive data management
Sometimes it may be necessary for the app to store sensitive data locally on the user’s device. A prime target for malware developers is sensitive data that is stored locally and can be harvested. Decompilation of the app may give them crucial information on how sensitive data is managed and whether the app’s users are worth targeting.
Equally, the app may manage sensitive data without storing it; sensitive data-in-use is exposed in memory, and understanding how the app generates and processes sensitive data during runtime can yield valuable insights for an adversary. For example, the app may expose sensitive data through use of public APIs, or it may not be set up to overwrite the sensitive data as soon as it has been used. An attacker can exploit this knowledge gained through decompilation and static analysis in order to target the app later through dynamic analysis or malware.
Communication with backend services
Almost all apps communicate with the outside world via the network, and often apps managing the most sensitive data (relating to financial transactions, personally identifiable information, health records) are the most heavily dependent on interactions with backend services.
Finding all network requests in the decompiled source code is therefore a valuable step for a potential attacker. By searching for strings like ‘http’, ‘https’, ‘url’, and ‘network’, the reverse engineer can identify points to target in network-based attacks. This could be by targeting a particular domain to redirect traffic using a proxy server, or by identifying weakly encrypted or cleartext traffic.
Use of cryptography
Mobile apps use cryptography to secure user data and ensure confidentiality, especially during highly sensitive processes such as making and receiving contactless payments. Both Android and iOS offer cryptographic APIs for key generation and storage, and there are a number of third party libraries that are used as alternatives or complements to the platforms’ own.
By decompiling the application and analyzing its use of cryptography (e.g. which cryptographic APIs and specific algorithms are used, how they are used, and for what purposes), an adversary may be able to identify weaknesses: hard-coded secret keys; deprecated algorithms which can be cracked with modern computing power; ineffective management of encrypted data stored locally on the device.
What can attackers achieve by modification?
Abuse of Restricted Functionalities
For instance by accessing ‘locked’ features and content on their own devices, cheating or spoofing functionalities, or bypassing local authentication checks.
Modifications are usually made with one of two main goals in mind:
- to exploit the application directly,
- to exploit the application’s users, for instance by repackaging the app and distributing it via app stores or via social engineering techniques like phishing. The repackaged app is designed to seem as similar as possible to the legitimate app, but with additional or modified logic to steal sensitive data, control devices remotely, or display advertisements for the attacker’s own benefit.
From the perspective of security, modifying an app is functionally equal to extracting the app’s code and resources for use in a different app.
Prevention and Mitigation
Decompilation and modification are unpreventable in themselves, and so the priority is instead to make them pointless to the adversary.
Mitigating the danger of decompilation is fundamentally about making the decompiled code difficult (and ideally impossible) to understand. In this case, there is no benefit to decompilation. This can be achieved through code & resource hardening, mainly by means of obfuscation and encryption.
And the same basic principle applies to modification: attackers tamper with apps in order to change their functionalities, to then exploit those changes during runtime. If the tampered app is never run, then the attacker cannot take advantage of their modifications. This is the fundamental rationale behind anti-tampering & integrity controls.
Note
Google and Apple are of course conscious of the dangers of decompilation and especially modification, and both platforms take some steps to prevent them. Android Studio offers automatic obfuscation through its tool R8 during the build process, and so the majority of Android apps are obfuscated to some extent. And iOS apps uploaded to and downloaded from the App Store have their binaries and some assets encrypted automatically using Apple's FairPlay technology. This restricts execution of the apps on any but authorized iOS devices. However, neither of the two big platforms offer comprehensive protection against reverse engineering through decompilation;
R8 obfuscation - like all obfuscation - makes the code more difficult to read and understand, but reverse engineering logic from obfuscated code can be done, and there are a number of tools to assist with the deobfuscation of Java and Kotlin code. In the case of iOS FairPlay encryption, the binaries are decrypted when downloaded and installed on an iOS device. And so it’s possible (with a jailbroken device) to export the unencrypted IPA to another machine and carry out exactly the type of disassembly, decompilation, and modification described on this page. To mitigate modification and tampering, on the other hand, both platforms also offer code signing functionalities. For more information on this, see below on this page.
Code and Resource Hardening
Obfuscation
- Obfuscation involves renaming identifiers, file names, class names, method names, symbols, and strings, as well as adding ‘junk’ code, without fundamentally changing the content or logic of the app.
- During runtime the Operating System can execute an obfuscated app as normal, with potentially only a small effect on performance, because the fundamental logic is unchanged.
- Obfuscation makes the code more difficult to read and understand, but reverse engineering logic from obfuscated code can be done, and there are a number of tools to assist with deobfuscation of any programming language.
Encryption
- Encryption is more powerful than obfuscation because it fully transforms code and resources into meaningless ciphertext which cannot be read or understood by either a human or a machine. This ciphertext can only be restored to its original form through use of a decryption key. Encrypted code must therefore be decrypted before it can be executed by the operating system.
- This is possible through symmetric encryption, where the encryption key and the decryption key are identical. The most secure, effective approach is for the app’s code and resources to be encrypted with algorithmically generated encryption keys, and for the app to contain a corresponding algorithm to generate the matching decryption keys during runtime. In this scenario, the code and resources will remain encrypted at all times except when they need to be accessed by the OS during execution on a user’s device.
Virtualization
- Virtualization is a specialized mechanism to prevent decompilation and reverse-engineering of Java and Kotlin code, as used in native Android development.
- Java and Kotlin are compiled to bytecode which is then run on virtual machines (VMs). This makes it possible to ‘translate’ Java and Kotlin code (method calls, field types, and calls to object fields inside methods) into a special set of instructions that can only be processed by a unique VM, created specifically to process those instructions. Since these instructions are only comprehensible to this particular VM, and behave nothing like Java, Kotlin, or standard bytecode, they are particularly difficult for the reverse engineer to understand.
Application Integrity and Anti-Tampering
To mitigate modification and tampering, both Android and iOS platforms offer code signing functionalities. The idea here is that developers sign their apps with a private key which only they can access. This private key is tied to a public key certificate which identifies the owner of the private key, i.e. the developer, and also contains cryptographic hashes - uniquely derived representations - of the contents of the app and its code. Whenever the app is installed on an Android or iOS device, the operating system calculates the cryptographic hashes of the app’s contents and checks that they match the hashes in the public certificate. If the hashes don’t match, it’s clear that the app has been modified, and the system will not allow the app to run.
This is a good solution, but it has one notable flaw: it remains possible for an adversary to modify the application and then simply re-sign the app with their own private key, creating a new public key certificate with cryptographic hashes matching the newly modified app. This means that responsibility is left to the end user to download and use apps only from trusted sources and legitimate developers.
To reinforce this mechanism, therefore, Google in addition offers the Play Integrity API, which developers can optionally integrate into their Android apps. With Play Integrity, at sensitive points during runtime (e.g. when the user performs a transaction), the app can request verifications from Google Play servers: as well as checking whether the app is running on a genuine Android device, and was installed legitimately via Google Play, Play Integrity also checks whether the app has been modified and/or re-signed by comparing the certificate of the app on the device with the certificate it has on record. If there’s any discrepancy, the Google Play server indicates to the app that the app’s integrity has been compromised, and the app (or rather its associated server) can block the transaction.
This successfully prevents wholesale repackaging and re-signing of an application, but it doesn’t guarantee the integrity of the app’s code in itself. And it remains possible to extract code for repurposed use in another app.
The Play Integrity API library can also be removed from a modified version of the app. If the backend authentication logic is secure, this will mean the modified app cannot be used to perform genuine transactions. But it can still be repackaged and redistributed to unsuspecting users who believe they are using the legitimate, unmodified version.
This exposes a general problem that anti-tampering and integrity controls must ideally solve: if some component within your app is fundamental to checking the integrity of the app, what’s to stop an attacker from simply removing or overriding that component? As we’ve seen, a digital signature is intended to confirm the authenticity and integrity of the app, but an attacker can simply modify the app and re-sign it with their own key.
Encryption can help us here. As mentioned previously:
the most secure, effective approach is for the app’s code and resources to be encrypted with algorithmically generated encryption keys, and for the app to contain a corresponding algorithm to generate the matching decryption keys during runtime. In this scenario, the code and resources will remain encrypted at all times except when they need to be accessed by the OS during execution on a user’s device.
If those keys are algorithmically generated using the app’s file contents as inputs, any encrypted file becomes (1) impossible for an attacker to understand, and (2) impossible for an attacker to modify. If they modify any encrypted file, the inputs to the algorithms will be different, and the decryption keys will not match the encryption keys, making decryption impossible. The modified file will be unusable.