DJI – The ART of obfuscation

Study of an Android runtime (ART) hijacking mechanism for bytecode
injection through a step-by-step analysis of the packer used to protect the
DJI Pilot Android application.

Introduction

In the world of Android applications, it’s not uncommon to come across
applications protected by a packer. The role of a packer is to protect all
or part of the application code from static analysis. There are many reasons
why a developer might want to protect an application:

Protect valuable business logic;
Protect application monetization logic (e.g. a license management mechanism);
Evading conventional analysis tools to hide malicious logic;
…

Here, we take a look at the DJI Pilot
application, not to understand why developers want to protect their code – this
has already been the subject of previous work (see in particular this
DJI Pilot analysis)
– but to highlight a runtime mechanism implemented by DJI to protect its
application code. This protection is based on the use of a modified version of
the SecNeo packer.

The article details the various stages in the analysis to understand how the
application code is obfuscated. A Python proof-of-concept named
DxFx for statically unpacking
the DJI Pilot application is provided as practical support for this article.
DxFx does not claim to be a SecNeo unpacker. Its sole aim is to improve the
reader’s understanding of the various mechanisms implemented by the packer
through Python code. It will not be maintained in the future.

Targeted application

The analysis is performed on the latest version of the DJI Pilot application:

Version: 2.5.1.17
SHA256: 642aa123437c259eea5895fe01dc4210c4a3a430842b79612074d88745f54714
Download link

DxFx provided in support of the article has also been tested on the following
versions of the DJI Pilot application:

Version: 2.5.1.15
SHA256: d6f96f049bc92b01c4782e27ed94a55ab232717c7defc4c14c1059e4fa5254c8

and

Version: 2.5.1.10
SHA256: 860d9d75dc2b2e9426f811589b624b96000fea07cc981b15005686d3c55251d9

Bytecode, where are you?

Primary analysis

Static analysis of the APK initially reveals that the result of bytecode
decompilation is, to say the least, uncluttered…

This is because, like other packers, SecNeo leaves only a bootstrap code in
the bytecode to launch the application’s unpacking phase. Here, the packer
bootstrap code loads the native libDexHelper.so library:

The first step in the analysis is therefore to find the bytecode containing the
application’s business logic.

The packer logic is present in the native library libDexHelper.so. However,
the code of this library is itself packed. So, we have to unpack… the packer
to analyze its logic.

As the aim of this article is not to understand how the packer itself is
protected, this part is not dealt with in-depth, and we simply dump the
library at runtime from the DJI Pilot application process memory space. There
are a multitude of ways to do this, using tools such as gdb or Frida.

However, you may be in for a few surprises:

Cannot attach to process 25562: Operation not permitted (1), process 25562 is already traced by process 25598

or:

Failed to attach: process not found

The packer contains some countermeasures, as partially described in this
issue,
to prevent the use of dynamic tools. Fortunately, these can be easily bypassed.

Once libDexHelper.so has been dumped from memory, it can be analyzed with a
disassembly tool.

First look at the packer binary

An initial brief analysis of the libDexHelper.so library reveals the presence
of the decrypt_jar_128K symbol. A hook of the associated function with
Frida reveals that a buffer is passed as input and contains the contents of a
DEX file as output :

'use strict';

const dlopen_ext = Module.getExportByName(null, '__loader_android_dlopen_ext');

function main() {
  const decrypt_jar_128K_addr = Module.getExportByName(
    'libDexHelper.so', 'decrypt_jar_128K'
  );

  /**
  * decrypt_jar_128K function hook
  */
  Interceptor.attach(decrypt_jar_128K_addr, {
    onEnter: function(args) {
      this.dex_buffer_ptr = args[1];
    },
    onLeave: function() {
      console.log(`nReading dex buffer @ ${this.dex_buffer_ptr}`);
      console.log(this.dex_buffer_ptr.readByteArray(16));
    }
  });
}

/**
 * Bootstrap
 */
const boot_intercept = Interceptor.attach(dlopen_ext, {
  onEnter: function(args) {
    this.name = args[0].readUtf8String();
  },
  onLeave: function() {
    if (this.name.includes('libDexHelper.so')) {
      main()
      boot_intercept.detach();
    }
  }
});

The result of the script is:

Reading dex buffer @ 0x74d1e63140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 4a 8b b5 fd 1b 58 54 1f  dex.035.J....XT.

Reading dex buffer @ 0x74d268c140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 6f 02 2a 0b 48 26 a5 e0  dex.035.o.*.H&..

Reading dex buffer @ 0x74d3005140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 8a b4 08 1c 90 61 5a 34  dex.035......aZ4

Reading dex buffer @ 0x74d3643140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 cb b9 8e 72 35 3a d8 bc  dex.035....r5:..

Reading dex buffer @ 0x74d4055140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 c2 8b a3 7b 64 3b c6 54  dex.035....{d;.T

Reading dex buffer @ 0x74d4a5f140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 dd 47 c2 4e a1 39 cc 79  dex.035..G.N.9.y

Reading dex buffer @ 0x74d552f140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 58 17 ae a9 56 21 f1 1f  dex.035.X...V!..

Reading dex buffer @ 0x74d5a77140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 84 62 14 0d ac 5f b7 f8  dex.035..b..._..

So, here we can see that 8 DEX files (with the dex.035 magic) are
unpacked. It is possible to modify the previous hook to be able to dump the
various DEX files as they are unpacked. Another solution is to understand where
the packed DEX files are stored in the APK and how we can unpack them
statically.

Static unpacking of DEX files

The advantage of the dynamic extraction method lies in its rapid
implementation. However, the latter requires the application to be run and an
environment set up to allow instrumentation of the process. Static extraction,
on the other hand, enables cold unpacking of DEX files directly from the APK.
The drawback of the static approach is that it requires a slightly deeper
understanding of how the packer works.

DEX files where are you?

Some versions of the SecNeo packer store the bytecode in the classes0.jar
file located in the APK assets. Unfortunately, this is not the case here as
the file does not exist.

However, if we take a closer look at the classes.dex file located at the root
of the APK and supposed to contain only the packer bootstrap code, we can see
that something is wrong with its size:

du -h classes.dex
63M     classes.dex

63MB is a very large size for the code we observed in the first analysis.
Usually, the multidex mechanism
will split the bytecode file into several .dex files well before reaching
this size. File entropy analysis also gives us some interesting clues:

We can see 8 peaks tending towards an entropy of 8, which may
suggest that these chunks are encrypted. The previous Frida hook revealed
that 8 DEX files were unpacked, which is probably no coincidence. The 8 chunks
shown in the graph correspond to 128KB sections, so we can make the connection
with the decrypt_jar_128K symbol of the function. A differential analysis
with the dynamically obtained files finally confirms that the classes.dex
file contains all 8 DEX files after the SecNeo bootstrap code. The first 128K
chunk of each DEX file is encrypted to probably conceal certain information that
could be used to detect the presence of the hidden files like the
magic number
in the header.

Encryption analysis

To understand how the first 128KB of each DEX is decrypted, we need to analyze
how the decrypt_jar_128K function works.

One of the function’s basic blocks contains the encryption logic:

loc_8DC78
ADD             W3, W3, #1      ; i++
LDRB            W6, [X5],#1     ; x = buffer[cursor++]
AND             W7, W3, #0xFF   ; i %= 256
SUB             W0, W5, W1
MOV             X3, X7
CMP             X2, X0
LDRB            W0, [X8,X7]     ; +--
ADD             W4, W4, W0      ; | j = (j + S[i]) % 256
AND             W9, W4, #0xFF   ; +--
MOV             X4, X9
LDRB            W10, [X8,X9]    ; +--
STRB            W10, [X8,X7]    ; |
STRB            W0, [X8,X9]     ; | S[i], S[j] = S[j], S[i]
LDRB            W7, [X8,X7]     ; +--
ADD             W0, W7, W0      ; +--
UXTB            W0, W0          ; |
LDRB            W0, [X8,X0]     ; | x = S[(S[i] + S[j]) % 256] ^ x
EOR             W0, W0, W6      ; +--
STURB           W0, [X5,#-1]    ; buffer[cursor-1] = x
B.HI            loc_8DC78

This is RC4‘s pseudo-random generation
algorithm (PRGA):

i := 0
j := 0
while GeneratingOutput:
    i := (i + 1) mod 256
    j := (j + S[i]) mod 256
    swap values of S[i] and S[j]
    t := (S[i] + S[j]) mod 256
    K := S[t]
    output K
endwhile

Analysis of the decrypt_jar_128K CFG gives us information about where different
parts of the RC4 algorithm are located:

Encryption key generation

The key’s cross-references lead to a generation function based on a simple XOR
between a 16-byte hardcoded constant and the 16 first bytes of the string
com.dji.industry.pilot:

We are now able to statically unpack DEX files.

The DEX encryption is currently implemented in the DexPool class of DxFx

However, disassembly of the unpacked DEX files reveals a problem. The code for
a large number of methods seems to have been stolen, overwritten, and replaced
mainly by nop instructions:

We can therefore assume that the packer has a second bytecode protection
mechanism.

Bytecode where are you? Again…

Method debug info

The various methods whose code is stolen all seem to contain a
debug info offset
(debug_info_off) which also appears in the body of the method:

It seems there is something fishy with the debug_info_off, this field could
play a role in the method code unpacking mechanism, perhaps as an identifier.
Moreover, a classes.dgc file located in the APK assets contains a large
number of debug info offsets used in stolen methods… The classes.dgc file
therefore seems a potentially interesting candidate for further analysis.

The classes.dgc file

An entropy analysis reveals that the beginning of the file (oddly enough, a
128KB chunk) probably contains encrypted data:

This is a good lead to follow in the libDexHelper.so binary.

Encryption analysis

Likely, a mechanism similar to the 128KB chunk encryption of DEX
files is used for the classes.dgc file. Analysis of libDexHelper.so reveals
a function whose scheme also corresponds to an RC4 encryption algorithm:

We can confirm that is the classes.dgc decryption function by using a simple
Frida hook:

'use strict';

const dlopen_ext = Module.getExportByName(null, "__loader_android_dlopen_ext");
const nullptr = 0;

function main() {
  const rc4_fct_addr = Module.getExportByName(
    'libDexHelper.so',
    'p416302DA23BEF5D5A81473ACFAC4DA25'
  );

  Interceptor.attach(rc4_fct_addr, {
    onEnter: function(args) {
      console.log(args[0].readByteArray(32))
    }
  });
}

Interceptor.attach(dlopen_ext, {
  onEnter: function(args) {
    this.name = args[0].readUtf8String();
  },
  onLeave: function(retval) {
    if (retval != nullptr && this.name.includes('libDexHelper.so'))
      main();
  }
});

The result is:

           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  ef bd de 50 8b bb 81 c7 80 63 35 ca 95 6e 1d 1d  ...P.....c5..n..
00000010  36 d5 ef 02 df 2a 50 2b e8 88 03 c3 9b 45 da 5f  6....*P+.....E._

It matches the first bytes of the classes.dgc file:

As with the decrypt_jar_128K function, the basic block initializing S to
identity permutation reveals the presence of a cross-reference to the key.

Encryption key generation

From the cross-references, it is possible to locate the key generation function.
The CFG of the function looks a bit like the one used to generate the DEX
decryption key. However, a slightly more complex mechanism is used to generate
the key:

First, the MD5 hash of a 4096-byte binary
blob in memory is computed. MD5 is identified by looking at a sub-function called
in the previous CFG. This sub-function corresponds to the
MD5 algorithm for calculating a
block (512 bits). The algorithm is flattened and contains hardcoded K
constants (0xe8c7b756, 0xd76aa478, …).

The binary blob is loaded directly from libDexHelper.so and can be found even
in the packed version of the library. This chunk appears to be preceded by a
kind of header containing the name mthfilekey:

Once the MD5 has been calculated, a deterministic sequence is generated by
calling another sub-function. Analysis of the function reveals that it is a
Fibonacci sequence:

Next, the 16 bytes of the MD5 hash are XORed with 16 bytes retrieved directly
from the 4096-byte chunk (mthfilekey) following a deterministic walk based
on the Fibonacci sequence previously generated.

We are now able to statically generate the RC4 key that decrypts the first
128KB of the classes.dgc file.

The classes.dgc decryption is implemented in the CodePool._decrypt_chunk method of DxFx.

The RC4 key generation is done by the BinHelper.code_pool_key method of DxFx.

classes.dgc file format

Once decrypted, looking at classes.dgc reveals that the beginning of the
file contains a table indexing all the application methods
(code_item)
whose code has been stolen:

Each table item points to the code_item of a method:

However, as it stands, the Dalvik opcodes present in the method bodies seem
inconsistent and therefore probably obfuscated… At this stage, we have all
the elements needed to link the stolen bytecode (even if obfuscated for the
moment, we will address this later) to the application’s various damaged
methods. First of all, it’s interesting to understand when the packer repairs
the methods so that the application can run normally. This mechanism is
particularly interesting because it uses an ART’s functionality.

ART hijacking

ART in a nutshell

The Android Runtime (ART) is
Dalvik’s successor runtime in charge of optimizing and executing code for
Android applications and other Android system components. The Android Runtime — How Dalvik and ART work?
article by Paulina Sadowska is a great introduction to ART.

Class loading mechanism

When a method is to be executed, the runtime must first check that the class to
which the method belongs is loaded. If this is not the case, the runtime will
load and link the class. The linking process involves several phases as
described in the Java Language Specification:

Class verification;
Class preparation;
Resolution.

The stage we’re interested in here is the class verification because it’s
precisely this stage that is instrumented by the packer. Among other things,
this step checks the bytecode of the class’s various methods for
inconsistencies. It is implemented in the ClassLinker::VerifyClass method of
ART.

One of the interesting features of VerifyClass is that it calls the
UpdateClassAfterVerification method:

static void UpdateClassAfterVerification(Handle<mirror::Class> klass,
                                         PointerSize pointer_size,
                                         verifier::FailureKind failure_kind)
    REQUIRES_SHARED(Locks::mutator_lock_) {

  // [...]

  // Now that the class has passed verification, try to set nterp entrypoints
  // to methods that currently use the switch interpreter.
  if (interpreter::CanRuntimeUseNterp()) {
    for (ArtMethod& m : klass->GetMethods(pointer_size)) {
      if (class_linker->IsQuickToInterpreterBridge(m.GetEntryPointFromQuickCompiledCode())) {
        runtime->GetInstrumentation()->InitializeMethodsCode(&m, /*aot_code=*/nullptr);
      }
    }
  }
}

UpdateClassAfterVerification updates the entry points of the various methods
of the verified class. So, it has to iterate over all
the methods of the class and call the Instrumentation::InitializeMethodsCode
method:

Anatomy of the hook

The Instrumentation::InitializeMethodsCode method provides a crossing point
on every method in the application that can be executed. It is precisely this
crossing point that is exploited by the packer to repair methods whose code has
been stolen. To do this, libDexHelper.so places a hook on
InitializeMethodsCode:

The prolog of the Instrumentation::InitalizedMethodsCode method is patched to
redirect the execution flow to a function in libDexHelper.so that we call
PatchMethodCode :

A few moments later… we
can deduce the hook’s anatomy and the different operations performed by
PatchMethodCode :

Once the PatchMethodCode function is called, it first loads the
obfuscated bytecode of the current method using the debug_info_off as an
identifier with the method index table of the classes.dgc file. The code is
passed to the function we call here DecryptMethodCode to be
de-obfuscated. Then code_item (dex::CodeItem)
of the method (art::Method)
is patched to point to the buffer containing the de-obfuscated bytecode.

This mechanism ensures that the damaged code in each method is repaired before
the method is executed. At this point, the last thing we need to understand is
how bytecode is obfuscated in classes.dgc. To do this, we need to analyze the
DecryptMethodCode function.

Bytecode de-obfuscation

The function is rather small, and an analysis of a few basic blocks gives a
good idea of how it works:

The function iterates over each opcode. The obfuscated opcodes are XORed with
the low byte of the method’s info_debug_off offset. The result of this
operation is then used as the index of a substitution table. The obfuscated
opcode is replaced by the one obtained from the substitution table:

opcode = S[obfuscated_opcode ^ info_debug_off & 0xff]

Since the substitution table is theoretically a maximum of 256 bytes, one might
assume that one of the RC4 KSA previously reversed is reused to generate it,
but… no.

The S substitution table is simply stored in the libDexHelper.so library
and can be directly extracted from the packed binary. We have everything we
need to fix all the damaged methods and the unpacked DEX can be decompiled
properly:

We are now able to perform static unpacking of the application.

The method fixing step is implemented in the Dex class of DxFx.

The bytecode de-obfuscation is located in the MethodCipher class of DxFx.

Conclusion

Through the unfolding of the analysis methodology used to create a static
unpacker, we can see the different encryption/obfuscation algorithms used by
the packer at different stages. In addition, we were able to highlight an
interesting protection mechanism involving bytecode injection and
exploiting Android runtime hijacking.

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

DJI – The ART of obfuscation

Introduction

Targeted application

Bytecode, where are you?

Primary analysis

First look at the packer binary

Static unpacking of DEX files

DEX files where are you?

Encryption analysis

Encryption key generation

Bytecode where are you? Again…

Method debug info

The classes.dgc file

Encryption analysis

Encryption key generation

classes.dgc file format

ART hijacking

ART in a nutshell

Class loading mechanism

Anatomy of the hook

Bytecode de-obfuscation

Conclusion

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY