GitHub – LukasOgunfeitimi/TikTok-ReverseEngineering

by oqtey
GitHub - LukasOgunfeitimi/TikTok-ReverseEngineering

This project is for reverse engineering the TikTok Virtual Machine (VM).

TikTok uses a custom virtual machine (VM) as part of its obfuscation and security layers. This project includes tools to:

  • Deobfuscate webmssdk.js that has the virtual machine.
  • Decompile TikTok’s virtual machine instructions into readable form.
  • Script Inject Replace webmssdk.js with the deobfuscated VM injector.
  • Sign URLs Generate signed URLs which can be used to perform auth-based requests eg. Post comments.

When looking at webmssdk.js you’re met with a
heavily obfuscated file. The main method of obfuscating Javascript
is to take advantage of bracket notation which let’s you index a variable
using another variable.

So when you see something like this:

// Line 3391 of ./deobfVersions/raw.js
r[Gb[301]](Gb[57], e))

You have absolutely no idea what it’s indexing.

Each use of this method is using an array Gb defined as

    var Gb = ["ydTGHdFNV", "sNxpGNHMrpLV", "xyrNMLEN Fpp rpMu", "ydWyNe", ...].map(function(a) {
        return a.split("").map(function(c) {
            return "LsfVNxutyOcrEMpYAGdFHneaUKRXSgoJDbhqICzPZklivTmWBwQj".indexOf(c) == -1 ? c : "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"["LsfVNxutyOcrEMpYAGdFHneaUKRXSgoJDbhqICzPZklivTmWBwQj".indexOf(c)]
        }).join("")
    });

As you can see we can’t even read this either as it’s all encoded
using this string "LsfVNxutyOcrEMpYAGdFHneaUKRXSgoJDbhqICzPZklivTmWBwQj".

Because this code get’s executed immediately we can simply take this snippet
and run it in any console and retrieve:

[
    "isTrusted",
    "beforeunload",
    "filename too long",
    "isView",
    ...
]

We can now see each of these strings, therefore we can use RegEx to go through
the script and replace all uses of the array as seen here
It will also convert the bracket notation back to readable dot notation.

After that we’ve left with webmssdk1.

The example from above now looks like this

r.addEventListener("abort", e),

Much better.

Another significant obfuscation method used is for disguising function calls.

Each function is defined in an array Ab.

  var Ab = [function(e) {
      return "[object Array]" === Object.prototype.toString.call(e)
  }
  , function(e) {
      return e && e.__esModule && Object.prototype.hasOwnProperty.call(e, "default") ? e.default : e
  }
  , function() {
      var Ga;
      Ga = [0, 1],
      (je = !Ga[0],
      le && (setTimeout(function() {
          document.dispatchEvent(new Event(pe))
      }, Ga[1]),
      document.removeEventListener("DOMContentLoaded", Ab[40]),
      document.removeEventListener("readystatechange", Ab[75])))
  }
  ...]

And it used by calling Ab[index](args) like:

When using common IDE’s if we click on this function it will just bring us to
the start of the array making it difficult to keep track of what function call
is calling what function.

We can make this readable by:

  • Taking the array
  • Replace each of the function element with it’s own standard function calling it function Abindex(args)
  • Replace each call to Ab[index](args) with Abindex(args)

We can do this by using the AST form of the script via bapel as seen here

Which gives us this

The Virtual Machine part of the script, specifically when executing the bytecode
is a nested if else statement as seen here

It is actually just a normal switch case but has been disguised pretty well. After manually
doing some of the cases, AI was able to help me out and do the rest. Which gave me this,
which looks pretty standard for a bytecode VM.

When debugging the Virtual Machine later and seeing which functions it uses
I was able to tell what it’s doing and changed some of var names.

After all of this and a few more small obfuscation techniques
here is the latest version of the file.


With the file fully deobfuscated, figuring out the functionality was much easier,
I easily found how the VM was being initiated here.

The bytecode is stored as a long string that’s all been XOR’ed with a key that
lies within the string.

// Line 3046 of latestDeobf.js
// Getting XOR key
for (var t = atob(payload), r = 0, n = 4; n < 8; ++n) r += t.charCodeAt(n);

// Decryping bytecode
unZip(Uint8Array.from(t.slice(8), XOR, r % 256), {  i: 2 }, t && t.out, t && t.dictionary),

// Extracting strings, functions and metadata for each function
for (var n = leb128(t), o = 0; o < n; ++o) strings.push(Ab27(t)); 
i = leb128(t);
for (o = 0; o < i; ++o) {
  for (var argsLength = leb128(t), isStrictMode = Boolean(leb128(t)), exceptionHandlers = new Array(), p = leb128(t), m = 0; m < p; ++m)    exceptionHandlers.push([leb128(t), leb128(t), leb128(t), leb128(t)]);
  for (var instructions = new Array(), h = leb128(t), v = 0; v < h; ++v) instructions.push(leb128(t));
  instructionSets.push([instructions, argsLength, isStrictMode, exceptionHandlers]);
}

NOTE: The string was gZip-ed and each value was leb128 encoded both for compression


Virtual Machine decompiling

TikTok is using a full-fledged bytecode VM, if you browse through it, it supports
scopes, nested functions and exception handling. This isn’t a typical VM and shows that
it is definitely sophiscated.

To be able to write a form of decompilation I simply went through each of the cases
and wrote appropriate code for each one, and any case that jumps to another position
for loops like this:

case 2:
    var a = instructions[index++];
    stack[pointer] ? --pointer : index += a;
    break;

I would simply stop it from doing so:

case 2:
    var a = instructions[index++];
    //stack[pointer] ? --pointer : index += a;

    addCode(`// if (!v${pointer}) skip ${a} to ${index + a}`, byteCodePos)
    break;

After doing this for all the cases I dumped each file here.
It’s not completely readable but you should be able to make out a general idea
of what each function is doing, for example VM223 which is
generating random characters.


As this is a Javascript file executed on the web, it is actually possible to replace
the normal webmssdk.js with the deobfuscated file and use TikTok normally.

This can be achieved by using two browser extensions known as Tampermonkey for executing
custom code and CSP to disable CSP so I can fetch files from blocked origins. This is so I
can put latestDeobf.js in my own file server and have it be fetched each time, this is so I can easily
edit the file and let the changes take effect each time I refresh. This makes it much easier to bebug
when reversing functions.

The script can be found here


Now that we have deobfuscated the file and decompiled the VM we can
start to reverse any function we want and figure out what it’s doing.

When you make a request to the server it usually consists of 3 additional headers.

Header Description
msToken Sent by the server and reissued on each request.
X-Bogus Generated by webmssdk.js based on request.
_signature Generated by webmssdk.js based on request.

When making a request that doesn’t require authentication like querying a user. Only X-Bogus is
needed to be generated which can be done using window.frontierSign. _signature isn’t needed
and any msToken can be used.

This popular API let’s you make those requests.
It uses a webdriver library called playwright, that simply sets up a browser instance, so it can easily call window.frontierSign.

When it comes to making authentication-based requests like posting a comment, _signature is needed
and isn’t exposed to window.


The inital function call for each request is VM86 which then calls

VM113 for X-bogus

VM189 for _signature

I was able to write signer which
succesfully signs URL’s.

Here’s a demo of posting a comment and checking it using a
private browser to ensure it’s successful.


PostCommentTest1.mp4


NOTE: There are also some bot protection methods such as mouse tracking (VM120)
and environment checking (VM265) within VM86, but it is a completely client-sided check and doesn’t communicate
with the server about, so it can be ignored when generating the signatures.


  • Note: The TikTok VM is constantly changing with new releases. There’s a high chance the main algorithms will change and decompilation of the new VM is needed.

Related Posts

Leave a Comment