This crackme doesn’t require a lot of reversing skills but provides plenty of opportunities to learn new tools and techniques. At least I learned a few new things.
First of all, the archive contains three files: a Windows executable file, a password-protected zip archive (by default, “hackthebox” will not work), and a beautiful picture. Let’s check what’s inside this Windows executable:
1 2 3 4 5 6 | $ diec PE64 Linker: Microsoft Linker(14.36.32825) Compiler: Microsoft Visual C/C++(19.36.32825)[C] Tool: Visual Studio(2022 version 17.6) Packer: PyInstaller |
This was the first thing I learned: there is a way to create an executable file for different platforms, inside which there is a Python interpreter and all the necessary dependencies! Luckily for me, there’s no need to dig into its innards and unpack it (although that would be an interesting exercise), since the the unpacker already exists, even with the Web version.
Running pyinstxtractor
produces a lot of files, but only two of them look interesting: maze.pyc
and obf_path.pyc
, which it uses as one of the imported files. The other files seem to be dependencies, dependencies of dependencies, and some kind of glue for PyInstaller to work.
Writing a decompiler for pyc
files also seems like an interesting challenge, but I cut corners and used one that already exists. Now, the fun begins.
The first thing that maze.py
does is that it checks that the input string is Y0u_St1ll_1N_4_M4z3 and then executes the obf_path.obfuscate_route()
function. Decompiling the obf_path.pyc
doesn’t help much: it loads the binary blob and executes it. The question is how to convert this blob back to Python code.
Google hasn’t automatically answered how to generate pyc
files from code
objects, so I’ll have to dig deeper. And that was the second thing I learned. There is a module named py_compile that generates bytecode from a source file, so it should definitely have something related to my question. It turned out that the only thing it does is call a separate module:
1 2 3 4 5 6 7 8 9 10 11 | if invalidation_mode == PycInvalidationMode.TIMESTAMP: source_stats = loader.path_stats(file) bytecode = importlib._bootstrap_external._code_to_timestamp_pyc( code, source_stats['mtime'], source_stats['size']) else: source_hash = importlib.util.source_hash(source_bytes) bytecode = importlib._bootstrap_external._code_to_hash_pyc( code, source_hash, (invalidation_mode == PycInvalidationMode.CHECKED_HASH), ) |
and then it writes this blob to the pyc
file. I thought that the _code_to_timestamp_pyc
function should do some complex things, but no, the pyc
file format is quite simple:
1 2 3 4 5 6 7 8 | def _code_to_timestamp_pyc(code, mtime=0, source_size=0): "Produce the data for a timestamp-based pyc." data = bytearray(MAGIC_NUMBER) data.extend(_pack_uint32(0)) data.extend(_pack_uint32(mtime)) data.extend(_pack_uint32(source_size)) data.extend(marshal.dumps(code)) return data |
So, for our task, we need to call this function directly by providing a code object from the obf_path.py
file:
1 2 3 4 | v = loads(b'...') pyc_data = importlib._bootstrap_external._code_to_timestamp_pyc(v) with open('obf_path_unpacked.pyc', 'wb') as f: f.write(pyc_data) |
Decompiling the result with uncompyle6
creates another challenge: the next layer now uses lzma
and zlib
to decompress a binary blob and then executes it. Unpacking it and writing to standard output will produce Python code that looks a bit weird: between the long lines that assign a string value with a lot of repeated __regboss__
to a variable with the same name, there is another call to exec (loads(… )). I googled this string just out of curiosity, and it turns out that this code was generated by a “protector” named Regboss. Protector! Which compresses the code and assigns a variable with a long name around it! The world is going crazy.
In any case, decompiling the blob using the same technique with importlib
will eventually give you the source code. Simplifying the logic a bit, it looks like this:
1 2 3 4 5 6 7 8 9 | index_file = "maze.png" index = open(index_file, "rb").read() seed = index[4817] + index[2624] + index[2640] + index[2720] print("\n\nG00d!! you could escape the obfuscated path") print("take this it may help you: ") sleep(2) print(f"\nseed({seed})\nfor i in range(300):\n randint(32,125)\n") print("Be Careful!!!! the route from here is not safe.") sys.exit(0) |
So, there are no flags here, but it looks like a hint for the next step:
1 2 3 | seed(493) for i in range(300): randint(32,125) |
Returning to where we started, maze.py
has the following logic after the obf_path.obfuscate_route()
function I just described:
- unpack the
enc_maze.zip
archive with the hardcoded password - read the contents of the
maze
file from the archive - “decrypt” the content using the following logic and write the result to the
dec_maze
file:
1 2 3 4 5 | for i in range(0, len(data), 10): data[i] = (data[i] + 80) % 256 else: for i in range(0, len(data), 10): data[i] = (data[i] ^ key[i % len(key)]) % 256 |
The dec_maze
file looks like an ELF binary, but none of the analysis tools can recognize it:
1 2 3 4 5 6 7 8 9 10 11 | $ xxd -l 64 dec_maze 00000000: 3f45 4c46 0201 0100 0000 5f00 0000 0000 ?ELF......_..... 00000010: 0300 3e00 6000 0000 8010 0000 0000 3d00 ..>.`.........=. 00000020: 4000 0000 0000 0000 4b31 0000 0000 0000 @.......K1...... 00000030: 0000 3d00 4000 3800 0d00 4000 6d00 1c00 ..=.@.8...@.m... $ file dec_maze dec_maze: data $ objdump -f ./dec_maze objdump: ./dec_maze: file format not recognized |
If you look closely, the file is corrupt: the first byte of the ELF magic should be 7f
, plus some other fields in the header have strange values too. My idea was that it was not fully “decrypted”, or the decryption algorithm should be modified somehow to get the correct result.
And indeed, when I carefully read maze.py
again, a problem emerged:
1 | key = [0] * len(data) |
In other words, the second round of the “decryption” algorithm actually does nothing: it XORs the data with 0, resulting in the same value. My gut feeling said that I needed to use the “hint” code from the obf_path.obfuscate_route()
function:
1 2 3 4 | key = [] random.seed(493) for i in range(300): key.append(random.randint(32,125)) |
and voila:
1 2 | $ file dec_maze dec_maze: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=fda317f523cc4b926eea4e2565e7b9e6390f5aff, for GNU/Linux 3.2.0, stripped |
Now we finally have a binary to analyze! During the disassembly round, the following logic was revealed (in pseudo-C):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | fgets(input, 64, stdin); if (input[0] != 'H' || input[1] != 'T' || input[2] != 'B') return -1; int length = strlen(input); for (int i = 1, flag_index = 0; i < length; ++i, ++flag_index) { if (i + 1 == length) break; if ((input[i-1] + input[i] + input[i+1]) != encrypted_flag[flag_index]) return -1; } return 0; |
In other words, the algorithm compares the sum of three adjacent characters in the input data with some integer in the encrypted_flag
array. This means that in this case 'H' + 'T' + 'B'
should be equal to '0xDE
, then 'T' + 'B' + '{'
should be equal to '0x111
and soon.
To reverse this algorithm, we need to do the following:
- get the current integer from
encrypted_flag
- subtract it from the sum of the two previous characters
- the result is the current character we need
In code it might look like this (in pseudo-Java now, because why not):
1 2 3 4 5 6 7 8 9 | var flag = new char[encrypted_flag.length + 3]; flag[0] = 'H'; flag[1] = 'T'; flag[2] = 'B'; for (int i = 0; i < encrypted_flag.length; ++i) { var c = encrypted_flag[i] - (flag[i+1] + flag[i+2]); flag[i+3] = (char) c; } |
And that’s all we need.
The challenge can be found here.