Revenge of the Unitrix

By Zvi Azran & Dana Yosifovich

Summary

Unitrix is the name given to an old homographic exploit that misuses the invisible Unicode Standard character U+202E, which flips the direction of the characters displayed in the text. We revisited this old exploit and found that many malicious attackers are still using it to trick users into opening malicious files.

For example, to hide an executable file as a pdf file, the attacker can name the file as GTAcheatsheetexe.pdf, which is what the user sees. In fact, the real filename is actually U+202Efdp.exe” with an invisible character flipping the ending. Together with an icon change, this method can be implemented on any file extension to give the appearance of any file type (examples: pdf ⇄ fdp, png ⇄ gnp, mp4 ⇄ 4pm).

Since the beginning of the year, we’ve seen many malicious files using the Unitrix exploit. We’ve realized that the phenomenon of Unitrix is far more common than we first thought, and we’ve witnessed many different malware families or even simple cryptominers that use this, such as Revenge RAT, Echelon stealer, Orcus, and more.

This report will explain the whats, the whys, and the hows of Unicode homographic exploits; how Unitrix fits into the picture; and examples of malware we encountered using it (listed in the Malware Overview).

Unitrix Overview

Unicode Basics

The Unicode Standard is one of the most essential building blocks of the global computing world — allowing anyone to write and read in their own language. To do that, it lists unique “code points” (U+hex), that are used to represent letters (D, ž,Dž, ʶ,愛, 𓂀), symbols (+∊≠, £¥₪, ҂˚˟˿), marks (ם֑֟֯, ী,◌҉), separators (, , , ,  ), emojis (😊, 🙏, 👍), and much more. For simplicity, we will refer to these code points as “characters” (as most people do) for the rest of the explanation.

Simply listing unique numbers for characters is not enough. Characters can change their shape or change the sentence depending on the context. To support that, every character comes with a list of properties. These properties may define the width (AA) of the character (if it has a width at all), its role in the sentence (-”.), its direction (EƎ), and much more.

What about the visual element? This is done with fonts. Fonts are files that store glyphs (pictures of characters), that present the characters' look. If the font does not contain a glyph for a character, it displays a substitute/replacement character, e.g. �, □.

Now that we’ve discussed the basics, let’s concentrate on the most famous Unicode challenge out there: the homograph attack (or ‘visual spoofing').

Homograph attacks

These are texts that look the same for a user but are completely different for the computer.
Homograph attacks can be carried out in multiple ways:

  • Swapping between two different characters that look the same. For example, can you spot the one different character between “ReasonLabs” and “ReasonLabs”?
a latin small letter aa cyrillic small letter a

Lots of languages have characters that look like other characters from other languages. These confusables are all mapped so any attacker and any defender use the same list. The problem with defending against homographic attacks are the many options for each character, making any sentence a combinatorial explosion.
Case in point: here are 15 “a” lookalikes: a,a, 𝑎, 𝗮, 𝕒, 𝚊, а, ɑ, α, 𝔞, 𝒂, 𝘢, 𝙖, 𝐚, 𝖺.

  • Adding invisible characters: the string “Reason‪‬‭⁠⁡⁢⁣⁤⁦⁧⁩𝅳𝅴𝅵𝅶𝅷𝅸𝅹𝅺󠀁󠀠󠀡󠀢󠀣󠀤󠀥󠀦󠀧󠀨󠀩󠀪󠀫󠀬󠀭󠀮󠀯󠀰󠀱󠀲󠀳󠀴󠀵󠀶󠀷󠀸󠀹󠀺󠀻󠀼󠀽󠀾󠀿󠁀󠁁󠁂󠁃󠁄󠁅󠁆󠁇󠁈󠁉󠁊󠁋󠁌󠁍󠁎󠁏󠁐󠁑󠁒󠁓󠁔󠁕󠁖󠁗󠁘󠁙󠁚󠁛󠁜󠁝󠁞󠁟󠁠󠁡󠁢󠁣󠁤󠁥󠁦󠁧󠁨󠁩󠁪󠁫󠁬󠁭󠁮󠁯󠁰󠁱󠁲󠁳󠁴󠁵󠁶󠁷󠁸󠁹󠁺󠁻󠁼󠁽󠁾󠁿Labs” is 126 characters long, and its invisible characters are all unique (check for yourself).

long name

Invisible characters are an important part of some writing systems and are also used as digital signatures to legally prove who used your text. Unfortunately, they can also be used maliciously in cases where lengthy string checks are skipped — very long strings will pass inspection without the user noticing.

  • Using font files: any character can potentially be shown as another character. However, this is a harder attack to carry out because the user needs to install and use the malicious font.

What problems do these attacks cause?

  • A user can be tricked into opening a file that looks ‘safe’, but is actually a virus or malware.
  • A program will slow down or crash when processing huge invisible texts.
  • Users may be tricked into entering their private information on an attacker' s website or sending emails to an attacker' s email (HostSplit/HostBond exploits).

How does the industry defend against homograph attacks?

There are various ways to defend against these attacks, depending on the severity of the threat. Here is a good read on how Chrome deals with Unicode in Internationalized Domain Names (IDNs)

Having discussed the basic elements of Unicode, we can now explore the Unitrix exploit.

Unitrix explained

Unitrix is an old exploit — old enough that it has come back into fashion! The name Unitrix was coined by Avast in 2011, but in actual fact, it has famously been used since the beginning of the internet, in order to troll users.

The premise behind the idea is very simple: One invisible character, U+202E, flips the rest of the letters presented until the end of the sentence.

This character is called the Right-To-Left Override (RLO) and it is one out of 12 invisible Explicit Bidirectional Formatting characters used to enforce different direction constraints on the sentence. Why do we need them? They help present text to more than 300 million speakers of right-to-left languages e.g. Hebrew or Arabic.

You could even argue that the Trojan Source exploits, which visually flip parts of the sentence around within source codes, are versions of Unitrix exploits because they also use U+202E. The main problem is when U+202E is used in filenames of executables in order to trick users into clicking on them.

Some examples from what we’ve seen:

Additionally, the icon of the file is changed to look like the type of file the user has been tricked into clicking on.

Very rarely U+202E is used correctly in filenames. This is done by having a closing U+202C or U+202D to display the file ending correctly. For example, “Not_Unitrix.‮cod‬.exe” (Not_Unitrix. U+202EcodeU+202C.exe) is not the Unitrix problem, because the file ending is not flipped. On the other hand, it can still be misleading, such as in “S‮putes_epyk‬.exe” (SU+202Eputes_epykU+202C.exe) but at least we know it's an executable.

Another point to mention is that although there are other invisible characters that can be used, they are not exploitable in filenames. For example, “s⁧⁨mp4⁩.exe⁩” (flipped depending on environment) will have its invisible characters shown as in the File Explorer. The Unitrix exploit could have been avoided if the same approach would have been implemented in the past.

The situation in Android and iOS is not different. Users can easily be tricked by the filename to download and run an APK/IPA application archive file ignoring the warnings.

List of known attacks that use Unitrix: https://attack.mitre.org/techniques/T1036/002/

Malware Overview

The following malware families have all been using Unitrix to deliver themselves:

  1. Njrat \ Bladabindi
  2. Quervar virus
  3. Orcus RAT
  4. Echelon stealer
  5. Revenge RAT
  6. Cryptominer

Finding these malware families isn’t difficult if one puts their mind to it — there are more malware families than we realize that deliver infostealers, cryptominers, RATs, and whatnot, using this very old but successful technique.

Below are examples of filenames of malicious files in the wild that have the Unitrix character in them. These file names are UTF8 encoded and by decoding them with cp-windows-1252, we can easily see how the computer views the text:

However, for the user, this is displayed as:

1. Njrat \ Bladabindi

We have been keeping an eye on Unitrix usage in the wild, and have spotted a campaign of “.sln” projects (visual studio project file, which means this campaign targets developers). However, these are in fact not.sln projects, but.scr files with an icon of Visual Studio — a very confusing file for the average user.

SHA1 c0c6269ea11ad39d961ae56c0fae0f3aa633fc04

This is in fact a sneaky infostealer, which harvests passwords and user data by stealing the information from the user’s browser files, and uses Discord Webhooks (a way that Discord offers to send messages and updates to a text channel) as a way to send the stolen data to the attacker.

The malware writes a file named “PerfWatson.exe” to the hard disk, which is a process associated with Visual Studio. It uses this name to further hide the malware behind its visual-studio-project facade.

In summary, the infostealer:

  • Uses Discord CDN (Content Delivery Network) to further download another file: “final.exe” hxxps[:]//cdn[.]discordapp[.]com[/]attachments[/]933…..7634/..1743…16/final[.]exe (truncated)
  • Employs Discord webhooks to send stolen information:

  • Verifies the user’s IP address by using https://api64.ipify.org.
  • Sends host information to the C&C (Command & Control) address: 94.71.213.142., partly in plaintext, partly in base64 encoded text.
  • Creates persistence in the Startup folder and registry Run key, and adds a firewall rule to allow network communication to its own program that it created at “C:\ProgramData\NtUserRuntime.exe”, recognized as Bladabindi.

For the Virus Total hunters among us, a useful way to catch more samples of this Bladabindi distribution is by searching by icon: https://www.virustotal.com/gui/search/main_icon_dhash%253Ad2de8f9c9e9ad818/files

Here we see 30 files that all have the same icon hash, all with more than 30 positives, and with similar-looking names. And while we are on the subject of ‘similar’ — we can also click on Virus Total’s ‘Similarity’ button, and see that 29 of them have the same ‘imphash’ (or import hash), and half of them are downloaded from cdn.discordapp.com.

We can also gather more potential file names, such as:

Additionally, if you search for this pdb path “C:\\Users\\gamze\\Desktop\\Stub\\Stub\\obj\\x86\\Debug\\Stub.pdb”, it yields 15 more files, with different icons.

If we want to search for more files that use the Unitrix trick in Virus Total, we can do it as follows:

name:rcs..sln

2. Quervar Virus

The Quervar virus has existed since 2013. It is a file infector used to infect Microsoft Office documents on the machine by encrypting the document contents, storing them inside its own executable, and opening it when the user clicks on it. Quervar will rename the file as the name of the original document, with the addition of the Unitrix character, and a “.scr” extension. So in case of infection, the user’s documents will look like the original names, plus an ending of “rcs”.

For example:

Original file name: bill no 354 i c l.xls
New file name: bill no 354 i c l U+202Eslx.scr
Display post-infection:

When the user clicks and opens this file, two processes are created:

  1. Excel.exe opens the original Excel file, so the user will not suspect that something bad has happened.
  2. The malicious process (in this case: “11E00.exe”).

The Excel document is not interesting for further analysis, so we will focus on the malicious process that was created.

The process adds persistence by creating an.lnk in the Startup folder:
C:\Users\{user}\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\38B48.lnk”.

The target of the link file is “C:\Users\{user}\AppData\Local\Temp\305E0\11E00.exe -cook shell32.dll

After the persistence creation, the process stops its activity. This requires us to restart the machine as part of the analysis process — in this way, the attacker can evade sandbox products, as the malicious activity cannot be identified at a regular execution. After a restart, the real flow continues, and the specimen executes its malicious activity.

3. Orcus

Another good example of the usage of Unitrix in filenames is the Orcus RAT, which has been delivering files with extensions of media files, like “.avi”:

Orcus Remote Access Trojan is capable of accessing the infected host remotely, and applies malicious commands such as password stealing, live command execution, screen capture, web camera and microphone recording, keylogging, and more, as custom plugins can be developed by the users for it.

Example of the code of this file (Sha1: fe445e052ee2dda6ddb26a5337757f2c8b8fcd56):

4. Cryptominer combined with Echelon infostealer

Echelon is a Russian open-source infostealer written in C#, that sends the stolen information with a telegram bot.

The file is disguised as a text file with a misleading “.txt” extension, but it is in fact a .exe file. sha1: 30b814398e42f86fccd29514168328d0fc0d5f6f

When executing the file, at first glance it appears to be a cryptominer, as it drops a miner to the host and starts mining xmr.pool.minergate.com at port 45700 user=mouysyroussse@mail[.]ru.
But when examining the strings, we can see some very interesting activity, including theft of clipboard data and crypto wallets.

Examples from the strings contained in the file, that indicate its infostealing activity:

After execution is finished, the information is sent via a telegram bot in the following format:

5. Revenge RAT

Revenge RAT is a Remote Access Trojan that has accompanied us since at least 2019, written in .NET, and capable of reconnaissance actions and spying on users. There is an open-source version available on GitHub, though the version below has some differences. More details will follow in a separate article.

In this case, we’ve seen many different Revenge RAT files sharing common grounds — using “mp4” extensions such as:

Filename:
SHA1: ccac9c7ebc86bb57747d6bda1e886ff7f9b16578

This file is creatively called “sexexe..mp4” which is actually “sexU+202E4pm..exe”.

This file source code is not obfuscated, and has visible Revenge RAT strings, already present in the file metadata:

Abilities and actions of the Revenge RAT:

  • Captures screens, audio, and webcam
  • Process injection
  • Dynamically loads DLLs
  • Persistence under SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run\\Client
  • Creates mutex called “RV_MUTEX”
  • Gathers user information
  • Communicates with C2 server (in this example, 69.207.180.32 — an address that serves the Orcus RAT as well)
  • Creates registry key under HKEY_CURRENT_USER\\SOFTWARE, base64 encoding of the mutex, which is “UlZfTVVURVg=”

The program sends user information back to its C2 server — the image below displays the information that it will send first:

Key & ID of the attacked host, IP address, computer name & user name, OS name & processor name and type, total physical memory, volume size, antivirus installed, firewall installed, name of the foreground window, language — all encoded in base64 format.

6. Cryptominer disguised as Need For Speed torrent download

How could we complete this article without mentioning a classic miner?
As with the examples mentioned earlier, the attacker flips the characters of “.scr” and “torrent”. The user thinks that before him lies a torrent download of the Need For Speed computer game, and for good reason — the “rcs” has been very well hidden in the name:

The actual name is:

As well as the misleading name, the file’s icon is the one of µTorrent.

The file also contains a real torrent download, so the user will not suspect that something malicious has occurred. When the user clicks on download torrent0.torrent file, the actual XMR miner starts to run in the background.

Sha1 1ed6550cb59f41dc7a4131dc0460775b448316a0

Other than this new method of delivery, there is nothing new to report on this miner; it has a configuration file in the program data folder, starts the mining process as “svchost.exe”, and uses the following pool us1.ethermine.org at port 4444.

Hunting

Amongst successful implementations of misleading file names, there have been several failed attempts that have left the Unicode characters in plain sight:

When hunting for these failed files, it’s a good idea to search for the characters that are escaped to be in ASCII/basic Latin text.

How to search for filenames with Unitrix by order of efficiency (some file names contain these patterns legitimately, so these are not meant for AV or YARA rules, but for hunting):

  • Explicitly search for the U +202E character (it varies in different environments) e.g. query like U&'%\202E%'.

Finding failed attempts:

  • Search for the escaped form in ASCII, e.g. “202E”, “u202E”, “\202E” etc.
  • Search for plaintext flipped extensions: e.g. gnp.exe

Virus Total Hunt

If you’d like to find more samples that use this method, the search options here are countless. Try searching for the following patterns:

-name:‮nls..scr
N.B. The two dots are the result of an attacker using this method without fully understanding it:

-name:‮gnp.exe
-name:‮‮tnerrot.scr
-name:‮cod.scr
-name:‮3pm.exe
-name:‮4pm.exe

Conclusion

We hope that in this article we were able to convince you that the Unitrix threat not only still exists, but should be taken seriously. As silly as it may seem, this exploit is widely used all over the world, with new campaigns and malware being delivered daily. We’ve seen that Unitrix is used in cryptominers, infostealers, file infectors, and RATs — basically, everyone is still using it as a way to fool the user into running the virus.

It’s easy to miss the signs of a Unitirix exploit because files are often submitted with the filehash as the filename, instead of the original filename, so this very valuable piece of information is usually not present.

There is a strong case to be made that U+202E should be a displayable character in filenames. At the very least, the programming community needs to understand the scope of the problem. In general, Unicode-related threats are all known, but they are not always dealt with. Let' s at least not be tricked by one character.