Analyzing Incident Artifacts in Support of Forensic Identification

“Time isn’t the main thing, it’s the only thing”
Miles Davis

Author: Pablo Canseco

Abstract

The practice of investigating artifacts of interest during a cyber incident can seem overwhelming. A starting point to an incident can be an alert, a suspicious log, a misplaced or misspelled process, anomalous network activity, or what is commonly referred to as an indicator of compromise (IOC). What we see on our screen displays are merely references to the iceberg of underlying inner workings of layers of digital data. Each layer iteration is translated and delivered to yet another layer of data until it becomes the information we view on our displays—and that is just the start. When we consume, produce, and transmit information, we leave trails of digital artifacts over complex networks of transmission mediums and the systems that make transmission possible, even when an attempt is made to delete it.

In this paper, we analyze and present elements that are part of the practice of identifying artifacts of evidence forensics. We explore some of the drivers behind indicators and the intelligence behind them, and some of the factors that can influence forensic practitioners in making a positive identification of malicious artifacts.

Introduction

” Every kind of cyber operation—malicious or not—leaves a trail,” from “A Guide to Cyber Attribution” by the U.S. Office of the Director of National Intelligence.

The story of a network’s compromise is made out of many attributes, intricate, elaborate details, complex, and some times puzzling ideas; that by the time it has entered an organization’s network, it has only won half its battle—for the attacker as well as the defender.

As the threat actor lurks by the casted shadows of a (possible) SMBv1 vulnerability in your (impossible) Windows XP system, it can remain undetected slithering laterally to shared drives, file servers, unless a light is casted by tools like Wireshark, and others.

Analyzing network artifacts can be tricky for many reasons. The incident responder may witness an indicator of compromise at first sight, but there needs to be a process of identification, and analysis. If a compromise is validated, there comes into question the issue of containment; network activity can be severed if an infected host is attempting to establish connection with an external party to download its instructions, or worse—an advanced payload like a rootkit, ransomware, but how does an incident handler get to that conclusion?

Incident handlers, forensic practitioners are against the clock when there is a suspected compromise, they confront many decisions, and each is influenced by how much they know about their hypothetical adversary, and how much they do not know. If a decision is rushed and ill-informed, they can disrupt service, and availability of certain network activity. If they shutdown a live system, they could potentially be ridding the system of important network forensic data allocated in memory. (Messier, 2017) What if we could “freeze” time, or better yet, replay a series of events as we do with live packet captures. This method would provide us the opportunity to further analyze our network traffic samples offline, even after the threat is contained and eradicated. It could further provide an organization with forensic identification of network artifacts that can assist in improving the organization’s information security posture, threat intelligence collection and digestion, and even assist in designating cyber attribution, and more.

Statement of purpose

Forensic identification is the practice of identifying infected hosts by analyzing evidence of recent infections. (NIST, et al., 2013) In that spirit, we will investigate a scenario in our network that alerts us of the potential presence of a renown Trojan named LokiBot. We will analyze the alert component to understand its drivers and collect any relevant alert information to build our investigation upon. We will then replay a network traffic sample to identify anomalous network activity basing our investigation on some of the information we learned from the alert while keeping an open mind into other possibilities. Throughout this paper, we will also elaborate on the intelligence background of key elements that aid our investigation, and their relevance, as well as bring attention to topics that support evidentiary findings such timestamp analysis, cyber threat intelligence, beaconing, and others.

The Framework to Our Methods

Our digital world today is abuzz with new nomenclature to describe a wide array of topics, specially in technology where acronyms appear mandatory, even compulsive, but often playful and fun. All in all, fitting of our busier lives in attempt to cram more with less. The same can be said of malware, where the average lines of code for a malware specimen is 125, held in large contrast to the highly heralded Stuxnet malware, with about 15,000 lines of code. This is an important distinction when we consider that many programs, many of which that are designed to thwart malware contain thousands of lines of codes. This leaves Blue Teamers at a seeming disadvantage. However, Tavis Ormandi, author of an analytical paper of Sophos antivirus titled, “Sophail: A Critical Analysis of Sophos Antivirus,” notes that defenders do not have to analyze the thousands of lines of code in a program to be able to defend it. (Sikorski & Honig, 2012, pp. 23) Defenders can instead focus on areas that are most vulnerable according to their threat landscape, attack surface, threat intelligence gathering, and how those threats tend to manifest in their systems.

After a compromise has taken foothold in an organization’s systems and network, once they have gone undetected by an organization’s firewalls, UTMs, or NGFWs, sophisticated intruders could remain undetected and achieve persistence for days, months, and even years until they can accomplish their objective(s). (Sikorski & Honig, 2012, pp. 21–22)

Malware can use evasion techniques to hide itself, and at the very least obfuscate your threat hunting—if your organization has instilled such noble pursuit to this day. To the end of identifying and containing a malicious actor, monitoring and analyzing your network for an attacker’s lateral movement, or call-home beacons to C2 servers is an act of preparedness that gives further credence to the cliché saying that it’s not a matter of if your organization will be compromised, but when.

A Series of Unfortunate Syslog Events

“Time isn’t the main thing, it’s the only thing,” Miles Davis said.

Timestamp analysis is often overlooked but can ease the burden of event correlation. (Fletcher & Carbone, 2015) The U.S. Department of Justice’s forensic chart, “Digital Forensic Analysis Methodology,” during the analysis phase, raises the question about the time an event is created, transmitted, modified, or used, and what else happened during this time. (U.S. Department of Justice et al., 2007) If systems and event logs are not synchronized and cannot accurately depict a chain of events, can we confidently answer those questions? A forensic investigation can become incongruent and mired with inconsistencies and lead to greater risk if they are not in sync. (Symmetricom, 2009)

Threats actors can affect the timestamp in a file system, but metadata can also shed light on those affected changes. (Fletcher & Carbone, 2015) Separately, correlating time between systems and events is also essential to piece together the puzzle left by a series of events; it creates an accurate timeline and representation of what transpired, and very importantly—when. A representation of an accurate sequence of events aids a forensic investigator with organization, provides a context, corroborates information, and can reveal inconsistencies. (Fletcher & Carbone, 2015)

It is important to remember that time is organized and logged differently depending on a system file structure format. (SANS & Carbone, 2016) Consider for a moment that an Administrator is aware of critical event logs that cause a server to shutdown intermittently. We know that time across systems is commonly synchronized via NTP protocol servers in UTC standard, but what if our system is using its own Domain time server, and on top of that, is out of sync—we could be missing preceding, current, and succeeding lapses of microevents that led to a system crash. Now, let us take that scenario and add into its complexity more than one machine, and in addition, the immediacy of datagram units or packets arriving instantaneously next to each other. If we were looking at a security breach, we would be interested in knowing the exact time the compromise took place and what the threat actor did after, to what an organization’s security team did, or did not do. Time synchronization between devices to a unique time source is especially important while conducting forensic analysis. (Fletcher & Carbone, 2015)

Wireshark Time

Wireshark keeps an internal timestamp of the time in UTC, however, if the packet capture is taken from a system that does not reflect time correctly, it will skew the time of capture. (Wireshark, n.d.) For example, if one works in a team made up of members across the globe, analyzing a PCAP captured in a different time zone will display in local time unless we make the conversion. In the event where multiple PCAP files need to synchronize, the Time Shifting feature in Wireshark can be used from two distinct locations with the same stream of data.

To reference time around a packet in a packet capture, Wireshark can be set to reference one particular packet as the basis of time instead of the default of taking time in relation to the beginning of the capture. (Sanders, 2017)

Indicators of Compromise

An Indicator of compromise is any information expressed independently to describe an intrusion. (Sanders & Smith, 2013) Below, we will be looking at some of the most salient indicators of compromise for a positive identification of malicious network artifacts.

LokiBot Malware

LokiBot has seen a resurgence of activity since July 2020, according to the Cybersecurity and Infrastructure Agency, formerly led by Chris Krebs as of November 17, 2020. (Cybersecurity & Infrastructure Security Agency, 2020) LokiBot belongs to a family of malware that steals information such as credentials and is a renown keylogger.

In our case scenario, our Computer Incident Response Team (CIRT) has just received an alert from their Suricata IDS.

The alert has found a match to the rule below:

The Emerging Threat signature rule above explicitly identifies the threat as a trojan, this information forms basis of suspicion.

Below, we identify the top talkers on the network. We can also deduct sizeable data packets are transmitted to external IP 104.27.191.242, this indicates a type of upload. Talos Reputation Center reveals that this IP belongs to CloudFlare, a cloud-service infrastructure provider. (Cisco Talos Intelligence Group, n.d.)

Wireshark Screenshot 1

Our attention diverts to the sizeable traffic between these two hosts, especially due to the nature of the exchange. We make use of the Wireshark expression filter ip.dst==10.1.16.101 && ip.src==104.27.191.242 to narrow specific traffic from the external IP to our internal host.

Wireshark Screenshot 2

As we see above, exporting the HTTP object list reveals two sources; the first is a well-known Microsoft service file that checks for network connectivity, therefore, it is normal. (Microsoft, 2008) The remaining and corresponding HTTP network traffic over port 80 is to the external IP by hostname himkon.ga, but how do we know if this is intentional by the user or if it is incidental.

A further dig into traffic between our internal host and the external server, we see that our internal host is sending a POST request to a Uniform Resource Indicator hxxp://himkon.ga/amaswitch/fre.php. A POST method request is common when a user is sending information over to a server such as when a user is uploading a file, filling out a form. Then the question becomes, how do we know our user is not filling out a web form or conducting other legitimate and innocuous activity.

Wireshark Screenshot 4

In the example above, we can clearly see the content type corresponding to a Mozilla User Agent. The content type application/octet stream tells us sent data is encoded in binary, but its true data type is unknown. (Mozilla.org, 2020)

Thus far we can enumerate a number of red flags (other than the alert) including a 404 Error when our internal host contacts the PHP landing page, this tells us the web server which is potentially a Command & Control server is not fielding a “web” requests over port 80.

The information we have gathered from the network artifacts gathered above tells us half the story, however, it is enough for a positive identification that should prod the CIRT to move forward with containment and a static analysis of the infected host machine such as looking for malign, hijacked processes, hidden directories, files.

In the after-action review, we may want to look at the end-user’s recent downloads from the web before the suspected network traffic occurred, including Phishing emails with malicious links or dangerous file type attachments, among a few. The information we gather after we eradicate the malware, can inform us what went wrong, where improvements need to be made to include other preventative measures that the organization can implement.

Malware Attributes & Traits

In our LokiBot example, we identified an Emerging Threat rule, which was our first alert into suspicious activity on our network. Signature-based rules are designed to look for specific and known properties in network packets. However useful this alert is, it still relies on an existing fingerprint that has been previously disclosed, which can include hashes and/or file names. Just like an antivirus Software, if the signature does not match the rule, the alert is not raised. This is an important consideration given that most malware is recycled, according to Trend Micro. In fact, much of malware code is recycled from parts of other malware. (Trend Micro, 2018) We can see the evidence in many infamous examples like the NSA EternalBlue exploit leaked by a group named Shadow Brokers, that paved the way for worms like WannaCry and NotPetya, which wreaked havoc around the world. (Greenberg, 2018)

Two of the most widely popular Intrusion Detection System engines are Snort and Suricata, and their primary ruleset sources are Emerging Threats (ET) and The Vulnerability Research Team (VRT). Both ruleset sources, have open-source, community-driven rulesets available for download for each engine, as well as a paid subscription service. Though these engines allow rules to be preset with priority numbers, some organizations have multiple engines running at the same time, and in separate network domains. (Sanders & Smith, 2013)

Now that we have a background about where some of the rulesets come from, we can unpack our LokiBot rule example.

It is important to reiterate that IDS signature-based rules look for Indicators of Compromise in the network packet data, in some way, similarly in how we scoured for anomalous evidence in Wireshark.

There are two syntax sections used by a rule; the header and options. The first word in the header indicates an action with the word “alert,” then follows the protocol used. LokiBot is known for beaconing over HTTP, hence the HTTP element in the rule. What follows is the source and destination that applies to the rule, and this is important in understanding the data flow we want our rule to apply towards. The flow direction in this instance is crucial in considering that as a renown keylogger, LokiBot may be exfiltrating data out of our network. The content option of the rule identifies Application layer data, which can also be viewed in our Wireshark analysis. The flow option requires that there is an on-going connection in the direction to the server. The remaining options are not as relevant to our analysis at this point because they have more relation on how the rule should work with the IDS engine to detect the threat.

IP Bread Crumbs & Beaconing

An IP address is an example of an Atomic Indicator, in that it is very specific, and cannot be broken down further. (Sanders & Smith, 2013) While IP source is not a definitive object of attribution or malicious intent, it does provide statistical data sets for further analysis. For example, it can provide quantitative information regarding number of attacks from a certain IP pool source, and as well as correlate those attacks used for other criminal activity, which can indicate the reuse of criminal infrastructure. (McAfee, 2020) In 2020, IP geolocations where cybercrime originated against cloud accounts were from countries predominantly where cybercrimes are not typically enforced. In contrast, none were from Europe, largely attributed to having some of the most stringent data protection laws. (McAfee, 2020) If you are part of an organization that has a significant footprint in the cloud with products like Microsoft Office 365, then it is encouraged to use methods against password spraying practices such as enforcing password complexity, authentication failure lockouts, and Multi-factor Authentication. It is noteworthy to mention that IP addresses can be shared as threat intelligence feeds for blocklists and can be analyzed with GeoIP in Wireshark.

In terms of cyber attribution, IP addresses can hint in assessing responsibility to point of origin and infrastructure (Office of the Director of National Intelligence, 2018), but these can be obfuscated with the availability of VPNs. IP addresses are a factor to consider, however, especially if an organization is being DoS’ed from a particular source.

Just like malware can display the presence of recycled code, giving away hints of another group’s involvement, threat actors can also become practitioners of old habits, such as using the same C2 servers, revealing the same approaches, objectives, whereabouts. (FireEye, 2014)

Life Expectancy of Indicators & Cyber Threat Intelligence (CTI)

All intelligence has a shelf life. Consider indicators, which also have an evolutionary and maturity model. When there is a new malware sample discovery, and its parts have are collected, identified, and shared across to where organizations are able to add a detection signature– this time is known as the immature phase. As signatures update, and fewer false positives are detected, indicators reach their maturity phase, and retire when they are ineffective. (Sanders & Smith, 2013)

As new variants of malware are found in the wild, new intelligence is necessary to combat against these threats. Security teams can fine tune signatures and gauge a signature’s performance by their effectiveness of detecting threats. But an organization’s risk can become commensurate to their reliance on that Intelligence. In other words, the more an organization relies on community-based signatures, for example, the more emphasis it places on its effectiveness. And, if an organization’s Cyber Threat Intelligence is outdated, it can skew key decisions with implications to their security posture.

U.S. Government agencies have CISA’s EINSTEIN project, which can generate around thirty thousand alerts daily. (CISA, 2020) The partnership between public and private sector has grown stronger in the past years as U.S. critical infrastructure is presented with constant threats. (CISA, 2020) Private sector companies can also find resources to update their strategic, tactical, and operational CTI. Organizations may use research reports from well-established vendors on the latest threats. For example, 2020 has seen a dramatic increase in threats that claim association with COVID-19 according to the latest report by McAfee, either in Phishing attacks, or others like the reported increase of 1902% over the last four quarters in PowerShell attacks. (McAfee) This means that the threat landscape has shifted as more companies allow their employees to work from home. If an organization placed a heavier reliance toward their security posture when their employees worked from the office behind a firewall, now they have lost that vantage point since COVID-19 begun. The presence or absence of an adequate antivirus or endpoint protection would certainly have more damaging impact. Case in point, the increase of PowerShell attacks in 2020 can be attributable to a family of Trojan named Donoff; and as we know, Trojans can be downloaded from the Internet.

Static Indicators of Compromise

Conversely to a network-based indicator of compromise are those changes that take effect or present themselves on the infected host machine. Again, let’s take a look at our LokiBot example, we know that a very common attack vector for LokiBot is Phishing; by using a technique described in MITRE’s ATT&CK in which a user is tricked into executing a malicious file. (CISA, 2020) This means that our user may have downloaded the Trojan file, which then installed itself in the file system. We know that malware tends to hide itself in hidden or temporary folders in Windows such as %APPDATA%, and while there are shared repositories of LokiBot hashed executable file, its detection is not assured because of how frequently malware is recycled by different group campaigns. We also know that as a password stealer, LokiBot, is known to implement a user-space Windows keylogger by hooking Windows API in the SetWindowsHookEx function, and it may include a DLL file that also hijacks other processes. (Sikorski, 2012) With the combination of known indicators of compromise,

However, because we were able to identify network artifacts proprietary to a known malware, this information can provide a CIRT with the insight to eradicate the threat.

E Pluribus Unum

The technique we have used in this paper to analyze malicious network artifacts have been made in Wireshark, a protocol analyzer, with references to Network Intrusion Detection Systems, and Cyber Threat Intelligence, among others. It is a high-level, network-based approach to some of the characteristics of adverse events that is complimentary to others means of forensic identification. It is similar to the workings of deconstructing a malware sample, where one may use static analysis to understand the malware’s functionality, and even assist in creating network-based signatures like the ones found in Emerging Threats and VRT for Suricata, Snort. And, static analysis of malware samples can be largely ineffective against more sophisticated types of malware behavior. (Sikorski, 2012) In dynamic analysis, Analysts benefit from running a malware sample in a sandbox to gain a deeper understanding of how much and the extend to which a malware may affect the system, its behavior, values allocated in memory, registry, and more.

In Conclusion

We have scraped the surface of some of the techniques and tools in detection and identification of indicators of compromise. For every tool and technique we have examined to uncover the underpinnings of malicious activity, we can expect an equal effort to obfuscate those findings. We can conclusively state, however, that diversity of methodologies, techniques, tools, and skills can invariably aid in the detection and handling, forensic collection, and analysis of an incident. As we have demonstrated in this paper, threats leave digital traces of themselves in multitude of forms, and while not one attribute makes a threat whole, threats can variably demonstrate common traits. Those traits are distinct samples of digital traces they left elsewhere, and I believe we can gain confidence in arguing that cyber defenders and threat hunters alike are also improving their capabilities to remain one step ahead of the adversary.

References

Cisco Talos Intelligence Group. (n.d.). IP and Domain Reputation Center || Cisco Talos Intelligence Group – Comprehensive Threat Intelligence. Cisco Talos. Retrieved November 20, 2020, from https://talosintelligence.com/reputation_center/

Cybersecurity & Infrastructure Security Agency. (n.d.-a). Critical Infrastructure Partnerships and Information Sharing | CISA. CISA. Retrieved November 20, 2020, from https://www.cisa.gov/critical-infrastructure-partnerships-and-information-sharing

Cybersecurity & Infrastructure Security Agency. (n.d.-b). EINSTEIN | CISA. CISA. Retrieved November 20, 2020, from https://www.cisa.gov/einstein

Cybersecurity & Infrastructure Security Agency. (2020, September 22). LokiBot Malware | CISA. CISA. https://us-cert.cisa.gov/ncas/alerts/aa20-266a

Duncan, B. (2019, January 11). Wireshark Tutorial: Display Filter Expressions. Palo Alto Networks. https://unit42.paloaltonetworks.com/using-wireshark-display-filter-expressions/

FireEye. (2014). Digital Bread Crumbs. https://www.fireeye.com/content/dam/fireeye-www/global/en/current-threats/pdfs/rpt-digital-bread-crumbs.pdf

Fletcher, D., & Carbone, R. (2015, July 15). Forensic Timeline Analysis using Wireshark. SANS Digital Forensics and Incident Response. https://digital-forensics.sans.org/community/papers/gcfa/forensic-timeline-analysis-wireshark-giac-gcfa-gold-certification_9603

Greenberg, A. (2018, December 7). The Untold Story of NotPetya, the Most Devastating Cyberattack in History. Wired. https://www.wired.com/story/notpetya-cyberattack-ukraine-russia-code-crashed-the-world/

McAfee. (2020, July). McAfee Labs COVID-19 Threats Report. https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-july-2020.pdf

Microsoft. (2008, November 25). Network Connectivity Status Indicator and Resulting Internet Communication in Windows Vista. Microsoft Docs. https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-vista/cc766017(v=ws.10)?redirectedfrom=MSDN

Mozilla. (2019, November 3). MIME types (IANA media types). MDN Web Docs. https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types

National Institute of Standards and Technology, Souppaya, M., & Scarfone, K. (2013, July). NIST SP 800-83r1. Guide to Malware Incident Prevention and Handling for Desktops and Laptops. NIST. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-83r1.pdf

Office of the Director of National Intelligence. (2018, September 14). A Guide to Cyber Attribution. DNI. https://www.dni.gov/files/CTIIC/documents/ODNI_A_Guide_to_Cyber_Attribution.pdf

Pantazopoulos, R. (2017, June 27). Loki-Bot: Information Stealer, Keylogger, & More! SANS. https://www.sans.org/reading-room/whitepapers/malicious/loki-bot-information-stealer-keylogger-more-37850

Sanders, C., & Smith, J. (2013). Applied Network Security Monitoring: Collection, Detection, and Analysis (1st ed.). Syngress.

SANS, & Carbone, R. (2016, March 23). Filesystem Timestamps: What Makes Them Tick? SANS. https://digital-forensics.sans.org/community/papers/gcfa/filesystem-timestamps-tick_11322

Sikorski, M., & Honig, A. (2012). Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software (1st ed.). No Starch Press.

Symmetricom. (2009, May 19). A Hidden Security Danger: Network Timing The role of accurate timing in reducing network security risk. Uplink. http://www.uplink.no/Portals/0/NTP%20Hidden%20Security%20Danger.pdf

Trend Micro. (2018, February 5). How Hackers Recycle Top Threats. Trend Micro, Inc. https://blog.trendmicro.com/how-hackers-recycle-top-threats/

U.S. Department of Justice, Carroll, O., Song, T., & Brannon, S. (2007, August 22). Digital Forensic Analysis Methodology. U.S. Department of Justice. https://www.justice.gov/sites/default/files/criminal-ccips/legacy/2015/03/26/forensics_chart.pdf

Wireshark. (n.d.). 7.6. Time Stamps. Retrieved November 20, 2020, from https://www.wireshark.org/docs/wsug_html_chunked/ChAdvTimestamps.html