Unleashing Intelligence: Transforming NetFlow and Flow Log Data into Actionable Intelligence
Tom Dixon, Senior Field Engineer
In our modern digital landscape, organizations are drowning in an ocean of data. This is because raw data in isolation lacks meaning and utility, and actually acts to add more hay to the proverbial haystack making the needle even harder to find.
But… and to the main point of this blog post… by leveraging the power of context and employing advanced analytical techniques, we can start to extract valuable information and, ultimately, turn that ocean into actionable intelligence by following a process of turning data into Information, and then turning that Information into Intelligence.
- Data is raw facts. Numbers, values, metrics etc.
- Information is generated by combining data and presenting it in a way to answer questions.
- Intelligence is the combination of information and context that allows us to make informed decisions.
For example:
419307235394 1688685620 1688690060 ACCEPT use1-az3 us-east-1 172.31.10.126 813 83.249.123.141 56376 10798456351 7254123 egress i-0de814a9e4cc66495 eni-00e1cef132ce581b0 OK 16 – us-east-1 – – subnet-0f0ab6b9e12aefb52 19 – IPv4 5 vpc-02554f4d12fd1d472
This is the type of raw data we get from NetFlow and VPC and NSG Flow Logs.
However, as is typical with data in raw form, it is difficult to comprehend and lacks the actionable insights gleaned from intelligence. To unlock the potential, we need to process and enrich the data with relevant context to make Information and complete the transformation of data into intelligence.
Examples of this might be how much data is exfiltrating my Azure Production environment? or how many hosts are communicating with New Zealand? or even how much SSH traffic is in my environment? By enriching the original flow data and creating a common dataset to operate from, suddenly those raw IPs and Ports take meaning and we can transform the original example above to:
A device inside of my network communicated with an external destination over port 813 and more than 10Gbs of data was transferred outbound
Finally, we can now garner real intelligence by combining first party context that helps us make decisions and take actions. This context allows me to understand what device an IP address actually is and by virtue, what it should and should not be doing.
My AWS hosted webserver in US-East-1 that is vulnerable to CVE:xxxxx, communicated with a unique destination that is ‘known hostile’ over ports designated for my custom application resulting in over 10Gbs of data being transferred outbound. It did this because vulnerability ‘xyz’ was exploited and the data was transferred because it was assigned a security group of “temp-all-access”.
Context breathes life into raw data, providing the necessary background and understanding for analysis. By integrating additional data sources such as threat intelligence feeds, asset inventories, geolocation data, and user information, we can contextualize the network traffic events. This process adds valuable dimensions, enabling us to interpret the data in a more meaningful way.
Data enrichment involves augmenting the raw NetFlow, VPC and NSG Flow Log data with contextual information. This enrichment process can be achieved through various methods, such as:
- IP Reputation Analysis: Leveraging threat intelligence feeds to identify known malicious IP addresses and flagging potentially suspicious traffic.
- GeoIP Mapping: Determining the geographic origin of network traffic by mapping IP addresses to physical locations. This information can help identify unusual traffic patterns or potential threats.
- User Context: Associating network traffic events with specific users or devices to understand their behavior and detect anomalies.
- Integration with Asset Inventories: Linking network traffic data with asset inventory databases allows for improved identification of assets involved in network communications.
Once the data has been enriched and contextualized, it becomes information that can be analyzed. Analytical techniques such as statistical analysis, anomaly detection, machine learning, and data visualization can be applied to extract valuable insights. Some common objectives of analysis include:
- Visibility on Premise and In the Cloud: Understanding the normal behavior of network traffic to identify deviations and potential security incidents.
- Compromise and Threat Detection: Analyzing patterns and indicators of compromise to detect and respond to cyber threats in real-time.
- Performance Optimization: Identifying network bottlenecks, optimizing resource allocation, and improving overall network performance.
- Governance: Leveraging the enriched data to ensure compliance with regulatory requirements and conducting audits when necessary.
By combining the enriched information with expert knowledge, experience, and domain-specific insights, we can transform it into actionable intelligence. Intelligence empowers organizations to make informed decisions, proactively mitigate risks, detect emerging threats, and optimize network operations. This intelligence can be disseminated through alerts, reports, dashboards, and other channels, enabling stakeholders to act swiftly and effectively.