Graph Data Analytics for Cybersecurity

The ever-increasing number of significant security incidents has led to an emerging interest in combating cyber threats. To enable cooperation in preventing attacks, it is necessary to have structured and standardized formats to describe an incident. Currently used formats are complex and extensive as they are designed for automated processing. This hampers readability and prevents humans from understanding the documented incident.

Cyber security experts have a challenging job. They analyze vast datasets to find security holes, track anomalies, and patch them. Reacting quickly against an attack is vital. Structured threat intelligence is crucial to experts as it enables them to understand attacks. But, this is only possible when experts can read and analyze this information. Experts need to edit it to include any additional or missing information easily.

Cybercrime is booming

Organizations like LinkedIn, Sony, the CIA, and NASDAQ have been hacked in the last few years. It has resulted in millions of dollars in lost revenues, private information exposure, and downtime.

These attacks show no sign of stopping. Criminals are well aware of the value of information. Today, there is a black market where Zero-Day exploits, an attack method that exploits a previously unknown security breach, can be sold. The best hackers can sell their discoveries to the highest bidder. The market is booming, stimulated by governments looking to arm themselves.

Cyber security teams are under increasing pressure. They can rely on hordes of data to defend their organizations. Standard monitoring systems can generate terabytes of data, but can they be used to thwart attacks?

Graph technologies can tackle big data

Large enterprises generate an estimated 10- 100 billion events each day. Security teams have a data pool from IP logs, communication or server logs, and network logs. This volume is challenging for traditional security information and event management (SIEM) tools. They are designed to analyze system events, records, and network flows for intrusion detection but cannot handle big data.

Volume is not the most significant issue either. Security data is often unstructured and vast from incomplete, heterogeneous data sources. Unstructured data in tabular oriented tools can work but at a price :

  • complexity in structuring and querying the data ;

  • poor performances when querying the data

  • difficulty to integrate new sources ;

Graph databases like InfiniteGraph and Neo4j make it easy to store and query unstructured data, even as the volume grows.

What does a cyber attack look like?

Let’s use a concrete example to understand why graph technologies help cyber security. Cisco has published a blog post detailing how its graph analytics capability can protect customers against zero-day exploits, previously undiscovered security flaws in software. Hackers can use the flaw to attack systems between the time it is discovered, and the software is patched.

For example, the domains used in the phishing attack are linked to a couple of entities :

  • a registrar: a commercial entity or organization that manages the reservation of Internet domain names

  • a name server: a software server or computer hardware that implements a network service to respond to queries by turning a domain name into an IP address ;

  • an IP address: a numerical label assigned to each device in a computer network that uses the Internet Protocol for communication ;

The IP address is unique, but servers and registrars can link the domain names to other domain names. A graph model is ideal for representing these entities and their connections.

Why is graph visualization so important?

Visualizing data as a graph is both generic and a very intuitive notion of relationships. It lets us describe things in terms that everyone is familiar with. This allows everyone to understand data relationships and decide how future data should be pursued. For example, we have network-level data such as DNS records, IP addresses, domains, etc. When we populate that data into a graph model, we see the relationships, and everyone can communicate very clearly about what they see.


A graph model for network data

What does graph visualization do?

  • Predict: Visualize cyber threat intelligence

Cyber analysts need to identify which threats exist and how they could impact the organization. There’s no shortage of available intelligence. The challenge is to make sense of it and share the insight. This is where graph and timeline visualization is crucial.

Threat intelligence is densely connected. Visualizing these connections as a graph uncovers the patterns, anomalies, and outliers in a way that reveals your threat landscape.

Analysts need to see cause and effect. Combining graph visualization and timeline views reveals how and why cyber threats happen and their impact on the network.

  • Monitor: build smarter SOCs

Billions of alerts are raised across networks every day. Security Operations Centers (SOCs) make sense of these alerts. These central hubs provide dashboards and visual interfaces tracking activities to help analysts respond and monitor real-time.

Interactive timeline and graph visualization functionality are essential components of an effective SOC, providing an intuitive, insightful and fast view of the data.

Using interactive graph visualization, we can see events unfold in the network at a glance, powering cyber threat analysis processes.

Using Key Data Graphs to Build Intelligent Threat Analytics Solutions

Access to multidimensional cybersecurity big data brings new opportunities for us to handle cyber threats. Intelligent threat analytics prioritizes creating the following data graphs:

  • Environmental data graph: includes information about vulnerabilities in assets, IT system architecture, and files

  • Behavioral data graph: includes file analysis logs, network-side detection alerts, device-side detection alerts, application logs, sandbox logs, and honeypot logs

  • Intelligence data graph: includes threat intelligence collected from diverse external sources

  • Knowledge data graph: includes various knowledge bases such as ATT&CK, CAPEC, and CWE

Although independent, the four graphs are related via entities of a specified type, thus achieving global linkage while ensuring clear data representation.

(1) Environmental data graph

“Environment” indicates various types of entities within the protected cyberspace and properties as well as connections between them.

Creating an environmental data graph requires risk assessment tools/services, vulnerability management, asset management, and business data like IT system architecture, human resources, and enterprise to diversify and link environmental entities.

The “dark assets” not put under management and assets exposed on the Internet contribute to the attack surface. It is imperative to identify critical relationships and entities that matter to security to address pervasive threats. It is necessary to assess how intensively and extensively the threat might impact the network or system. This is the only way to detect the attack surface accurately.

(2) Behavioral data graph

“Behavior” is the actions of entities that can be detected and collected in the protected cyberspace. A combination of UEBA and SIEM can efficiently address the requirements for collecting behavioral data.

This process should be inclusive while ensuring systematism and normalization. The significant difference between this graph and the other graphs is that the behavior graph has a higher frequency of additions and updates and is effective over shorter durations.

To maximize efficiency, it is crucial to manage data throughout the lifecycle, design interactive capabilities between behavior and environments/knowledge, properly link entities, and construct the ontology model of behavioral data.

(3) Intelligence data graph

Threat intelligence improves decision-making on security incidents by providing better context. Presently, it has become an essential strategic resource applied in various fields, including attack attribution security operations, threat analytics, risk assessment, and situational awareness.

Different threat intelligence providers may interpret threat intelligence from different perspectives. A useful intelligence data graph should be time-sensitive, accurate, and inclusive. The key to enhancing its efficiency is to select scenario-specific threat intelligence sources to draw a special-purpose data graph.

(4) Knowledge data graph

Knowledge data graphs provide correlative knowledge of threats in specific environments. This enables security devices to generate alerts on potential dangers, assess the depth and extent of a threat’s impact, and recommend appropriate countermeasures. Incident analytics enabled by knowledge graphs can magnify the context of associated entities in the environment, intelligence, and behavior graphs while being actionable, reusable, and interpretable leading to automated analytics. We use open-source project data to build knowledge bases.


Knowledge Data Graph

Conclusion

Efficient security solutions require a unified, highly automated toolchain and platform to receive and feed multi-source and extensive heterogeneous data. This enables security devices to promptly detect, track and respond to threats and help people conduct security operations, mitigation, and research. A scalable and usable data graph needs infrastructure support for data storage and processing. It should also ensure data interactions and associations within and across different graphs.

This requires systematic design and optimization of the oncology database, including entity attributes, relationships, and types. Additionally, it requires a scalable and unified standard and language (such as STIX or MAEC) to describe instances at the data layer of the graph architecture.

Finally, we need industry standards for intelligent security around technology, data, regulations, and architecture.