Problem Space

Radia is an immersive visualization tool built to represent the complex data of call graphs. Call graphs are visual representations, generated from binary output, and seen when reverse engineering code. Reverse engineering (RE) is “the process of discovering undocumented internal principles of a piece of code” to “1. enable legacy code to continue to function, 2. identify critical vulnerabilities, 3. understand the true purpose and actions of code” [2].

The problem with current visualization tools is that their format is too limited to express complex data effectively, especially in the commonly used 2D graphs. One of the RE industry’s most used disassembler and debugger tools, IDA Pro, recognizes in its graphing tutorial that the visualization “is mainly interesting for small programs, because the graph quickly becomes extremely complex in the case of a program containing a large number of functions” [3]. When the tool’s documentation acknowledges the limited functionality of its graphing tool, it truly is indicative of a problem.

Current visualization software places an emphasis on detailed information and labeling to link the data with code blocks. Navigating a graph remains simplistic; it typically uses limited scrolling, selection, and zooming. In short, current graphing tools lack the ability to paint a clear picture of the overall code landscape, and integrate visualization effectively into RE workflows.

The audience for these particular tools are reverse engineers, security researchers, malware analysts and other types of code auditors. In the past, engineers and programmers have been the people to create these visualization solutions; approaching this problem from an interaction design perspective, over an engineering one, allows a shift in focus from the technical details of the code to the usability and relevance of the graphs. Addressing these design issues includes developing the appropriate visual metaphors, creating better interactivity, and improving the ability to parse information.

Within this context, this research project further looks into the potential use of virtual reality (VR) in visualization tools. The research question I am posing is: can immersive VR visualizations of code structures help reverse engineers’ understanding?

Background

The goal of this project is to build an effective visualization tool, with a user interface (UI) that illustrates potential known problem functions within the code. It will encompass designing and implementing the appropriate UI techniques for travel, navigation, and selection systems within a three dimensional environment.

The use of VR over other methodologies can help reduce distractions while making the code structures more tangible to the user. As the user floats through the code, and as the nodes and connections fill their visual landscape, it becomes more relatable to them both physically and mentally. Current research on immersive visualization has shown correlations to its effectiveness for certain tasks. For example, the benefits of completing the same collection tasks set in an egocentric first person immersive virtual environment (VE), versus an exocentric monitor view, showed that the error rate was lower in VE than on the monitor [1]. This has been shown to be highly effective when looking at interconnected data that contains node and edge structures like network topologies and computer code relations.

figure 1. Radia is a visualization tool for discovering logical relationships within binary code.
figure 1. Radia is a visualization tool for discovering logical relationships within binary code.

 

Design Considerations

Radia aims to create an interactive and immersive environment that visualizes the symbolic relationships between code and augments the task of reverse engineering binaries. A successful solution will allow integration into the user’s RE workflow to reduce discovery time for inputs and outputs, complex parsing routines, calls to unsafe functions, and insight into the typical run-time flow of the application. Navigation and manipulation of the data in the VE should happen in an intuitive and naturalistic way. Lastly, the visual interface of Radia should transform all code markings, interconnections and potential problems in an elegant and meaningful representation.

The visual design of the Radia call graphs uses a colourful set of geometric shapes and coloured directional paths to illustrate specific visual elements in the code and reveal their nature.

This solution breaks the different conceptual levels of the graph into areas with context sensitive controls to alleviate the visual complexity. This, in turn, decreases the cognitive load on the user, so they can focus their attention on discovery as they travel through the graph. Travel and data manipulation is mapped to a keyboard, using the WASD keys paradigm of video games to move through the space.

Manipulation of the code clusters and specific nodes in an unobtrusive UI will show potential problem areas, string references, ingress and egress calls, the number of basic blocks, and allow a space to add linked notes and mark nodes for further review. This is done with context sensitive menus and a minimal overlay display to keep the visual field clear.

Conclusion

Data visualization is a difficult task to get right. However, with clear visual metaphors and interaction cues, an immersive environment can help engage the user’s spatial reasoning and visual cortex to breakdown information into useful components. Radia is, first and foremost, an information-gathering tool for illustrating interconnections. The likelihood of this interface’s success will largely be based on being naturalistic and intuitive to use, allowing users to identify potential problems in the code. In the end, the hope is to improve the available tools for auditing code to readily identify and remediate problems, and ultimately, to make code less vulnerable to exploitation by malware.

References

  • [1] Bayyari, A., and Tudoreanu, M. E. The impact of immersive virtual reality displays on the understanding of data visualization. Proc. of the ACM Symp. on Virtual Reality Software and Technology (ACM 2006). New York, USA, 2006.
  • [2] Dolan-Gavitt, Brendan, et al. Repeatable reverse engineering for the greater good with PANDA. 2003. https://mice.cs.columbia.edu/getTechreport.php?techreportID=1588&disposition=inline&format=pdf.
  • [3] Hex-Rays IDA. Graphing with IDA Pro. DataRescue, 1 (1). 2005. 4.