The ImSafe project started in 99 as part of my thesis at University of Liege to obtain my degree in Electrical Engineering. The goal of the project was to create a tool implementing a few of the last ideas in Intrusion Detection. There
are a lot of papers out there (see the references) about anomaly detection but really few open-source applications implementing those ideas.
Finally the thesis didn't that much towards the implementation but focused more on the testing of the ideas and the development of new approaches. I will try to summarize the idea behind ImSafe here, but if you want to learn more, you should check the references and my thesis, which unfortunately is in french.
ImSafe is now becoming a open-source project hosted on sourceforge.net. I'm looking forward to receive feedback from the community on this procuct, to get developers, to join other IDS projects....
2. Host-based anomaly detection
The field of Intrusion detection is a really challenging field with so many people focusing on so many different approaches. The market products are mainly focusing on the network aspect of it, You can find a lot of Network scanner out there doing simple pattern matching or using more evolved detection mechanisms to spot attacks on your network.
Host-based intrusion detection is well known as a virus detection tool, but rarely more than that. You can find products detecting file changes on your system (eg: tripwire) or analysing the logs of your NT systems to detect anomalies, etc...
ImSafe is a new tool in this category, known as Host-based IDS but the first one to trully implement pure Anomaly Detection at the process level. The idea behind anomaly detection is that you don't know what an attacker may do to corrupt your system, but you know how your system is supposed to behave in a normal situation. An anomaly detector is simply comparing the actual state of the system with its own knowledge of how the system should behave. You know understand that you need a learning stage in the process.
3. Ok, but how does ImSafe work ??
ImSafe is monitoring specific applications on your system, those applications that are potential targets to crackers (eg: your ftp server). ImSafe is tracing the system calls of those processes and try to predict the next system call with a certain probability. The idea is that after a good learning phase, it should be possible to do "not too bad" prediction of the next calll, given the past. If the predictions fail to be correct, then an alarm is raised.
3.1 What do we monitor ?
We monitor the sequence of system calls made by a process, for exemple:
...setuid, open, read, mmap, open, close,...
In reality we do just monitor the system call number whis is faster.
3.2 How do we process the data ?
The audit trail of calls is then cut in fixed-length sequences by w sliding window mechanism. For exemple the above will give us:
Setuid, open, read, mmap
Open, read, mmap, open
Read, mmap, open, close
We store those sequences in a data-base using a tree data structure. So each path from the root to a leaf represents a sequence that has been see. More over we do add a label at the leaf which represents the probability of occurrence of this call (the one at the leaf) given that the past calls have been seen.
You know see that if we use an "infinte" sequence length, each training exemple will be stored completely in the database. Thus each time we will monitor a process, we will be able to predict with 100% accuracy the next call if the sequence is in the DB, 0% otherwise... That's of course not what we want !
So we use smaller sequences length (6 is the default value).
3.3 Does it work ?
Yea !After a good learning phase, we have a fine profile of the application and, in fact, a small data base containing this profile (about 1000 leafs for Wu-ftpd). At that point we can easily monitor the application without having too much false positives and detect intrusions (buffer overflows, brute force on passwords, dos,...).
An usual ftp-login on wu-ftpd
A buffer overflow on wu-ftpd 2.5
3.4 What is the problem ?
False positives... We still have a high volume of false positive, it means that we detect attacks when we should not. One execution on one hundred gives us an alarm , and that is HUGE !
But we can lower that level by modifying a few parameters, the only problem is that the system is then vulnerable to certain, well crafted attacks. There is a solution: to rely on other mechanisms to detect those attacks, and that's what we are doing with the FBOd system.
4. The Fast Buffer Overflow Detection Heuristic
Monitoring system calls is a good idea, but we can monitor other variables to detect potential attacks. For example, when we
receive a system call, we can easily get the value of the EIP (Program Counter) from where the system call was issued.
One should expect those value to stay in a certain range which is where the code segment is in memory. Look at this trace:
As you may see, as soon as the shellcode of the buffer overflow starts being executed, the ESP stops growing. This
is evident, we ARE IN the stack, since thiso ne is a stack overflow, and the system calls originate from the stack.
Our system is simply assuming that this kind of behavior is far from normal, especially if the system call is an execve()...
It helps us detect faster that we have a buffer overflow, so we can rise our thresholds in the main system and lower the rate of false positive, still detecting
This feature is really new and still work in progress, lots of tests need to be done to validate it.
6. To learn more
The basic idea (system calls tracing) is not mine and has been floating around in a lot of papers. So I really encourage you to read those papers to learn more about the idea. If you want to learn more about ImSafe, you can read the thesis (in french) and the Frequently Asked Questions
I will update this document as time goes by... please send me your questions and comments.