Sameh Attia: Review: Graylog delivers open source log management for the dedicated do-it-yourselfer

http://www.networkworld.com/article/3001358/opensource-subnet/review-graylog-delivers-open-source-log-management-for-the-dedicated-do-it-yourselfer.html

In most big security breaches, there’s a familiar thread: something funny was going on, but no one noticed. The information was in the logs, but no one was looking for it. Logs from the hundreds or thousands of network devices are the secret sauce to problem solving, security alerting, and performance and capacity management. Gathering logs together, analyzing them, reporting, and alerting on them is a basic part of good IT practice.

Graylog is an open-source log management tool, complete with a three-tier architecture, super-scalable storage (based on Elasticsearch), an easy-to-use web interface, and a powerful toolkit to parse messages, build ad-hoc dashboards, and set alerts on logs. It sounds great—and our testing shows that the functionality provided is solid and reliable, with one caveat: you have to be willing to do a lot of work yourself.
If you regard writing regular expressions as a fun afternoon’s work, if you have a fairly limited set of homogeneous network devices and log sources that you understand pretty well, and if you don’t need reporting or correlation, then Graylog is a great choice at an excellent price. However, Graylog has significant limitations compared to commercial products. The money you save in not paying for a commercial log management tool (such as Splunk), you may eat up in your own time investment to customize and adapt Graylog to your environment.

Easy to install and get started

We started by downloading the pre-built VM based on Ubuntu v14 (“Trusty”) provided by the Graylog team. Graylog has worked hard to make installation easy. Our testing focused on Graylog for network and security monitoring, which means that most of the log messages we had to feed it came either via SYSLOG or were in the Windows Event Log. Since we had an existing SYSLOG receiver on a dedicated IP address, we were able to swap Graylog in quickly, and we were up and receiving SYSLOG messages within an hour.
The pre-built VMs are nearly suitable for production deployment, and even support scalability features (such as separating out front-end from back-end services). The Graylog VM does not include a system management control panel — it’s not an appliance, but a handily pre-installed system — so maintaining the underlying operating system is your command line responsibility. We ran Graylog on our production VMware cluster for several months, and went through two upgrades of the software, all without problems or data loss.
If you need more control, you also have the option to install Graylog’s components yourself onto popular Linux distributions (Ubuntu, Debian, and CentOS) or use common DevOps orchestration tools (Puppet and Chef are supported, as well as Ansible and Vagrant) to automate putting Graylog on your own supported Linux platform. Graylog does not run on Windows.
Although the Unix command line is needed for system management, most Graylog operations are handled through the web-based GUI. Graylog’s GUI has a modern feel, and we found the design of the GUI intuitive and easy-to-learn. The Graylog team has taken care to make navigation easy and makes full use of the power of the browser — without requiring add-ons such as Java or Adobe Flash.

+ ALSO ON NETWORK WORLD 7 communities driving open source development +

Evaluating Graylog: Starting with data collection

To evaluate Graylog, we considered the entire cycle of log management and came up with six main areas to test: data collection and storage, parsing and normalization, searching, reporting, correlation and analysis of data, and alerting. Although Graylog has the potential for high-performance scalability through its choice of database and its multi-tier architecture, we did not test for performance.
Before any log management tool can be useful, you have to get your logs into it. Graylog has the obligatory SYSLOG receiver (both TCP and UDP) as well as an agent-based Windows Event Log collector that we focused on in our testing. The Graylog team has defined their own structured logging format called “GELF” (Graylog Extended Log Format), which it offers as a way for applications (and other log collection tools) to pre-parse their log data before sending it off to Graylog. Graylog also includes an HTTP-based interface for direct submission of log data.
Graylog has a particularly strong set of log collection tools for the development community, with documented support for pulling logs from Ruby on Rails, Heroku (an application deployment environment), and using JSON. From our point of view as network and security managers, these were not particularly interesting, but Graylog is not aimed only at network and security log management — as a general tool, it can also meet the needs of operations teams looking to monitor complex interconnected application environments.
For other common formats, such as logs being stored in databases, flat text files, non-agent-based Windows, or anything else you might want, Graylog relies on the open source world. For example, if you don’t want to install a Graylog agent on all of your Windows servers, you’ve got to build an agentless solution yourself. To facilitate the linkage between contributed software and the core product, the company has created a “Graylog Marketplace”. That’s great if you find what you want, but if you don’t, then you have to write it, adapt something else, or assemble parts yourself.

For example, in our testing, we needed to link Graylog to our production Sophos Enterprise Console to monitor anti-malware events and other issues. Sophos writes to a database, but Graylog can’t read from that. So we ended up installing an additional product from Sophos to extract messages and write them to text files, bringing up a SMB file share to get the files to the Graylog world, and then using another open source tool, Logstash, to translate the Sophos format to GELF format and send it over to Graylog. Did it work? Yes. Did we have to learn multiple tools and build our own pattern recognizers to make it all work? Yes — a lot of tools and a lot of work.
Of course, watching what other people in the community have done with Graylog can give you new ideas of ways to collect and monitor data. For example, one of the contributed tools does HTTP polling and collects statistics, turning these polls into log messages, which can then be displayed or analyzed using Graylog’s dashboard tools, making Graylog even more useful.
As with most open source products, if the path you are following is one that multiple people have followed before, life is easy with Graylog. You’ll find reliable tools, good documentation, and some support. But if you want to collect logs in many different ways, Graylog gives you the framework to solve your problem, but leaves the details — and the long-term support — up to you.

Parsing and normalization

One of the most important features of modern log management tools is the ability to parse and normalize log messages before storing them. Without good parsing, the log management tool has little benefit, because the log messages have no real meaning. If you can’t tell, for example, source IP addresses from destination IP addresses, or network port numbers from disk error counts, then you haven’t moved out of the 1990s in log management.
Graylog has two main ways of handling the parsing problem. The first, and obviously their favorite, is GELF, their own structured log format. When messages are sent in using GELF format, the data are pre-parsed, and Graylog can store them in its databases quickly and with high accuracy. Unfortunately, Graylog doesn’t support CEF (the almost-identical structured log format pushed by ArcSight nearly 10 years ago, adopted by many network and security vendors), so enterprises that normalize their logs into CEF format hoping for a drop-in replacement are out of luck.
The second way to handle log parsing is to do it yourself as the messages enter Graylog, using what Graylog calls “Extractors.” These are essentially regular expressions you write (or download from the Graylog Marketplace if your device is supported) that parse out messages. While these extractors are not hard to write, Graylog’s facilities for controlling the extractors need work. For example, you can’t have different extractors called into play depending on who is sending the message, which means that — for all practical purposes — each different type of device in your network has to send to a different Graylog input process (either on a different IP address or a different port number). This adds to the complexity of deployment.
Overall, Graylog’s parsing is very generic, and this results in one of the main weaknesses of Graylog: the lack of any sort of data dictionary or information schema. When deploying Graylog, the network manager is thrust into the role of developing an information framework for reading their own log data and making use of it. Although the IT industry has decades of experience with this, and over a dozen SEIM products have hit the marketplace, Graylog doesn’t include any of that accumulated knowledge. The network manager has to decide everything from which fields to capture, to what names to use, to where DNS lookups should occur, to which messages are important and which ones are not.
Some network managers may find this a fascinating exercise and dive deep in the time-consuming task of trying to understand and categorize every element of every log they get from every device. But for many others, this is less attractive. Part of the value of a log management tool is some semantic knowledge of what logs mean and how they should be interpreted. Graylog doesn’t provide that, and the Graylog Marketplace doesn’t help — as there is no effort made to keep the various extractors and GELF tools in synchronization with each other.
Normalization is another area where Graylog leaves you almost entirely on your own. The one type of normalization built-in is date formats, where Graylog has a well-designed converter that helps to parse date formats between different log systems into a single consistent timestamp.
But for all other normalization, Graylog depends on another open source tool, Drools, a business rules management system. Graylog borrows the Drools Expert business rules engine, which can be used to further parse and normalize messages.
Has Graylog brought together the pieces that are needed to build a good parsing and normalization system? Yes, definitely. Has your typical network manager been cast adrift in a time-consuming sea of confusion compared to other log management tools? Equally true. Graylog brings a completely blank slate to the table. From our point of view, too blank. There must be a way to build the accumulated best practices of network and security log management into the tool without turning this into a do-it-yourself nightmare, but Graylog has not found it.

Searching, reporting, and analysis

Once you’ve got your messages into Graylog, searching and reporting are the main ways to get everything out. At this time, Graylog does not have an internal reporting interface. You’re welcome to write your own reports — more of the do-it-yourself style of the product --- using the documented APIs.
We didn’t test performance of Graylog for searching, but in our test system with gigabytes of messages going back several months, most queries returned results immediately. Even when we used large time windows, answers were nearly instantaneous.
Searching using the GUI is quick and intuitive. You select a time window (such as “in the last hour”, or a more specific absolute time if you want), then just type what you’re looking for, including wildcards. To make use of the parsed fields in a message, you simply specify the field and then an operator. Graylog includes the normal ones, such as “is equal,” “contains,” or “is between,” as well as relational operators such as AND, OR, and NOT, but also some more esoteric ones, such as fuzzy proximity searches. For example, “network world”~3 matches messages which have “network” and “world” in any order, within 3 words of each other.
The Graylog GUI is advanced, but could use some work to match similar tools. Doing a drill-down into results by further refining a search query is easy, but requires you to go back to the keyboard and type in field names and value, rather than simply clicking on a value to add it to the query.
When a search returns results, Graylog immediately offers up a timeline histogram, which can be helpful in identifying patterns or finding when an event occurred, as well as the option to save a query or move results to a dashboard.
We used Graylog multiple times to debug problems in our network, track email messages, and investigate security events. Each time we were able to find the information we needed using the search language, and having all our logs together in a single place saved a lot of time in tracking problems that spanned multiple systems.

Dashboards are the main analysis tools for Graylog and are an amazing tool for getting visibility into what is happening across many messages. Dashboards are designed to aggregate data. For example, after sending our email security gateway logs to Graylog, we could build a dashboard that included a strip graph showing incoming levels of spam, viruses, and other threats.

Graylog’s dashboards are built up graphically, which makes them easy to assemble, and you can quickly start by exporting a search over to a dashboard. Each dashboard is composed of widgets that compute values or display graphs, such as “count of results returned,” “number of unique IP addresses seen,” “average response time to HTTP request.”
Graylog security is based on users and groups (“roles” in Graylog). In our testing, we linked this to our Active Directory for authentication and group mapping. Dashboards are fully integrated into this security model, so a group such as “executives” could have access to certain dashboards, but not others.
While we found Graylog’s reporting lacking and the searching good, the dashboards are really a step up in the world of log management and make the product stand out as a leader in this area.

Correlation and alerting

Alerting and correlation are real-time activities: looking at message flows, comparing them to business rules, and then causing some action to happen. For example, one login failure in an hour is not very interesting, but a thousand login failures an hour is interesting, and only by looking at all the message flows can you differentiate between these.
Alerting in Graylog begins with a feature called streams. A stream is a type of saved search that runs continuously as log messages flow into the system. Streams can be fed into dashboards, sent over to other log management systems, or simply be used as convenient pre-packaged content for different types of analysis. Access to streams is also defined by the Graylog users and groups system. For example, you could allow application developers to see messages only from development and QA systems by creating the appropriate stream, while hiding those messages from system operators (who might be confused).
We found that building streams made us re-think some of the data model we used in the collection, parsing and normalization parts of our testing. For example, when we wanted to count alerts across different brands of firewall and IDS in our network, we had to be very careful to normalize the messages in the same way so that we had a minimum of false positives, yet caught every event that was interesting. The results were valuable, but only once we put in the time to really understand and properly categorize and normalize different types of messages.
Once messages are filtered into a stream, Graylog allows for alerting to occur based on simple conditions: number of messages in a time period that match the stream, or a value in a message that passes a threshold or matches a particular string. Built-in alerting supports email and HTTP posts, but you can write your own or find something interesting in the Marketplace. For example, we downloaded a tool from the Marketplace that linked a Graylog stream to our Nagios network monitoring system, sending an alert into Nagios when a stream alert was triggered for over-temperature conditions in our servers.
Correlation is not a current feature in Graylog: you can’t correlate across events, unless you write your own tools to do so and link them in with Graylog’s API. Graylog also doesn’t have the easy ability to connect to other databases as messages fly in, such as linking to an asset inventory or configuration management database. This makes common correlation use cases more difficult, such as differentiating alerts between critical and test systems.

Is it right for me?

After using Graylog in production for several months, we found a solid product with excellent performance and an easy-to-use GUI. Although we found a few areas where the product could be improved, overall we think that Graylog is enterprise ready.

However, being enterprise-ready doesn’t mean that it’s ready to deploy, and we found that the do-it-yourself nature of the product requires a significant investment in time before the value of Graylog above a simple store-and-search tool is realized.
In some cases, the missing pieces represent major weaknesses. Even if Graylog is bulletproof and enterprise-ready, not every plug-in, content pack, and partner tool we had to add on was at the same level of quality and reliability. Thus, you could end up with a solid Graylog installation surrounded by a rat’s nest of other tools and products that have lower uptime and higher support costs.
With that being said, network managers who have the time and energy to invest in heavy customization of their log management systems may find Graylog an attractive option. This is doubly true for network managers who have been working with existing commercial systems and who can re-use some of the data dictionary, log parsing, and normalization techniques from their existing systems.
If you’ve never had a log management system, it is going to be difficult to make an intelligent choice of whether Graylog is right for you, and even more difficult to make effective use of its features. But for network managers frustrated with their current tools, Graylog may be a great escape valve. In our testing and use of other tools, we have found that changing the behavior of other commercial tools can be next to impossible, error prone, or very unsupported. With Graylog, it’s easy to make these changes. Network managers with very diverse equipment vendors may find that it’s easier to migrate to Graylog than try and retrofit a commercial package to their needs. The tradeoff is a significant one, and will be a big factor in deciding whether Graylog is right for you.

Saturday, November 21, 2015

Review: Graylog delivers open source log management for the dedicated do-it-yourselfer

Easy to install and get started

Evaluating Graylog: Starting with data collection

Parsing and normalization

Searching, reporting, and analysis

1 comment:

Sameh Attia

Followers

About Me

Blog Archive