Splunk 5.0's Report Accelerator, better than Summary Indexing?
Over the last 3 years, I've worked with nearly 100 clients as a Splunk Professional Services Consultant across all sectors of business and have seen Splunk grow first-hand from the back office system admin's life saver, to the Enterprise Big Data Engine that it is today. Splunk's latest 5.0.1 release is nothing short of amazing. They've done the impossible, again - and they did it with their renowned Splunk personality. It broadens the meaning of "All Bat Belt, No Tights". Among the awesome features in 5.0 like data replication and improved SDK's, is the Report Accelerator. Lately the questions that I get asked most often are about the difference between Report Acceleration and Summary Indexing. Is one just as good as the other? Are there specific scenarios where it would be bad to use Report Acceleration? Are there requirements for using it? Can I accelerate everything?
All great questions.
In order to use Report Acceleration, your search must include a a streaming command (search, rex, bin, lookup) followed by a reporting command (chart, stats, top, rare, timechart). There are numerous other streaming and reporting commands. Basically, it needs to be in a table with cell values in order to perform statistical computations on it, and you can't use it on "every" search you have for the same reason. So far this is the same way that you would need to use a Summary Index. So what makes it different?
FASTER. Report Acceleration is faster than Summary Indexing. It is faster by design, due to the way the acceleration summaries are stored. They are stored in indexes at the bucket level. This is crucial because the delay in data arrival is no longer missing as it would be with a Summary Index, nor does it create a bottleneck since the that "late data" gets stored in a hot bucket and the accelerations summaries can span both hot AND warm buckets simultaneously. To accomplish the same effect in a Summary Index, the late-arriving-data would force a full rebuild of the summary index, which leaves the request that initiated the spawn of queries sitting idle while the summary index rebuilds. In my opinion, this aspect of the Report Acceleration is simply brilliant.
AUTO-BACKFILL. This was just never possible out of the box with Summary Indexing. With they way most customers use Summary Indexing, you're basically up a creek without a paddle if you encounter any kind of data interruption what so every. Just short of taking a crash course in Python to script up a method of backfilling your data, most customers decide the time-to-implement that strategy would far out weigh the attainable value in writing it, and of course having to write it when you have data interruption is the last thing anyone wants to do. Well, problem solved - use Report Acceleration. By design, acceleration summaries auto-update and auto-rebuild.
Other than Report Acceleration being faster and having these very cool features, it will never entirely replace the need for Summary Indexing. Everything that you can do with a Summary Index, you can also do in Report Acceleration, except query a summary index or report on non-streaming commands. You can't use Report Acceleration on summary index because Acceleration summaries are stored within the queried index at the bucket level and span across hot and warm buckets simultaneously, which is not possible in the structure of a summary index.
So all things considered, my default answer to any customer that asks this of me is that there isn't really a "pro" or a "con" for using one or the other. It's more use case based. The general rule is that Report Acceleration is good for just about everything as long as there are about 50k+ events, and you're not trying to report on a nonstreamable set of data.
I hope this has helped clear things up a little about the latest version of Splunk.
Happy holidays!
- Log in to post comments