Cleaned up and renamed for publication.

This commit is contained in:
Elf M. Sternberg 2013-05-08 20:34:51 -07:00
parent 4d7c3d5f0a
commit bfc1791dd8
6 changed files with 491 additions and 11 deletions

View File

@ -1,13 +1,30 @@
# simple
# splunksearch
A Clojure library designed to ... well, that part is up to you.
A Clojure command line program that enables simple search access to a
splunk utility.
## Usage
FIXME
This is a bit tricky because this program relies on both the Splunk
Java SDK, which is not in Clojure or Maven repos, and the Splunk Java
SDK's Command utility class. You'll have to download the Splunk Java
SDK yourself and install the Splunk JAR file. You'll also have to
create, from the root of this project directory, resources/com/splunk,
and deposit Command.class (which can be found in the SDK's tree
somewhere) in the newly created directory.
I used lein localrepo to install the Splunk JAR file. It seems to
have worked for me.
Once all that's done, *and* you've got an instance of Splunk up at
running, *and* you've successfully configure your .splunkrc file, you
can try:
lein run 'search <your search here>'
## License
Copyright © 2013 FIXME
Copyright © 2013 Elf M. Sternberg
Distributed under the Eclipse Public License, the same as Clojure.
Distributed under the Apache Public License, under the same terms as
other Splunk software.

View File

@ -1,3 +0,0 @@
# Introduction to simple
TODO: write [great documentation](http://jacobian.org/writing/great-documentation/what-to-write/)

452
doc/splunk_search.nw Normal file
View File

@ -0,0 +1,452 @@
% -*- Mode: noweb; noweb-code-mode: clojure-mode ; noweb-doc-mode: latex-mode -*-
\documentclass{article}
\usepackage{noweb}
\usepackage[T1]{fontenc}
\usepackage{hyperref}
\begin{document}
% Generate code and documentation with:
%
% noweave -filter l2h -delay -x -html backbonestore.nw | htmltoc > backbonestore.html
% notangle -Rstore.js backbonestore.nw > store.js
% notangle -Rindex.html backbonestore.nw > index.html
\section{Introduction}
Two months ago I was complaining that I didn't have a job, and then
suddenly
\nwanchorto{http://www.elfsternberg.com/2013/03/14/gave-lightning-talk/}{I
had one}. I work for \nwanchorto{http://splunk.com}{Splunk} now, an
enterprise-level machine data analysis company. Their core product is
pretty magical: you can shove just about any logfile, real-time
stream, script-generated datasource, or anything else at it, and then
interrogate the data to understand and monitor your networks, server
farms, client connections, whatever. It does an amazing job of
correlating data through little more than keyword observation. There's
even a \nwanchorto{http://www.splunk.com/download}{free version},
which, while limited to a half-gig of data, is a good way to start off
taking apart your weblog files.
They have me niftying up the third-party web framework, the thing
large companies use to integrate the data we collect with their own
network intelligence dashboards, visualization systems, whatever.
It's all my kind of thing: Python back-end, Javascript/Backbone/jQuery
front-end stuff, lots of clever closures and event handling.
But I decided, for entertainment purposes only, to learn more about a
part of the system I know almost nothing about: the Java SDK. And it
wasn't enough to work in a language I don't know (since I don't know
Java) with SDK's I've never seen before. I had to go make it work with
a language that, to the best of my knowledge, no one has ever
demonstrated compatibility with Splunk before. I had to make it work
in \nwanchorto{http://clojure.org/}{Clojure}.
Yes, I'm a Hipster Hacker. I mean, come on, if you're not making it
difficult for you, what is education for, anyway?
\subsection{Disclaimer}
I am a newbie to all of this. Clojure, Splunk, even Java. This is
entirely raw and beginner level stuff. I did this for my own
edification. This code in no way represents the state of the art at
Splunk. It's definitely not warranted in any way by me or anyone
else, and it's licensed under the Apache Public License, V2. Use at
your own risk. Don't make me go ALL CAPS on you.
\subsection{Literate Program}
A note: this article was written with the
\nwanchorto{http://en.wikipedia.org/wiki/Literate_programming}{Literate
Programming} toolkit
\nwanchorto{http://www.cs.tufts.edu/~nr/noweb/}{Noweb}. Where you see
something that looks like \textless \textless this \textgreater \textgreater, it's a placeholder for code
described elsewhere in the document. Placeholders with an equal sign
at the end of them indicate the place where that code is defined. The
link (U-\textgreater) indicates that the code you're seeing is used later in the
document, and (\textless-U) indicates it was used earlier but is being defined
here.
\subsection{Install everything}
First, install Clojure, Leiningen, the Splunk free server, and the
\nwanchorto{http://dev.splunk.com/view/splunk-sdk-java/SP-CAAAECN}{Splunk
Java SDK}. Start up the free server and give it some data. I gave it
the dataset from earthquake.usgs.gov-- "earthquakes greater than
Richter 2.5 in magnitude over the past 30 days.") but you can give it
whatever you want.
Completing the installation is a bit tricky because this example
relies on both the Splunk Java SDK, which is not in any Clojure or
Maven repos, and the Splunk Java SDK's Command utility class, which
Splunk has provided as a helpful tool for unpacking the command line
and for understanding the Splunk command file (.splunkrc on most Linux
and Mac boxes). You'll have to download the Splunk Java SDK yourself
and install the Splunk JAR file. You'll also have to create, from the
root of this project directory, resources/com/splunk, and deposit
Command.class (which can be found in the SDK's tree somewhere) in the
newly created directory.
I used lein localrepo to install the Splunk JAR file. It seems to
have worked for me.
\subsection{Create the project}
In your workspace directory, whatever you call it, build a new project
and call it \texttt{splunk}.
\texttt{lein new splunk}
Open the new directory and create a new directory, \texttt{lib}. Find
the Splunk jar file in the SDK, and copy it into \texttt{lib}. Edit
project.clj so it looks something like this:
<<project cli rev 1>>=
(defproject splunksearch "0.1.0"
:description "SplunkSearch: An implementation of the Splunk Search example program in Clojure"
:url "https://github.com/elfsternberg/splunk-search"
:license {:name "Apache Public Licence 2.0"
:url "http://www.apache.org/licenses/LICENSE-2.0"}
:dependencies [
[org.clojure/clojure "1.4.0"]
[commons-cli "1.2"]
[gson "2.1"]
[opencsv "2.3"]
:plugins [[lein-localrepo "0.4.1"]]
:main splunksearch.core)
@
Run \texttt{lein deps} to install your dependencies.
Then, use the \texttt{localrepo} command to install the splunk jar in
your Maven repository:
\texttt{lein localrepo coords splunk-1.1.jar | xargs lein localrepo install}
Now, edit the project file to show your new dependency:
<<project cli rev 2>>=
(defproject splunksearch "0.1.0"
:description "SplunkSearch: An implementation of the Splunk Search example program in Clojure"
:url "https://github.com/elfsternberg/splunk-search"
:license {:name "Apache Public Licence 2.0"
:url "http://www.apache.org/licenses/LICENSE-2.0"}
:dependencies [
[org.clojure/clojure "1.4.0"]
[commons-cli "1.2"]
[gson "2.1"]
[opencsv "2.3"]
[splunk "1.1"]]
:plugins [[lein-localrepo "0.4.1"]]
:main splunksearch.core)
@
Now you're ready to begin. Fun, huh?
\subsection{Create a simple query}
The Splunk Java SDK has its own custom methods for dealing with
arguments from the command line, from your run commands file (all
those files in your home directory that end in ``rc'' on Linux and
sometims Mac OS), and arguments to send to the remote Splunk server.
We need two sets of arguments, one to define access to the service,
and one to define the output mode of the data we expect to get back.
First, we have to create the namespace and import everything we're
going to need:
<<create the namespace>>=
(ns splunksearch.core
(:require [clojure.java.io :refer :all])
(:import (com.splunk Service ServiceArgs Args ResultsReaderJson ResultsReaderCsv ResultsReaderXml Event))
(:import (com.splunk Command))
(:import (java.io InputStreamReader OutputStreamWriter)))
@
I pretty much followed the code in order, a list of procedures.
Remember, this document serves mostly as a stream-of-consciousness
``this is what I learned about Clojure while wrestling with Splunk''
history of my project.
The Splunk \texttt{Command} class requires a series of definitions
added, and then it attempts to parse the command line. Command
instantiates via a static factory method, so that's accessed with a
slash; the \texttt{doto} macro allows me to access the created object
repeatedly, passing methods and arguments, and guarantees the object
created is returned regardless of the return object of the last
method. Clojure args have to be coerced via \texttt{into-array} into
a Java array for Java-based args parsers to make sense of them.
<<parse command arguments>>=
(defn build-splunk-command [args]
(let [command
(doto (Command/splunk "search")
(.addRule "count" Integer
"The Maximum Number of results to return (default: 100)")
(.addRule "earliest_time" String
"Search earliest time")
(.addRule "field_list" String
"A comma-separated list of the fields to return")
(.addRule "latest_time" String
"Search latest time")
(.addRule "offset" Integer
"The first result (inclusive) from which to begin returning data. (default: 0)")
(.addRule "output" String
"Which search results to output {events, results, preview, searchlog, summary, timeline} (default: results)")
(.addRule "output_mode" String
"Search output format {csv, raw, json, xml} (default: xml)")
(.addRule "reader"
"Use ResultsReader")
(.addRule "status_buckets" Integer
"Number of status buckets to use for search (default: 0)")
(.addRule "verbose"
"Display search progress")
(.parse (into-array String args)))]
(if (not= (count (.args command)) 1)
(Command/error "Search expression required" nil))
command))
@
My example code was taken from the \texttt{Search/Program.java} file,
provided with the SDK. That program had a ton of local variables to
control search generation, stream configuration, reader and output
generation. I decided that all had to go into a simple map, which I
could then refer to at any time.
Yes, the names above are repeated here. That's a bit of a code smell,
I think.
<<build the arguments map>>=
(defn build-argument-map [command]
(let [opts (.opts command)
ruleset [["count" 100]
["earliest_time" nil ]
["reader" false]
["verbose" false]
["field_list" nil ]
["latest_time" nil ]
["offset" 0]
["output" "results"]
["output_mode" "xml"]]]
doall (into {} (for [[k v] ruleset] [k (if (.containsKey opts k) (.get opts k) v)]))))
@
Now that I have my argument map, I need to process it into Args
objects understood by the Splunk Service class. I don't know about
you, but this all feels just a bit fiddly:
<<build the queryargs object>>=
(defn build-splunk-queryargs [argument-map]
(let [rulelist ["earliest_time" "field_list" "latest_time" "status_buckets"]
qa (Args.)]
(doseq [fieldname rulelist]
(if (argument-map fieldname)
(.put args fieldname (argument-map fieldname))))
args))
@
Later, I have to build the output Args object, which the Splunk
Service uses to to configure the output. Obviously. It's the exact
same code, and it wasn't cut and paste. I'm thinking this needs an
abstraction.
<<build the output args object>>=
(defn build-splunk-output-args [argument-map]
(let [rulelist ["count" "offset" "output_mode"]
args (Args.)]
(doseq [fieldname rulelist]
(if (argument-map fieldname)
(.put args fieldname (argument-map fieldname))))
args))
@
Now that we have everything, \textit{le sigh}, it's time to pass it
all to the server. I don't even care much about the Service object
once I've instantiated it. I'm using it for a single query, a single
Job on the server. And it's a bit redundant to pass both the command
and the argument map, but I'm using the map to configure other program
behaviors, so it stays as a separate copy of the command content.
Between the double-dot argument and the chain operators, I'm seeing a
lot of Haskellian inspiration in Clojure.
<<send the query to the server>>=
(defn build-splunk-job [command argument-map]
(let [queryargs (build-splunk-queryargs argument-map)
service (Service/connect (.opts command))
job (.. service (getJobs) (create (first (.args command)) queryargs))]
(while (not (.isDone job))
(if (argument-map "verbose")
(println (format "\n%03.1f%% done -- %d scanned -- %d matched -- %d results"
(* (.getDoneProgress job) 100.0)
(.getScanCount job)
(.getEventCount job)
(.getResultCount job))))
(Thread/sleep 1000))
job))
@
There are a number of different things we can get from the server.
The correct setting is usually ``preview'', meaning ``show me any
valid data you've collected, even if it's not complete.'' Preview
will return everything even if the job \textit{is} complete, so it's
safe to use at all times. But you can see the options below. Here,
using the argument map and the instructions to build the output args
object, I ask for a stream (an actual Java \texttt{InputStream}) from
the server of the data I want:
<<request a stream from the server>>=
(defn get-splunk-stream [job argument-map]
(let [outputargs (build-splunk-output-args argument-map)
output (argument-map "output")]
(case output
"results" (.getResults job outputargs)
"preview" (.getPreview job outputargs)
"searchlog" (.getSearchLog job outputargs)
"summary" (.getSummary job outputargs)
"timeline" (.getTimeline job outputargs)
)))
@
There are two kinds of read operations: we could use the Splunk
ResultsReader, which parses the content for you and turns it into a
hashmap of keys and values, or you can just get the raw data. I'm
going to deal with the readers first.
Instantiating a reader irked me. I could not figure out how to make
Java constructors act like first-class objects; I wanted to be able to
pick a class and pass it back to the calling function, which could
instantiate it on the fly after dereferencing it. I'm sure there are
Clojure gurus who can help with that:
<<construct a reader of the appropriate type>>=
(defn construct-reader [stream output-mode]
(case output-mode
"xml" (ResultsReaderXml. stream)
"json" (ResultsReaderJson. stream)
"csv" (ResultsReaderCsv. stream)))
@
This returns a reader-ready function to print out content based upon
the output mode. The \texttt{when} macro for handling loops was a
real eye-opener.
<<make a stream reader with a mode parser>>=
(defn get-streamtype-reader [output-mode]
(fn [stream]
(with-open [reader (construct-reader stream output-mode)]
(loop [e (.getNextEvent reader)]
(when (not= e nil)
(println "EVENT:********")
(doseq [k (seq (.keySet e))]
(printf "%s ---> %s\n" k (.get e k)))
(recur (.getNextEvent reader)))))))
@
This returns a reader-ready function to print out the raw content.
<<make a generic stream reader>>=
(defn generic-reader []
(fn [stream]
(with-open [reader (InputStreamReader. stream "UTF8")
writer (OutputStreamWriter. System/out)]
(try
(let [buffer (char-array 1024)]
(while true
(let [count (.read reader buffer)]
(if (== count -1) (throw (Exception. "EOF")))
(.write writer buffer 0 count))))
(catch Exception e nil)))))
@
I'm completely sure that there's a better way to acheive the symmetry
I wanted, and that the function that takes no argument above in ``make
a generic stream reader'' is completely gratuitous, but I kinda liked
the way this one came out.
<<get the stream reader>>=
(defn get-reader [command argument-map]
(let [use-reader (.. command opts (containsKey "reader"))]
(if use-reader
(get-streamtype-reader (argument-map "output_mode"))
(generic-reader))))
@
This has to reveal just how bleedingly new I am to Clojure, because
I've never seen a ``let cascade'' like this before in anyone else's
code. But it works!
<<main>>=
(defn -main[& args]
(let [command (build-splunk-command args)]
(let [argument-map (build-argument-map command)]
(let [job (build-splunk-job command argument-map)]
(let [stream (get-splunk-stream job argument-map)]
(let [reader (get-reader command argument-map)]
(reader stream)))))))
@
And the entire program ends up looking like
<<core.clj>>=
<<create the namespace>>
<<parse command arguments>>
<<build the argument map>>
<<build the queryargs object>>
<<build the output args object>>
<<send the query to the server>>
<<request a stream from the server>>
<<construct a reader of the appropriate type>>
<<make a stream reader with a mode parser>>
<<make a generic stream reader>>
<<get the stream reader>>
<<main>>
@
And that's it.
You can now run the program with lein:
<<command_line>>=
lein run --output_mode csv "search magnitude > 4.5"
@
Assuming you have something in your Splunk database with a
``magnitude,'' you should get a CSV dump of all the fields related to
it.
Now you know how to talk to the basic Splunk service using Clojure.
I'm sure if you're comfortable with Clojure none of the Java access
weirdness surprised you at all, but for me this was a pretty good
exercise. At times I felt like I was writing in a ``fantasy LISP'', a
Lisp that actually, you know, had \textit{real world} applicability,
and that should have done the things I would have expected of a LISP.
That fantasy LISP was pretty close to the real deal; it only took me a
few hours of \texttt{lein run} sessions to knock out all the bugs.

14
project.clj Normal file
View File

@ -0,0 +1,14 @@
(defproject splunksearch "0.1.0"
:description "SplunkSearch: An implementation of the Splunk Search example program in Clojure"
:url "https://github.com/elfsternberg/splunk-search"
:license {:name "Apache Public Licence 2.0"
:url "http://www.apache.org/licenses/LICENSE-2.0"}
:dependencies [
[org.clojure/clojure "1.4.0"]
[commons-cli "1.2"]
[gson "2.1"]
[opencsv "2.3"]
[splunk "1.1"]]
:plugins [[lein-localrepo "0.4.1"]]
:main splunksearch.core)

View File

@ -1,4 +1,4 @@
(ns simple.core
(ns splunksearch.core
(:require [clojure.java.io :refer :all])
(:import (com.splunk Service ServiceArgs Args ResultsReaderJson ResultsReaderCsv ResultsReaderXml Event))
(:import (com.splunk Command))

View File

@ -1,6 +1,6 @@
(ns simple.core-test
(ns splunksearch.core-test
(:use clojure.test
simple.core))
splunksearch.core))
(deftest a-test
(testing "FIXME, I fail."