diff --git a/README.md b/README.md index f853220..efdd124 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,30 @@ -# simple +# splunksearch -A Clojure library designed to ... well, that part is up to you. +A Clojure command line program that enables simple search access to a +splunk utility. ## Usage -FIXME +This is a bit tricky because this program relies on both the Splunk +Java SDK, which is not in Clojure or Maven repos, and the Splunk Java +SDK's Command utility class. You'll have to download the Splunk Java +SDK yourself and install the Splunk JAR file. You'll also have to +create, from the root of this project directory, resources/com/splunk, +and deposit Command.class (which can be found in the SDK's tree +somewhere) in the newly created directory. + +I used lein localrepo to install the Splunk JAR file. It seems to +have worked for me. + +Once all that's done, *and* you've got an instance of Splunk up at +running, *and* you've successfully configure your .splunkrc file, you +can try: + +lein run 'search ' ## License -Copyright © 2013 FIXME +Copyright © 2013 Elf M. Sternberg -Distributed under the Eclipse Public License, the same as Clojure. +Distributed under the Apache Public License, under the same terms as +other Splunk software. diff --git a/doc/intro.md b/doc/intro.md deleted file mode 100644 index ed9211c..0000000 --- a/doc/intro.md +++ /dev/null @@ -1,3 +0,0 @@ -# Introduction to simple - -TODO: write [great documentation](http://jacobian.org/writing/great-documentation/what-to-write/) diff --git a/doc/splunk_search.nw b/doc/splunk_search.nw new file mode 100644 index 0000000..3330cee --- /dev/null +++ b/doc/splunk_search.nw @@ -0,0 +1,452 @@ + % -*- Mode: noweb; noweb-code-mode: clojure-mode ; noweb-doc-mode: latex-mode -*- +\documentclass{article} +\usepackage{noweb} +\usepackage[T1]{fontenc} +\usepackage{hyperref} +\begin{document} + +% Generate code and documentation with: +% +% noweave -filter l2h -delay -x -html backbonestore.nw | htmltoc > backbonestore.html +% notangle -Rstore.js backbonestore.nw > store.js +% notangle -Rindex.html backbonestore.nw > index.html + +\section{Introduction} + +Two months ago I was complaining that I didn't have a job, and then +suddenly +\nwanchorto{http://www.elfsternberg.com/2013/03/14/gave-lightning-talk/}{I + had one}. I work for \nwanchorto{http://splunk.com}{Splunk} now, an +enterprise-level machine data analysis company. Their core product is +pretty magical: you can shove just about any logfile, real-time +stream, script-generated datasource, or anything else at it, and then +interrogate the data to understand and monitor your networks, server +farms, client connections, whatever. It does an amazing job of +correlating data through little more than keyword observation. There's +even a \nwanchorto{http://www.splunk.com/download}{free version}, +which, while limited to a half-gig of data, is a good way to start off +taking apart your weblog files. + +They have me niftying up the third-party web framework, the thing +large companies use to integrate the data we collect with their own +network intelligence dashboards, visualization systems, whatever. +It's all my kind of thing: Python back-end, Javascript/Backbone/jQuery +front-end stuff, lots of clever closures and event handling. + +But I decided, for entertainment purposes only, to learn more about a +part of the system I know almost nothing about: the Java SDK. And it +wasn't enough to work in a language I don't know (since I don't know +Java) with SDK's I've never seen before. I had to go make it work with +a language that, to the best of my knowledge, no one has ever +demonstrated compatibility with Splunk before. I had to make it work +in \nwanchorto{http://clojure.org/}{Clojure}. + +Yes, I'm a Hipster Hacker. I mean, come on, if you're not making it +difficult for you, what is education for, anyway? + +\subsection{Disclaimer} + +I am a newbie to all of this. Clojure, Splunk, even Java. This is +entirely raw and beginner level stuff. I did this for my own +edification. This code in no way represents the state of the art at +Splunk. It's definitely not warranted in any way by me or anyone +else, and it's licensed under the Apache Public License, V2. Use at +your own risk. Don't make me go ALL CAPS on you. + +\subsection{Literate Program} + +A note: this article was written with the +\nwanchorto{http://en.wikipedia.org/wiki/Literate_programming}{Literate + Programming} toolkit +\nwanchorto{http://www.cs.tufts.edu/~nr/noweb/}{Noweb}. Where you see +something that looks like \textless \textless this \textgreater \textgreater, it's a placeholder for code +described elsewhere in the document. Placeholders with an equal sign +at the end of them indicate the place where that code is defined. The +link (U-\textgreater) indicates that the code you're seeing is used later in the +document, and (\textless-U) indicates it was used earlier but is being defined +here. + +\subsection{Install everything} + +First, install Clojure, Leiningen, the Splunk free server, and the +\nwanchorto{http://dev.splunk.com/view/splunk-sdk-java/SP-CAAAECN}{Splunk +Java SDK}. Start up the free server and give it some data. I gave it +the dataset from earthquake.usgs.gov-- "earthquakes greater than +Richter 2.5 in magnitude over the past 30 days.") but you can give it +whatever you want. + +Completing the installation is a bit tricky because this example +relies on both the Splunk Java SDK, which is not in any Clojure or +Maven repos, and the Splunk Java SDK's Command utility class, which +Splunk has provided as a helpful tool for unpacking the command line +and for understanding the Splunk command file (.splunkrc on most Linux +and Mac boxes). You'll have to download the Splunk Java SDK yourself +and install the Splunk JAR file. You'll also have to create, from the +root of this project directory, resources/com/splunk, and deposit +Command.class (which can be found in the SDK's tree somewhere) in the +newly created directory. + +I used lein localrepo to install the Splunk JAR file. It seems to +have worked for me. + +\subsection{Create the project} + +In your workspace directory, whatever you call it, build a new project +and call it \texttt{splunk}. + +\texttt{lein new splunk} + +Open the new directory and create a new directory, \texttt{lib}. Find +the Splunk jar file in the SDK, and copy it into \texttt{lib}. Edit +project.clj so it looks something like this: + +<>= +(defproject splunksearch "0.1.0" + :description "SplunkSearch: An implementation of the Splunk Search example program in Clojure" + :url "https://github.com/elfsternberg/splunk-search" + :license {:name "Apache Public Licence 2.0" + :url "http://www.apache.org/licenses/LICENSE-2.0"} + :dependencies [ + [org.clojure/clojure "1.4.0"] + [commons-cli "1.2"] + [gson "2.1"] + [opencsv "2.3"] + :plugins [[lein-localrepo "0.4.1"]] + :main splunksearch.core) +@ + +Run \texttt{lein deps} to install your dependencies. + +Then, use the \texttt{localrepo} command to install the splunk jar in +your Maven repository: + +\texttt{lein localrepo coords splunk-1.1.jar | xargs lein localrepo install} + +Now, edit the project file to show your new dependency: + +<>= +(defproject splunksearch "0.1.0" + :description "SplunkSearch: An implementation of the Splunk Search example program in Clojure" + :url "https://github.com/elfsternberg/splunk-search" + :license {:name "Apache Public Licence 2.0" + :url "http://www.apache.org/licenses/LICENSE-2.0"} + :dependencies [ + [org.clojure/clojure "1.4.0"] + [commons-cli "1.2"] + [gson "2.1"] + [opencsv "2.3"] + [splunk "1.1"]] + :plugins [[lein-localrepo "0.4.1"]] + :main splunksearch.core) +@ + +Now you're ready to begin. Fun, huh? + +\subsection{Create a simple query} + +The Splunk Java SDK has its own custom methods for dealing with +arguments from the command line, from your run commands file (all +those files in your home directory that end in ``rc'' on Linux and +sometims Mac OS), and arguments to send to the remote Splunk server. +We need two sets of arguments, one to define access to the service, +and one to define the output mode of the data we expect to get back. + +First, we have to create the namespace and import everything we're +going to need: + +<>= +(ns splunksearch.core + (:require [clojure.java.io :refer :all]) + (:import (com.splunk Service ServiceArgs Args ResultsReaderJson ResultsReaderCsv ResultsReaderXml Event)) + (:import (com.splunk Command)) + (:import (java.io InputStreamReader OutputStreamWriter))) + +@ + +I pretty much followed the code in order, a list of procedures. +Remember, this document serves mostly as a stream-of-consciousness +``this is what I learned about Clojure while wrestling with Splunk'' +history of my project. + +The Splunk \texttt{Command} class requires a series of definitions +added, and then it attempts to parse the command line. Command +instantiates via a static factory method, so that's accessed with a +slash; the \texttt{doto} macro allows me to access the created object +repeatedly, passing methods and arguments, and guarantees the object +created is returned regardless of the return object of the last +method. Clojure args have to be coerced via \texttt{into-array} into +a Java array for Java-based args parsers to make sense of them. + +<>= +(defn build-splunk-command [args] + (let [command + (doto (Command/splunk "search") + (.addRule "count" Integer + "The Maximum Number of results to return (default: 100)") + (.addRule "earliest_time" String + "Search earliest time") + (.addRule "field_list" String + "A comma-separated list of the fields to return") + (.addRule "latest_time" String + "Search latest time") + (.addRule "offset" Integer + "The first result (inclusive) from which to begin returning data. (default: 0)") + (.addRule "output" String + "Which search results to output {events, results, preview, searchlog, summary, timeline} (default: results)") + (.addRule "output_mode" String + "Search output format {csv, raw, json, xml} (default: xml)") + (.addRule "reader" + "Use ResultsReader") + (.addRule "status_buckets" Integer + "Number of status buckets to use for search (default: 0)") + (.addRule "verbose" + "Display search progress") + (.parse (into-array String args)))] + (if (not= (count (.args command)) 1) + (Command/error "Search expression required" nil)) + command)) + +@ + +My example code was taken from the \texttt{Search/Program.java} file, +provided with the SDK. That program had a ton of local variables to +control search generation, stream configuration, reader and output +generation. I decided that all had to go into a simple map, which I +could then refer to at any time. + +Yes, the names above are repeated here. That's a bit of a code smell, +I think. + +<>= + +(defn build-argument-map [command] + (let [opts (.opts command) + ruleset [["count" 100] + ["earliest_time" nil ] + ["reader" false] + ["verbose" false] + ["field_list" nil ] + ["latest_time" nil ] + ["offset" 0] + ["output" "results"] + ["output_mode" "xml"]]] + doall (into {} (for [[k v] ruleset] [k (if (.containsKey opts k) (.get opts k) v)])))) +@ + +Now that I have my argument map, I need to process it into Args +objects understood by the Splunk Service class. I don't know about +you, but this all feels just a bit fiddly: + +<>= +(defn build-splunk-queryargs [argument-map] + (let [rulelist ["earliest_time" "field_list" "latest_time" "status_buckets"] + qa (Args.)] + (doseq [fieldname rulelist] + (if (argument-map fieldname) + (.put args fieldname (argument-map fieldname)))) + args)) + +@ + +Later, I have to build the output Args object, which the Splunk +Service uses to to configure the output. Obviously. It's the exact +same code, and it wasn't cut and paste. I'm thinking this needs an +abstraction. + +<>= + +(defn build-splunk-output-args [argument-map] + (let [rulelist ["count" "offset" "output_mode"] + args (Args.)] + (doseq [fieldname rulelist] + (if (argument-map fieldname) + (.put args fieldname (argument-map fieldname)))) + args)) + +@ + +Now that we have everything, \textit{le sigh}, it's time to pass it +all to the server. I don't even care much about the Service object +once I've instantiated it. I'm using it for a single query, a single +Job on the server. And it's a bit redundant to pass both the command +and the argument map, but I'm using the map to configure other program +behaviors, so it stays as a separate copy of the command content. +Between the double-dot argument and the chain operators, I'm seeing a +lot of Haskellian inspiration in Clojure. + +<>= +(defn build-splunk-job [command argument-map] + (let [queryargs (build-splunk-queryargs argument-map) + service (Service/connect (.opts command)) + job (.. service (getJobs) (create (first (.args command)) queryargs))] + (while (not (.isDone job)) + (if (argument-map "verbose") + (println (format "\n%03.1f%% done -- %d scanned -- %d matched -- %d results" + (* (.getDoneProgress job) 100.0) + (.getScanCount job) + (.getEventCount job) + (.getResultCount job)))) + (Thread/sleep 1000)) + job)) +@ + +There are a number of different things we can get from the server. +The correct setting is usually ``preview'', meaning ``show me any +valid data you've collected, even if it's not complete.'' Preview +will return everything even if the job \textit{is} complete, so it's +safe to use at all times. But you can see the options below. Here, +using the argument map and the instructions to build the output args +object, I ask for a stream (an actual Java \texttt{InputStream}) from +the server of the data I want: + +<>= +(defn get-splunk-stream [job argument-map] + (let [outputargs (build-splunk-output-args argument-map) + output (argument-map "output")] + (case output + "results" (.getResults job outputargs) + "preview" (.getPreview job outputargs) + "searchlog" (.getSearchLog job outputargs) + "summary" (.getSummary job outputargs) + "timeline" (.getTimeline job outputargs) + ))) + +@ + +There are two kinds of read operations: we could use the Splunk +ResultsReader, which parses the content for you and turns it into a +hashmap of keys and values, or you can just get the raw data. I'm +going to deal with the readers first. + +Instantiating a reader irked me. I could not figure out how to make +Java constructors act like first-class objects; I wanted to be able to +pick a class and pass it back to the calling function, which could +instantiate it on the fly after dereferencing it. I'm sure there are +Clojure gurus who can help with that: + +<>= + +(defn construct-reader [stream output-mode] + (case output-mode + "xml" (ResultsReaderXml. stream) + "json" (ResultsReaderJson. stream) + "csv" (ResultsReaderCsv. stream))) + +@ + +This returns a reader-ready function to print out content based upon +the output mode. The \texttt{when} macro for handling loops was a +real eye-opener. + +<>= + +(defn get-streamtype-reader [output-mode] + (fn [stream] + (with-open [reader (construct-reader stream output-mode)] + (loop [e (.getNextEvent reader)] + (when (not= e nil) + (println "EVENT:********") + (doseq [k (seq (.keySet e))] + (printf "%s ---> %s\n" k (.get e k))) + (recur (.getNextEvent reader))))))) +@ + + +This returns a reader-ready function to print out the raw content. + +<>= + +(defn generic-reader [] + (fn [stream] + (with-open [reader (InputStreamReader. stream "UTF8") + writer (OutputStreamWriter. System/out)] + (try + (let [buffer (char-array 1024)] + (while true + (let [count (.read reader buffer)] + (if (== count -1) (throw (Exception. "EOF"))) + (.write writer buffer 0 count)))) + (catch Exception e nil))))) +@ + +I'm completely sure that there's a better way to acheive the symmetry +I wanted, and that the function that takes no argument above in ``make +a generic stream reader'' is completely gratuitous, but I kinda liked +the way this one came out. + +<>= + +(defn get-reader [command argument-map] + (let [use-reader (.. command opts (containsKey "reader"))] + (if use-reader + (get-streamtype-reader (argument-map "output_mode")) + (generic-reader)))) + +@ + +This has to reveal just how bleedingly new I am to Clojure, because +I've never seen a ``let cascade'' like this before in anyone else's +code. But it works! + +<
>= +(defn -main[& args] + (let [command (build-splunk-command args)] + (let [argument-map (build-argument-map command)] + (let [job (build-splunk-job command argument-map)] + (let [stream (get-splunk-stream job argument-map)] + (let [reader (get-reader command argument-map)] + (reader stream))))))) + +@ + +And the entire program ends up looking like + +<>= +<> + +<> + +<> + +<> + +<> + +<> + +<> + +<> + +<> + +<> + +<> + +<
> +@ + +And that's it. + +You can now run the program with lein: + +<>= +lein run --output_mode csv "search magnitude > 4.5" +@ + +Assuming you have something in your Splunk database with a +``magnitude,'' you should get a CSV dump of all the fields related to +it. + +Now you know how to talk to the basic Splunk service using Clojure. +I'm sure if you're comfortable with Clojure none of the Java access +weirdness surprised you at all, but for me this was a pretty good +exercise. At times I felt like I was writing in a ``fantasy LISP'', a +Lisp that actually, you know, had \textit{real world} applicability, +and that should have done the things I would have expected of a LISP. +That fantasy LISP was pretty close to the real deal; it only took me a +few hours of \texttt{lein run} sessions to knock out all the bugs. + + + diff --git a/project.clj b/project.clj new file mode 100644 index 0000000..f7e3f38 --- /dev/null +++ b/project.clj @@ -0,0 +1,14 @@ +(defproject splunksearch "0.1.0" + :description "SplunkSearch: An implementation of the Splunk Search example program in Clojure" + :url "https://github.com/elfsternberg/splunk-search" + :license {:name "Apache Public Licence 2.0" + :url "http://www.apache.org/licenses/LICENSE-2.0"} + :dependencies [ + [org.clojure/clojure "1.4.0"] + [commons-cli "1.2"] + [gson "2.1"] + [opencsv "2.3"] + [splunk "1.1"]] + :plugins [[lein-localrepo "0.4.1"]] + :main splunksearch.core) + diff --git a/src/simple/core.clj b/src/splunksearch/core.clj similarity index 99% rename from src/simple/core.clj rename to src/splunksearch/core.clj index df21d36..86f4562 100644 --- a/src/simple/core.clj +++ b/src/splunksearch/core.clj @@ -1,4 +1,4 @@ -(ns simple.core +(ns splunksearch.core (:require [clojure.java.io :refer :all]) (:import (com.splunk Service ServiceArgs Args ResultsReaderJson ResultsReaderCsv ResultsReaderXml Event)) (:import (com.splunk Command)) diff --git a/test/simple/core_test.clj b/test/splunksearch/core_test.clj similarity index 60% rename from test/simple/core_test.clj rename to test/splunksearch/core_test.clj index e428c6d..451dcc8 100644 --- a/test/simple/core_test.clj +++ b/test/splunksearch/core_test.clj @@ -1,6 +1,6 @@ -(ns simple.core-test +(ns splunksearch.core-test (:use clojure.test - simple.core)) + splunksearch.core)) (deftest a-test (testing "FIXME, I fail."