User:Tsu2/elasticsearch 1.0

Elasticsearch on openSUSE (and likely SUSE)

If the reader is arriving at this page expecting a complete introduction to Elasticsearch, Logstash and Kibana, you will likely be disappointed. This wiki page is intended to supplement the standard Logstash Guides (see links below), providing some supplemental foundation information and preview for what the Logstash tutorials instruct.

This wiki updates information about Elasticsearch and its sister applications Logstash and Kibana running on openSUSE and SUSE. Older ES required Ruby which is no longer the case. It is now a pure standalone Java binary which can be downloaded and run anywhere or properly installed. In any case, for a fast changing project like Elasticsearch no one should be using old versions. There are generally few reasons to avoid always updating to the latest release. With Elasticsearch today, there are many new options for installing (or not) and now there are new "embedded" options;.

This month (February 2014) Elasticsearch is releasing 1.0 which includes substantial improvements over the older 0.90 branch. If you are new to Elasticsearch, I highly recommend ignoring the 0.90 branch and start using 1.0 even in these last few days before its official release.

Whatever option you choose for running Elasticsearch (and likely Logstash and Kibana), you will need to install a suitable Java runtime.
I recommend and have been using openjdk-1.7 from the openSUSE repos.

New to Elasticsearch and Big Data?

The Tutorials

The starting point for Newbies first approaching Elasticsearch (You only have to download the Logstash jar file, you don't have to install anything).
Current URLs Logstash tutorials
http://logstash.net/docs/1.3.3/tutorials/getting-started-simple
http://logstash.net/docs/1.3.3/tutorials/getting-started-centralized
http://logstash.net/docs/1.3.3/tutorials/10-minute-walkthrough/
Note: As of this writing, the tutorials still are based on ES 0.90, but may be updated for 1.0 at some time. In the meantime, if anyone decides to run the tutorials using Elasticsearch 1.0, note that "indexer.conf" must be modified as follows

Original indexer.conf for ES 0.90 and below

output {
  stdout { debug => true debug_format => "json"}

  elasticsearch {
    host => "127.0.0.1"
  }
}

indexer.conf for ES 1.0

output {
  stdout { debug => true debug_format => "json"}

  elasticsearch_http{
    host => "127.0.0.1"
  }
}

Running Logstash

Once you have a suitable Java installed, you can simply download Logstash and run it with this command from anywhere you choose

java -jar logstash-[version] agent -f configfile

The command is standard for launching jar files and sets up Logstash as an agent which will listen, parse and send according to the json formatted instructions in the configfile.

The configfile is easy to read, it describes inputs (how Logstash listens), some instructions to transform the data, parsing and applying tags and lastly outputs (where the transformed data goes)

From time to time the tutorials will describe a curl command. Curl is the standard command line tool used to interrogate an Elasticsearch node (by default you need to specify a URL using port 9200). Although this tool and how it's used is fundamental to manipulating and querying ES, you do not need to learn the curl commands immediately as there are web tools like Marvel, elasticsearch-head, elasticsearch-hq and bigdesk (there are others as well) that will automatically invoke the necessary commands and display on a webpage

If you're new to Logstash, by default it is already able to parse apache and syslog logfiles (contains the Grok patterns to do basic recognition and parsing). Logstash is able to parse other types of files by either installing a plugin (someone else did the work) or defining your own Grok pattern.

Some noteworthy items described in the Logstash tutorial config filesl

Input

stdin

stdin { }

Allows you to simply type into the console you launched logstash, and the data is accepted by Logstash file {}
By describing a path within the curlies, you can input a local file like a syslog. On openSUSE the path would be /var/log/messages

TCP

  tcp { 
    type => "apache"
    port => 3333
  }

A general purpose TCP port can be created which would accept data by way of a network connection. Netcat for instance can be used which by default connects to a remote end point on port 3333

Redis

redis {
  host => " "
  # these settings should match the output of the agent
  data_type => " "
  key => " "

  # We use the 'json' codec here because we expect to read
  # json events from redis.
  codec => js'''on
}

Redis can perform simple queuing functions instead of inputting directly to an Elasticsearch node.

Output

stdout

stdout { codec => rubydebug }

Is nice to visually see activity so you know that something is happening, and how fast launches an embedded version of elasticsearch without assistance and sends data to it.

Launch embedded Elasticsearch

elasticsearch { embedded => true }

Launches an embedded instance of Elasticsearch on demand and outputs data to it. Convenient for learning and simplifying by avoiding any formal ES installation. If you have ES installed, you may not want to launch an embedded instance and simply send to the installed ES instead.

Output to installed Elasticsearch

elasticsearch_http { host => " " }

Points to the elasticisearch instance location(network address or resolvable name)

Embedded Elasticsearch and Kibana

You may be surprised to know that you can launch all three apps (Logstash, Elasticsearch, Kibana) at once simply by running Logstash and without installing either of the other two. These types of Elaticsearch and Kibana instances are called "embedded" perhaps because their code is included within a standard Logstash binary.

An embedded instance of Elasticsearch is launched by stating the following within the Logstash configfile's output

elasticsearch { embedded => true }

Although an embedded instance of elasticsearch can be launched by defining in the logstash config file's output, the web frontend (Kibana) is launched as part of the logstash command by appending "--web", eg

java -jar logstash-1.3.3-flatjar.jar agent -f logstash-simple.conf -- web

Clear the data between tutorials

One thing that isn't mentioned anywhere is if you want to run the tutorials over again from the beginning, you should first clear the data from your previous run. I created a scriptfile with the following that does this for an installed ES which can be run from anywhere(requires root permissions).

rm -r -f /var/lib/elasticsearch/*

If you're running an embedded ES, the following command can be invoked from the directory you're running Elasticsearch, inspect and verify the path described in the Logstash config file (I recommend also placing in a scriptfile). Note, I'm including the "-f" flag not because it's required for openSUSE(can be omitted) but is required in CentOS if you want to avoid interactive verification.

rm -r -f data/*

Concluding

And there you have it, a quick preview of the topics in the Logstash tutorials. I hope the information described here is valuable to others first getting their toes wet. I experienced many mis-steps and contemplated many mistaken conceptions before getting them right.