Configuring Ganglia

Wes Stevens's Avatar

Wes Stevens

04 Jun, 2010 10:44 PM via web

My group would like to get more metrics out of it than we're currently being provided with, such as the ability to see an hour interval any time in the recorded past instead of just the previous hour, and we would also like to see activity even at the thread level to help us identify problems as they arise across our clusters. Would it be possible and if so how to configure Ganglia to provide more parameters and unlock additional features? Help greatly appreciated

  1. Support Staff 2 Posted by David E. Young on 07 Jun, 2010 11:42 AM

    David E. Young's Avatar

    I'll investigate these questions. There are no immediate answers; sorry about that. This will take some research.

  2. Support Staff 3 Posted by David E. Young on 07 Jun, 2010 01:19 PM

    David E. Young's Avatar

    Created a lighthouse ticket.

  3. Support Staff 4 Posted by David E. Young on 10 Jun, 2010 12:56 PM

    David E. Young's Avatar

    Hey Wes. Would you try and elaborate on what thread-level metrics you're interested in? This will help me focus on how we might obtain this information and via what type of interface. Thanks.

    -- david

  4. Support Staff 5 Posted by David E. Young on 02 Jul, 2010 12:32 PM

    David E. Young's Avatar

    Hey Wes. I saw your posting on the ganglia support list. I've written gmond plugins in both Python and C, and find the C approach generally less onerous. I can send you the source and compiled gmond modules if you like so you can take a look. We have two modules; one collects mysqld metrics, the other some metrics from SMB. These modules should be considered experimental, but they seem to run fine here.

    I built the C modules on a Fedora 11 box and run them on CentOS 5.4 with Ganglia 3.1.7. They're 64-bit and will not run with the ganglia version as shipped with our original CentOS DVD.

    Let me know.

    -- david

  5. Support Staff 6 Posted by David E. Young on 02 Jul, 2010 12:39 PM

    David E. Young's Avatar

    I should qualify the comment regarding running with ganglia 3.1.1. You may certainly compile the modules on a 32-bit CentOS 5.4 machine and they should (probably) run just fine. I just haven't tried that so I can't make promises.

  6. 7 Posted by Wes Stevens on 02 Jul, 2010 05:08 PM

    Wes Stevens's Avatar

    Yea sure, would love to have them thanks!

  7. 8 Posted by Wes Stevens on 02 Jul, 2010 08:35 PM

    Wes Stevens's Avatar

    Hey Dave I'm not sure if this is the best place to ask this, but I ran the makefile on the ganglia_plugins package you provided this morning and I get this compile error:

    [wjs@A3797200 ganglia_plugins]$ make
    gcc -c -g -fno-strict-aliasing -fPIC -fpic -fno-omit-frame-pointer -std=gnu99 -Wall -I. -I/usr/include/ganglia -I/usr/include/apr-1 -m64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE utils.c -o lib64/utils.o
    In file included from utils.c:10:
    utils.h:5:25: error: gm_protocol.h: No such file or directory
    make: *** [lib64/utils.o] Error 1

    Not sure what it does, my guess is something related to communication with the gmon daemon, but I can't find gm_protocol.h anywhere, although it's included in utils.h. Thanks again in advance.

  8. 9 Posted by Wes Stevens on 02 Jul, 2010 09:16 PM

    Wes Stevens's Avatar

    Ah nevermind Harry found me the ganglia development package that contains this stuff.

  9. 10 Posted by Wes Stevens on 06 Jul, 2010 04:53 PM

    Wes Stevens's Avatar

    Hey Dave do you know if the SaffronOne_0 and SaffronOne_1 metrics will work if Saffron is configured to run as a single process and we run an ingest? Any other caveats I might not be aware of?

    I'm also taking a look at your C code and wondering if you are still developing this stuff and you might have later versions of this available?

    Lastly I'm wondering if there is any documentation that you guys may have for this project, I've been taking a look at it and just the sheer amount of code can take me a while to digest exactly what was done here, although I've got it running and displaying metrics in Ganglia just fine. Thanks

  10. Support Staff 11 Posted by David E. Young on 06 Jul, 2010 06:53 PM

    David E. Young's Avatar

    If you're running single-process then the metrics won't be available. The current version assumes a standard deployment. This is mostly because the smb metrics module gets all of its information from the proc filesystem; there is no API as of yet to smb itself.

    You have the latest version. I'm on different tasking right now, but if you folks see bugs or a need for new features then I should have time to look at these requests.

    There was zero ganglia documentation (at least that I found) on how to write metric modules in C. In the ganglia 3.1.7 source tree there is module shell directory (ganglia-3.1.7/gmond/modules/example), plus of course the modules shipped with ganglia. Those are what I used to get going. It's actually not as bad as it might first appear. If you look at the skeleton module you'll see see that there isn't all that much code to hook into the ganglia API.

    Finally, remember that you have two alternatives to a C approach: a) Python; and b) gmetric. I found Python to impact gmond a bit more than I liked; I prefer the tight integration of C and knowing the metric modules are running as fast/efficiently as possible (assuming of course my code is good). Gmetric is the old interface/way to insert metrics into gmond. It doesn't format as nicely as the API, but it is a possible alternative. Google around for some tutorials; I played with gmetric a bit.

    -- david

  11. 12 Posted by Wes Stevens on 09 Jul, 2010 04:52 PM

    Wes Stevens's Avatar

    Hey Dave, at some point in the future we may be interested in changing the smb metrics to suit our configuration, however my biggest fear is overburdening you and keeping you from doing other important work, especially if that work is helping the rest of my team. I am not really sure what to do at this point, so I figure I will wait until my team starts raising a fuss. May have other questions in the future though. Thanks

  12. Support Staff 13 Posted by David E. Young on 12 Jul, 2010 12:28 PM

    David E. Young's Avatar

    Don't worry about that. At present, part of my standing tasking is extensions to ganglia, so unless something changes drastically here I'm available to look at these extensions. Just let me know. Thanks!

    -- david

  13. 14 Posted by Wes Stevens on 12 Jul, 2010 04:57 PM

    Wes Stevens's Avatar

    Probably my biggest problem I'm having right now is generating our own simple Python template for generating our own metrics. We think such a template will really give us an idea where we wanna go as far as a final template, whether we want it in C or Python. I have followed this tutorial almost to a t with 0 results:

    http://tech.xlab.si/2008/10/writing-custom-python-metric-modules-fo...

    despite how incredibly simple and straightforward the tutorial is. The only way I way I significantly stray from the tutorial is putting the test.py in /usr/lib64/ganglia/python_modules instead of using /usr/local/share/ganglia/python_modules, I'm not sure if that's keeping everything from working, but I do not have permission to use /usr/local/share. We when we run gmond with debugger option we get an error that it cannot find metric_init in test.py, which doesn't make any sense because it is clearly there, it must not be finding test.py altogether or something else.

    In relation to what we've been talking about, we haven't had any of the SMB metrics plotting data aside from the Tomcat ones. This doesn't mean all of them don't work, but I strongly suspect at least most of them don't. Even during ingests they show nothing. If you have any questions about our configuration please let me know, perhaps even Jim Fleming could help you there.

    Thanks again in advance

  14. Support Staff 15 Posted by David E. Young on 12 Jul, 2010 05:53 PM

    David E. Young's Avatar

    I generally see all of the SMB metrics plot. On the occasions I haven't, restarting gmetad/gmond on my test cluster (sometimes several times) clears that up. I don't think it's a problem with the module, but you can verify by looking in /var/log/messages. Modsmb will write diagnostics to the system logger. All of this assumes you're using ganglia 3.1.7 (64-bit); I haven't tested with any other version. Ganglia can be flaky; even the newest version.

    I also had a python metric module working on ganglia 3.1.1 (the initial version shipped with our original DVD). On that dvd, ganglia is 32-bit because at that time only 32-bit rpms were available.

    The ganglia 3.1.7 RPMs we sent you folks were compiled without python support, so unless you've rebuilt them you'll have trouble. Also, ganglia 3.1.1 only supports python up to version 2.3.

    I don't know enough about the environment you're working in to offer anything more constructive. Let me know.

    -- david

  15. 16 Posted by Wes Stevens on 12 Jul, 2010 06:05 PM

    Wes Stevens's Avatar

    Indeed this environment meets all those requirements, it is 3.1.7, 64-bit, the rpm has python support, we have python 2.5 installed, it's just not working.

    I have rebooted gmond and gmetad daemons dozens of times and even run ingests and we have yet to have any of the non-Tomcat metrics plot anything.

    As far as the SaffronOne_0 and SaffronOne_1 metrics we think they don't work because our Saffron is configured for a single process and this is two processes, probably not showing anything because the plugin isn't made to recognize a process name for a single process instance, but I dunno I haven't investigated.

    As for the Enterprise or ClusterCoordinator stuff, I have no idea why their not plotting anything.

  16. Support Staff 17 Posted by David E. Young on 12 Jul, 2010 06:11 PM

    David E. Young's Avatar

    Oh, of course. Yes, as mentioned previously the modsmb plugin will only plot SMB metrics if you're running multi-process. That includes all smb processes, not just SaffronOne. HostManager is not currently plotted at all because it really doesn't impact the system.

    As for python, that I have not tried (yet) with 3.1.7. Unless things have changed, in addition to your module-specific config file, you must also place modpython.conf in /etc/ganglia/conf.d. There's a sample for this in the examples directory that came with the modsmb tarball.

    -- david

  17. 18 Posted by Wes Stevens on 12 Jul, 2010 06:21 PM

    Wes Stevens's Avatar

    Oh, of course. Yes, as mentioned previously the modsmb plugin will only plot SMB metrics if you're running multi-process. That includes all smb processes, not just SaffronOne. HostManager is not currently plotted at all because it really doesn't impact the system.

    Could we get it to plot data for a single saffron process then? Like, just SaffronOne, without any postfix like 0 or 1.

    As for the modpython.conf in /etc/ganglia/conf.d, we have indeed had this located there, as well as test.conf. Perhaps we should set up a meeting on webex? I guess I would need Harry's approval.

  18. Support Staff 19 Posted by David E. Young on 12 Jul, 2010 06:36 PM

    David E. Young's Avatar

    Plotting a specific class instance within a single VM (such as SaffronOne) would require some type of API to either the Java or VM layer of SMB, neither of which exists right now.

    When I get a bit of time (I'm on a specific task right now) I'll see about getting python support for ganglia onto my test cluster. A webex is probably a bit premature right now; I'd go to the ganglia forum first.

    This is a good project; have fun with it.

    -- david

  19. Support Staff 20 Posted by David E. Young on 14 Jul, 2010 12:17 PM

    David E. Young's Avatar

    Hi Wes. I found a bit of time yesterday to try python with ganglia 3.1.7. I installed the example.py module that comes with the distribution and it works ok. I noticed one subtlety that's easy to miss, though. In modpython.conf you'll see this line:

    params = "/usr/lib64/ganglia/python_modules"

    make sure you've created the python_modules directory in /usr/lib64/ganglia, and drop your python modules in there. Alternatively, of course, you can change the params attribute.

    hth, david

  20. Support Staff 21 Posted by David E. Young on 14 Jul, 2010 01:29 PM

    David E. Young's Avatar

    Also, you've probably read this but I'm going to throw it into the mix anyhow:

    http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_gmond_python_...

    -- david

  21. 22 Posted by Wes Stevens on 14 Jul, 2010 06:43 PM

    Wes Stevens's Avatar

    Already had test.py located in /usr/lib64/ganglia/python_modules and also read that tutorial a while back. Here's the actual error message I get:

    [wjs@A3797200 ~]$ /usr/sbin/gmond -d10
    loaded module: core_metrics
    loaded module: cpu_module
    loaded module: disk_module
    loaded module: load_module
    loaded module: mem_module
    loaded module: net_module
    loaded module: proc_module
    loaded module: sys_module
    loaded module: multicpu_module
    loaded module: smb_module
    loaded module: python_module
    loaded module: mysql_module
    [PYTHON] Can't find the metric_init function in the python module [test].
    ...
    Unable to find the metric information for 'test'. Possible that the module has not been loaded.
    ...
    ......
    .........

    EDIT: /var/log/messages says the same thing and doesn't reveal anything new

  22. 23 Posted by Wes Stevens on 14 Jul, 2010 07:04 PM

    Wes Stevens's Avatar

    Here's more information that might help (in following the tutorial http://tech.xlab.si/2008/10/writing-custom-python-metric-modules-for-ganglia/)

    [wjs@A3797200 ~]$ ls -1 /etc/ganglia/conf.d
    diskusage.pyconf.off
    modgstatus.conf
    modmysql.conf
    modpython.conf
    modpython.conf~
    modpython.conf.save
    modsmb.conf
    multicpu.conf
    tcpconn.pyconf
    test.pyconf
    test.pyconf~

    [wjs@A3797200 ~]$ ls -1 /usr/lib64/ganglia/python_modules
    compile.py
    compiler.py
    example.py
    example.pyc
    multidisk.py
    multidisk.pyc
    tcpconn.py
    tcpconn.pyc
    test.py
    test.py~
    test.pyc

    [wjs@A3797200 ~]$ cat /etc/ganglia/conf.d/modpython.conf
    /*
      params - path to the directory where mod_python
               should look for python metric modules

      the "pyconf" files in the include directory below
      will be scanned for configurations for those modules
    */
    modules {
      module {
        name = "python_module"
        path = "modpython.so"
        params = "/usr/lib64/ganglia/python_modules"
      }
    }

    include ('/etc/ganglia/conf.d/*.pyconf')

    [wjs@A3797200 ~]$ cat /etc/ganglia/conf.d/test.pyconf
    modules {
      module {
        name = "test"
        language = "python"
      }
    }

    collection_group {
      collect_once = yes
      time_threshold = 20
      metric {
        name = "test"
        title = "Prints some string"
      }
    }

    [wjs@A3797200 ~]$ cat /usr/lib64/ganglia/python_modules/test.py
    import os

    def getString(name):
      test_file = "/home/notec/wjs/Desktop/test"

      try:
        p = os.popen(test_file, 'r')
        return p.read()

      except IOError:
        return "Error"

    def metric_init(params):
      d1 = {'name': 'test',
      'call_back': getString,
      'time_max': 90,
      'value_type': 'string',
      'units': '',
      'slope': 'both',
      'format': '%s',
      'description': 'Example metric'}

      return [d1]

    def metric_cleanup():
      pass

  23. Support Staff 24 Posted by David E. Young on 14 Jul, 2010 07:11 PM

    David E. Young's Avatar

    Tell you what. Tar up all of your python stuff (.py files, config files) and send me the tarball. I want to make sure I've got the exact stuff you do; cut'n'pasting the stuff might alter the formatting some way. I'll take a look...

    -- david

  24. 25 Posted by Wes Stevens on 14 Jul, 2010 07:25 PM

    Wes Stevens's Avatar

    Sure thing (see attached)

  25. Support Staff 26 Posted by David E. Young on 14 Jul, 2010 08:01 PM

    David E. Young's Avatar

    Ok, Wes. I reproduced your problem but don't understand it. Especially after finding a solution. I simply renamed your module to getstr.py (rather than test.py), and changed the metric name to Get_String. Works just fine.

    With your original code I also got this message from gmond:

    Exception AttributeError: "'module' object has no attribute 'metric_init'" in 'garbage collection' ignored
    Fatal Python error: unexpected exception during garbage collection

    Makes no sense does it? I told you ganglia could be flaky. Try the attached version.

    -- david

  26. 27 Posted by Wes Stevens on 14 Jul, 2010 08:41 PM

    Wes Stevens's Avatar

    Alright no longer getting the error. However, no luck getting Ganglia to show the Get_String metric anywhere, rebooted a bunch of times with no effect. Are you able to get this?

  27. Support Staff 28 Posted by David E. Young on 15 Jul, 2010 12:36 PM

    David E. Young's Avatar

    Yes, I see it on the node view page under Time and String Metrics. Note the first line:

    Prints some string Frodo Baggins
    Last Boot Time Tue, 06 Jul 2010 08:03:29 -0400
    Gexec Status OFF
    Gmond Started Thu, 15 Jul 2010 08:34:02 -0400
    Last Reported 0 days, 0:00:10
    Machine Type x86_64
    Operating System Linux
    Operating System Release 2.6.30.10-105.2.23.fc11.x86_64
    Uptime 9 days, 0:30:53

  28. Support Staff 29 Posted by David E. Young on 15 Jul, 2010 01:22 PM

    David E. Young's Avatar

    Oh, one more thing. I'm sure you saw this, but when I was experimenting with your code I changed the os.popen() to reference a script named /tmp/frodo. That script simply echoes 'Frodo Baggins' to stdout. Also, note that the script must (obviously) be executable by everyone. I made this mistake.

    -- david

  29. Support Staff 30 Posted by David E. Young on 15 Jul, 2010 05:15 PM

    David E. Young's Avatar

    And another note. I think I understand why your original code, when named test.py, didn't work. Python has a package named 'test' in its standard path. If the ganglia libpython module simply imports python metric modules, then what probably happened is the python 'test' package was imported rather than your module.

    Only thing that makes sense right now.

    -- david

Comments are currently closed for this discussion. You can start a new one.