C'in the Light
After the group discussion, I had almost convinced myself that MLC++ was after all not such a good idea for basing our tools on, inspite of being written in a(n) ( more ;) efficient language.
The cons being:
But apart from Weka and MlC++/MLJ there seems to be quite a few other libraries that deserves a look especially Plearn.
1) Plearn
Maintainer: Yoshua Bengio Activity: 90 minutes ago url:
Description:
PLearn is a C++ library aimed at research and development in the field of statistical machine learning algorithms. PLearn is a C++ library that uses the object-oriented and operator overloading capabilities of the C++ language to allow, among other things, to express cost functions and their optimization as a standard C++ program, in a declarative manner that is as close as possible to their mathematical formalization.<--snip-->
some quotes from the manual which i liked
Most neural-network and general machine-learning simulation environments define their own scripting language. While it is very tempting for every computer-scientist to craft his own language, creating a complete, clear, efficient and bug-free language is a horrendous task, so this is how things usually go: one starts bulding a simple scripting syntax (typically lisp-like because it's easy to parse) to specify simple experiments. Quickly it appears too limited, and it may grow to include loops, functions, data structures, etc... Eventually it ends up including some sort of support for object-oriented programming, and finally for efficiency you want it to be compiled rather than interpreted! In the end, you end up with a huge mess of a system that was not designed to grow that much from the beginning, and which is often impossible to comprehend and maintain for anybody but its author. The end-result might sometimes be impressive, but at the cost of a lot of efforts diverted from your actual research. While C++ is far from being the perfect language, it is very powerful, can be both very expressive and generate highly efficient code, and most of all it has the immense advantage of being developed and well-supported by worldwide teams of dedicated and competent people... comment: need to check on support for pitching different algorithms against each other.
2) ClustLib
Activity: 2002-08-21 17:00 url url
Description:
The ClustLib project is intended to build several C++ libraries with basic functionalities for clustering in the context of data mining. The planned packages include access to mysql databases, scaleable vector quantization, multidimensional density estimation
3) ROSETTA
Activity: 2001-05-24 15:00 url
Description:
The ROSETTA C++ library is a collection of C++ classes and routines that enable discernibility-based empirical modelling and data mining, developed as part of my disseretation. It comprises useful routines for machine learning in general and for rough set theory in particular.