I’ve been pondering a bit concerning the work we do at our lab. If I take a look at the trajectory of the sorts of papers I’d be thinking about writing, they’ve gone from logics for probabilistic reasoning to subjects of the intersection of logic and studying, and now neurosymbolic AI with points of ethics and explainability peppered in.
Now, one frequent theme right here is to know the foundations for combining reasoning and studying. And this stays maybe an important agenda for my very own analysis curiosity perspective, and approaches that match on this panorama are of basic curiosity to me.
However then I used to be additionally pondering that there’s a sense by which we take care of machine studying fashions and machine studying formulations and ideas at an summary degree, the place to a big extent, we attempt to not tinker with the precise studying framework itself, and even generally the educated machine studying mannequin altogether.
There are exceptions: Our work with Nick on multiplexnet checked out loss capabilities for neural networks, and these loss capabilities are augmented with logical formulation. On this case, we’re collectively coaching the neural community and its efficiency over the method and backpropagate the loss again into the community. Likewise, our current machine studying journal article with Andreas seems to be to immediately manipulate the structure of the neural networks in order that we are able to carry out neural program induction. Giannis, then again, has been trying on the semantics of sum product networks to include prior constraints and generate counterfactuals.
So, it’s probably not the case that you would be able to divorce your self from the machine studying setup utterly. However at a conceptual degree, what I believe is occurring is that we’re treating the machine studying mannequin at a meta degree. The only method to see this, in fact, is that the basic definition of machine studying as merely a perform that maps your inputs to an output. Presumably, you’ll be getting inputs that you just’ve by no means seen, which captures the generalization of this machine studying perform.
At this summary degree, you pay virtually no consideration to how this perform is being constructed. However with this perform in place, you can begin speaking about meta-level properties. For example, suppose you’re thinking about robustness. Then you definitely would possibly say, for small perturbations of the enter house, will we count on the output label to alter dramatically? Or, as an illustration, in case you’re thinking about a neuro-symbolic structure the place we need to mix the neural enter with a reasoning system, we’d deal with the neural predictions and even the neural ideas as objects for the logical solver after which use the solver for our reasoning. While you plug in backpropagation, then it turns into a extra end-to-end kind structure that’s at present highly regarded.
Right here, I’d argue, to a big extent, we don’t actually fear an excessive amount of concerning the machine studying mannequin. We don’t take care of the statistics and the arithmetic of the distributions captured by these items and purpose about them. In truth, there’s virtually fairly a big conceptual and theoretical distinction between core machine studying concept papers and theoretical papers on logic and studying and neuro-symbolic AI. I’d say the latter are nonetheless concept papers, however they take care of several types of notions as a result of they don’t actually essentially purpose concerning the distributions or the matrices and variances and different kinds of linear algebra ideas that you have to semantically characterize a machine studying mannequin. You largely deal with the mannequin as a perform. Maybe you need to seize the gradient descent as an operation that you’d carry over the fashions of your logical method. However that’s at an summary purposeful degree the place you don’t fear concerning the intricacies of the sorts of probabilistic and statistical elements that’s being educated within the mannequin.
This makes me really feel that there must be a meta-level therapy of this type of pipeline, which I’m merely going to discuss with right here as meta-machine studying. To some extent, meta-machine studying has already been a part of our discussions for many years. Each paper from the 80s that handled pipelines that had been of a hybrid nature, combining some form of further computational duties past the basic act of machine studying is, on this sense, a meta-approach. Even methods and functions constructed on high of machine studying fashions could possibly be seen as a meta-level strategy — as soon as the machine studying mannequin is educated, a complete set of applications are sitting on high to make the applying work which work together with this machine studying mannequin in an input-output sense (together with APIs) or some form of purposeful sense.
I’d argue many of the providers we’re seeing immediately that use giant language fashions based mostly on APIs are additionally, on this sense, meta-level as a result of they’re typically interacting with the machine studying mannequin in a really restricted method. In fact, there’s nothing to cease you if you wish to work together with the coaching itself. However so long as you don’t concern your self with all the standard mathematical instruments of machine studying fashions and also you merely deal with the performance and construct on high of that — assuming, in fact, you’ve got the info for the coaching and the testing — that is largely meta-level.
What I believe is fascinating about meta-level machine studying is that it’s a potential future for laptop science. Whether it is actually the case that each one areas of laptop science would possibly profit from having a form of data-driven studying part to help the sphere in a technique or one other, then you definitely largely want to consider meta-level architectures and meta-level frameworks.
Coming again to my very own work, talking solely on behalf of myself and probably not for my collaborators, I’d say I discover meta-level machine studying probably the most fascinating method to consider machine studying. Everyone clearly has their very own bias, and I’d argue many statisticians would see it the opposite method round. They’d argue doubtless that with out coping with the statistical points of those fashions, there’s not a lot to say.
However I’d say there’s an space of computation that may see these fashions at an summary degree — assume when it comes to input-output pairs, assume when it comes to the finctionality that they produce, which might be all on the meta-level. I’m curious to see how that is going to turn out to be increasingly clear as we transfer ahead and use machine studying for the whole lot from safety and privateness to software program engineering.