Abstract 
A problem of learning of a target function based
on its noisy observations at random points via penalized empirical
risk minimization with convex loss function and convex complexity
penalty will be discussed. This includes a number of regression and large
margin classification problems.
We will consider two specific settings. In one of them,
the target function belongs to the linear span of a very large
dictionary of given functions. In a more general case, it belongs
to the linear span of a large number of reproducing kernel Hilbert
spaces. Aggregation problems for large ensembles of kernel machines
and statistical problems for additive models in regression and
classification can be studied in this framework.
It is assumed that the target function has a "sparse
representation" in the "dictionary" and the goal is to recover the
function with an error that depends mainly on the "sparsity" of the problem
(even if the size of the "dictionary" is very large).
An approach based on $\ell_1$type complexity penalization will be
discussed. We prove several probabilistic bounds showing the relationship
between the sparsity of the empirical solution and the sparsity of the
target function and provide oracle inequalities on the excess risk and
the $L_2$error that depend on the sparsity of the problem.
(Some parts of this talk are based on a joint work and on discussions
with Philippe Rigollet, Alexandre Tsybakov and Ming Yuan).
