Séminaires de l'année académique 2006-2007

  • Mardi 31 octobre 2006 à 11h, Alex Etienne, Université de Neuchâtel, Suisse

Bias and variance of the Gini index.

An unequivocal analytical formulation of the Gini index for the special case of an infinite continuous distribution is presented. It is shown how the multiple other formulations found in the literature may be derived from it. For simple sampling, the bias created by the ratio enclosed in the index is approximated. The different variance estimation techniques are reviewed and compared through simulations

  • Mardi 7 novembre 2006 à 11h00,Leonhard Held, Université de Zurich, Suisse

In this talk I will describe methods for model choice and model criticism based on probabilistic  forecasts of external data. Special emphasis will be given to multivariate predictions and predictions of count  data. The methodology will be illustrated through a case study from chronic disease epidemiology.

  • Mardi 28 novembre 2006 à 11h, Eva Cantoni, Université de Genève, Suisse

Variable selection is an important step in any statistical analysis. If this issue is well adressed in the linear regression setting for example, it was not the case until recently for marginal longitudinal models. I will present a generalized version of Mallows's Cp to be used for variables selection in the setting of marginal longitudinal models. The definition of this criterion is very general so that it can also address robustness, heteroscedasticity, and missing values. I will go on to present a Monte Carlo Markov Chain technique that allows to handle situations where the number of covariates is very large and where therefore a criterion that has to be computed for each model cannot be considered.


E. Cantoni,  J. Mills Flemming & E. Ronchetti (2005). "Variable Selection for Marginal Longitudinal Generalized

Linear Models",  Biometrics, 61, 507-514. 

E. Cantoni,  C. Field ,  J. Mills Flemming & E. Ronchetti (2006). "Longitudinal variable selection by cross-validation

  • Mardi 19 décembre 2006 à 11h00,Ingrid Van Keilegom, Université Catholique de Louvain la Neuve, Belgique

A Goodness-of-fit Test for Semiparametric Models in Multiresponse Regression

Abstract : We propose an empirical likelihood test that is able to test the goodness-of-fit of a class of semiparametric regression models. The class includes as special cases fully parametric models, semiparametric models, like the multi-index and the partially linear models, and models with shape constraints, like monotone regression models. Another feature of the test is that it allows both the response variable and the covariate be multivariate which means that multiple regression curves can be tested simultaneously. The test also allows the presence of infinite dimensional nuisance parameters in the model to be tested. It is shown that the empirical likelihood test statistic is asymptotically normally distributed under certain mild conditions and permits a wild bootstrap calibration. Despite the fact that the class of models which can be detected consistently by the proposed test is very large, the empirical likelihood test enjoys good power properties against departures from a hypothesized model within the class.

This is joint work with Song Chen, Iowa State University .

  • Mardi 6 février 2007, Joe Whitaker, Department of Mathematics and Statistics, Lancaster University, England

Weighted independence graphs for finite population surveys.

The analysis of survey data, collected on a set of response variables defined over a finite population, may benefit from a bird's eye view of their inter-relationships and in particular, of their strengths. This overall analysis should highlight those variables that strongly modify the conditional distribution of another variable, and by contrast, should indicate those which have little affect. We introduce a weighted graph based on measures of independence strength calculated from the population that fulfils this purpose. We show that the graph may be properly defined in terms of population measures without any appeal to super populations, probability modelling or to likelihood. A sample of young women and their smoking behaviours, taken from the General Household Survey is used as an illustration

  • Mardi 13 février 2007, Mikhail Kanevski, Institute of Geomatics and Analysis of Risk, Unviersity of Lausanne.

Machine learning algorithms for environmental and pollution data

The presentation deals with the description and application of machine learning algorithms - MLA for environmental and pollution spatial (spatio-temporal) data.  The main approaches considered consist of traditional artificial neural networks of different architectures (multilayer perceptron, general regression neural networks, self-organising maps, etc.) and recent developments in statistical learning theory (Support Vector Machines, Support Vector Regression) models.

Variety of examples of MLA application both as an exploratory data analysis and modelling tools are given. Current and future trends in MLA applications for environmental data are discussed. The results of MLA are compared with geostatistical predictions/simulations. Case studies considered includes: soil and water systems pollution, soil types and hydro-geological units classification, topo-climatic data modelling, optimisation of monitoring networks and others.

  • Mardi 13 mars 2007, 11h, Valentin Rousson, Biostatistics Units, University of Zurich, Suisse

A mixed approach for proving non-inferiority with respect to binary endpoints

When a new treatment is compared to an established one in a randomized study, it is standard practice to statistically test for equivalence (or for non-inferiority) rather than for significance. When the endpoint is binary, one usually compares two treatments using either an odds-ratio or a difference of proportions. In this talk, we propose a mixed approach which uses an odds-ratio to define "practical equivalence" and which uses then a difference of proportions to show non-inferiority. The mixed approach is shown to be more powerful than the conventional odds-ratio approach when the efficiency of the established treatment is known (with good precision) and when it is high (with more than 50% of success), as is often the case. The gain of power achieved may lead in turn to a substantial reduction in the sample size needed to prove non-inferiority.

The method can be generalized to ordinal endpoints.

This is a joint work with Burkhardt Seifert, University of Zurich

  • Mardi 27 mars 2007, 11h, Camélia Goga, Institut de Mathématiques de Bourgogne, Université de Bourgogne, France

Fonction d'influence: nouvelles applications dans la théorie des sondages

L'approche par linéarisation via la fonction d'influence a été proposée par Deville (1999) pour approcher la variance d'une statistique complexe par la variance de l'estimateur d'Horvitz-Thompson du total de la variable linéarisée.
Nous proposons d'étendre cette approche pour les enquêtes sur deux échantillons. Une classe d'estimateurs composites est dérivée et la variance de ces estimateurs est approchée en linéarisant par les fonctions d'influence partielles.

Une deuxième application de l'approche de Deville est donnée lors de l'estimation des paramètres qui concernent les données fonctionnelles (ou les courbes). Nous sommes intéressés par l'estimation d'une courbe moyenne et des éléments propres de l'opérateur de covariance puis d'´etudier leurs propriétés asymptotiques dans le cadre des sondages.

  • Mardi 17 avril 2007, à 11h, Beat Hulliger, Fachhochschule Nordwestschweiz Switzerland

Outliers and Influential Observations in Establishment Surveys

The detection and treatment of outliers in data with missing values is an important part of the editing and imputation process of surveys, in particular of establishment surveys. The application of the BACON-EEM algorithm and of Transformed Rank Correlations to the famous MU284 data set of Swedish Municipalities shows how these methods work in practice. From the point of view of the analysis of a survey it is important to know the influential observations for the statistics used in the analysis. Since outliers are defined with respect to a specific model they may not coincide with influential observations. A specific form of influence measure is used in the so called selective editing, which tries to target the editing and imputation effort to the important errors. The relation between outliers, influence measures and the score function is discussed and illustrated with the MU284 data set.

  • Mardi 24 avril 2007, à 11h, Yuedong Wang, Department of Statistics and Applied Probabilty, University of California, Santa Barbara, USA

Spline Smoothing with Correlated Random Errors

Abstract. Spline smoothing techniques are commonly used to estimate the mean function in a nonparametric regression model. Their performances depend greatly on the choice of smoothing parameters. Many methods of selecting smoothing parameters such as GML, GCV and UBR have been developed under the assumption of independent observations. They tend to underestimate smoothing parameters when data are correlated.

We assume that observations are correlated and that the correlation matrix depends on a parsimonious set of parameters. We extend the GML, GCV and UBR methods to estimate the smoothing parameters and the correlation parameters simultaneously. Mardi 21 mars 2006 à 11h00, Olivier Renaud, FPSE, Université de Genève

  • Mardi 8 mai 2007, à 11h, Pedro Luis do Nascimento Silva, Southampton Statistical Sciences Research Institute, University of Southampton

Imputation for missing and outlying anthropometric data used for nutritional status assessment in a Brazilian household sample survey

Joint work with Andre Martins Costa (IBGE) and Mauricio Teixeira Leite de Vasconcellos (IBGE)

Anthropometric data such as standing height (recumbent length for babies) and body mass (weight) are frequently utilized to estimate the prevalence of certain undesirable nutritional states, such as malnutrition and obesity, in populations of regions or countries, and to examine the association of occurrence of such states with known or potential risk factors. Obtaining these anthropometric measurements in large scale household sample surveys is subject to a number of difficulties which lead to missing or suspicious data being recorded. The approach used to edit and impute the anthropometric data from the Household Budget Survey taken in Brazil in 2002/2003, based on modelling the bivariate data for height and weight given age and sex, is described. This application showed that the approach has a number of limitations, leading to ongoing research aiming to overcome the perceived shortcomings Mardi 28 mars 2006 à 11h00, Nikole Kramer, TU Berlin - Institut für Quantitative Methoden, Germany 

  • Mardi 26 juin 2007, à 11h, Prof. P.R Parthasarathy, Indian Institute of Technology, Madras, India

Exact Transient Solution of a State-dependent Birth and Death Process.

The time-dependent solution of certain simple birth and death processes are usually derived by solving partial differential equations satisfied by the generating functions. In this talk, we obtain the transient probabilities of state-dependent birth and death processes in terms of power series expression using continued fractions. In this study, the underlying forward Kolmogorov differential-difference equations are first transformed into a set of linear algebraic equations by employing Laplace transforms. This leads to a J-fraction which is expressed as a formal power series. Inverting we get the transient probabilities of state-dependent BDPs in closed form. Several examples are presented to illustrate this approach.