Statistical Modeling for Data Science
- Graphs or networks may to model complex interactions between variables. A relation in the graph or network is to model an interaction between two variables. Undirect graphs, such as those found in Gaussian graphical models, or directed graphs, such as those found in Bayesian networks, are examples. The main aim of network analysis is to determine the structure of the network.
- Stochastic differential and difference equations can be used to depict natural and engineering science models Finding approximate statistical models that solve such equations may provide useful information for, for example, statistical control of such processes in mechanical engineering. These techniques can help to bridge the gap between applied sciences and data science.
- Local models and globalization Statistical models are usually only applicable in sub-regions of the domain of the variables involve. Local models can then be use. The study of structural breaks can be useful in identifying regions for time series local modelling. Concept drift analysis can also be to investigate model changes over time.
There are often hierarchies of more and more global systems in time series. In music, for example, notes provide a simple local structure, while bars, motifs, phrases, sections, and other elements provide increasingly global structures. Properties of local models can be combine with more global characteristics to find global properties of a time series.
Mixture models
The Mixture models It can also be to generalize local models to global models. Since standard mathematical models are often far too simplistic to be accurate for heterogeneous data or larger regions of interest, model combination is critical for the characterization of real relationships.
Model validation and model selection
When more than one model is propose for a given task, such as prediction, statistical tests for comparing models may help structure the models, for example, in terms of predictive power. The distribution of power characteristics is study by artificially varying the sub population to learn the model in re sampling methods, which are commonly to evaluate predictive power.
Perturbation experiments provide a new way to test the success of models The reliability of the various models against noise is thus measure.
Meta-analysis Methods for evaluating combined models, such as model averaging, are also available.
Model selection Since the number of classification and regression models proposed in the literature has increased at an increasing rate in recent years, it has become increasingly significant.