A generic shader replaces a specific stage of the shading pipeline with a user-defined program to be executed on need - thereafter, kernel. Shaders generally run in parallel with limited inter-communication between different executions - thereafter instances - usually limited to simplified first-derivative computation and cache optimizations. Being simply a sequence of operations, kernels are defined using special programming languages tailored to match the needs for explicit parallelization and efficiency. Various shading languages have been designed for this purpose.
Depending on the stage being replaced, a shader fetches specific data while its output is handed to successive stages. Input data is typically read-only and can be categorized in two main types:
The output is ideologically always varying (although two instances may actually output the same value). Fourth generation shading pipelines allow to control how output interpolation is performed when primitives are rasterized and pixel shader's varying input is generated.
A vertex shader replaces part of the geometry stage of a graphics pipeline. Vertex shaders consume vertices filled by the Input Assembly stage by applying the specified kernel "for each vertex". The result, which usually include an affine transform, is then fetched by the next state - the Primitive Assembly stage. A vertex shader always produces a single transformed "vertex" and runs on a vertex processor.
Producing vertex position for further rasterization is the typical task of the vertex shader.
Note the current meaning of "vertex" may or may not match the intuitive idea of a vertex. In general, it is better to think at a "vertex" as the basic input data set. This is especially important for generic processing, in which a vertex may hold attribute which does not map to any "geometrical" meaning.
Although vertex shaders were the first hardware accelerated shader type with a high degree of flexibility (see GeForce3, Radeon R200), their feature set was considerably different from other stages for a long time. Even if the exposed instruction set can be considered unified, the performance characteristics of vertex processing units can be considerably different from other execution units. Historically, branching has been considerably more efficient and flexible on vertex processors. Similarly, dynamic array indexing was possible only on vertex processors up to fourth generation pipelines.
Geometry shaders replace a part of the geometry stage subsequent to Primitive Assembly stage and prior to Rasterization. Differently from other shader types, which replaced well-known tasks, the notion of a geometry shader have been only recently introduced to realtime systems so they currently don't map to anything possible before. Additionally, the problem being solved is conceptually very different so a generic geometry shader will be considerably different from a typical shader (both vertex and fragment).
Pixel shaders determine (or contribute to the determination of) the color of a pixel.