This contribution presents a method to segmentate video scenes hierachically into different moving objects and subobjects using a 2 -dimensiona1 description of these scenes. Therefore information from single images as well as information from successive images is used to spit up a scene into different objects. Furthermore each of these objects is characterized by a transform h(x, T) which is implicitely describing the surface and the three-dimensional motion of the moving objects in the scene. Using this description an object oriented prediction of the image contents from one image to the next as it may be used in low bitrate image coding is possible.