The present invention relates to a method of improving the resolution of a moving object in a digital image sequence. More in particular, the present invention relates to a method of improving the resolution of a small moving object in a digital image sequence, the object consisting mainly or exclusively of boundary pixels.
In many image processing applications the most interesting events are related to changes occurring in the scene: e.g. moving persons or moving objects. In this document we focus on multi-frame Super-Resolution (SR) reconstruction of small moving objects, i.e. objects that are comprised mainly, or even solely, of boundary pixels, in undersampled image sequences. These so-called ‘mixed pixels’ depict both the foreground (the moving object) and the local background of a scene. Especially for small moving objects, resolution improvement is useful. Multi-frame SR reconstruction improves the spatial resolution of a set of sub-pixel displaced Low-Resolution (LR) images by exchanging temporal information for spatial information.
The concept of SR reconstruction has already been in existence for more than 20 years, as evidenced by the paper by R. Y. Tsai and T. S. Huang: “Multiframe image restoration and registration,” in Advances in Computer Vision and Image Processing, JAI Press, 1984, vol. 1, pp. 317-339. However, only little attention is given to SR reconstruction on moving objects. This subject has been addressed in, for example, the paper by A. W. M. van Eekeren, K. Schutte, J. Dijk, D. J. J. de Lange, and L. J. van Vliet: “Super-resolution on moving objects and background,” Proc. IEEE 13th International Conference on Image Processing (ICIP'06), vol. 1, 2006, pp. 2709-2712. Another publication addressing SR reconstruction is the paper by M. Ben-Ezra, A. Zomet, and S. K. Nayar: “Video super-resolution using controlled sub-pixel detector shifts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 977-987, 2005.
Some Prior Art techniques, such as the one disclosed in the paper by Ben-Ezra et al., apply different SR reconstruction methods, for example iterated-back-projection or projection onto convex sets, while having the use of a validity map in their reconstruction process in common. This makes these methods robust to motion outliers. These known methods perform well on large moving objects (the number of mixed pixels is small in comparison to the total number of object pixels) with a simple motion model, such as translation. Other Prior Art techniques use optical flow to segment a moving object and subsequently apply SR reconstruction to it. In these known techniques, the background is static and SR reconstruction is done solely on a masked large moving object.
In the article by Van Eekeren et al. mentioned above an algorithm was presented that performs, after segmentation, simultaneously SR reconstruction on a large moving object and background using a Prior Art SR reconstruction technique. However, in the article no SR reconstruction is applied to the boundary (mixed pixels) of the moving object because of a cluttered background.
In the paper by F. W. Wheeler and A. J. Hoogs: “Moving vehicle registration and super-resolution,” Proc. IEEE Applied Imagery Pattern Recognition Workshop (AIPR'07), 2007, super-resolution reconstruction is performed on moving vehicles of approximately 10 by 20 pixels. For object registration a trajectory model is used in combination with consistency of local background and vehicle. However, in this known SR reconstruction approach no attention is given to mixed pixels. An interesting subset of moving objects are faces. In Prior Art techniques in that area which use SR reconstruction the modelling of complex motion is a key element. However, the faces in the used LR input images are far larger than the small objects addressed by the present invention.
When a moving object is small (that is, when it consists mainly or even solely of mixed pixels) and the background is cluttered, even the most advanced pixel-based SR reconstruction methods of the Prior Art will fail. Any pixel-based SR reconstruction method makes an error at the object boundary, because it is unable to separate the space-time variant background and foreground information within a mixed pixel.
U.S. Pat. No. 7,149,262 (Columbia University) discloses a resolution enhancement algorithm for obtaining a polynomial model mapping of low resolution image data to high resolution image data. However, said patent fails to mention super-resolution and hardly mentions moving objects, and is therefore incapable of suggesting an improved SR reconstruction method.
European Patent Application EP 1 923 834 (TNO), published on 21 May 2008, discloses a method for detecting a moving object in a sequence of images captured by a moving camera. The method comprises the step of constructing a multiple number of different images by subtracting image values in corresponding pixels of multiple pairs of images. One image is a representation of a high-resolution image having a higher spatial resolution than the original captured images. This known method does not concern the identification of a moving object, only its detection.
It is an object of the present invention to overcome these and other problems of the Prior Art and to provide a method of improving the resolution of a moving object in a digital image sequence, which method has an improved resolution at the object boundary, in particular when the object consists mainly, or even entirely, of boundary pixels.
To solve the above-mentioned problems the present invention proposes to perform SR reconstruction on small moving objects using a simultaneous boundary and intensity estimation of a moving object. Assuming rigid objects that move with constant speed through the real world, a proper registration is done by fitting a trajectory through the object's location in each frame. The boundary of a moving object is modelled with a sub-pixel precise polygon and the object's intensities are modelled on a High-Resolution (HR) pixel grid.
More in particular, the present invention provides a method of improving the resolution of a moving object in a digital image sequence, the method comprising the steps of:
constructing a high resolution image background model,
detecting the moving object using the high resolution image model,
registering the object, and
producing a high-resolution object description,
wherein the step of producing a high-resolution object description involves an iterative optimisation of a function based upon an edge model of the moving object.
By using an iterative optimisation of a function and a polygonal model of the (edge of the) moving object, it is possible to produce an accurate high-resolution object description, and thereby to accurately identify the object.
The function also be based upon a high resolution intensity description, and preferably is a cost function. It is further preferred that the high-resolution object description comprises a sub-pixel accurate boundary and/or a high-resolution intensity description. The step of registering the object preferably involves a model-based object trajectory.
The step of producing a high-resolution object description may involve solving an inverse problem. Advantageously, the high resolution image background may be estimated using a pixel-based super-resolution method.
In a particularly advantageous embodiment, the iterative optimisation of a cost function involves a polygonal description parameter and/or an intensity parameter. The edge model preferably is a polygonal edge model.
In a further embodiment, the method of the present invention may comprise the further steps of:
subjecting the high-resolution object description to a camera model to produce a low resolution modelled image sequence,
producing a difference sequence from a registered image sequence and the modelled image sequence,
feeding the difference sequence to the cost function, and
minimising the cost function to produce the next iteration of the polygon description parameter and/or an intensity parameter.
Advantageously, the function may comprise a regularisation term for regulating the amount of intensity variation within the object, preferably according to a bilateral total variation criterion.
A preferred embodiment of the method of the present invention can be summarised as follows. After applying SR reconstruction to the background, the local background intensities are known on an HR grid. When the intensities of the moving object and the position of the edges of the boundary are known as well, the intensities of the mixed pixels can be calculated. By minimizing the model error between the measured intensities and the estimated intensities, a sub-pixel precise boundary and an intensity description of the moving object are obtained.
Especially for small moving objects the approach of the present invention improves the recognition significantly. However, the use of the inventive SR reconstruction method is not limited to small moving objects. It can also be used to improve the resolution of boundary regions of larger moving objects. This might give an observer some useful extra information about the object.
The present invention also provides a computer program product for carrying out the method as defined above. A computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD. The set of computer executable instructions, which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
The present invention additionally provides a device for improving the resolution of a moving object in a digital image sequence, the device comprising:
an image background unit for constructing a high resolution image background model,
a detection unit for detecting the moving object using the high resolution image background model,
a registering unit for registering the object, and
an object description unit for producing a high-resolution object description,
wherein the object description unit is arranged for performing an iterative optimisation of a cost function based upon an edge model of the moving object. The edge model preferably is a polygonal model, while the registering unit preferably applies a model-based object trajectory.
The present invention further provides a system comprising a device as defined above, which system is a radar system, an infra-red detection system, a medical system such as an NMR system or echoscopy system, an alarm system, a vehicle alert system, or a gaming system.
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which:
FIG. 1 schematically shows a flow diagram of the construction of a 2D HR scene zk at frame k and the degradation to an LR frame ŷk via a camera model.
FIG. 2 schematically shows a flow diagram of the merging of foreground and background to obtain HR image zk.
FIG. 3 schematically shows two examples of the calculation of the measure Γp at vertex vp of polygon p.
FIG. 4 schematically shows a flow diagram of estimating a high-resolution description of a moving object (p and f).
FIG. 5 schematically shows the chord method for finding the value of δ that gives the maximum distance D, the value Tδ being used as a threshold value.
FIG. 6 schematically shows four times SR reconstruction of a simulated under-sampled image sequence containing a small moving car.
FIG. 7 schematically shows the quantitative performance (normalised MSE) of the SR reconstruction method of the present invention on a simulated image sequence containing a moving car (6 pixels) for varying SNR and SCR.
FIG. 8 schematically shows a top view of the set-up for capturing real-world data.
FIG. 9 schematically shows four times SR resolution of a vehicle captured by an infrared camera (50 frames) at a large distance.
FIG. 10 schematically shows a four times SR result of a vehicle compared with the same vehicle at a four times smaller distance.
FIG. 11 schematically shows a preferred embodiment of a system according to the present invention.
By way of example, a model of the real world on a two-dimensional (2D) High-Resolution (HR) grid will be described. In addition, it will be described how this is observed by an optical camera system.
2D High-Resolution Scene
A camera\'s field-of-view at frame k is modelled as a 2D HR image, consisting of R pixels, sampled at or above the Nyquist rate without significant degradation due to motion, blur or noise. Let us express this image in lexicographical notation as the vector zk=[zk,l, . . . , zk,R]T. The vector zk is constructed from a translated HR background intensity description b=[b1, . . . , bV]T, consisting of V pixels, and a translated HR foreground intensity description f=[f1, . . . , fQ]T, consisting of Q pixels. This is depicted in the left part of FIG. 1. Note that the foreground f has a different apparent motion with respect to the camera than the background b. The foreground (small moving object) is not solely described by its intensity description f, but also by a sub-pixel precise polygon boundary p=[v1z, v1y, . . . , vPx, vPy]T with P being the number of vertices. The following assumptions are made about a moving object: 1) the aspect angle of the object stays the same and 2) the object is moving at constant speed. These are realistic assumptions given the high frame rate of today\'s image sensors, in particular if a moving object is far away.
At frame k the HR background and the HR foreground are translated and merged to the 2D HR image zk in which the rth pixel is defined by: