To speak about megapixels, one must define first what is a pixel. In photography, the "pixel" is the smallest circle of confusion (CoC) the system camera+lens can produce.
This article about depth-of-field has more information about CoC. A short explanation is: no camera and no lens is perfect. Even with perfect focus, a infinitely small point of light is never shot as a point, but rather as a small circle... Sometimes not so small, sometimes looking a irregular smudge instead of a circle, but let's left these complications aside.
If we consider 1 CoC = 1 digital pixel, the megapixel estimation will be very low. But this is a pessimistic estimation. We must take into accout the concept of MTF, which is the maximum contrast difference between two adjacent pixels. Digital pixels can achieve MTF of 100%, while in photography an MTF of just 50% is considered "sharp enough".
The animation below show CoC's of different sizes and their translation to digital pixels. The translation changes depending on the offset between circles and sensor pixels:
The MTF varies as the CoCs strike the pixel in the middle or at the border, so we need to take an average. On average, a circle of confusion with 1.5 digital pixel of diameter yields an MTF of 50%.
With this relationship in hand, we can estimate that 1 CoC = 1.5 digital pixels, and then we can estimate how many "real" megapixels a camera+sensor system can deliver.
For example, a system that resolves 2000 horizontal lines and 1500 vertical lines has 6.75 effective megapixels (2000×1500×1.5 ×1.5). In order to reach this resolution, that seems low but it is actually quite good, the camera sensor must possess a raw resolution that is much higher (20MP or more).
From a scientific point of view, it makes more sense to talk about resolved lines and MTF. But the megapixel is a unit that means something to the layman.
Another reason to express sharpness in MP is the presentation media. Very few pictures are printed. The final destination of most pictures is a screen (a photo gallery, a Web site, etc.). Since the screens are made of pixels, it makes sense to talk in megapixels.
A Full HD sreen has 2MP, so an image must have at least this effective resolution to look sharp on Full HD. (Actually, since a screen can show pixels with 100% MTF, the sharpness improves asymptotically as higher-resolution images are shown.)
It is important to give credit to DxOLabs here: they popularized the concept of "perceptual megapixels" (PMP) and maintain an excellent Web site about cameras and lenses. But their definition of PMP is a trade secret; the definition employed in this article is entirely based in our own research.
Having said all that, how can you estimate the perceptual resolution of the equipment you possess? Cameras, lenses, different apertures of each lens, focus calibration — every component has a huge impact on the effective resolution.
An empirical, but simple and effective method, is the following:
If there is only one gray pixel between black and white, this is 50% MTF. Two gray pixels: 33%. Three gray pixels: 25%. Four: 20%. Five: 17%. In some situations we can find intermediate MTFs e.g. when there are two gray pixels but one of them is almost white or almost black, so the contrast is something between 33% and 50%.
Since you are taking pictures using the sensor's native resolution, it is almost impossible the MTF to be higher than 50%, and it will be typically worse. In my equipment (Nikon D3200), only one lens has MTF 50% at most apertures: the famous 35mm 1.8G DX, at the cost of some disvantages (harsh bokeh, etc.). Other lenses produce MTFs between 16% and 40%.
Once you have estimated the MTF of your lens, for each aperture, you can estimate the perceptual megapixels, doing some math:
MPe = MP ÷ 0,52 × MTF2
The rationale is: we divide the raw sensor megapixels (MP) by 0,5 (that is the "sharp enough" MTF for photography) and we multiply by the MTF we've found in the tests. Since megapixel is sort of a unit of area, we must square the MTFs to make units compatible.
For example, my 50mm lens reaches an MTF of 40% in best case; so the perceptual resolution of this lens, in my camera (that has 24,4 raw MP), is:
MPe = 24,2 ÷ 0,52 × 0,42
MPe = 96,8 × 0,16
MPe = 15,5 effective megapixels
The best aperture for this lens if f/8. The MTF falls down to 16% in f/1.8 and 33% in f/16, which corresponds to perceptual resolutions of 2.5MP and 10.6MP, respectively. The weak performance when wide-open does not mean automatically that the lens is bad. In a full-frame camera, it would deliver a decent 5.5MP due to the increase of sensor size. The actual problem in my case is to use a full-frame lens, optimized for full-frame, in a DX camera. My DX zoom, that is considered a "snapshooter" lens, does better than that when wide-open, because it is DX-optimized.
What could be considered an acceptable effective resolution. In my opinion, an image with 6MP effective can already be considered good. Take a picture with this level of sharpness is not easy; it takes technique, good equipment and some knowledge about the factors that affect sharpness (focus, ISO, best aperture for the given lens, etc.). 6MP (3000x2000) is the JPEG size that I use to export my images from RAW.
But higher initial resolutions are always welcome, since they can take some cropping. ALso, a 6MP image generated from a 15MP RAW is sharper than an image that was born as 6MP, since digital pixels can have MTF higher than 50%. (The MTF of the downsized image can go up to, but not higher than, 80%.)
The highest possible sharpness is not always desirable; in the case of the 50mm lens mentioned before, the widest aperture can still be useful for portraits, where the extra sharpness would be detrimental since reveals small skin imperfections. The "dream-like" character that the wide-open lens creates is considered attractive in many situations.
The other complication is, all these numbers do change accordingly to the object being shot, the object distance, the color of the object, etc. The same 50mm lens can deliver an MTF approaching 50% at f/8, and 33% at f/1.8, when shooting a distant building. Building windows are good targets since they are high-contrast. It is possible that post-processing worked better with colored objects (the building I shot was green, with white window frames; green and white are different colors, while black and white are shares of a single color: gray).
This performance difference for different colors is probably one of the ingredients of the "secret sauce" that DxOMark uses to estimate effective megapixels.
The relationship between MTF and megapixels is related to the so-called Nyquist limit, that affects all digital media. In order to represent a signal with frequency f, it takes at least 2f samples, or pixels.
For example, to represent a "zebra" pattern — black and white lines — it takes at least one pixel per line. Black pixel, white pixel, black, white... Quite clear, right? 60 line pairs take 120 pixels, at least, to encode this zebra pattern in digital form.
This limit is absolute, one cannot escape it. If one tries to represent a higher-frequency signal e.g. 90 line pairs with the same 120 pixels, the final result is a false signal of 30 line pairs. It is the dreaded aliasing or moiré, that adds false patterns to an image. Feed the pixels with a finer pattern than Nyquist limit is not only useless, it spoils the image.
The Nyquist limit is theoretical, and assumes a perfect decoding that by definition cannot be implemented. If the digital-analog decoder (in the case of a picture, such decoder would be a screen or a printer) is not perfect, the moiré and aliasing effects show in frequencies below the Nyquist limit. Still using the same example, the 120 pixels would have trouble to represent any signal above 30 or 40 line pairs in practice.
The Nyquist limit, as far as the theory goes, guarantees an MTF of 100% until the cut-off frequency, and 0% above it. It is clear that we can represent 60 line pairs in 120 pixels with 100% MTF, since the pixels can have alternate colors, but 40 line pairs in 120 pixels does not have such a sharp relationship. The MTF must be lower than 100% for this case. Indeed the solution to avoid moiré and aliasing is a low-pass filter that reduces MTF as the frequency increases.
Until 2012 or 2013, every sensor had an OLPF filter, that took some sharpness but avoided moiré. The ideal OLPF filter would reduce the MTF gradually from 100% to 50% in the vicinity of Nyquist cut-off frequency, and down to 0% above it. But such a filter is physically impossible. Every OLPF takes sharpness because it must depress the MTF more than strictly necessary in order to avoid moiré effectively.
The current trend is to abolish the OLPF filter. First, because the lens themselves have limited resolution when compared with modern sensors, so they already are doing the role of a low-pass filter. Filtering twice degrades the image; if an OLPF filter limits the effective resolution to 20MP and the lens is also 20MP, the combinations yields only 10MP. Second, fixing the moiré in software is viable these days.