CMOS Imager With 1024 SPADs and TDCs for Single-Photon Timing and 3-D Time-of-Flight

Federica Villa, Rudi Lussana, Danilo Bronzi, Student Member, IEEE, Simone Tisa, Alberto Tosi, Member, IEEE, Franco Zappa, Senior Member, IEEE, Alberto Dalla Mora, Davide Contini, Daniel Durini, Member, IEEE, Sasha Weyers, and Werner Brockherde

I. INTRODUCTION

THE last decades saw a growing interest in the scientific community toward single-photon time-resolved measurements of very faint and fast optical signals for safety and security, medical and biological applications. High-performance, ultra compact, multi-channel instruments capable of measuring photon arrival times with a time jitter better than few hundreds picoseconds, are required in many fields, such as time-of-flight (TOF) measurement in 3-D depth ranging and light detection and ranging (LIDAR), and TOF of gamma rays in advanced TOF-resolved positron emission tomography [1] in nuclear medicine imaging, just to mention a few. Other applications require an accurate waveform (intensity versus time) reconstruction of fast and faint optical signals through repetitive excitation of the sample under investigation by means of time-correlated single-photon counting (TCSPC) [2], [3]. The breakthrough is a multi-pixel sensor able to provide single-photon sensitivity and highly precise timing-measurements electronics into a single chip. Furthermore, demanding TCSPC applications require extremely good accuracy, integral (INL) and differential (DNL) non-linearities much better than $\frac{1}{2}$ least-significant bit (LSB), i.e., timing resolution. Arrays of single-photon avalanche-diodes (SPADs) are the best candidates because they have no read-out noise, unlike CCDs, no limited sensitivity, differently from CMOS Active Pixel Sensors, and no excess noise, typical of analog avalanche photo diode arrays. SPAD arrays provide true single-photon sensitivity and high frame rates, thus overcoming the gap between high-sensitivity (low-speed though) CCD imagers and high-speed (but low-sensitivity) CMOS APS sensors.

Array of SPADs with time to digital converters (TDCs) have been already reported in literature [4]–[8]. Some of them present a multiplexed architecture, in which one TDC is shared among tens of SPADs [4], [5]. For instance, Ref. [4] reports a large number of SPADs (128 $\times$ 128), but the image resolution is low because each column works as a single macro-pixel. Other papers present fully parallel in-pixel electronics with good timing resolution but poor linearity, for example either 52 ps, 40% DNL and 140% INL [6], or 55 ps, 30% DNL, 250% INL [7], or 119 ps, 50% DNL, 120% INL [8]. Many SPAD arrays show non-uniform detection performance and also low uniformity in the measured arrival time. In Ref. [7], the peak photo detection efficiency (PDE) varies from 3% to 27.5% depending on the pixel. Ref. [6] and Ref. [8] have a pixel-to-pixel TDC non-uniformity larger than the LSB (8 LSB and 2 LSB, respectively), thus compensation is needed to actually exploit their timing resolution. Despite the high dark count rate (DCR) density of those SPADs, the DCR is kept to 50 cps [7] and 100 cps [8], using a very small active diameter (5.6 $\mu$m [7] and 8.6 $\mu$m [8]).
Arrays with fully parallel electronics, i.e., one TDC per pixel, have low fill-factor (e.g., 1% [7], 2.3% [8]), hence a low overall active area, despite the large number of pixels (0.5 mm$^2$ [7], 0.06 mm$^2$ [8]). Furthermore the peak DCR is lower than 30% and drops below 5% for wavelength shorter than 350 nm and higher than 800 nm. The limited overall active area and PDE prevent the possibility to use them in photon starving applications, such as optical time domain reflectometry [9], in which large collection areas are required. In Ref. [10], a high fill-factor is achieved, but with vertical integration of two wafers (one for SPADs and another one for timing electronics) and just 2 ns timing resolution.

In this paper we present a single-chip CMOS imager, fabricated in a low-cost 0.35 μm CMOS technology, consisting of 32 × 32 pixels with 150 μm pitch, each able to detect single photons (in the 300 – 900 nm wavelength range), to time-stamp their arrival times (with 312 ps resolution) for acquiring waveforms and time-resolved maps, and to count photons for providing photon-number (i.e., intensity) resolved 2-D videos (e.g., very useful for properly aligning sensor with optical setup). Compared to arrays designed in more scaled technologies, the developed imager has worse timing resolution, but reaches a 10× speed-up compared to arrays designed in more scaled technologies, e.g., 1% [7], or 2.3% [8] in other more scaled technology, e.g., 1% [7], or 2.3% [8] in other more scaled technology.

Concerning detection performance, we designed large active area (both at pixel- and overall array- level), high PDE, in a well assessed low-DCR technology, aiming at advanced TCSPC applications, where uniformity and sensitivity are essential requirements. An in-depth optical characterization of the SPAD detectors was described in Ref. [11]. Concerning array design, we traded-off all in-pixel TDC electronics, pixel pitch, fill-factor, and pixel count, aiming at real applications where high throughput and simple optics are a must, like in confocal microscopy, in which well-spaced laser excitation/photodetection spots are required [12], and 1024 pixels are quite enough, though a bit lower than denser arrays [7].

Target applications are fluorescence lifetime imaging microscopy (FLIM), where the average fluorescence relaxation time [13] is measured, FLIM-based Förster resonance energy transfer (FRET) for assessing the energy transfer between two chromophores with tens of picoseconds dynamics [14], diffuse optical tomography where internal compositions of biological tissues are characterized through photon delays from 1 ns to tens of nanoseconds [15], proteomics and DNA sequencing [16].

The paper is organized as follows: Section II describes the chip architecture, i.e., the global timing electronics, the in-pixel circuitry, and the readout logics. Section III presents the characterization of the TDCs (linearity, precision, uniformity and crosstalk). Section IV reports 2-D and 3-D static images and videos. Section V draws conclusions and perspectives.

II. CHIP STRUCTURE

The 32 × 32 array has 150 μm × 150 μm pixels with SPAD detectors having circular 30 μm diameter photocactive area. It is fabricated in a 0.35 μm CMOS technology, automotive certified, selected to provide high yield and uniformity of SPAD performance [11]. In fact, the developed SPADs reach state-of-the-art performance in terms of noise (i.e., DCR), afterpulsing, yield, reliability, and optical crosstalk compared to all other CMOS SPADs so far reported, and are even comparable to best-in-class custom-process SPADs [17].

Since the employed n-well isolation is not so deep, the achieved PDE in the near infra-red (NIR) is lower than what attained by custom-processed SPADs. Nevertheless in the near ultra-violet (NUV) the obtained sensitivity outperforms all other SPADs, both custom- and CMOS-based, presented so far in literature, as it can be observed in Fig. 1, thanks to a thinner Si$_3$N$_4$ passivation layer. The PDE is 55% at 450 nm wavelength, 45% at 400 and 500 nm, and still 20% at 300 nm in the NUV and at 650 nm in the NIR ends of the Silicon sensitivity. The detector characterization of the SPADs integrated in the array was presented in [11].

The 0.35 μm technology employed prevents to shrink the in-pixel electronics, thus limiting the attainable fill-factor to just 3.14% with no micro lens array, but at the same time it allows to exploit large 30 μm SPADs with DCR of just 120 cps (counts/s) with no need of cooling (Fig. 2). Such large SPADs mitigate the fill-factor, which eventually is even better than what reported in other more scaled technology, e.g., 1% [7], or 2.3% [8] in 130 nm technology.
Fig. 3. Imager architecture, composed by: 32 × 32 pixels (split in two identical blocks to speed-up readout), external counters (four blocks by 16 × 16) for multiple gate-on periods, readout circuitry (row selectors and output multiplexers), 16 multiphase clocks, global STOP channel and registers.

In each pixel, when a photon hits and triggers the SPAD, the avalanche sensing electronics provides a START signal to the in-pixel TDC. The global STOP is provided to all TDCs by an external sync (e.g., the excitation laser sync-out).

Fig. 3 shows all chip building blocks, namely reference clock frequency doubler (fx2), delay locked loop (DLL), STOP interpolator and synchronizer for the global timing electronics, array of pixels (with SPAD and TDC each), and readout circuitry (with row selector and output multiplexer). The next paragraphs describe in detail the TDC technique employed and each component of the chip.

A. Sliding Scale and TDC Structure

The chip was designed to achieve sub-nanosecond timing resolution and some hundreds of nanosecond full-scale range. Such ranges could be achieved through interpolation methods based for instance on Pulse-Shrinking delay line [23], Tapped delay line [24], or standard and cyclic Vernier delay line [25] elements. In all those methods, linearity is limited by components mismatches, but it can be greatly improved by employing the “Sliding Scale technique”, which requires to separately measure the time interval between a reference clock and both asynchronous START and STOP signals [26], [27]. In this way, even if the same START-STOP time interval is converted, different portions of START and STOP interpolator ranges are exploited, thus interpolators’ deterministic nonlinearities are converted in stochastic jitter [28]. Therefore such “Sliding Scale” improves linearity though worsening single-shot precision, which degrades due to interpolators’ INL [28]:

$$\sigma_{TDC} = \sqrt{\sigma_q^2 + \sigma_{INL-START}^2 + \sigma_{INL-STOP}^2 + \sigma_{clk}^2 + \sigma_n^2}$$

where $\sigma_q$ is the quantization error, given by

$$\sigma_q = \sqrt{\frac{\text{LSB}_{\text{START}}^2}{12} + \frac{\text{LSB}_{\text{STOP}}^2}{12} = \frac{\text{LSB}}{\sqrt{6}}}$$

and $\sigma_{INL-START}$ and $\sigma_{INL-STOP}$ are the INL standard deviations of START and STOP interpolators respectively, $\sigma_{clk}$ is the reference clock jitter, and $\sigma_n$ is any additional jitter of signals within the TDC. Nonetheless the “Sliding Scale” technique adds the additional $\sigma_{INL}$ contributions to $\sigma_q$, both DNL and INL remarkably improve [29].

We carefully selected the TDC conversion technique by taking into consideration that the interpolator must be replicated into each pixel, hence it should require small dimensions and low power consumption. We designed an interpolator similar to a tapped delay line, but instead of propagating the START signal along the delay line, we decided to propagate a reference clock, whilst the state of multiphase clocks is sampled in correspondence of the START or the STOP signal, respectively, by means of separate interpolators. In this way, we effectively implement the “Sliding Scale” technique and we can keep the DLL propagating the reference clock outside the pixel, as a global electronics. Instead, the pulse shrinking methods would require the integration of one different DLL into each pixel, and the Vernier delay line technique would provide very short time bin, but at the expense of larger area and higher power consumption.

We conceived a TDC structure similar to the one presented in Ref. [25], but we decided to employ just one interpolation stage, for smaller pixel dimensions, hence higher fill-factor. Fig. 4 shows the TDC global architecture: the in-pixel TDC components are a coarse 6 bit counter, which assures a long full-scale range (more than 300 ns), and 4 bit interpolators, for fine resolution. The coarse counter is used to count the number
of clock periods (of a 100 MHz external clock, internally doubled to 200 MHz) between START and STOP signals. Then, the fine interpolator, based on a global DLL which divides the clock period into 16 intervals of 312 ps each, includes an in-pixel discriminator to detect the phase of the START signal with respect to the 16 multiphase clocks. An identical discriminator detects the phase of the asynchronous global STOP. Two synchronizers assure the correct synchronization between coarse counter and the START and STOP interpolators, respectively. The TDC acts as a flash converter and requires a negligible (about 1 ns) conversion time, due only to the propagation delays. Eventually, the 6 bit counter, the START interpolator and its synchronizer are integrated into each pixel, whereas the 16 taps DLL, the STOP interpolator and its synchronizer are all global components of the array.

Each measurement consists of 6 bits from the in-pixel coarse counter \( (N_{\text{coarse}}) \), 4 bits from the in-pixel START interpolator \( (N_{\text{START}}) \), and 4 bits from the global STOP interpolator \( (N_{\text{STOP}}) \). The photon time-tag \( T_{\text{meas}} \) is the elapsed time between the START (i.e., the photon) and the STOP (i.e., the global sync) events and it is computed as

\[
T_{\text{meas}} = \left( N_{\text{coarse}} - \frac{N_{\text{START}} - N_{\text{STOP}}}{16} \right) \cdot T_{\text{ck}}
\]

(3)

where \( T_{\text{ck}} \) is one period of the reference clock, which corresponds to 5 ns when a 100 MHz external clock is used. Therefore, the TDC provides a LSB of \( T_{\text{ck}} / 16 = 312 \, \text{ps} \), with a full-scale range FSR = 320 ns and 10-bit resolution.

B. Global Timing Logics

The global timing logics comprises a 16 taps DLL, for generating clocks with constant phase shift, a frequency multiplier, for doubling the frequency of the external reference clock, the STOP interpolator and the STOP synchronizer, for synchronization between fine interpolator and coarse counter.

The structure of the 16 taps DLL is shown in Fig. 5: it propagates the reference clock by generating 16 multiphase clocks (shifted by 312 ps steps when using a 100 MHz external clock), which feed the TDC interpolators. As shown in Fig. 5, the DLL is constituted by 16 delay cells, whose delay is adjustable by means of the control voltage (internally generated by phase detector and charge pump). The phase detector generates voltage pulses (up or down depending on which clock is in advance), with a duration proportional to the phase delay between \( \text{clk first} \) and \( \text{clk last} \). Then the charge pump consequently charges or discharges a capacitor, depending on which clock is in advance [30]. The internal control voltage makes the TDC insensitive to process, voltage and temperature drifts. Simulations have been performed to assure the correct operation in a wide temperature range, from 0 to 80 °C, and also considering worst-cases process variations. The 16 multiphase clocks are connected to the 1024 TDC’s START interpolators, one for each pixel, and to the single global STOP interpolator. In order to reduce power consumption of the multiphase clock buffers, clocks are distributed only when SPADs are enabled.

The frequency multiplier doubles the external reference clock frequency, allowing to use external clock generators at lower frequency. Fig. 6 shows the implementation of the circuit, based on a four-tap DLL, which makes the internal clock stable and locked. A Set-Reset latch has been used to lock both the rising and the falling edge of the output clock to the rising edge of a clock generated by the DLL, thus a 50% output duty-cycle is assured, independent of the input duty-cycle.

The global interpolator detects the phase of the STOP in respect to the 16 multiphase clocks, and provides a 16-levels thermometric scale, which is then converted into a 4 bit binary code, through a 16-to-4 encoder. Identical 4 bit interpolators have been employed also for all the 1024 STARTs, one for each pixel. The interpolator is composed by 16 latches, which acquire the state of the multiphase-clocks at the rising edge of the STOP (or START) signal. Since 16 of these latches are replicated into each pixel, it is important to reduce their power consumption. The schematics of the latch is shown in Fig. 7: M1–M4 form a sense-amplifier based latch [31], M7 and M8 assure a fast recharge after commutation, M5 and M6 form a pseudo p-MOS NOR gate, which pre-charges both branches at the same level when no input is received. Compared to a CMOS gate, it allows to save a large amount of power at such a high 200 MHz clock.

The synchronizer block assures the correct synchronization between interpolator and coarse counter. If START (or STOP) signal arrives too close to the clock rising edge, a metastable
Fig. 7. Schematics of the latch, based on a sense-amplifier, constituting the core block of the fine interpolator.

Fig. 8. Pixel block diagram, with 30 μm SPAD and 10 bit TDC.

condition can occur, and it is necessary to refer to the fine conversion to solve the uncertainty. Synchronizer circuits for (in-pixel) START and (global) STOP have been employed to solve such metastable conditions.

C. In-Pixel Electronics

All functionalities of the chip have been implemented in-pixel. Fig. 8 shows the block diagram of the pixel [29], which includes a large photoactive area SPAD (30 μm diameter) [11], a quenching circuit with active reset for fast avalanche sensing and quenching (two high voltage transistors has been used to bias the SPADs at excess voltages as high as 7 V) [32], pulse shaping electronics for proper synchronizations, and the 10 bit TDC. A 10 bit memory latch stores the results of the conversion, and the B0:B9 output buffers drive the readout data buses. Thanks to the 1024 in-pixel memory latches, a new frame can be acquired while performing the readout of the previous one, in a global shutter technique.

Eventually, since many applications benefit of combined 2-D (photon-counting) imaging and 3-D (photon-timing) ranging, we implemented both. In fact, the 6-bit asynchronous counter counts either the number of clock periods between START and STOP, in photon-timing mode, or the photons detected within a time window, in photon-counting mode.

D. Readout Circuitry

Since many applications require very high frame-rate and synchronous acquisition from all pixels, we optimized the global electronics for improving data throughput. The planar array is divided in two sub-arrays, in which pixels of each half-column share the same data bus. In each sub-array the in-pixel output buffers of one row are activated at the same time and two separate multiplexers (one for the upper and one for the lower sub-array) serially feed the results of each column to the output pads. The two advantages are the doubling of output throughput and the halving of parasitic capacitance of column data buses, at the cost of doubling the output pads. With a 50 MHz readout clock, a 100 000 fps maximum frame rate is easily achieved (i.e., 128 MB/s data rate).

E. Overall Chip

The minimum frame duration, required for reading out the whole 32 × 32 array, is 10 μs, corresponding to 100 000 fps maximum frame rate. In photon-timing mode the maximum duration of the SPAD gate-on period is limited by the TDC range (320 ns with an external 100 MHz reference clock). Since in many applications the typical probability of detecting photons for each laser shot is lower than 5% [33], the overall photon-timing system would be very inefficient. To overcome this problem we implemented a multi-window timing feature: more than one gate-on period is opened within each frame and the first photon within the frame will trigger the TDC conversion. One global STOP is provided to the chip for each gate-on period and up to 64 stop-interpolator results can be stored in a global memory and then can be readout at the end of each frame. In order to associate the proper STOP to each pixel, other 1024 6-bit counter have been added: every gate-on period such counters are incremented, if no photon has yet been detected in the specific pixel.

In this way the total conversion of each pixel has 16 bit: six from the coarse counter, four from the START fine interpolator and six from the additional counter for multiple gate-on windows.

Since no strict timing constraints must be respected, these counters have been placed outside the pixel, as shown in Fig. 3. Fig. 9 shows a micrograph of the overall chip (total chip size 9 × 9 mm²).

The total power consumption in photon-counting mode is less than 70 mW, whereas in photon-timing it depends on the required timing resolution and on the number of gate-on windows per frame. Table I shows the power consumption for each array block in photon-timing mode, at two different timing resolutions, attained by varying the external reference clock.
TABLE I
POWER CONSUMPTION OF THE IMAGER MAIN BLOCKS, IN PHOTON COUNTING MODE, FOR DLL CLOCK FREQUENCIES OF 160 AND 200 MHz

<table>
<thead>
<tr>
<th>Circuit block</th>
<th>Notes</th>
<th>Power consumption</th>
<th>Units</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>@ 390 ps</td>
<td>@ 312 ps</td>
<td></td>
</tr>
<tr>
<td>DLL</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>In-pixel electronics</td>
<td>1 conversion per frame</td>
<td>260</td>
<td>mW</td>
</tr>
<tr>
<td>Clock distribution</td>
<td>1 gate per frame</td>
<td>33</td>
<td>mW</td>
</tr>
<tr>
<td></td>
<td>50 gates per frame</td>
<td>1.65</td>
<td>W</td>
</tr>
<tr>
<td></td>
<td>Digital I/O at 100 000 fps</td>
<td>20</td>
<td>mW</td>
</tr>
<tr>
<td></td>
<td>Analog I/O</td>
<td>&lt; 5</td>
<td>mW</td>
</tr>
<tr>
<td>TOTAL</td>
<td>1 gate per frame</td>
<td>358</td>
<td>mW</td>
</tr>
<tr>
<td></td>
<td>50 gates per frame</td>
<td>1.9</td>
<td>W</td>
</tr>
</tbody>
</table>

III. TDCs CHARACTERIZATION

In this section, we present the characterization of the TDC within the array, when all 1024 TDCs are running at the same time. We start from linearity (DNL and INL) and single-shot precision, i.e., the standard deviation of TDC conversion results when a constant START-STOP time interval is measured many times. Then we report the uniformity of TDC conversions among different pixels and finally the crosstalk among TDCs.

A. Linearity

The array design aimed at optimizing the linearity of the measurement, both taking care of the delay cells matching and implementing the Sliding-Scale technique.

In order to quantitatively measure the linearity of the chip, we performed a code density test, collecting about 40 000 samples for each pixel within a range of 300 ns. No differences were observed among different pixels, hence Fig. 10 shows the code density test, the DNL and INL of one pixel, in the array center. The obtained DNL is 2% LSB rms (±6% maximum) and the INL is 10% LSB rms (±22% maximum). Such relative linearity is much better than what reported in literature [4]–[8]. Considering the DNL absolute value, only Ref. [7] shows a slightly better DNL (16.5 ps [7] versus our 18.7 ps), but with a much worse INL (137.5 ps [7] versus our 78 ps).

B. Single-Shot Precision and Uniformity

We characterized the single-shot precision of the TDCs integrated in each pixel of the array, by illuminating the corresponding SPADs with a pulsed laser with 60 ps full-width at half maximum (FWHM). In order to check the reliability of the imager, two different reference clock frequencies were used, namely 80 and 100 MHz, corresponding to a time-bin, i.e., LSB, of 390 and 312 ps, respectively. The typical pulse responses of one pixel in the array center are shown in Fig. 11: the single-shot precision is 660 ps (left) and 600 ps (right) FWHM, corresponding to a variance of 280 and 254 ps rms, respectively. Depending on the measured time delay, the precision shows variations of ±20 ps with respect to the typical response represented in Fig. 11.

For assessing TDC response uniformity, we homogeneously illuminated the imager with a pulsed laser and all 1024 TDCs...
measured the arrival time of the first photon per pixel. A 100 MHz clock was used, resulting in a 312 ps bin. After accumulating 1024 histograms, with about 40000 conversions per pixel, the centroid and the histograms spread (in FWHM) were computed, across the whole TDC range. No significant differences (lower than $\frac{1}{2}$ LSB) were observed at different delays. The maximum error between the measured time and the average time measured by each pixel of the array is shown in Fig. 12 left, whereas the single-shot precision is shown in Fig. 12 right.

The maximum difference across centroids among different pixels is 260 ps (less than 1 LSB) and the standard deviation is 42 ps; it is a clear signature of the excellent pixel uniformity. Fig. 12 right shows the single-shot precision (i.e., the distribution width) of each pixel; the average value across all pixels is 609 ps FWHM, i.e., 259 ps rms. The variance of the FWHM among the 1024 pixels is just 27.8 ps rms, showing again excellent uniformity of TDC performance over the entire array.

C. TDC Crosstalk

We assessed the crosstalk among different TDCs by measuring the FWHM of photon arrival time histograms (with about 50 000 conversion) in each pixel, by illuminating the chip with a pulsed laser with both low (red lines in Fig. 13) and high (blue lines in Fig. 13) optical power. In the first case less than 1% of pixels receives a photon, thus it is possible to assume that while the hit pixel is performing a TDC conversion, no neighbors are working. In the second case, more than 85% of pixels receives a photon almost simultaneously, therefore it is possible to consider, with good approximation, that most of neighboring pixels are working simultaneously. The centroid (Fig. 13 left) averaged among all 1024 array pixels is 97.21 ns and 97.26 ns, at low and high optical powers respectively, and in all pixels at high conversion rates the measured TOF is longer than at low conversion rates. This fact is mostly due to the heating of the array (that causes longer propagation delay in the STOP signal input buffer) and not to crosstalk among TDCs. Furthermore, note that the time difference in the measured centroid at low and high optical power is only 50 ps, i.e., 16% LSB. The FWHM (Fig. 13 right) is 609.7 and 612.5 ps, at low and high optical power respectively, showing negligible dependence on counting conditions, i.e., photon rate. Therefore, we can conclude that there is almost no crosstalk among TDCs since neither the accuracy (centroid) nor the precision (FWHM spread) of measurements change significantly with the number of working TDCs.

IV. 2-D AND 3-D MEASUREMENTS

In order to prove the photon counting and photon timing capability of the SPAD imager, we performed combined 2-D/3-D acquisitions, working at high frame-rates. 2-D images and videos were acquired by using the chip in photon-counting mode (100 000 fps maximum frame rate), whereas 3-D distance-resolved ranging was achieved by measuring the round trip duration of a laser pulse shining the target scene.

Fig. 14 shows the acquired 2-D image of the face of a doll at 60 cm distance from the imager, acquired through a 55 mm objective, with no post-processing. The field-of-view (FOV) of the camera was about $5 \times 5$ cm$^2$, the integration time was 5 ms with low ambient illumination. In order to increase the image dynamic range (ratio between the maximum and the minimum counts per frame), the 5 ms integration time ($T_{int}$) resulted by summing 500 frames ($N_{frame}$) of 10 $\mu$s each. The resulting dynamic range is:

$$DR = 20 \log_{10} \frac{N_{frame} \cdot 2^6}{DCR \cdot T_{int}} = 94.5 \text{ dB}$$

(4)

set also by the very low dark counting rate of 120 cps.

Fig. 15 shows some frames of a 2-D video, acquired at 50 000 fps. A chopper with five blades was rotating at 39 000 r/min (i.e., 1/5 period corresponds to about 300 $\mu$s) in a scene.
Fig. 15. Eight frames from a 2-D video acquired at 50 000 fps of an optical chopper rotating at 39 000 r/min, illuminated by a neon lamp flickering at 100 Hz. All images have the same color scale (0–60 counts).

Fig. 16. 3-D reconstruction of a human target with office lighting, acquired by the SPAD imager in 5 ms, with a 4 mm depth resolution, with no post-processing.

with a neon lamp illumination, flickering at 100 Hz (5 ms half-period). Frames 1, 8 and 15 show the chopper rotating 1/5 of a full turn, with the neon lamp at the maximum brightness; instead frames 262, 270 and 277 (about 5 ms after the previous ones) show that the neon gas discharge is almost off.

Active illumination is instead necessary for 3-D acquisitions based on TOF photon timing: we employed a 750 nm pulsed laser (90 mW average emitted optical power, 70 ps FWHM) and a 250 mm objective (f/4.8) placed in front of the SPAD imager, thus obtaining a 60 × 60 cm² FOV at 5 m distance from the camera. Fig. 16 shows an example of 3-D reconstruction, with no post-processing, of a human target obtained with 5 ms dwell time. Each pixel acquired about 3000 photons, so depth-information is measured with a precision given by

$$\sigma_N = \frac{\sigma_1}{\sqrt{N}}.$$  (5)

The overall precision $\sigma_N$ of 0.7 mm (i.e., 4.6 ps rms) is achieved after $N = 3000$ valid events, with the previously reported single-shot precision $\sigma_1$ of 254 ps rms.

V. C ONCLUSION

We presented a CMOS SPAD imager, based on 32 × 32 smart-pixels, each containing a SPAD detector and a TDC. The chip can count the number of photons detected into each pixel at 100 000 fps or can measure the time-tagging, i.e., the photon arrival time, of one photon per pixel per frame. The measured time delay can be either the photon TOF in 3-D-ranging/LIDAR applications, or the fluorescence photon emission in FLIM/FRET imaging, or the time-resolved photon waveform of very fast optical signals. The 1024 fully independent pixels operate in parallel, with no multiplexing during either detection or TDC conversion, in a global shutter mode. In photon-counting mode, at every frame the camera provides 1024 words of 6 bits each, with 100 000 fps maximum frame rate. In photon timing mode, at every frame the camera provides 1024 words of 10 bits each, i.e., 312 ps time bin and 320 ns full-scale range.

The imager opens the way to ultra-sensitive (single-photon sensitivity) high-speed (a hundred kiloframes/s) acquisitions of fast optical phenomena and dynamic sequences of events. Ongoing applications are in time-resolved spectroscopy, fluorescence lifetime imaging, diffusive optical tomography, molecular imaging in life sciences, TOF 3-D ranging, and atmospheric layer sensing through LIDAR. These applications in particular benefit from the large total active area of the array (0.7 mm²) combined with the highest PDE among SPAD array for photon-timing, allowing to reduce the optical power of the active illumination employed to excite the sample and targets under investigations. Other applications are gesture recognition for human-machine interface in which sub-centimeter resolution and hundreds of frames per second are required [34], gaming and mixed reality requiring 3-D depth-resolved dynamic acquisitions at medium distances (up to ten meters) with centimeter...
resolution [35], and also touchless interaction with mobile devices even in light-starved environments.

ACKNOWLEDGMENT

The authors would like to thank S. Masci for valuable support in preparation and wire bonding of the chips.

REFERENCES


