1 op { 2 graph_op_name: "AudioSpectrogram" 3 in_arg { 4 name: "input" 5 description: <<END 6 Float representation of audio data. 7 END 8 } 9 out_arg { 10 name: "spectrogram" 11 description: <<END 12 3D representation of the audio frequencies as an image. 13 END 14 } 15 attr { 16 name: "window_size" 17 description: <<END 18 How wide the input window is in samples. For the highest efficiency 19 this should be a power of two, but other values are accepted. 20 END 21 } 22 attr { 23 name: "stride" 24 description: <<END 25 How widely apart the center of adjacent sample windows should be. 26 END 27 } 28 attr { 29 name: "magnitude_squared" 30 description: <<END 31 Whether to return the squared magnitude or just the 32 magnitude. Using squared magnitude can avoid extra calculations. 33 END 34 } 35 summary: "Produces a visualization of audio data over time." 36 description: <<END 37 Spectrograms are a standard way of representing audio information as a series of 38 slices of frequency information, one slice for each window of time. By joining 39 these together into a sequence, they form a distinctive fingerprint of the sound 40 over time. 41 42 This op expects to receive audio data as an input, stored as floats in the range 43 -1 to 1, together with a window width in samples, and a stride specifying how 44 far to move the window between slices. From this it generates a three 45 dimensional output. The lowest dimension has an amplitude value for each 46 frequency during that time slice. The next dimension is time, with successive 47 frequency slices. The final dimension is for the channels in the input, so a 48 stereo audio input would have two here for example. 49 50 This means the layout when converted and saved as an image is rotated 90 degrees 51 clockwise from a typical spectrogram. Time is descending down the Y axis, and 52 the frequency decreases from left to right. 53 54 Each value in the result represents the square root of the sum of the real and 55 imaginary parts of an FFT on the current window of samples. In this way, the 56 lowest dimension represents the power of each frequency in the current window, 57 and adjacent windows are concatenated in the next dimension. 58 59 To get a more intuitive and visual look at what this operation does, you can run 60 tensorflow/examples/wav_to_spectrogram to read in an audio file and save out the 61 resulting spectrogram as a PNG image. 62 END 63 } 64