Home | History | Annotate | Download | only in base_api
      1 op {
      2   graph_op_name: "AudioSpectrogram"
      3   in_arg {
      4     name: "input"
      5     description: <<END
      6 Float representation of audio data.
      7 END
      8   }
      9   out_arg {
     10     name: "spectrogram"
     11     description: <<END
     12 3D representation of the audio frequencies as an image.
     13 END
     14   }
     15   attr {
     16     name: "window_size"
     17     description: <<END
     18 How wide the input window is in samples. For the highest efficiency
     19 this should be a power of two, but other values are accepted.
     20 END
     21   }
     22   attr {
     23     name: "stride"
     24     description: <<END
     25 How widely apart the center of adjacent sample windows should be.
     26 END
     27   }
     28   attr {
     29     name: "magnitude_squared"
     30     description: <<END
     31 Whether to return the squared magnitude or just the
     32 magnitude. Using squared magnitude can avoid extra calculations.
     33 END
     34   }
     35   summary: "Produces a visualization of audio data over time."
     36   description: <<END
     37 Spectrograms are a standard way of representing audio information as a series of
     38 slices of frequency information, one slice for each window of time. By joining
     39 these together into a sequence, they form a distinctive fingerprint of the sound
     40 over time.
     41 
     42 This op expects to receive audio data as an input, stored as floats in the range
     43 -1 to 1, together with a window width in samples, and a stride specifying how
     44 far to move the window between slices. From this it generates a three
     45 dimensional output. The lowest dimension has an amplitude value for each
     46 frequency during that time slice. The next dimension is time, with successive
     47 frequency slices. The final dimension is for the channels in the input, so a
     48 stereo audio input would have two here for example.
     49 
     50 This means the layout when converted and saved as an image is rotated 90 degrees
     51 clockwise from a typical spectrogram. Time is descending down the Y axis, and
     52 the frequency decreases from left to right.
     53 
     54 Each value in the result represents the square root of the sum of the real and
     55 imaginary parts of an FFT on the current window of samples. In this way, the
     56 lowest dimension represents the power of each frequency in the current window,
     57 and adjacent windows are concatenated in the next dimension.
     58 
     59 To get a more intuitive and visual look at what this operation does, you can run
     60 tensorflow/examples/wav_to_spectrogram to read in an audio file and save out the
     61 resulting spectrogram as a PNG image.
     62 END
     63 }
     64