SpeakUp Click is a compact add-on board providing an offline speech recognition solution. This board features the VS1053, an Ogg Vorbis/MP3/AAC/WMA/FLAC/MIDI audio codec from VLSI Solution. It provides an audio encoder and decoder for those formats on a single chip alongside a low-power DSP processor, data memory, 16KB instruction RAM, and 0.5KB RAM for user application. This board also contains the STM3232F415RG, an ARM Cortex-M4 32-bit MCU from STMicroelectronics. The STM32 stores many voice commands and compares them with those received from the VS1053, sends the data to the host MCU or executes the command, thus enabling this board to act as a stand-alone solution. This Click board™ makes the perfect solution for voice-controlled applications, home automation, or any human-machine interface.
SpeakUp Click is supported by a mikroSDK compliant library, which includes functions that simplify software development. This Click board™ comes as a fully tested product, ready to be used on a system equipped with the mikroBUS™ socket.
This product is no longer in stock
SpeakUp Click is based on the VS1053, an Ogg Vorbis/MP3/AAC/WMA/FLAC/MIDI audio codec from VLSI Solution. This audio codec takes audio samples through an onboard microphone or external one connected to the 3.5mm audio jack. Received input bitstream is decoded and passed through a digital volume control to an 18-bit oversampling, multi-bit, sigma-delta DAC. The decoding is controlled via a serial control bus. In addition, it is possible to add application-specific features like DSP effects to the user RAM. On this Click board™, there is a Line Out header (L, G, R) for connecting the headphones for hearing recorded samples used for comparison.
As a brain, the SpeakUp Click features the STM32F415RG, an ARM Cortex-M4 32-bit MCU from STMicroelectronics. This MCU communicates with the VS1053 via its SPI serial interface and is responsible for processing received data from the audio codec and comparing stored voice data to the received one. It can hold more than 200 voice commands, depending on their length. Moreover, the STM32 in Standalone Mode can execute tasks according to the voice command through the HD1 and HD2 headers which contain GPIOs and 3.3V power rails. The GPIOs can have ON, OFF, Toggle, Pulse, and None states, with additional parameters for Pulse. In Click™ Mode, each recorded voice command is given an index number which can be sent to the host MCU over a USB or UART interface. In addition, the STM32 firmware can also be upgraded via the PROG JTAG header.
The SpeakUp Click, via STM32, establishes the connection with the host MCU via one of the selected mikroBUS™ interfaces (UART, SPI, or I2C). UART interface is set as default, while SPI/I2C can be an additional communication method if users want to create their own libraries. Also, the user is provided with other functions for STM32, such as reset function, interrupt, and PWM from the mikroBUS™ socket. Using onboard push buttons makes it possible to perform some basic configurations without using the software. Push button SW1 is used for recording voice commands that shouldn't last more than 1 second while pressing SW2 push button for more than 2 seconds erases all recorded voice commands. Both pressed push buttons will reset the Click board™. There are two LEDs, amber and red, to provide the board's visual status.
We also provide a free SpeakUp application based on the Dynamic Time Warping (DTW) algorithm that lets you configure this Click board™ through the software. SpeakUp Click board features a mini USB connector to connect the board to the PC and recognizes it as HID. After a successful connection, the SpeakUp Click board™ will perform ambient noise detection and will calibrate itself, which is a process that will last about 10 seconds. The application for the PC allows you to add voice commands to the SpeakUp Click with a predicted time limit per command. The recorded command will be automatically played so you can make sure it is ok and can be named to avoid future confusion. Application settings have the configuration for acceptance threshold, recording timeout, word length, and more.
NOTE: You can find the complete instructions for using and configuring the SpeakUp Click in our User Manual.
This Click board™ can be operated only with a 3.3V logic voltage level. The board must perform appropriate logic voltage level conversion before using MCUs with different logic levels. However, the Click board™ comes equipped with a library containing functions and an example code that can be used as a reference for further development.
Can be used for voice-controlled applications, home automation, or any human-machine interface
VS1053 – audio codec from VLSI Solution
STM32F415RG – ARM Cortex-M4 32-bit MCU from STMicroelectronics
Recognize over 200 different voice commands, Standalone capabilities with user programmable GPIOs, onboard MCU, sound received through internal or external mic, comes with a dedicated software tool for easy configuration, selectable serial interface, ultra fast operation, and more
Click board size
L (57.15 x 25.4 mm)
This table shows how the pinout on SpeakUp Click corresponds to the pinout on the mikroBUS™ socket (the latter shown in the two middle columns).
|SPI Chip Select||CS||3||CS||RX||14||TX||UART TX|
|SPI Clock||SCK||4||SCK||TX||13||RX||UART RX|
|SPI Data OUT||SDO||5||MISO||SCL||12||SCL||I2C Clock|
|SPI Data IN||SDI||6||MOSI||SDA||11||SDA||I2C Data|
|LD1||LD1||-||Ready for Recording and Listening Status LED Indicator|
|LD2||LD2||-||Operational Status LED Indicator|
|LD3||PWR||-||Power LED Indicator|
|SW1||SW1||Populated||Button for Recording the Voice Command|
|SW2||SW2||Populated||Button for Deleting Voice Command|
|HD1-HD2||-||Unpopulated||User-Programmable GPIOs Headers|
|HD3||LINE OUT||Unpopulated||Stereo Headphone Header|
|CN4||PROG||Unpopulated||JTAG Programming Header|
|Applications||Can be used for voice-controlled applications, home automation, or any human-machine interface|
|On-board modules||VS1053 – audio codec from VLSI Solution STM32F415RG – ARM Cortex-M4 32-bit MCU from STMicroelectronics|
|Key Features||Recognize over 200 different voice commands, Standalone capabilities with user programmable GPIOs, onboard MCU, sound received through internal or external mic, comes with a dedicated software tool for easy configuration, selectable serial interface, ultra fast operation, and more|
|Click board size||L (57.15 x 25.4 mm)|
SpeakUp click and Speakup 2 click are speaker dependent speech recognition click boards with standalone capabilities. They work by matching sounds with pre-recorded commands. The full-featured version (SpeakUp) is powered by an STM32F415RG MCU, has an additional MP3 codec chip, and a connector for an external microphone. The simplified version (SpeakUp 2) is powered by an FT900 32-bit MCU. Both boards are programmed using a dedicated software tool for easy configuration.
This guide uses the original SpeakUp as a reference, but all instruction apply to SpeakUp 2 as well.
Wouldn't you rather issue verbal commands and have your machines comply, instead of pressing keys, pushing buttons and flipping switches all the time? There's a wide range of applications for the SpeakUp.
What gives the SpeakUp its speech recognition capabilities is the firmware we developed for the on-board MCU. It’s based on the DTW algorithm, which makes it decisive, it turns your talk into action almost instantly
The main goal of a speech recognition system is to substitute a human listener, although it is very difficult for an artificial system to achieve the flexibility offered by human ear and human brain. The work principle of speech recognition systems is roughly based on the comparison of input data to prerecorded patterns. These patterns can be arranged in the form of phoneme or word. By this comparison, the pattern to which the input data is most similar is accepted as the symbolic representation of the data. It is very difficult to compare raw speech signals directly. Because the intensity of speech signals can vary significantly, a preprocessing on the signals is necessary. This preprocessing is called Feature Extraction.
First, short time feature vectors are obtained from the input speech data, and then these vectors are compared to the patterns classified prior to comparison. The feature vectors extracted from speech signal are required to best represent the speech data, to be in size that can be processed efficiently, and to have distinct characteristics.
The SpeakUp Firmware uses Dynamic Time Warping (DTW) algorithm - word-based, isolated word, speaker dependent and template matching algorithm:
The SpeakUp software configuration tool is a free PC application for configuring the SpeakUp click board. With it, you can configure the board to recognize over 200 different voice commands and have the on-board MCU carry them out instantly. You can download the software from the following link: The software is designed with ease of use and simplicity in mind. The UI is based on tabs and drop-down menus requiring no programming skills to use. Still, it has all the essential features and options that give you full control of the set-up process.
This flowchart shows the typical workflow of programming SpeakUp. The process is explained in the detail in the remainder of this article.
Connect the SpeakUp click board to the computer via the USB cable. It will be recognized as a USB Human Interface Device (HID) in the Device Manager of the Control Panel.
Once you connect the SpeakUp to your computer you’re just a few clicks away from configuring it. The set-up process is dead simple. Launch the application, and it will lead you through the initial steps of recording and assigning commands.
After the successful connection, the SpeakUp click™ board will perform ambient noise detection and calibrate itself. The process lasts about 10 seconds. It’s done when the red signal LED turns off. After that the board is ready for recording voice commands. You can set custom calibration parameters for any subsequent usage in the Project Setting
To create a new project, press the Create New Project button from the main toolbar of the SpeakUp software.
A new window will open, where you can enter your project’s name and destination folder (if the destination folder doesn’t exist, the software will prompt you to create it). To finish project creation after inputting the required information, press the Create button.
Alternatively, you can choose to open the settings menu as soon as you create a project, by checking the appropriate box
In the General Settings you can configure the SpeakUp’s functionality
Acceptance threshold: This is the parameter you should adjust to define how closely your delivery has to match your pre-recorded command. At lower values, you’ll have to deliver the command precisely the way you recorded it. At higher values the matching doesn’t have to be so precise, but this increases the probability that the SpeakUp will pick up irrelevant speech and interpret it as a command. You should be able to reach the sweet spot value through some trial & error.
Recording timeout: Timeframe in which the SpeakUp click board expects recording input after the record button is being pressed. User can choose between 5, 10 and 15 seconds timeframes.
Word Length: Length of the voice command being recorded, in seconds. Can be 1, 1.5, 2, 2.5 and 3 seconds To configure project settings, press the Open Settings Window button and the Settings window will open.
Noise level: Minimal sound volume level that can trigger a voice command recognition. Lower values require quieter pronunciation, resulting in higher noise/hiss sensitivity. On the contrary, higher level values require louder pronounciation and they are less sensitive to noise/hiss.
We recommend that you keep auto detection enabled. That way the SpeakUp Click board will measure the noise level, and perform noise calibration automatically. Auto detection can last a bit longer, usually around 10 seconds. Sudden changes in sound levels will lengthen the time of calibration and will result in improper sound level values.
Notify master: Notifies the master (MCU or PC) when the voice command is recognized by sending a 16-bit index number of voice command via chosen communication interface (UART or USB).
Data rate: Sets the speed used for sending data to the master (MCU or PC)
In this section, you can rename GPIO pins according to your needs and set their starting conditions. The new GPIO pin aliases will be applied in the main window too. Set the corresponding initial GPIO pin state in the Initial Pin States section. Condition can be either low (logical 0) or high (logical 1).
When a new command is recorded, it is time to assign it an action. The action will be performed when the voice command is recognized. Also, a 16-bit index number of the voice command will be sent via chosen communication interface (UART or USB).
There are five types of action that can be assigned:
NONE: When this option is selected, no action will be performed on the corresponding GPIO pin upon voice command matching.
ON: When this option is selected, a corresponding GPIO pin will be set to logical high state upon voice command matching.
OFF: When this option is selected, a corresponding GPIO pin will be set to logical low state upon voice command matching.
TOGGLE: When this option is selected, a corresponding GPIO pin state will be toggled upon voice command matching.
PULSE: When this option is selected, a train of pulses will be sent to the corresponding GPIO pin upon voice command matching.
The pulse parameters can be set in the Pulse Parameters window (click on the Edit pulse parameters icon Edit pulse parameters to open it):
A period (T) is the time it takes for a signal to complete a single cycle (sum of the high state and low state time periods).
Duty ratio (D) is the percentage of T in which a signal is active, i.e. ratio of the high state period and a complete period.
N is the number of times the pulse is repeated. Thus, a 60% duty cycle means the signal is ON 60% of the time period but OFF 40% of the time period.
When you’re finished recording and configuring voice commands, it is time to upload the project to the SpeakUp click™ board. This is done via the Upload Project button. You can monitor the upload process in the Toolbar. After it’s done, an appropriate message will be displayed in the Status Bar.
Each recorded voice command is given an index number which is sent to the host MCU. You can export voice command names and their indexes as constants. The exported document will be in the form of a source file (in any of the three languages), as shown below.
1 /* 2 This file is generated by SpeakUp Software. 3 It containts voice commands constants. 4 Creation date: 4/3/2014 Creation time: 11:20:09 AM 5 Name: Turn ON Program A Index: 0 Length: 0.0 s 6 Description: Turns on Program A 7 */ 8 const VCMD_TURN_ON_PROGRAM_A = 0; 9 /* 10 Name: Turn ON program B Index: 1 Length: 0.0 s 11 Description: Turns on Program A 12 */ 13 const VCMD_TURN_ON_PROGRAM_B = 1;
You can perform some basic configuration directly on the SpeakUp without using the software. Different combinations of button presses will allow you to record, re-record or erase commands. You’ll get feedback from the on-board LEDs. However you won’t be able to assign specific actions with this method.
Push-button 1 - To record your voice command, press and hold the button while speaking. You must stay within the time limit for each command (default settings: 1 second). You can also record multiple commands at once by pronouncing them one by one while keeping the button pressed. Just make sure to wait for the red LED to flash between pronouncing subsequent commands. Proceed in this way for as many commands as you need. Each command will be assigned a unique index.
Push-button 2 - If you press it for more than 2 seconds, all recorded voice commands will be erased. If both push-buttons are pressed for more than 2 seconds, the SpeakUp click board will reset.
Two indicator LEDsprovide the following signals:
Amber LED - the board is ready for recording or listening.
Red LED - the board is perfoming an operation.
When the voice command is recognized, both LEDs are lit for a half a second.
For better recording results, it is necessary to provide conditions with lowest amounts of ambient noise and speaker distance from the microphone in the range from 10 to 20cm. If there are problems with the voice command detection, please record it several times due to the pronunciation diversity. It is mandatory to play back the recorded voice command in order to hear if some ambient noise was recorded also. Because of this, it is recommended that the SpeakUp click™ board is placed on a surface that doesn’t transfer mechanical vibrations. This is a speaker dependent system. If there are more users, each person should record voice commands separately, due to the pronunciation diversity. Number of voice commands that can be recorded depend on their lengths, typically more than 200 (or 100 for SpeakUp 2) for the voice command length of 1 second. Please keep in mind that the recording is performed by the SpeakUp click board™, not the computer, so there is no need to connect an external microphone to the computer.