How to Build a Speech Authentication System with Django and Next JS - Part 1

How to Build a Speech Authentication System with Django and Next JS - Part 1


9 min read

Learn how to build a speech authentication system with Django in this comprehensive tutorial. With the help of a Python package that converts speech to text, users can enter their details by speaking. The system includes a registration and login page and once logged in, users will be greeted with a welcome voice. Follow the step-by-step guide to set up the backend API using Django and the frontend using Next.js and Chakra UI.


The system includes a registration and login page and once logged in, users will be greeted with a welcome voice. Follow the step-by-step guide to set up the backend API using Django and the frontend using Next.js and Chakra UI. The article also covers the prerequisites, project setup, installations, and building of the API. The API includes endpoints for Registration, Login, JWT Token Refresh, User Retrieve, and Speech to Text.


Hey there! To make the most of this article, it would be helpful to have a solid understanding of Django, Django Rest Framework, and Next.js. Don't worry though, we'll do our best to explain things clearly and make it easy to follow along.

Getting Started

First of all, we would work on the backend. Django will be used for the Backend and Next.js + Charkra UI for the front end. Django is used because we would be using a Python package that helps us convert speech to text.

Application Design

The overall application design will be very simple. The backend will be an API which the frontend will consume.

  1. A user's voice will be recorded on the frontend.

  2. The frontend sends the recorded wav file to the backend in Base64 format.

  3. The backend will process the data and convert it to text.

  4. The text is returned to the frontend as a response,

  5. then text is entered into the UI input box.

That's it, very simple.

Project Setup

This project is already available on GitHub. Clone it and follow the installation guide in the Readme to set up the project.


For building this project, we will need to have the following installed;

  • Python 3.9 - This is used for the backend, and Django is used as the framework.

  • NodeJs & NPM - This is used for the frontend.

Building the API

Let's clone the project.

git clone
cd speech_to_text_auth

For now, we would be working on the API. Move into the api directory.

Set up your virtual environment

python -m venv venv
# Or python3 if you have that installed
python3 -m venv venv

Check out this python documentation to learn about virtual environments.

Activate the environment.

# windows: cmd.exe

# bash or zsh
source venv/bin/activate

If the above commands don't work, check out this python documentation for help activating virtual environments. If you encounter an issue activating your virtual environment, here is a list of commands for common shells.

Installing requirements

# This will install the requirements in the requirements.txt file
pip install -r requirements.txt

Move into the src directory. There we have the account application, config, tests, and utils directories.

Now we need to create our .env file. This is where our secret configurations are kept. Decouple is used for accessing this file in our config files e.g,, etc.

Create a new file, .env and copy the contents of .env.example into it.

Now we need to create a Postgres DB for our project. By default, the project uses PostgreSQL but can be changed in the settings file. We can also create a DB for tests if we want to run tests. It's customary to have a separate database for development and testing because well testing uses autogenerated data which we don't want in our development database.

The config folder is where our settings are located. This directory contains a settings directory which contains two Python files, and Having our settings file arranged like this helps us to write separate settings for different environments.

We could

  • develop using the,

  • test using the,

  • maybe create a file for our staging environment in a live server

  • and for our production environment.

The tests folder contains our tests.

Accounts Application

This is the only application on the project and the most important because we will store users' data here. The API views, models, and serializers are designed here.

This is a basic Django custom user model implementation. By inheriting the Django AbstractBaseUser model, we can build our User model and make email required.

This is the DBML structure used for the diagram above.

// Use DBML to define your database structure
// Docs:
// DB Diagram tool:

Table users {
  pk integer [primary key]
  email varchar
  active boolean
  staff boolean
  admin boolean
  created timestamp

Table profile {
  pk integer [primary key]
  user_id integer
  fullname varchar
  sex varchar
  phone varchar
  country varchar

Ref: profile.user_id - // one-to-one

The User model has a one-to-one relationship with the Profile model. This is a very common way to design a User table. The reason for this is to make sure that when there is a change in business logic that will slightly affect how the User is created. We will update the Profile model instead so that we won't have migration issues with the User model.

When a user is created, we automatically create its profile by listening to the post_save signal. Learn more about Django Signals.

We need a custom user registration form because we need a way to make the Django admin create a User correctly. This form allows the admin to enter the user's profile data when creating the user. The default Django admin form will have only fields for the User model.

By overriding the model form save method, we get the validated profile data and populate the already created user profile. As soon as the is invoked, the profile will be created by the post_save signal from earlier.

Learn more about creating Django Model Form.

One important thing here is, we create a UserAdmin class for overriding the default that will be created by Django. Read more about Customizing Django Model Admin.

Django BaseUserAdmin is a special ModelAdmin that allows us to override the form used for creating a new User instance in the admin. So we pass the form we created earlier here.


An important thing to note here is that after creating a Custom Django User Model we need to tell Django we want to use it instead of the default.

The line below tells Django to use our new Custom User Model instead of its default User Model. This is a very common implementation in Django, here is a Django Rest Framework GitHub template that we can use as a project boilerplate.

AUTH_USER_MODEL = 'account.User'

Decouple is used to access your environment keys, .env. This documentation shares how to effectively use decouple with Django.

Views and Serializers

Now we will cover the Django rest framework views and serializers. We need the following endpoints;

  1. Registration [POST],

  2. Login [POST],

  3. JWT Token Refresh [POST],

  4. User Retrieve [GET], and

  5. Speech to Text [POST].

Registration [POST]: The endpoint will be used to create a new user. The data that will be received through all the endpoints will be in JSON format. The endpoint uses the imported RegisterSerializer to convert post data into native Python datatypes and validate them. An important thing to note here is that we are using JSON Web Token (JWT) for authenticating our users. As soon as a new account is created, new access and refresh tokens are included in the response. Read more about Simple JWT, A JSON Web Token authentication plugin for the Django REST Framework.

Login [POST]: Validates the user data and returns a response that contains the tokens and logged-in user data.

JWT Token Refresh [POST]: This is used to get a new access token using a valid refresh token. As JWT is a stateless authentication method, meaning there is no session stored for the user. Every 5 mins the access token previously generated is expired which requires the user to generate a new access token using the refresh token which is valid for 24 hours. The validation time for both the access and refresh tokens can be edited from the file. Learn more about this at Simple JWT.

User Retrieve [GET]: We need to be able to get the data of the currently authenticated user. This endpoint returns the data of the authenticated user.

Speech to Text [POST]: This endpoint used the SpeechRecognition package, to convert audio binary to text. This package supports Google, IBM, and Microsoft Speech-to-text APIs and many more. Here we are using Google Speech-to-text. Because of the abstraction, the package brings, we don't need to call any actual API to upload our audio files for processing.

class SpeechToTextView(APIView):

    def post(self, request, *args, **kwargs):
        Convert speech to text, using Google Speech Recognition API
        Accepts a base64 encoded string of audio data
            Response: Response object with text
            record: str ='record')
            decoded_b64 = base64.b64decode(record)

            # Convert decoded_b64 to in-memory bytes buffer
            r = sr.Recognizer()
            with sr.AudioFile(io.BytesIO(decoded_b64)) as source:
                # listen for the data (load audio to memory)
                audio_data = r.record(source)
                # recognize (convert from speech to text)
                text = r.recognize_google(audio_data)

            return Response(data={'text': text})
        except Exception:
            return Response(
                data={'message': "Error converting speech to text"},

The data that will be passed to this endpoint will be in base64 format because it's faster and easier to transfer and process in this form.

decoded_b64 = base64.b64decode(record)

This takes the data received and decodes it using the built-in base64 library. The decoded data returned is in binary form. This is done so that we can convert it into a Buffered stream in memory using io.BytesIO(decoded_b64) , the built-in io library which is used for working with Input and Output (I/O) streams.

What are Buffered I/O streams in Python?

Buffered I/O streams in Python's io module are classes that help improve the performance of I/O operations by buffering data in memory. Think of buffering as a way of temporarily storing data in a queue so that it can be accessed more efficiently later on. Read more on Buffered streams to deeply understand how it works.

When working with objects like Buffers, Files, or Database connections in Python Context Managers are used to gain access to these objects because the connection is closed automatically even if there is an unhandled exception in the context manager.

with sr.AudioFile(io.BytesIO(decoded_b64)) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)

If an error occurs while trying to record the audio source, which is the Buffer we created earlier in the memory, the I/O connection will be closed safely. This article explains why it's important to use Context Managers when dealing with objects like this.

So the buffer is recorded and passed to google for speech-to-text processing. If all is well, we return the text in the response else we raise a BadRequest error.

Conclusion ๐Ÿค“

In conclusion, this article provides a comprehensive tutorial on how to build a speech authentication system with Django. The system allows users to enter their details by speaking and includes a registration and login page. The article covers the step-by-step guide to setting up the backend API using Django. The article also covers the necessary installations, project setup, and the design of the application. Overall, this article is a great resource for developers looking to build a speech authentication system.

To make this article brief the next part which is Building the Frontend with Next.js will be covered in another article.

You can follow me on Twitter, Hashnode,, and Github, where I post amazing projects and articles.

Thanks for reading, ๐Ÿ˜‰.