How to Build a Speech Authentication System with Django and Next JS - Part 1
Learn how to build a speech authentication system with Django in this comprehensive tutorial. With the help of a Python package that converts speech to text, users can enter their details by speaking. The system includes a registration and login page and once logged in, users will be greeted with a welcome voice. Follow the step-by-step guide to set up the backend API using Django and the frontend using Next.js and Chakra UI.
TL;DR
The system includes a registration and login page and once logged in, users will be greeted with a welcome voice. Follow the step-by-step guide to set up the backend API using Django and the frontend using Next.js and Chakra UI. The article also covers the prerequisites, project setup, installations, and building of the API. The API includes endpoints for Registration, Login, JWT Token Refresh, User Retrieve, and Speech to Text.
Prerequisites
Hey there! To make the most of this article, it would be helpful to have a solid understanding of Django, Django Rest Framework, and Next.js. Don't worry though, we'll do our best to explain things clearly and make it easy to follow along.
Getting Started
First of all, we would work on the backend. Django will be used for the Backend and Next.js + Charkra UI for the front end. Django is used because we would be using a Python package that helps us convert speech to text.
Application Design
The overall application design will be very simple. The backend will be an API which the frontend will consume.
A user's voice will be recorded on the frontend.
The frontend sends the recorded wav file to the backend in Base64 format.
The backend will process the data and convert it to text.
The text is returned to the frontend as a response,
then text is entered into the UI input box.
That's it, very simple.
Project Setup
This project is already available on GitHub. Clone it and follow the installation guide in the Readme to set up the project.
Installations
For building this project, we will need to have the following installed;
Python 3.9 - This is used for the backend, and Django is used as the framework.
NodeJs & NPM - This is used for the frontend.
Building the API
Let's clone the project.
git clone https://github.com/devvspaces/speech_to_text_auth
cd speech_to_text_auth
For now, we would be working on the API. Move into the api
directory.
Set up your virtual environment
python -m venv venv
# Or python3 if you have that installed
python3 -m venv venv
Check out this python documentation to learn about virtual environments.
Activate the environment.
# windows: cmd.exe
venv\Scripts\activate
# bash or zsh
source venv/bin/activate
If the above commands don't work, check out this python documentation for help activating virtual environments. If you encounter an issue activating your virtual environment, here is a list of commands for common shells.
Installing requirements
# This will install the requirements in the requirements.txt file
pip install -r requirements.txt
Move into the src
directory. There we have the account application, config, tests, and utils directories.
Now we need to create our .env
file. This is where our secret configurations are kept. Decouple is used for accessing this file in our config files e.g manage.py
, wsgi.py
, etc.
Create a new file, .env
and copy the contents of .env.example
into it.
Now we need to create a Postgres DB for our project. By default, the project uses PostgreSQL but can be changed in the settings file. We can also create a DB for tests if we want to run tests. It's customary to have a separate database for development and testing because well testing uses autogenerated data which we don't want in our development database.
The config folder is where our settings are located. This directory contains a settings directory which contains two Python files, base.py
and test.py
. Having our settings file arranged like this helps us to write separate settings for different environments.
We could
develop using the
base.py
,test using the
test.py
,maybe create a file
staging.py
for our staging environment in a live serverand
production.py
for our production environment.
The tests folder contains our tests.
Accounts Application
This is the only application on the project and the most important because we will store users' data here. The API views, models, and serializers are designed here.
This is a basic Django custom user model implementation. By inheriting the Django AbstractBaseUser model, we can build our User model and make email required.
This is the DBML structure used for the diagram above.
// Use DBML to define your database structure
// Docs: https://www.dbml.org/docs
// DB Diagram tool: https://dbdiagram.io/home
Table users {
pk integer [primary key]
email varchar
active boolean
staff boolean
admin boolean
created timestamp
}
Table profile {
pk integer [primary key]
user_id integer
fullname varchar
sex varchar
phone varchar
country varchar
}
Ref: profile.user_id - users.pk // one-to-one
The User model has a one-to-one relationship with the Profile model. This is a very common way to design a User table. The reason for this is to make sure that when there is a change in business logic that will slightly affect how the User is created. We will update the Profile model instead so that we won't have migration issues with the User model.
When a user is created, we automatically create its profile by listening to the post_save
signal. Learn more about Django Signals.
We need a custom user registration form because we need a way to make the Django admin create a User correctly. This form allows the admin to enter the user's profile data when creating the user. The default Django admin form will have only fields for the User model.
By overriding the model form save method, we get the validated profile data and populate the already created user profile. As soon as the user.save()
is invoked, the profile will be created by the post_save
signal from earlier.
Learn more about creating Django Model Form.
One important thing here is, we create a UserAdmin class for overriding the default that will be created by Django. Read more about Customizing Django Model Admin.
Django BaseUserAdmin
is a special ModelAdmin that allows us to override the form used for creating a new User instance in the admin. So we pass the form we created earlier here.
An important thing to note here is that after creating a Custom Django User Model we need to tell Django we want to use it instead of the default.
The line below tells Django to use our new Custom User Model instead of its default User Model. This is a very common implementation in Django, here is a Django Rest Framework GitHub template that we can use as a project boilerplate.
AUTH_USER_MODEL = 'account.User'
Decouple is used to access your environment keys, .env
. This documentation shares how to effectively use decouple with Django.
Views and Serializers
Now we will cover the Django rest framework views and serializers. We need the following endpoints;
Registration [POST],
Login [POST],
JWT Token Refresh [POST],
User Retrieve [GET], and
Speech to Text [POST].
Registration [POST]: The endpoint will be used to create a new user. The data that will be received through all the endpoints will be in JSON format. The endpoint uses the imported RegisterSerializer to convert post data into native Python datatypes and validate them. An important thing to note here is that we are using JSON Web Token (JWT) for authenticating our users. As soon as a new account is created, new access and refresh tokens are included in the response. Read more about Simple JWT, A JSON Web Token authentication plugin for the Django REST Framework.
Login [POST]: Validates the user data and returns a response that contains the tokens and logged-in user data.
JWT Token Refresh [POST]: This is used to get a new access token using a valid refresh token. As JWT is a stateless authentication method, meaning there is no session stored for the user. Every 5 mins the access token previously generated is expired which requires the user to generate a new access token using the refresh token which is valid for 24 hours. The validation time for both the access and refresh tokens can be edited from the settings.py file. Learn more about this at Simple JWT.
User Retrieve [GET]: We need to be able to get the data of the currently authenticated user. This endpoint returns the data of the authenticated user.
Speech to Text [POST]: This endpoint used the SpeechRecognition package, to convert audio binary to text. This package supports Google, IBM, and Microsoft Speech-to-text APIs and many more. Here we are using Google Speech-to-text. Because of the abstraction, the package brings, we don't need to call any actual API to upload our audio files for processing.
class SpeechToTextView(APIView):
@swagger_auto_schema(
request_body=serializers.SpeechBody,
)
def post(self, request, *args, **kwargs):
"""
Convert speech to text, using Google Speech Recognition API
Accepts a base64 encoded string of audio data
Returns:
Response: Response object with text
"""
try:
record: str = self.request.data.get('record')
decoded_b64 = base64.b64decode(record)
# Convert decoded_b64 to in-memory bytes buffer
r = sr.Recognizer()
with sr.AudioFile(io.BytesIO(decoded_b64)) as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
# recognize (convert from speech to text)
text = r.recognize_google(audio_data)
return Response(data={'text': text})
except Exception:
return Response(
data={'message': "Error converting speech to text"},
status=status.HTTP_400_BAD_REQUEST)
The data that will be passed to this endpoint will be in base64 format because it's faster and easier to transfer and process in this form.
decoded_b64 = base64.b64decode(record)
This takes the data received and decodes it using the built-in base64 library. The decoded data returned is in binary form. This is done so that we can convert it into a Buffered stream in memory using io.BytesIO(decoded_b64)
, the built-in io
library which is used for working with Input and Output (I/O) streams.
What are Buffered I/O streams in Python?
Buffered I/O streams in Python's io
module are classes that help improve the performance of I/O operations by buffering data in memory. Think of buffering as a way of temporarily storing data in a queue so that it can be accessed more efficiently later on. Read more on Buffered streams to deeply understand how it works.
When working with objects like Buffers, Files, or Database connections in Python Context Managers are used to gain access to these objects because the connection is closed automatically even if there is an unhandled exception in the context manager.
with sr.AudioFile(io.BytesIO(decoded_b64)) as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
# recognize (convert from speech to text)
text = r.recognize_google(audio_data)
If an error occurs while trying to record the audio source, which is the Buffer we created earlier in the memory, the I/O connection will be closed safely. This article explains why it's important to use Context Managers when dealing with objects like this.
So the buffer is recorded and passed to google for speech-to-text processing. If all is well, we return the text in the response else we raise a BadRequest error.
Conclusion ๐ค
In conclusion, this article provides a comprehensive tutorial on how to build a speech authentication system with Django. The system allows users to enter their details by speaking and includes a registration and login page. The article covers the step-by-step guide to setting up the backend API using Django. The article also covers the necessary installations, project setup, and the design of the application. Overall, this article is a great resource for developers looking to build a speech authentication system.
To make this article brief the next part which is Building the Frontend with Next.js will be covered in another article.
You can follow me on Twitter, Hashnode, Dev.to, and Github, where I post amazing projects and articles.
Thanks for reading, ๐.