Saptak's Blog Posts
Handling nested serializer validations in Django Rest Framework
Posted: 2020-05-03T01:22:50+05:30I understand that the title of this post is a little confusing. Recently, while working on the Projects API in Weblate, I came across an interesting issue. The Projects API in Weblate allowed you to get an attribute called source_language
. Every project has only one source_language
and in the API, it was a read-only property.
{
"name": "master_locales",
"slug": "master_locales",
"web": "https://example.site",
"source_language": {
"code": "en",
"name": "English",
"direction": "ltr",
"web_url": "http:/example.site/languages/en/",
"url": "http://example.site/api/languages/en/"
},
"web_url": "http://example.site/projects/master_locales/",
"url": "http://example.site/api/projects/master_locales/",
"components_list_url": "http://example.site/api/projects/master_locales/components/",
"repository_url": "http://example.site/api/projects/master_locales/repository/",
"statistics_url": "http://example.site/api/projects/master_locales/statistics/",
"changes_list_url": "http://example.site/api/projects/master_locales/changes/",
"languages_url": "http://example.site/api/projects/master_locales/languages/"
}
As you can see, unlike the other relational fields, it's not a HyperLinkedIdentityField
. It uses the nested language serializer to show all the attributes of the source_language
.
Now, previously, when a project was created via API, a default language was always assigned to the project and there was no way to define the source_language
while creating the project via API.
Problem?
Doing GET on Language Serializer when sending POST on Project Serializer
So we needed to add the feature to define the source_language
of the project when we send a POST request to the Project API. And also edit the project via API to update the source_language
. So, to use the same serializer, the request body for the POST request would look something like this:
{
"name": "master_locales",
"slug": "master_locales",
"web": "https://example.site",
"source_language": {
"code": "ru",
"name": "Russian",
"direction": "ltr",
}
}
Now, in general, we would have a python serializer like this:
class LanguageSerializer(serializers.ModelSerializer):
web_url = AbsoluteURLField(source="get_absolute_url", read_only=True)
class Meta:
model = Language
fields = ("code", "name", "direction", "web_url", "url")
extra_kwargs = {
"url": {"view_name": "api:language-detail", "lookup_field": "code"}
}
class ProjectSerializer(serializers.ModelSerializer):
source_language = LanguageSerializer(required=False)
# ...
# Other parts of the serializer
The problem with having code like this is, when the ProjectSerializer
gets a request like shown above and tries to validate the data in the request, it also validates the LanguageSerializer
part. The LanguageSerializer
part whenever it gets data, it will automatically try to validate the data. The code
property of Language model has a unique constraint. So, when LanguageSerializer
tries to validate
{
"code": "ru",
"name": "Russian",
"direction": "ltr",
}
it will throw an error "This field must be unique" for code
property in case a language with codename ru
already exists in the database.
Solution
So there are few steps to get this done.
Remove validators from code
field
extra_kwargs = {
"url": {"view_name": "api:language-detail", "lookup_field": "code"},
"code": {"validators": []},
}
Add "code": {"validators": []}
to the extra_kwargs
to remove the validator from the LanguageSerializer on every data request it receives.
Add manual validation for code
field
Removing validator will also remove the validation while doing POST request. Now, the LanguageSerializer in Weblate specifically doesn't support POST, but in any case, you would manually need to add a validation function to the LanguageSerializer
so if someone checks for validity before adding language, it throws an error. To do that, add a function validate_code
like this:
def validate_code(self, value):
check_query = Language.objects.filter(code=value)
if check_query.exists() and not (
isinstance(self.parent, ProjectSerializer)
and self.field_name == "source_language"
):
raise serializers.ValidationError(
"Language with this Language code already exists."
)
if not check_query.exists():
raise serializers.ValidationError(
"Language with this language code was not found."
)
return value
Note: The name of the function must be validate_{field_name}
when you are trying to validate a field based on how DRF handles validation.
Overwrite create()
in ProjectSerializer
Finally, we would want to overwrite the create()
function of ProjectSerializer to:
- Validate
source_language
data using the above validation to check if the language with thatcode
exists - Modify
source_language
key of thevalidated_data
to have theLanguage
model object rather than the dictionary, so it can be used to create a project with the foreign key. - Lastly, create a project with the new
validata_data
The code would look something like this:
def create(self, validated_data):
source_language_validated = validated_data.get("source_language")
if source_language_validated:
validated_data["source_language"] = Language.objects.get(
code=source_language_validated.get("code")
)
project = Project.objects.create(**validated_data)
return project
And now, if you create a project, using the source_language
key, you can define the source language for the project while using the Project API. There might be several other ways to go about it. But this is one of the ways I found works.
Also, this feature is now live in Weblate 4.* versions which allows you to define the source_language
via the API.
Creating Custom Whoosh Plugin
Posted: 2020-04-19T13:16:52+05:30Recently, while trying to work on a query parser feature in Weblate, I came across this search engine library called Whoosh. It provides certain nice features like indexing of text, parsing of search queries, scoring algorithms, etc. One good thing about this library is most of these features are customizable and extensible.
Now, the feature I was trying to implement is an exact search query. An exact search query would behave in a way such that the backend would search for an exact match of any query text provided to it instead of the normal substring search. Whoosh provides a plugin for regex, which can be accessed via whoosh.qparser.RegexPlugin()
. So we can technically go about writing a regex to do the exact match. But a regex search will have worse performance than a simple string comparison.
So, one of the ways of doing a new kind of query parsing is creating a custom whoosh plugin. And that's what this blog is going to be about.
Simple Whoosh Plugin
In some cases, you will probably not need a complicated plugin, but just want to extend the feature of an existing plugin to match a different kind of query. For example, let's say you want to extend the ability of SingleQuotePlugin
to parse queries wrapped in either single-quotes or double-quotes.
class QuotePlugin(whoosh.qparser.SingleQuotePlugin):
"""Single and double quotes to specify a term."""
expr = r"(^|(?<=\W))['\"](?P<text>.*?)['\"](?=\s|\]|[)}]|$)"
In the above example, QuotePlugin
extends the already existing SingleQuotePlugin
class. It just overrides the expression to parse the query. The expression, mentioned in the variable expr
is usually a regex expression with ?P<text>
part denoting the TermQuery
. A TermQuery
is the final term/terms searched for in the database. So in the above regex, we say to parse any query such that the TermQuery
is wrapped in between single-quotes or double-quotes.
Query Class
A query class is the class, whose instance the final parsed term will be. Unless otherwise mentioned, it's usually <Term>
. So if we want our plugin to parse the query and show it as an instance of a custom class, we need to define a custom query class.
class Exact(whoosh.query.Term):
"""Class for queries with exact operator."""
pass
So, as you can say, we can just have a simple class just extending whoosh.query.Term
so that while checking the parsed terms, we can get is as an instance of Exact
. That will help us differentiate the query from a normal Term
instance.
Custom Whoosh Plugin
After writing the query class, we will need to write the custom plugin class.
class ExactPlugin(whoosh.qparser.TaggingPlugin):
"""Exact match plugin with quotes to specify an exact term."""
class ExactNode(whoosh.qparser.syntax.TextNode):
qclass = Exact
def r(self):
return "Exact %r" % self.text
expr = r"\=(^|(?<=\W))(['\"]?)(?P<text>.*?)\2(?=\s|\]|[)}]|$)"
nodetype = ExactNode
In the above example, unlike the simple case, we extend TaggingPlugin
instead of any other pre-defined plugin. Most of the pre-defined plugins in whoosh also extend TaggingPlugin
. So it is a good fit as a parent class.
Then, we create a ExactNode
class. This we will assign to the node type for the custom plugin. A node type class basically defines the query class to be used in this custom plugin, along with various representations and properties of the parsed node. qclass
will have the query class created before to denote the Exact
instance to the final parsed term.
Apart from that, we have the expr
which contains the regex just like in the simple example to parse the query term.
Finally...
After creating the custom plugin, you can:
- add this plugin to the list of plugins defined in the whoosh query parser class
- use the query class to make an
isinstance()
check when making database queries - check for the node type in the different nodes used by the parser