Saptak's Blog Posts

Creating Custom Whoosh Plugin

Posted: 2020-04-19T13:16:52+05:30

Recently, while trying to work on a query parser feature in Weblate, I came across this search engine library called Whoosh. It provides certain nice features like indexing of text, parsing of search queries, scoring algorithms, etc. One good thing about this library is most of these features are customizable and extensible.

Now, the feature I was trying to implement is an exact search query. An exact search query would behave in a way such that the backend would search for an exact match of any query text provided to it instead of the normal substring search. Whoosh provides a plugin for regex, which can be accessed via whoosh.qparser.RegexPlugin(). So we can technically go about writing a regex to do the exact match. But a regex search will have worse performance than a simple string comparison.

So, one of the ways of doing a new kind of query parsing is creating a custom whoosh plugin. And that's what this blog is going to be about.

Simple Whoosh Plugin

In some cases, you will probably not need a complicated plugin, but just want to extend the feature of an existing plugin to match a different kind of query. For example, let's say you want to extend the ability of SingleQuotePlugin to parse queries wrapped in either single-quotes or double-quotes.

class QuotePlugin(whoosh.qparser.SingleQuotePlugin):
    """Single and double quotes to specify a term."""
    expr = r"(^|(?<=\W))['\"](?P<text>.*?)['\"](?=\s|\]|[)}]|$)"

In the above example, QuotePlugin extends the already existing SingleQuotePlugin class. It just overrides the expression to parse the query. The expression, mentioned in the variable expr is usually a regex expression with ?P<text> part denoting the TermQuery. A TermQuery is the final term/terms searched for in the database. So in the above regex, we say to parse any query such that the TermQuery is wrapped in between single-quotes or double-quotes.

Query Class

A query class is the class, whose instance the final parsed term will be. Unless otherwise mentioned, it's usually <Term>. So if we want our plugin to parse the query and show it as an instance of a custom class, we need to define a custom query class.

class Exact(whoosh.query.Term):
    """Class for queries with exact operator."""

    pass

So, as you can say, we can just have a simple class just extending whoosh.query.Term so that while checking the parsed terms, we can get is as an instance of Exact. That will help us differentiate the query from a normal Term instance.

Custom Whoosh Plugin

After writing the query class, we will need to write the custom plugin class.

class ExactPlugin(whoosh.qparser.TaggingPlugin):
    """Exact match plugin with quotes to specify an exact term."""

    class ExactNode(whoosh.qparser.syntax.TextNode):
        qclass = Exact

        def r(self):
            return "Exact %r" % self.text

    expr = r"\=(^|(?<=\W))(['\"]?)(?P<text>.*?)\2(?=\s|\]|[)}]|$)"
    nodetype = ExactNode

In the above example, unlike the simple case, we extend TaggingPlugin instead of any other pre-defined plugin. Most of the pre-defined plugins in whoosh also extend TaggingPlugin. So it is a good fit as a parent class.

Then, we create a ExactNode class. This we will assign to the node type for the custom plugin. A node type class basically defines the query class to be used in this custom plugin, along with various representations and properties of the parsed node. qclass will have the query class created before to denote the Exact instance to the final parsed term.

Apart from that, we have the expr which contains the regex just like in the simple example to parse the query term.

Finally...

After creating the custom plugin, you can:

  • add this plugin to the list of plugins defined in the whoosh query parser class
  • use the query class to make an isinstance() check when making database queries
  • check for the node type in the different nodes used by the parser

Ticket Ordering or Positioning (back-end)

Posted: 2017-06-21T13:41:00+05:30

One of the many feature requests that we got for our open event organizer server or the eventyay website is ticket ordering. The event organizers wanted to show the tickets in a particular order in the website and wanted to control the ordering of the ticket. This was a common request by many and also an important enhancement. There were two main things to deal with when ticket ordering was concerned. Firstly, how do we store the position of the ticket in the set of tickets. Secondly, we needed to give an UI in the event creation/edit wizard to control the order or position of a ticket. In this blog, I will talk about how we store the position of the tickets in the backend and use it to show in our public page of the event.

So, as you can expect of course we need to store the position information of the ticket in the database. We already have a table for tickets in our database. All we needed to do was to add a column 'position' to that table. The table would still be in a 3NF normal form since each individual ticket for a particular event can have only one position value. Since we use Flask-Migrate, all we need to do is migrate and upgrade to add this new column.
After this was done, we could ensure that once the tickets with their proper position value was submitted from the front-end, we can store the information in the database. So, we needed to store the ticket position also received from the form along with all other ticket details. Now how we get the position value for a particular ticket in front-end will be discussed in the second blog. As of now, we assume, we have a database with position values assigned to tickets and we need to show them in ascending order in the event page.

So, in other words, what we needed to do was to sort the array of tickets associated with the event object in ascending order of their position attribute. The sorted function and lambda function of python came to the rescue. All we needed was to use the sorted function with a lambda function to determine the key for sorting. So the final code for sorting would look something like this:
sorted_tickets = sorted(event.tickets, key=lambda x: x.position)
Hence, sorted_tickets will have the tickets sorted in ascending order of their position. After this we send this newly sorted array of tickets as parameter while rendering templates whenever we need to show the tickets. And voila! we get to show the tickets in their sorted order.
So this all works considering that we get the information from the front-end, which is absolutely feasible for newly created events. But what about already created events? What happens with the tickets already created in those events? How do we give them a position value?
For allowing the feature to work on already created tickets, what we do is before the tickets are sorted, we check whether the tickets have a position attribute. If position attribute isn't present, then we assign it a position attribute which is 1 more than the index of the ticket in the array of tickets. That way we confirm that the tickets can be re-arranged.
Okay. That all sounds good. But how do we re-arrange tickets in front-end? Well, for that you have to read the next blog.

PyCon India 2016 : A weekend to remember

Posted: 2016-10-05T01:29:00+05:30

PyCon India is one of the best experiences I have had in recent past. PyCon India this year was held in New Delhi from 23rd September to 25th September, 2016. Three days filled with learning, interaction, meeting like minded people; couldn't have asked for anything more. The entire was one of python and development and I loved it.


Day 1 - Devsprint

The first day was special. I was conducting a devsprint on my GSoC project, Open Event Organizer Server under FOSSASIA. To represent the project at such a huge platform was a real great honor for me. The devsprint was a real good experience. Pitched about the project to an audience which comprised of people from many different organisations who were much more knowledgeable in Python than I am. I got to meet a few developers, helped them setup the project and walked them through the various components of the project, told them about In the Heat of Code, a contest by FOSSASIA which involved my project. To see so many eager faces willing to contribute was such a great experience. Also, got to meet a lot of great python developers and discuss my project with them. All in all an experience to remember. 
Apart from the DevSprint, met some fellow GSoCers, organisation heads and pythonistas, had an awesome lunch and also had the experience of volunteering. The day ended with a volunteers assembly where plans of the next 2 days were discussed. The first day had pretty much established that the next 2 days are going to be awesome.

After the closure, had the opportunity of going out with Sayan Chowdhury who is associated with both FOSSASIA and Fedora Projects and other Fedora Project contributors. It was a great meetup between the people of 2 organisations and led to discussions about various things. Later we went together to grab some food together and enjoy ourselves. After that, it was time to call the day off.

Day 2 - Volunteering Experience

Woke up early in the morning all excited. This was a big day. First chance of volunteering in a PyCon, meeting some awesome people and pythonistas. Who wouldn't be excited, right? So, went to JNU Convention Center, quickly completed my breakfast and was there in Audi 2 ready for the awaiting experience to blow me away. Meanwhile, booths of RedHat, Digital Ocean, goibibo, JetBrains, ZeOmega and IAMAI. I roamed about in the booths meeting representatives of each organisations, knowing about what and how they work on various things - from containers to cloud services to IDE, it was simply a great learning experience. Plus, got goodies from the booths. So that was awesome as well.

After that it was time for the keynote by Baishampayan Ghose or better known as BG which was followed by some real awesome talks - some of them I understood, while others interested me to learn new things. Apart from the talks, attended the open space and lightning talks as well while also meeting people like Kushal Das whom I have intended to meet for quite some time.

At the end of the day, it was time for volunteer speaker party at Bar-b-Que Nation. The delicious food mixed with discussions about open source and development in various organisations, one of the best parties in my life.

Day 3 - Last Day

Everyone was a little sad because it was the last day but at the same time it was time to make most out of this day. So, my aim as well was to meet, connect and interact with as many people as I can and also attend the talks that interested me. I roamed about trying to gather as much information as I can. It was all so overwhelming. Then we had a staircase DGPLUG meet, where Kushal Das talked about how open source inspires us all and why one should at all do open source. He was surrounded by many students who were new to this entire world, and his talk was so inspiring I felt proud to be a part of open source community.


The day ended with a vote of thanks. Group photos were clicked, promises were made to meet again, last time discussions and sharing of contact details. Though it was the last day, it actually marked a beginning for me - a beginning to be part of such conferences, to meet awesome new people. So now really looking forward to any developer conference and being a part of it.

Resizing Uploaded Image (Python)

Posted: 2016-09-29T18:07:00+05:30

While we make websites where we need to upload images such as in event organizing server, the image for the event needs to be shown in various different sizes in different pages. But an image with high resolution might be an overkill for using at a place where we just need it to be shown as a thumbnail. So what most CMS websites do is re-size the image uploaded and store a smaller image as thumbnail. So how do we do that? Let's find out.


Here, I am going to discuss about how to do it in python. Python has a module named PIL (Python Image Library). We will be using PIL to resize image. Firstly, what we do is store the image in a temporary folder. Then we open the image as PIL Image object.
While re-sizing, we maintain the aspect ratio. Firstly, we fix on a particular width of the thumbnail. We get the height from the width fixed keeping  the aspect ratio intact using the following code:
Screenshot from 2016-08-14 04:23:05
After we have the width and height that we want to resize the image to, we use the resize() function of Image class in PIL. It takes a tuple (width, height) as a parameter.
Screenshot from 2016-08-14 04:28:13
After the image is resized, you can either save it in your local disk or upload it to your storage space.