The main project I’m working on at The Recurse Center is an interactive project measurement dashboard. This week I’ll talk about the current state of the design and construction. I’m actively working on rearranging the user interface so a second post some other week will be needed to cover that.
Motivation
My wife, Justine, uses a series of interconnected spreadsheets to model the sales forecast of book publishing projects she’s managing. The spreadsheets consider things like sales projections, revenue splits, printing costs, and projected returns to try and come up with a month-by-month view of the business’s cash flow. The spreadsheets are fairly clever in their implementation, but still have limitations like having to manually re-link fields if a publishing date is changed.
Approach
I reviewed the spreadsheets and determined the core building blocks are Measurements and Visuals.
Measurements are quantities that describe the project. I noticed a few kinds of measurements:
- Fixed Values: 50/50% revenue split (just making these numbers up!)
- Fixed Values at Dates: Marketing $10,000 on July 1
- Reoccurring Fixed Values: $1000 Software Fees monthly from July 1 to September 1
- Distributed Values: 10,000 Books sold between September 1 and December 1 with strong initial sales followed by a cooldown
- Related Values: Printing cost is $5 times the number of books sold
- Related Values offset in time: Book returns are 10% of books sold with a 1 month lag
Visuals are just the results of measurements and optionally may include some aggregation
- Graphs
- Charts
- Charts of data totaled by month
- Pie Charts
I gathered my thoughts and observations into this Miro Board prior to starting development including some thoughts about GUI wireframes and SQL models. I think it’s always interesting to ideas evolve and looking back at this presentation, I only kept about 50% of the original ideas.
Architecture
I’m interested in improving my skills with Python and Javascript so I constructed programs using a Python backend using the Django and Django Rest Framework and Javascript using React.
The user creates and views measurements in one interface and visualizes the results in a second interface. The second interface stores the state of the users visualizations so that they can easily be recalled. A REST API is used to communicate with the database. When the user creates new measurements, a Django Signal is triggered that causes the results table to be updated.
Handling of Related Measurements
Creating a view of database state in React is not particularly novel. However allowing a user to create measurements that depend on other measurements required more cleverness and this section describes a little bit about how they are handled.
The first step was defining how the frontend would communicate these measurements. This could have been a dedicated SQL database structure but to simplify implementation I decided to use a string syntax to define related measurements. Here is an example of the syntax:
{p33m97l+20}[Test Project:Book Cost]*.25+{p33m104l+20}[Other Project:Marketing Cost]
The curly braces are what’s parsed by the database but the square brackets contain more human-friendly text descriptions of what project and measurement are being specified. Inside the curly braces there is a project tag, measurement tag, and time offset code. The project and measurement tags are actually the keys to the project and measurement in the database. The time offset code allows the result to “lag” or “lead” the measurement. Now since writing this by hand would be a pretty big ask of the users, they can actually select these values in a modal and insert them into the measurement. Users can specify addition, subtraction, multiplication, division, exponents, and parentheses (but more to come on this later).
When a user saves a new measurement with this syntax a HTTP POST request is generated to the database. Django first creates a measurement configuration record:
#views.py
class MeasureViewSet(ModelViewSet):
serializer_class = MeasureSerializer
def get_queryset(self):
return Measure.objects.filter(project=self.kwargs['project_pk'])
def get_serializer_context(self):
return {'project_id': self.kwargs['project_pk'], "request_data": self.request.data}
A measurement is described by several parameters which are a nested object inside the HTTP POST request. This requires a custom serializer to unpack and generate/update the nested models.
#serializers.py
class MeasureSerializer(serializers.ModelSerializer):
def create(self, validated_data):
parameter_data = validated_data.pop('parameters')
project_id = self.context['project_id']
measure = Measure.objects.create(
project_id=project_id, **validated_data)
for parameter in parameter_data:
Parameter.objects.create(measure=measure, **parameter)
return measure
def update(self, instance, validated_data):
if self.context['request_data'].get('parameters'):
parameter_data = self.context['request_data'].pop('parameters')
validated_parameter_data = validated_data.pop('parameters')
# note that we're using raw data from the request data and not the validated data
# when we go to update the nested object because we need the ID field which is stripped
# from validated data
parameter_dict = dict((i.id, i) for i in instance.parameters.all())
for item_data in parameter_data:
if 'id' in item_data:
# if exists id remove from the dict and update
parameter_item = parameter_dict.pop(item_data['id'])
# remove id from validated data as we don't require it.
item_data.pop('id')
# loop through the rest of keys in validated data to assign it to its respective field
for key in item_data.keys():
setattr(parameter_item, key, item_data[key])
parameter_item.save()
else:
# else create a new object
Parameter.objects.create(measure=instance, **item_data)
# delete remaining elements because they're not present in my update call
if len(parameter_dict) > 0:
for item in parameter_dict.values():
item.delete()
for attr, value in validated_data.items():
setattr(instance, attr, value)
instance.save()
return instance
parameters = ParameterSerializer(many=True, read_only=False)
After the measurement is created, a signal function is called that uses the measurement’s parameters to perform calculations of the results and updates a table of measurement results. This function is fairly hefty so I won’t reproduce it here, but I will discuss a few interesting aspects.
The first is what happens to parse the related expression string syntax and calculate the result. A series of regular expressions are used to pull the project, measure, and offset out of the expression string. Then the database queries for those results and puts the actual values into the expression string. Finally, since I’m avoiding writing my own parser, the python eval() method is used to act as the calculator (at some risk– I do pass an empty globals dictionary but the user could write some stupid stuff into the string that I’m not checking for).
Another key aspect is updated related expression measurements that depend on measurements that themselves have dependents. Suppose we have a measurement tree as follows:
Clearly the order of update will matter or else related measurements will use old data to generate. Before I wrote the function that handled this in the database I wrote a small python simulation and used the dependency scenario given in the diagram as test data. In writing it, I used dictionaries and functions like get_measure_by_id to make it clear where later I’d be using Django queries. My solution uses a tree data structure and recursion to cause the update to propagate through the tree in the correct order:
class measure_list():
def __init__(self) -> None:
self.data = [measure(1, [2, 3, 4]),
measure(2, [5]),
measure(3, []),
measure(4, []),
measure(5, [3, 6, 7, 8]),
measure(6, []),
measure(7, []),
measure(8, [3])]
def get_measure_by_id(self, id:int):
return list(filter(lambda x: x.get_id()==id,self.data))[0]
def get_dependents_by_id(self,id:int) -> list[int]:
return list(filter(lambda x: x.get_id()==id,self.data))[0].get_depends_on()
def tell_id_to_update(self,id:int):
measure = self.get_measure_by_id(id)
did_measure_update = measure.update()
return did_measure_update
class measure ():
def __init__(self, id: int, depends_on: list[int], ) -> None:
self.id = id
self.depends_on = depends_on
self.is_updated = False
def set_measure_list(self,measure_list: measure_list) -> None:
self.measure_list = measure_list
def get_id(self) -> int:
return self.id
def get_depends_on(self) -> list[int]:
return self.depends_on
def update(self):
if self.depends_on == []:
self.is_updated = True
print(str(self.get_id()) + "has updated")
return True
else:
results = []
for child in self.depends_on:
results.append(self.measure_list.tell_id_to_update(child))
self.is_updated = True
print(str(self.get_id()) + "has updated")
return True
my_measure_list = measure_list()
for measure in my_measure_list.data:
measure.set_measure_list(my_measure_list)
print(my_measure_list.tell_id_to_update(1))
Output:
3has updated
6has updated
7has updated
3has updated
8has updated
5has updated
2has updated
3has updated
4has updated
1has updated
True
If you look at the output the solution one inefficiency in this approach is that the child “leaf” nodes are told to update multiple times. It’s not harmful but it is slightly inefficient. I could store if the child has been updated and skip the update but in the actual Django implementation I didn’t want to have to manage the “child has been updated already on this update” state. So I’ve left it for now. I think it will be workable since the trees are not likely to be overly complex and the database updates are not particularly costly to perform.
Another problem in this implementation is that circular dependency is not handled. I think that would be easy enough to detect and trigger an error so I plan on fixing it eventually.
The code running on the database looks very similar to this prototype, just with functions like measure_query = Results.objects.filter(measure=measure_id).filter(date=date) used in place of the dictionary functions.
One final big assumption in the code is that since a measurement could have results with null date, a single date, or several dates, a dependent measurement will first look for a child measurement at the same date, then at null date, then it will just assume 0.
Summary
That was a whole lot of discussion of a backend to not really have much to show. I guess if it looked good it wouldn’t be the backend? Regardless I will post a chart to show that this actually works- I’m able to create, update, and delete measurements that then generate valid results in the database. In a future week I’ll share how the user actually makes these charts!