DailyProjectData Queries are hard and complicated

I've been spending my night time internet time working on SQL queries that will return records with the DailyProjectData (DPD) information conforming to the DPD REST API. It's been a pretty rough road just getting the queries to work correctly. Now that I have it working, I have to work on the performance of the queries. I know that I wrote a query that performs badly, but just getting it to work was pretty tough since I'm a noob with SQL.

Here is the Code Issue Query that I wrote that returns the amount of code issues for all tools between a specific timeframe for all data. If you want to get the data for a specific project, there needs to be a sensordata.resource clause in the query. The records that are returned have columns with the type of code issue, the latest runtime, tool, and the code issue total:


SELECT issue_type, tool_runtime, tool, sum(cast(issue_count as integer)) from (

-- This sub query gets all of the records without grouping the issue counts
SELECT sensordata_properties.key AS issue_type, sensordata.runtime
AS tool_runtime, sensordata.tool, value AS issue_count FROM "sensordata"
INNER JOIN sensordata_properties ON (sensordata_properties.sensordata_id = sensordata.id) WHERE (sdt_id = (select id FROM SensorDataType where name ~* 'CodeIssue')
AND tstamp > '2008-08-012T00:00:00.000' AND tstamp < '2008-08-13T00:00:00' AND sensordata_properties.key LIKE 'Type_%' AND sensordata.runtime IN (

-- This subquery gets the latest runtime

SELECT latest_runtime FROM (select max(sensordata.runtime) AS latest_runtime, sensordata.tool FROM "sensordata" INNER JOIN sensordata_properties
ON (sensordata_properties.sensordata_id = sensordata.id) WHERE (sdt_id = (select id
FROM SensorDataType WHERE name ~* 'CodeIssue') AND tstamp > '2008-08-012T00:00:00.000' AND tstamp < '2008-08-13T00:00:00' AND sensordata_properties.key LIKE 'Type_%') GROUP BY tool) AS latest_runtime_query))

GROUP BY sensordata.tool, runtime, sensordata_properties.key, value ORDER BY key) AS issue_sum_query GROUP BY issue_type, tool_runtime, tool ORDER BY issue_type

Phew. Thats a big query. I need to figure out how to refactor the query so I don't repeat the same sub query twice. These queries are currently being using in rails with the find_by_sql method, but James told me to check out the :includes directive because it is very powerful and can handle what I'm trying to do without using specific SQL

The good news is that our DPD service implementation is compatible with the Wicket ProjectBrowser that the CSDL team is working on. Right now we can retrieve the raw SensorData and Build DPD summaries. Hopefully, I'll be able to get the rest of DPD queries working and we can start using the ProjectBrowser at work.