Carolyn Meinel's Research Project (continued #2)

Meinel and team Ckar captain Christopher Karvetski used IARPA's HFC MTurk workers to extract forecasting accuracy information. Together we built a dynamically updated spreadsheet to manage our >40 forecasting models ingesting forecasts and rationales from ~ 500 MTurkers and a team of human forecasters with known high abilities. Every morning it uploaded the latest MTurk data from the IARPA API, in JSON format, converted to comma separated variable format and inserted into our spreadsheet. Then it updated immediately whenever one of the non-MTurk humans added their own forecasts. Karvetski did the lion's share of this work, for example calculating the Brier scores of all MTurkers as each question was scored. Meinel's inputs included using her forecasting skills to alter weights, etc. for use by Karvetski for tweaking the models until their outputs looked "good."

The first half of the competition we devoted to testing many models, including those specifically for time series data. All competitors were allocated a maximum of forty models at any given time. We tested over forty by swapping them out for new models, and adjusting the weights or inputs on existing models. This included modeling the performances of the MTurk forecasters both based upon the evolution of their Brier scores and using Semantic Search and some other NLP models to evaluate their rationales. Then in mid-August we began going for the highest scores possible because of the lure of the prize money. At that time we turned on our top level forecasting model, making use of inputs from our other models and with a human supervision layer, column J.

Screenshot of a sample instance, Fig. 1 below. The "Forecast_validation" column and "calculation_column" are for error checking. When validating, any non-zero number calls for an investigation (Meinel's job. That's what she gets for being an industrial engineer) Calculations for Column W, "Super_turker_forecast" are illustrated in Figure 2 below.

Ckar forecasting spreadsheet of 11-4-2019Figure 1: The Team Ckar forecasting system. It illustrates how spreadsheets can be used for surprisingly sophisticated forms of machine learning. (And we don't have to fear spreadsheets becoming an existential risk)

Next, Figure 2 below, see the calculations that went into Column W above: calculated by Meinel and Karvetski. This leaderboard includes ~500 MTurks (numbers changed slightly over time).

MTurk leaderboard with superTurkers identified.

Fig. 3 below: Forecasting accuracies of the best model per team. Our team, led by Karvetski, is Ckar. IFP is "Individual Forecasting Question."

Final Leaderboard, all models, all competitors, GFC II

Result:

Back to home -->

© 2024 Carolyn Meinel. All rights reserved.