model.tdr module

Author:Dominic Hunt
class model.tdr.TDR(alpha=0.3, beta=4, tau=0.3, invBeta=None, expect=None, avReward=None, **kwargs)[source]

Bases: model.modelTemplate.Model

The td-Learning algorithm

Name

The name of the class used when recording what has been used.

Type:string
currAction

The current action chosen by the model. Used to pass participant action to model when fitting

Type:int
Parameters:
  • alpha (float, optional) – Learning rate parameter
  • beta (float, optional) – Sensitivity parameter for probabilities
  • invBeta (float, optional) – Inverse of sensitivity parameter. Defined as \(\frac{1}{\beta+1}\). Default 0.2
  • tau (float, optional) – Learning rate for average reward
  • number_actions (integer, optional) – The maximum number of valid actions the model can expect to receive. Default 2.
  • number_cues (integer, optional) –
    The initial maximum number of stimuli the model can expect to receive.
    Default 1.
  • number_critics (integer, optional) – The number of different reaction learning sets. Default number_actions*number_cues
  • action_codes (dict with string or int as keys and int values, optional) – A dictionary used to convert between the action references used by the task or dataset and references used in the models to describe the order in which the action information is stored.
  • prior (array of floats in [0, 1], optional) – The prior probability of of the states being the correct one. Default ones((number_actions, number_cues)) / number_critics)
  • expect (array of floats, optional) – The initialisation of the expected reward. Default ones((number_actions, number_cues)) * 5 / number_cues
  • stimFunc (function, optional) – The function that transforms the stimulus into a form the model can understand and a string to identify it later. Default is blankStim
  • rewFunc (function, optional) – The function that transforms the reward into a form the model can understand. Default is blankRew
  • decFunc (function, optional) – The function that takes the internal values of the model and turns them in to a decision. Default is model.decision.discrete.weightProb
actorStimulusProbs()[source]

Calculates in the model-appropriate way the probability of each action.

Returns:probabilities – The probabilities associated with the action choices
Return type:1D ndArray of floats
calcProbabilities(actionValues)[source]

Calculate the probabilities associated with the actions

Parameters:actionValues (1D ndArray of floats) –
Returns:probArray – The probabilities associated with the actionValues
Return type:1D ndArray of floats
delta(reward, expectation, action, stimuli)[source]

Calculates the comparison between the reward and the expectation

Parameters:
  • reward (float) – The reward value
  • expectation (float) – The expected reward value
  • action (int) – The chosen action
  • stimuli ({int | float | tuple | None}) – The stimuli received
Returns:

Return type:

delta

lastChoiceReinforcement()[source]

Allows the model to update its expectations once the action has been chosen.

returnTaskState()[source]

Returns all the relevant data for this model

Returns:results – The dictionary contains a series of keys including Name, Probabilities, Actions and Events.
Return type:dict
rewardExpectation(observation)[source]

Calculate the estimated reward based on the action and stimuli

This contains parts that are task dependent

Parameters:observation ({int | float | tuple}) – The set of stimuli
Returns:
  • actionExpectations (array of floats) – The expected rewards for each action
  • stimuli (list of floats) – The processed observations
  • activeStimuli (list of [0, 1] mapping to [False, True]) – A list of the stimuli that were or were not present
storeState()[source]

Stores the state of all the important variables so that they can be accessed later

updateModel(delta, action, stimuli, stimuliFilter)[source]
Parameters:
  • delta (float) – The difference between the reward and the expected reward
  • action (int) – The action chosen by the model in this trialstep
  • stimuli (list of float) – The weights of the different stimuli in this trialstep
  • stimuliFilter (list of bool) – A list describing if a stimulus cue is present in this trialstep