Friday, December 12, 2008
TSP algorithm, using hidden variable solution
I came up with an approach to the Traveling Salesman Problem by imagining two travelers—one the traditional traveler moving forward in time long the optimum path, and a second backwards traveler moving backwards through time through the optimal path.
The two travelers meet and the solution collapses to a single traveler moving forward in time along the optimum path, while the algorithm works by minimizing the total weighted distance between the two travelers at each step.
I went ahead and generalized beyond the traditional TSP problem to a general solution for node and a traveler which allows moving to one node more than once, allowing what I call hubs, and nodes not reachable from all nodes.
The complete algorithm follows.
Given two travelers T_1 and T_2 and m nodes N, where m is a natural number, each traveler can be at a particular node, and each node has a distance from every other node in the space. However, it is not assumed that each node is reachable from every other node, but that at least one path exists between each node and one other node.
If no distance information is given then the travelers are assumed to be in a m-1 dimensional space with each node the same unit distance from every other node.
There is also a weight associated with the path from each node to another, which in general is considered to be cost. But it can be that or as many things as desired.
For instance, if it takes $200 U.S. to go from N_1 to N_2, then that cost is what's used for a traveler at N_1 considering going to N_2.
Assume T_1 is at N_1 and T_2 is at N_m. For the first iteration T_1 considers moving to N_2 which is reachable, and then T_2 considers moving from each reachable node in turn EXCEPT N_1 or N_2 as it excludes the node that T_1 is already at, and also excludes the one being considered.
For each potential move T_2 calculates the cost from that node to N_m, as T_2 is moving backwards in time, which I'll call cost_2, and then it calculates the straight line distance (a unit for the distance normalized algorithm) to T_1 at N_2--not at N_1, as it is checking where T_1 will be—and it multiplies the cost for that move times the cost of the move T_1 is considering, which I'll call cost_1, times the distance to T_1, and stores that data.
After T_2 has gone through every possible move, it simply takes the one that is smallest cost_1*cost_2*distance, and stores that info. If there are more weights, like time, it'd multiply time_1*time_2*cost_1*cost_2*distance, and so forth with as many multiplications as there are weights.
Now T_1 considers moving to another reachable node, like N_3 if there is a path from it current node to that one, and T_2 does the same calculation again, and stores the smallest cost_1*cost_2*distance.
This process continues until all possible moves by T_1 are done, and now T_1 has a set of cost_1*cost_2*distance values from the calculations by T_2, and selects the smallest one. If there is more than one path with the least value then it doesn't matter which is taken, so the first can be taken.
Now both T_1 and T_2 actually move to their respective nodes given by that smallest value.
Now the paths from nodes N_1 and N_m to the chosen nodes in the direction moved is removed from consideration. However, the reverse path for each is still available.
And you have a complete iteration.
Note that you have a maximum (m-2)^2 checks, so the algorithm is polynomial time.
If the travelers start at different nodes and every node is reachable from every other node, but you still have nearly the same number after the first iteration as now only paths are being removed and not nodes, but the algorithm is still polynomial time.
T_1 and T_2 continue until there are no more paths available available or there is only one node available.
Note that in the latter case then EITHER can move to complete, so you can arbitrarily say that T_1 moves forward in that case.
And that is how every node is reached, even if some nodes are not reachable from certain other nodes.
To have a complete circle back to a starting point, you have both T_1 and T_2 start at the same node, like N_1.
The cost for a round trip path with a starting point from each node in turn is stored with the optimum path being the one with the minimum total cost.
Notice with this algorithm you may go to a node multiple times, and a well-visited node is called a hub.
If the optimum path from each node is the same then the graph is said to be perfectly correlated, and corresponds with a case where the costs correlate very well with the distance between nodes.
If only one path is optimal then the graph is said to be dis-correlated, and that corresponds to a case where costs have little or no relation to distances between nodes.
The percentage correlation can be found then by dividing the number of starting nodes that give an optimal path by the total number of nodes, where a graph for which there is a high percentage correlation is said to be a rational graph, where it seems to me that a 67% correlation or higher would qualify.
That could have real world relevance for determining, say, whether or not prices along paths correlate well between measures along the path, or it might reveal problem spots, where, for instance, there is a high cost associated with an otherwise desirable path, which, for instance, might help a city plan places for new roads.
The two travelers meet and the solution collapses to a single traveler moving forward in time along the optimum path, while the algorithm works by minimizing the total weighted distance between the two travelers at each step.
I went ahead and generalized beyond the traditional TSP problem to a general solution for node and a traveler which allows moving to one node more than once, allowing what I call hubs, and nodes not reachable from all nodes.
The complete algorithm follows.
Given two travelers T_1 and T_2 and m nodes N, where m is a natural number, each traveler can be at a particular node, and each node has a distance from every other node in the space. However, it is not assumed that each node is reachable from every other node, but that at least one path exists between each node and one other node.
If no distance information is given then the travelers are assumed to be in a m-1 dimensional space with each node the same unit distance from every other node.
There is also a weight associated with the path from each node to another, which in general is considered to be cost. But it can be that or as many things as desired.
For instance, if it takes $200 U.S. to go from N_1 to N_2, then that cost is what's used for a traveler at N_1 considering going to N_2.
Assume T_1 is at N_1 and T_2 is at N_m. For the first iteration T_1 considers moving to N_2 which is reachable, and then T_2 considers moving from each reachable node in turn EXCEPT N_1 or N_2 as it excludes the node that T_1 is already at, and also excludes the one being considered.
For each potential move T_2 calculates the cost from that node to N_m, as T_2 is moving backwards in time, which I'll call cost_2, and then it calculates the straight line distance (a unit for the distance normalized algorithm) to T_1 at N_2--not at N_1, as it is checking where T_1 will be—and it multiplies the cost for that move times the cost of the move T_1 is considering, which I'll call cost_1, times the distance to T_1, and stores that data.
After T_2 has gone through every possible move, it simply takes the one that is smallest cost_1*cost_2*distance, and stores that info. If there are more weights, like time, it'd multiply time_1*time_2*cost_1*cost_2*distance, and so forth with as many multiplications as there are weights.
Now T_1 considers moving to another reachable node, like N_3 if there is a path from it current node to that one, and T_2 does the same calculation again, and stores the smallest cost_1*cost_2*distance.
This process continues until all possible moves by T_1 are done, and now T_1 has a set of cost_1*cost_2*distance values from the calculations by T_2, and selects the smallest one. If there is more than one path with the least value then it doesn't matter which is taken, so the first can be taken.
Now both T_1 and T_2 actually move to their respective nodes given by that smallest value.
Now the paths from nodes N_1 and N_m to the chosen nodes in the direction moved is removed from consideration. However, the reverse path for each is still available.
And you have a complete iteration.
Note that you have a maximum (m-2)^2 checks, so the algorithm is polynomial time.
If the travelers start at different nodes and every node is reachable from every other node, but you still have nearly the same number after the first iteration as now only paths are being removed and not nodes, but the algorithm is still polynomial time.
T_1 and T_2 continue until there are no more paths available available or there is only one node available.
Note that in the latter case then EITHER can move to complete, so you can arbitrarily say that T_1 moves forward in that case.
And that is how every node is reached, even if some nodes are not reachable from certain other nodes.
To have a complete circle back to a starting point, you have both T_1 and T_2 start at the same node, like N_1.
The cost for a round trip path with a starting point from each node in turn is stored with the optimum path being the one with the minimum total cost.
Notice with this algorithm you may go to a node multiple times, and a well-visited node is called a hub.
If the optimum path from each node is the same then the graph is said to be perfectly correlated, and corresponds with a case where the costs correlate very well with the distance between nodes.
If only one path is optimal then the graph is said to be dis-correlated, and that corresponds to a case where costs have little or no relation to distances between nodes.
The percentage correlation can be found then by dividing the number of starting nodes that give an optimal path by the total number of nodes, where a graph for which there is a high percentage correlation is said to be a rational graph, where it seems to me that a 67% correlation or higher would qualify.
That could have real world relevance for determining, say, whether or not prices along paths correlate well between measures along the path, or it might reveal problem spots, where, for instance, there is a high cost associated with an otherwise desirable path, which, for instance, might help a city plan places for new roads.